This is the release of Metagenome Standard Draft "Rifle Groundwater A2, ASSEMBLY_DATE=2018-03-21T16:19:45".

IMG assembly of Rifle Groundwater A2, ASSEMBLY_DATE=2018-03-21T16:19:45

Proposal Name:  Soil microbial communities from Rifle, Colorado, USA
Proposal ID: 564
Analysis Project/Task ID: 1165726/214647
Sequencing Project ID(s): 1006504

Input Data:
Library: CZBC
Platform: Illumina HiSeq-2000 Regular (DNA) 500 bp fragment
RawData: /global/dna/dm_archive/sdm/illumina/00/61/81/6181.4.39404.fastq.gz

Read Pre-processing:
The number of raw input reads is: 268060728

Assembly Stats:
A	C	G	T	N	IUPAC	Other	GC	GC_stdev
0.2657	0.2344	0.2343	0.2656	0.0000	0.0000	0.0000	0.4687	0.1124

Main genome scaffold total:         	1424945
Main genome contig total:           	1424945
Main genome scaffold sequence total:	1499332537
Main genome contig sequence total:  	1499332537  	0.000% gap
Main genome scaffold N/L50:         	119017/1975
Main genome contig N/L50:           	119017/1975
Main genome scaffold N/L90:         	950109/362
Main genome contig N/L90:           	950109/362
Max scaffold length:                	1553869
Max contig length:                  	1553869
Number of scaffolds > 50 KB:        	1207
% main genome in scaffolds > 50 KB: 	8.15%


Minimum 	Number        	Number        	Total         	Total         	Scaffold
Scaffold	of            	of            	Scaffold      	Contig        	Contig  
Length  	Scaffolds     	Contigs       	Length        	Length        	Coverage
--------	--------------	--------------	--------------	--------------	--------
    All 	       1424945	       1424945	    1499332537	    1499332537	 100.00%
    100 	       1424945	       1424945	    1499332537	    1499332537	 100.00%
    250 	       1423576	       1423576	    1499094581	    1499094581	 100.00%
    500 	        614542	        614542	    1210202004	    1210202004	 100.00%
   1000 	        286778	        286778	     978562432	     978562432	 100.00%
   2500 	         87238	         87238	     679391692	     679391692	 100.00%
   5000 	         35016	         35016	     500510510	     500510510	 100.00%
  10000 	         13723	         13723	     354366289	     354366289	 100.00%
  25000 	          3607	          3607	     203677780	     203677780	 100.00%
  50000 	          1208	          1208	     122178167	     122178167	 100.00%
 100000 	           340	           340	      64250745	      64250745	 100.00%
 250000 	            52	            52	      22528799	      22528799	 100.00%
 500000 	            11	            11	       8722210	       8722210	 100.00%
1000000 	             3	             3	       3747220	       3747220	 100.00%


Alignment of reads to final assembly:
The number of reads used as input to aligner is: 268060728
The number of aligned reads is: 216787618 (81%)

Assembly Methods:

The assembly was produced by the JGI IMG team as part of their effort to reassemble older projects with modern assembly tools.  The steps are similar to the current JGI Assembly protocol (as of 7/2017) with the notable exception of the treatment of contaminants.  The JGI Assembly group filters for quality and contaminants prior to assembly, however this assembly was produced by the IMG group using the raw/unfiltered reads and contaminant contigs are treated differently.  Large eukaryote contaminant contigs (human, mouse, cat, dog) are discarded but suspected prokaryote contigs are not removed and are listed in the contaminants.txt file instead.  Users should be aware that microbes indicated in the contaminants file are commonly found in sequencing libraries, being introduced from the reagents used in library preparation.

The raw/unfiltered reads were read corrected using bfc (r181) with kmer size of 21.

The resulting reads were then assembled using SPAdes assembler (SPAdes version: 3.10.0-dev) (3) using a range of Kmers with the following options: --meta --only-assembler -k 21,33,55,77,99,127

The entire filtered read set was mapped to the final assembly and coverage information generated using bwa mem (4; version 0.7.15-r1142-dirty) using default parameters.

Contaminant contigs were identified by mapping to the JGI Assembly group's standard human, dog, cat, and mouse reference databases; contigs with >= 90% length match were discarded.  Putative contaminant contigs were identified by mapping contigs to JGI Assembly group's standard microbial contaminants reference database and contigs with >= 90% length match are listed in the contaminants.txt file, but not removed.

If you have any questions, please contact the JGI project manager.  Please indicate if your questions pertain to the IMG reassembly or the official JGI assembly.

(1) B. Bushnell: BBTools software package (http://bbtools.jgi.doe.gov)
(2) Li, H. BFC: correcting Illumina sequencing errors. Bioinformatics 31, 2885-2887 (2015). Doi: 10.1093/bioinformatics/btv290 (https://github.com/lh3/bfc)
(3) Nurk, S., Meleshko, D., Korobeynikov, A. and Pevzner, P.A., 2017. metaSPAdes: a new versatile metagenomic assembler. Genome research, 27(5), pp.824-834.
(4) Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics, 25:1754-60. [PMID: 19451168]

The work conducted by the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231