This is the release of Metagenome Standard Draft "Rifle Groundwater C1, ASSEMBLY_DATE=2018-03-21T16:05:38". IMG assembly of Rifle Groundwater C1, ASSEMBLY_DATE=2018-03-21T16:05:38 Proposal Name: Soil microbial communities from Rifle, Colorado, USA Proposal ID: 564 Analysis Project/Task ID: 1166243/215681 Sequencing Project ID(s): 1006513 Input Data: Library: HSBN Platform: Illumina HiSeq-2000 Regular (DNA) 500 bp fragment RawData: /global/dna/dm_archive/sdm/illumina/00/63/18/6318.5.42196.ATCACG.fastq.gz Read Pre-processing: The number of raw input reads is: 104316000 Assembly Stats: A C G T N IUPAC Other GC GC_stdev 0.2847 0.2152 0.2154 0.2847 0.0000 0.0000 0.0000 0.4306 0.1027 Main genome scaffold total: 340101 Main genome contig total: 340101 Main genome scaffold sequence total: 379268725 Main genome contig sequence total: 379268725 0.000% gap Main genome scaffold N/L50: 27156/2035 Main genome contig N/L50: 27156/2035 Main genome scaffold N/L90: 223157/389 Main genome contig N/L90: 223157/389 Max scaffold length: 754741 Max contig length: 754741 Number of scaffolds > 50 KB: 392 % main genome in scaffolds > 50 KB: 10.44% Minimum Number Number Total Total Scaffold Scaffold of of Scaffold Contig Contig Length Scaffolds Contigs Length Length Coverage -------- -------------- -------------- -------------- -------------- -------- All 340101 340101 379268725 379268725 100.00% 100 340101 340101 379268725 379268725 100.00% 250 339778 339778 379211132 379211132 100.00% 500 168888 168888 317686099 317686099 100.00% 1000 70486 70486 249415986 249415986 100.00% 2500 20545 20545 174797889 174797889 100.00% 5000 8497 8497 133491192 133491192 100.00% 10000 3538 3538 99415065 99415065 100.00% 25000 1063 1063 62812549 62812549 100.00% 50000 392 392 39606140 39606140 100.00% 100000 107 107 19920419 19920419 100.00% 250000 17 17 6824648 6824648 100.00% 500000 4 4 2496658 2496658 100.00% Alignment of reads to final assembly: The number of reads used as input to aligner is: 104316000 The number of aligned reads is: 89077586 (85%) Assembly Methods: The assembly was produced by the JGI IMG team as part of their effort to reassemble older projects with modern assembly tools. The steps are similar to the current JGI Assembly protocol (as of 7/2017) with the notable exception of the treatment of contaminants. The JGI Assembly group filters for quality and contaminants prior to assembly, however this assembly was produced by the IMG group using the raw/unfiltered reads and contaminant contigs are treated differently. Large eukaryote contaminant contigs (human, mouse, cat, dog) are discarded but suspected prokaryote contigs are not removed and are listed in the contaminants.txt file instead. Users should be aware that microbes indicated in the contaminants file are commonly found in sequencing libraries, being introduced from the reagents used in library preparation. The raw/unfiltered reads were read corrected using bfc (r181) with kmer size of 21. The resulting reads were then assembled using SPAdes assembler (SPAdes version: 3.10.0-dev) (3) using a range of Kmers with the following options: --meta --only-assembler -k 21,33,55,77,99,127 The entire filtered read set was mapped to the final assembly and coverage information generated using bwa mem (4; version 0.7.15-r1142-dirty) using default parameters. Contaminant contigs were identified by mapping to the JGI Assembly group's standard human, dog, cat, and mouse reference databases; contigs with >= 90% length match were discarded. Putative contaminant contigs were identified by mapping contigs to JGI Assembly group's standard microbial contaminants reference database and contigs with >= 90% length match are listed in the contaminants.txt file, but not removed. If you have any questions, please contact the JGI project manager. Please indicate if your questions pertain to the IMG reassembly or the official JGI assembly. (1) B. Bushnell: BBTools software package (http://bbtools.jgi.doe.gov) (2) Li, H. BFC: correcting Illumina sequencing errors. Bioinformatics 31, 2885-2887 (2015). Doi: 10.1093/bioinformatics/btv290 (https://github.com/lh3/bfc) (3) Nurk, S., Meleshko, D., Korobeynikov, A. and Pevzner, P.A., 2017. metaSPAdes: a new versatile metagenomic assembler. Genome research, 27(5), pp.824-834. (4) Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics, 25:1754-60. [PMID: 19451168] The work conducted by the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231