This is the release of Metagenome Standard Draft "Rifle Groundwater A2, ASSEMBLY_DATE=2018-03-21T16:19:45". IMG assembly of Rifle Groundwater A2, ASSEMBLY_DATE=2018-03-21T16:19:45 Proposal Name: Soil microbial communities from Rifle, Colorado, USA Proposal ID: 564 Analysis Project/Task ID: 1165726/214647 Sequencing Project ID(s): 1006504 Input Data: Library: CZBC Platform: Illumina HiSeq-2000 Regular (DNA) 500 bp fragment RawData: /global/dna/dm_archive/sdm/illumina/00/61/81/6181.4.39404.fastq.gz Read Pre-processing: The number of raw input reads is: 268060728 Assembly Stats: A C G T N IUPAC Other GC GC_stdev 0.2657 0.2344 0.2343 0.2656 0.0000 0.0000 0.0000 0.4687 0.1124 Main genome scaffold total: 1424945 Main genome contig total: 1424945 Main genome scaffold sequence total: 1499332537 Main genome contig sequence total: 1499332537 0.000% gap Main genome scaffold N/L50: 119017/1975 Main genome contig N/L50: 119017/1975 Main genome scaffold N/L90: 950109/362 Main genome contig N/L90: 950109/362 Max scaffold length: 1553869 Max contig length: 1553869 Number of scaffolds > 50 KB: 1207 % main genome in scaffolds > 50 KB: 8.15% Minimum Number Number Total Total Scaffold Scaffold of of Scaffold Contig Contig Length Scaffolds Contigs Length Length Coverage -------- -------------- -------------- -------------- -------------- -------- All 1424945 1424945 1499332537 1499332537 100.00% 100 1424945 1424945 1499332537 1499332537 100.00% 250 1423576 1423576 1499094581 1499094581 100.00% 500 614542 614542 1210202004 1210202004 100.00% 1000 286778 286778 978562432 978562432 100.00% 2500 87238 87238 679391692 679391692 100.00% 5000 35016 35016 500510510 500510510 100.00% 10000 13723 13723 354366289 354366289 100.00% 25000 3607 3607 203677780 203677780 100.00% 50000 1208 1208 122178167 122178167 100.00% 100000 340 340 64250745 64250745 100.00% 250000 52 52 22528799 22528799 100.00% 500000 11 11 8722210 8722210 100.00% 1000000 3 3 3747220 3747220 100.00% Alignment of reads to final assembly: The number of reads used as input to aligner is: 268060728 The number of aligned reads is: 216787618 (81%) Assembly Methods: The assembly was produced by the JGI IMG team as part of their effort to reassemble older projects with modern assembly tools. The steps are similar to the current JGI Assembly protocol (as of 7/2017) with the notable exception of the treatment of contaminants. The JGI Assembly group filters for quality and contaminants prior to assembly, however this assembly was produced by the IMG group using the raw/unfiltered reads and contaminant contigs are treated differently. Large eukaryote contaminant contigs (human, mouse, cat, dog) are discarded but suspected prokaryote contigs are not removed and are listed in the contaminants.txt file instead. Users should be aware that microbes indicated in the contaminants file are commonly found in sequencing libraries, being introduced from the reagents used in library preparation. The raw/unfiltered reads were read corrected using bfc (r181) with kmer size of 21. The resulting reads were then assembled using SPAdes assembler (SPAdes version: 3.10.0-dev) (3) using a range of Kmers with the following options: --meta --only-assembler -k 21,33,55,77,99,127 The entire filtered read set was mapped to the final assembly and coverage information generated using bwa mem (4; version 0.7.15-r1142-dirty) using default parameters. Contaminant contigs were identified by mapping to the JGI Assembly group's standard human, dog, cat, and mouse reference databases; contigs with >= 90% length match were discarded. Putative contaminant contigs were identified by mapping contigs to JGI Assembly group's standard microbial contaminants reference database and contigs with >= 90% length match are listed in the contaminants.txt file, but not removed. If you have any questions, please contact the JGI project manager. Please indicate if your questions pertain to the IMG reassembly or the official JGI assembly. (1) B. Bushnell: BBTools software package (http://bbtools.jgi.doe.gov) (2) Li, H. BFC: correcting Illumina sequencing errors. Bioinformatics 31, 2885-2887 (2015). Doi: 10.1093/bioinformatics/btv290 (https://github.com/lh3/bfc) (3) Nurk, S., Meleshko, D., Korobeynikov, A. and Pevzner, P.A., 2017. metaSPAdes: a new versatile metagenomic assembler. Genome research, 27(5), pp.824-834. (4) Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics, 25:1754-60. [PMID: 19451168] The work conducted by the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231