JGI assembly of: Co-Assembly - Metagenome Draft - Combined assembly Anza Borrego MGs, ASSEMBLY_DATE=20250402 Proposal: 1948 - Systems level insights into methane cycling in arid and semi- arid ecosystems via community metagenomics and metatranscriptomics Principal Investigator: Marina Kalyuzhnaya - mkalyuzhnaya@sdsu.edu Analysis Project/Task ID: 1541801/625116 Sequencing Project ID(s): 1140290, 1140291, 1140295, 1140297, 1140298, 1140299, 1140300, 1140307, 1140308, 1140309, 1140310, 1140311, 1140313, 1140314, 1140315, 1140317, 1140318 External Data Files: MERGED_ABUNV1_merged_001.anqdpht.fastq.gz MERGED_ABUNV2_merged_001.anqdpht.fastq.gz MERGED_ABUNV3_merged_001.anqdpht.fastq.gz MERGED_ABUNV4_merged_001.anqdpht.fastq.gz MERGED_ABUNV5_merged_001.anqdpht.fastq.gz MERGED_ABVEG1_merged_001.anqdpht.fastq.gz MERGED_ABVEG2_merged_001.anqdpht.fastq.gz MERGED_ABVEG3_merged_001.anqdpht.fastq.gz MERGED_ABVEG4_merged_001.anqdpht.fastq.gz MERGED_ABVEG5_merged_001.anqdpht.fastq.gz Assembly Stats: A C G T N IUPAC Other GC GC_stdev 0.1768 0.3252 0.3239 0.1741 0.0000 0.0000 0.0000 0.6492 0.0916 Main genome scaffold total: 21751919 Main genome contig total: 21751919 Main genome scaffold sequence total: 19617.098 MB Main genome contig sequence total: 19617.090 MB 0.000% gap Main genome scaffold N/L50: 6098117/861 Main genome contig N/L50: 6098111/861 Main genome scaffold N/L90: 303776/3.324 KB Main genome contig N/L90: 303775/3.324 KB Max scaffold length: 1.716 MB Max contig length: 1.716 MB Number of scaffolds > 50 KB: 2778 % main genome in scaffolds > 50 KB: 1.52% Minimum Number Number Total Total Scaffold Scaffold of of Scaffold Contig Contig Length Scaffolds Contigs Length Length Coverage -------- -------------- -------------- -------------- -------------- -------- All 21,751,919 21,751,919 19,617,097,495 19,617,088,527 100.00% 250 21,751,919 21,751,919 19,617,097,495 19,617,088,527 100.00% 500 21,751,919 21,751,919 19,617,097,495 19,617,088,527 100.00% 1 KB 4,336,369 4,336,369 8,191,704,126 8,191,702,427 100.00% 2.5 KB 567,447 567,447 2,899,256,911 2,899,256,743 100.00% 5 KB 125,373 125,373 1,437,594,612 1,437,594,565 100.00% 10 KB 31,885 31,885 820,999,639 820,999,638 100.00% 25 KB 7,824 7,824 474,284,902 474,284,902 100.00% 50 KB 2,778 2,778 298,838,781 298,838,781 100.00% 100 KB 863 863 168,381,163 168,381,163 100.00% 250 KB 155 155 66,084,244 66,084,244 100.00% 500 KB 34 34 26,074,698 26,074,698 100.00% 1 MB 5 5 6,605,048 6,605,048 100.00% Alignment Stats: The number of reads used as input to aligner is: 2555028980 The number of reads aligned is: 1740176086 (68.108%) Methods: External reads were interleaved with BBTools (1) version 39.18 [reformat.sh in in2 out] and filtered with BBTools version 39.18 [rqcfilter2.sh barcodefilter=f clumpify=t kapa=t khist=t maq=3 maxns=3 minlen=51 mlf=0.33 phix=t pigz=t qtrim=r removecat=t removedog=t removehuman=t removemicrobes=t removemouse=t rna=f sketch tree trimfragadapter=t trimpolyg=5 trimq=0 unpigz=t usejni=f recalibrate=f filterbytile=f]. Filtered JGI reads (2) and filtered external reads were co-assembled with MetaHipMer2 (3) version 2.2.1.0.v2.2.1-1-ge2b8674c-master [mhm2.py --checkpoint=true --post-asm-only=true --checkpoint --adapter-refs all_adapters.fa] on 63 CPU nodes on the NERSC Perlmutter system. Contigs smaller than 500 bp were removed. Alignment information was determined by mapping reads to the assembly reference with MetaHipMer2 version 2.2.1.0.v2.2.1-1-ge2b8674c-master [--post-asm-only]. Coverage was determined by running BBTools (1) version 39.15 [pileup.sh]. (1) B. Bushnell: BBTools software package, http://bbtools.jgi.doe.gov (2) genome.jgi.doe.gov/lookup?keyName=jgiProjectId&keyValue=1541800 (3) Hofmeyr, S. et al. Terabase-scale metagenome coassembly with MetaHipMer. Sci Rep 10, 10689 (2020). https://doi.org/10.1038/s41598-020-67416-5 If you have any questions, please let us know: Robert Riley (rwriley@lbl.gov), Kurt LaButti (klabutti@lbl.gov), or Alex Copeland (accopeland@lbl.gov). The work (proposal: 10.46936/10.25585/60001056) conducted by the U.S. Department of Energy Joint Genome Institute (https://ror.org/04xm1d337), a DOE Office of Science User Facility, is supported by the Office of Science of the U.S. Department of Energy operated under Contract No. DE-AC02-05CH11231. JGI Input Data: Sequencing Project ID: 1140290 Library: BXTSS Platform: Illumina, HiSeq-2500 1TB, Illumina Low Input Fragment, 300bp, Tubes Filtered Data: 11793.5.220090.AGTCAA.filter-METAGENOME.fastq.gz Filtered Read Count: 162721448 Filtered Base Count: 24248052685 Raw Data: 11793.5.220090.AGTCAA.fastq.gz Raw Read Count: 188400466 Raw Base Count: 28448470366 Sequencing Project ID: 1140291 Library: BZZHW Platform: Illumina, HiSeq-2000 1TB, Illumina Low Input Fragment, 300bp, Tubes Filtered Data: 11945.1.229695.ATCACG.filter-METAGENOME.fastq.gz Filtered Read Count: 122455218 Filtered Base Count: 18348332857 Raw Data: 11945.1.229695.ATCACG.fastq.gz Raw Read Count: 123769674 Raw Base Count: 18689220774 Sequencing Project ID: 1140295 Library: BZZHX Platform: Illumina, HiSeq-2000 1TB, Illumina Low Input Fragment, 300bp, Tubes Filtered Data: 11913.4.227489.CGATGT.filter-METAGENOME.fastq.gz Filtered Read Count: 60923640 Filtered Base Count: 9130255871 Raw Data: 11913.4.227489.CGATGT.fastq.gz Raw Read Count: 61783070 Raw Base Count: 9329243570 Sequencing Project ID: 1140295 Library: BZZHX Platform: Illumina, HiSeq-2500 1TB, Illumina Low Input Fragment, 300bp, Tubes Filtered Data: 12038.1.234295.CGATGT.filter-METAGENOME.fastq.gz Filtered Read Count: 72088478 Filtered Base Count: 10804760346 Raw Data: 12038.1.234295.CGATGT.fastq.gz Raw Read Count: 73003560 Raw Base Count: 11023537560 Sequencing Project ID: 1140297 Library: BZZHY Platform: Illumina, HiSeq-2000 1TB, Illumina Low Input Fragment, 300bp, Tubes Filtered Data: 11913.4.227489.TTAGGC.filter-METAGENOME.fastq.gz Filtered Read Count: 59171058 Filtered Base Count: 8869184284 Raw Data: 11913.4.227489.TTAGGC.fastq.gz Raw Read Count: 59406740 Raw Base Count: 8970417740 Sequencing Project ID: 1140297 Library: BZZHY Platform: Illumina, HiSeq-2500 1TB, Illumina Low Input Fragment, 300bp, Tubes Filtered Data: 12038.1.234295.TTAGGC.filter-METAGENOME.fastq.gz Filtered Read Count: 69534746 Filtered Base Count: 10423537078 Raw Data: 12038.1.234295.TTAGGC.fastq.gz Raw Read Count: 69871208 Raw Base Count: 10550552408 Sequencing Project ID: 1140298 Library: BXTST Platform: Illumina, HiSeq-2500 1TB, Illumina Low Input Fragment, 300bp, Tubes Filtered Data: 11793.6.220094.AGTTCC.filter-METAGENOME.fastq.gz Filtered Read Count: 193511086 Filtered Base Count: 28948390492 Raw Data: 11793.6.220094.AGTTCC.fastq.gz Raw Read Count: 205965188 Raw Base Count: 31100743388 Sequencing Project ID: 1140299 Library: BXTSU Platform: Illumina, HiSeq-2500 1TB, Illumina Low Input Fragment, 300bp, Tubes Filtered Data: 11793.6.220094.ATGTCA.filter-METAGENOME.fastq.gz Filtered Read Count: 200784668 Filtered Base Count: 30060466304 Raw Data: 11793.6.220094.ATGTCA.fastq.gz Raw Read Count: 214222344 Raw Base Count: 32347573944 Sequencing Project ID: 1140300 Library: BXTXY Platform: Illumina, HiSeq-2500 1TB, Illumina Regular Fragment, 300bp, Plates Filtered Data: 11774.3.218920.AGTAGTC-GGACTAC.filter-METAGENOME.fastq.gz Filtered Read Count: 91063564 Filtered Base Count: 13632166869 Raw Data: 11774.3.218920.AGTAGTC-GGACTAC.fastq.gz Raw Read Count: 94306158 Raw Base Count: 14240229858 Sequencing Project ID: 1140307 Library: BXTXZ Platform: Illumina, HiSeq-2500 1TB, Illumina Regular Fragment, 300bp, Plates Filtered Data: 11774.1.218908.AGAGCCT-AAGGCTC.filter-METAGENOME.fastq.gz Filtered Read Count: 56669860 Filtered Base Count: 8480326075 Raw Data: 11774.1.218908.AGAGCCT-AAGGCTC.fastq.gz Raw Read Count: 59383746 Raw Base Count: 8966945646 Sequencing Project ID: 1140308 Library: BXTYA Platform: Illumina, HiSeq-2500 1TB, Illumina Regular Fragment, 300bp, Plates Filtered Data: 11774.1.218908.TCTCTTC-GGAAGAG.filter-METAGENOME.fastq.gz Filtered Read Count: 60953476 Filtered Base Count: 9128302546 Raw Data: 11774.1.218908.TCTCTTC-GGAAGAG.fastq.gz Raw Read Count: 64877146 Raw Base Count: 9796449046 Sequencing Project ID: 1140309 Library: BXTYB Platform: Illumina, HiSeq-2500 1TB, Illumina Regular Fragment, 300bp, Plates Filtered Data: 11774.1.218908.GAGGACT-AAGTCCT.filter-METAGENOME.fastq.gz Filtered Read Count: 63242506 Filtered Base Count: 9468476400 Raw Data: 11774.1.218908.GAGGACT-AAGTCCT.fastq.gz Raw Read Count: 66199416 Raw Base Count: 9996111816 Sequencing Project ID: 1140310 Library: BXTSW Platform: Illumina, HiSeq-2500 1TB, Illumina Low Input Fragment, 300bp, Tubes Filtered Data: 11870.1.224598.CCGTCC.filter-METAGENOME.fastq.gz Filtered Read Count: 61958714 Filtered Base Count: 9279438572 Raw Data: 11870.1.224598.CCGTCC.fastq.gz Raw Read Count: 65119652 Raw Base Count: 9833067452 Sequencing Project ID: 1140311 Library: BXTSX Platform: Illumina, HiSeq-2500 1TB, Illumina Low Input Fragment, 300bp, Tubes Filtered Data: 11870.1.224598.GTAGAG.filter-METAGENOME.fastq.gz Filtered Read Count: 64826722 Filtered Base Count: 9708296911 Raw Data: 11870.1.224598.GTAGAG.fastq.gz Raw Read Count: 73464796 Raw Base Count: 11093184196 Sequencing Project ID: 1140313 Library: BXTYC Platform: Illumina, HiSeq-2500 1TB, Illumina Regular Fragment, 300bp, Plates Filtered Data: 11774.1.218908.AGAATGC-GGCATTC.filter-METAGENOME.fastq.gz Filtered Read Count: 50554906 Filtered Base Count: 7564469291 Raw Data: 11774.1.218908.AGAATGC-GGCATTC.fastq.gz Raw Read Count: 53870994 Raw Base Count: 8134520094 Sequencing Project ID: 1140314 Library: BXTYG Platform: Illumina, HiSeq-2500 1TB, Illumina Regular Fragment, 300bp, Plates Filtered Data: 11774.1.218908.GCTGGAT-AATCCAG.filter-METAGENOME.fastq.gz Filtered Read Count: 49669120 Filtered Base Count: 7433492499 Raw Data: 11774.1.218908.GCTGGAT-AATCCAG.fastq.gz Raw Read Count: 52173414 Raw Base Count: 7878185514 Sequencing Project ID: 1140315 Library: BXTSY Platform: Illumina, HiSeq-2500 1TB, Illumina Low Input Fragment, 300bp, Tubes Filtered Data: 11870.1.224598.GTCCGC.filter-METAGENOME.fastq.gz Filtered Read Count: 67594574 Filtered Base Count: 10125713609 Raw Data: 11870.1.224598.GTCCGC.fastq.gz Raw Read Count: 71921570 Raw Base Count: 10860157070 Sequencing Project ID: 1140317 Library: BXTSZ Platform: Illumina, HiSeq-2500 1TB, Illumina Low Input Fragment, 300bp, Tubes Filtered Data: 11870.1.224598.GTGAAA.filter-METAGENOME.fastq.gz Filtered Read Count: 69639150 Filtered Base Count: 10431293924 Raw Data: 11870.1.224598.GTGAAA.fastq.gz Raw Read Count: 76433872 Raw Base Count: 11541514672 Sequencing Project ID: 1140318 Library: BXTTA Platform: Illumina, HiSeq-2500 1TB, Illumina Low Input Fragment, 300bp, Tubes Filtered Data: 11870.1.224598.GTGGCC.filter-METAGENOME.fastq.gz Filtered Read Count: 74837962 Filtered Base Count: 11213091237 Raw Data: 11870.1.224598.GTGGCC.fastq.gz Raw Read Count: 79953144 Raw Base Count: 12072924744