Table of Contents
	Project Information
		report description
		projects db info
		taxonomy summary
		genome size estimates
		contamination summary
		project base counts
	Libraries and Reads
		assembled average insert size estimates
		library / read quality summary
		reads2plates summary
		trimmed read length histograms
		library vector screening
		GC Content of reads histogram
	Contigs and Assemblies
		contig size and read count table
		depth summary
		depth histogram
		depth values
	Assembler Specific Info
		reads in assembly summary from assembler
		assembly parameters

Project Information

------------------------------------------------------------------- Assembly QC Report Date: 12-12-2005 Runby: Kerrie Barry Description: analysis of libraries in progress (or initial analysis of completed) ------------------------------------------------------------------- ------------------------------------------------------------------- Project information from 'PROJECTS' db ------------------------------------------------------------------- Project Size(KB) TaxID GenusSpecies 3634478 4500 863 Syntrophomonas wolfei ------------------------------------------------------------------- Taxonomy summary Command: /home/copeland/scripts/tax2tree.sh Syntrophomonas_wolfei ------------------------------------------------------------------- Clostridiales, order, eubacteria Clostridia, class, eubacteria Syntrophomonadaceae, family, eubacteria Firmicutes (Gram-positive bacteria), phylum, eubacteria Syntrophomonas wolfei, species, eubacteria Syntrophomonas, genus, eubacteria Bacteria (eubacteria), superkingdom, eubacteria cellular organisms root ------------------------------------------------------------------- Genome size estimates ------------------------------------------------------------------- # contigs: 4632475 # phrap: 3047800 # db: altered. 4500000 4060091 +/- 717838 ------------------------------------------------------------------- Contam Summary with *.contigs: Command: /psf/QC/bin/sparc/summarizeCrossMatchHits -o ------------------------------------------------------------------- Number of reads with X's: 3421 Number of reads with percent X's >= 20%: 541 = 3.6% Number of reads with percent X's >= 50%: 490 = 3.2% Number of reads with percent X's >= 80%: 418 = 2.8% Total reads in project: 15135 Total bp X'd : 567172 reads >= 20% >= 50% >= 80% screened Nr with L09136 3031 166 148 122 Nr with pMCL200_JGI_XZX+XZK 390 375 342 296 ------------------------------------------------------------------- Contam Summary with *.singlets: Command: /psf/QC/bin/sparc/summarizeCrossMatchHits -o -s ------------------------------------------------------------------- ------------------------------------------------------------------- Base Count for Project: Command: /home/copeland/scripts/projectBaseCount.pl phrap.out ------------------------------------------------------------------- A = 4455099 C = 2870916 G = 2871352 T = 4343444 N = 104796 X = 581467 GC fraction = 0.38 Total = 15227074 ------------------------------------------------------------------- Base Count for contigs: Command: /psf/QC/bin/sparc/faCount 3634478_fasta.screen.contigs ------------------------------------------------------------------- A 1537447 C 915847 G 912906 T 1539826 N 3615 fraction GC = 0.37 total bases = 4909641

Libraries and Reads

------------------------------------------------------------------- Histogram of Assembled Average Insert Sizes: Command: /home/copeland/scripts/phrapView2.pl -p phrap.out -C > reads.list ------------------------------------------------------------------- ------------------------------------------------------------------- Command: /usr/xpg4/bin/grep AHYO reads.list > grep.reads.list.AHYO Command: /home/copeland/scripts/histogram2.pl grep.reads.list.AHYO 4 500 ------------------------------------------------------------------- #Found 1703 total values totalling 5688004.0000. <3339.990605 +/- 328.529309> #Range: [ 1092 - 4097 ] #Most likely bin: [ 3000 - 3500 ] 1061 counts #Median bin: [ 3000 - 3500 ] 1061 counts #Histogram Bins Count Fraction Cum_Fraction | 1000 - 1500 : [ 7 0.00 0.00 ] | 1500 - 2000 : [ 5 0.00 0.01 ] | 2000 - 2500 : [ 9 0.01 0.01 ] |XXXXX 2500 - 3000 : [ 124 0.07 0.09 ] |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 3000 - 3500 : [ 1061 0.62 0.71 ] |XXXXXXXXXXXXXXXXXX 3500 - 4000 : [ 478 0.28 0.99 ] |X 4000 - 4500 : [ 19 0.01 1.00 ] ------------------------------------------------------------------- Estimated Assembled Average Insert Sizes: Command: /home/copeland/scripts/estInsertSize.pl -f phrap.out ------------------------------------------------------------------- # AHYO 3179 +- 435 (n=878) # AHYP 4830 +- 2484 (n=95) ------------------------------------------------------------------- Library / Read Quality summary extracted from 3634478_fasta.screen.trimQ15.SaF and database (md_run table) * note J15 is Jazz trimmed length, Q20 is count of Quality 20+ bases ------------------------------------------------------------------- DB |----J15---| |----Q20---| Fasta |----J15---| |----Q20---| LIB Reads %pass AvgLen %pass AvgNum Reads %pass AvgLen %pass AvgNun AHYO 15360 94 762 97 759 7647 89 680 93 694 AHYP 7680 93 813 97 788 7488 93 814 97 812 FWD |----J15---| |----Q20---| REV |----J15---| |----Q20---| LIB Reads %pass AvgLen %pass AvgNum Reads %pass AvgLen %pass AvgNun AHYO 3828 90 687 94 704 3819 87 672 91 683 AHYP 3840 92 816 98 813 3648 93 812 97 811 ------------------------------------------------------------------- reads2plates summary extracted from file: 3634478_fasta.screen.r2p [ from "READ/CLONE COUNT SUMMARY" to end ] ------------------------------------------------------------------- plate(s) reads clones N/plate avg% LIBRARY @ 40 7647 3836 95.90 99.90 AHYO @ 40 7488 3840 96.00 100.00 AHYP @ ] 15135 7676 95.95 cumulative total@@ LIBRARY PLATE ID COUNT [ AHYO 40 AHYP 40 ] for 80 total 96 well plate ids. Only indicates plates present in input file. Make no assumption regarding plates (not) present in project that do not appear above. ------------------------------------------------------------------- trimt JAZZ trim 15 readlength histogram: Command: /home/copeland/scripts/histogram2.pl 3634478_fasta.screen.trimQ15.SaF 4 50 ------------------------------------------------------------------- #Found 15135 total values totalling 10265447.0000. <678.258804 +/- 263.380933> #Range: [ 0 - 977 ] #Most likely bin: [ 800 - 850 ] 3984 counts #Median bin: [ 750 - 800 ] 1961 counts #Histogram Bins Count Fraction Cum_Fraction |XXXXXXXXXXXXX 0 - 50 : [ 1290 0.09 0.09 ] |X 50 - 100 : [ 135 0.01 0.09 ] |X 100 - 150 : [ 115 0.01 0.10 ] |X 150 - 200 : [ 116 0.01 0.11 ] |X 200 - 250 : [ 118 0.01 0.12 ] |X 250 - 300 : [ 112 0.01 0.12 ] |X 300 - 350 : [ 143 0.01 0.13 ] |XX 350 - 400 : [ 153 0.01 0.14 ] |XX 400 - 450 : [ 193 0.01 0.16 ] |XX 450 - 500 : [ 245 0.02 0.17 ] |XXX 500 - 550 : [ 286 0.02 0.19 ] |XXXX 550 - 600 : [ 426 0.03 0.22 ] |XXXXXX 600 - 650 : [ 584 0.04 0.26 ] |XXXXXXXXX 650 - 700 : [ 899 0.06 0.32 ] |XXXXXXXXXXX 700 - 750 : [ 1082 0.07 0.39 ] |XXXXXXXXXXXXXXXXXXXX 750 - 800 : [ 1961 0.13 0.52 ] |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 800 - 850 : [ 3984 0.26 0.78 ] |XXXXXXXXXXXXXXXXXXXXXXXXXXXXX 850 - 900 : [ 2847 0.19 0.97 ] |XXXX 900 - 950 : [ 430 0.03 1.00 ] | 950 - 1000 : [ 16 0.00 1.00 ] trimt JAZZ trim 15 readlength histogram for AHYO ------------------------------------------------------------------- Command: /usr/xpg4/bin/grep AHYO 3634478_fasta.screen.trimQ15.SaF > reads.trim15.AHYO.rl Command: /home/copeland/scripts/histogram2.pl reads.trim15.AHYO.rl 2 50 ------------------------------------------------------------------- #Found 7647 total values totalling 4622832.0000. <604.528835 +/- 275.066064> #Range: [ 0 - 977 ] #Most likely bin: [ 850 - 900 ] 964 counts #Median bin: [ 650 - 700 ] 835 counts #Histogram Bins Count Fraction Cum_Fraction |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 0 - 50 : [ 746 0.10 0.10 ] |XXXXX 50 - 100 : [ 121 0.02 0.11 ] |XXXX 100 - 150 : [ 102 0.01 0.13 ] |XXXX 150 - 200 : [ 101 0.01 0.14 ] |XXXX 200 - 250 : [ 104 0.01 0.15 ] |XXXX 250 - 300 : [ 96 0.01 0.17 ] |XXXXX 300 - 350 : [ 125 0.02 0.18 ] |XXXXXX 350 - 400 : [ 141 0.02 0.20 ] |XXXXXXX 400 - 450 : [ 176 0.02 0.22 ] |XXXXXXXXXX 450 - 500 : [ 232 0.03 0.25 ] |XXXXXXXXXXX 500 - 550 : [ 264 0.03 0.29 ] |XXXXXXXXXXXXXXXXX 550 - 600 : [ 403 0.05 0.34 ] |XXXXXXXXXXXXXXXXXXXXXXX 600 - 650 : [ 550 0.07 0.41 ] |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 650 - 700 : [ 835 0.11 0.52 ] |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 700 - 750 : [ 818 0.11 0.63 ] |XXXXXXXXXXXXXXXXXXXXXXXXXXXX 750 - 800 : [ 670 0.09 0.72 ] |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 800 - 850 : [ 881 0.12 0.83 ] |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 850 - 900 : [ 964 0.13 0.96 ] |XXXXXXXXXXXXX 900 - 950 : [ 303 0.04 1.00 ] |X 950 - 1000 : [ 15 0.00 1.00 ] trimt JAZZ trim 15 readlength histogram for AIGA trimt JAZZ trim 15 readlength histogram for AIGB trimt JAZZ trim 15 readlength histogram for AWZG trimt JAZZ trim 15 readlength histogram for AWZH trimt JAZZ trim 15 readlength histogram for AWZI ------------------------------------------------------------------- Library vector screening Command: /home/copeland/scripts/checkScreen.sh 3634478 ------------------------------------------------------------------- AHYO.000001.000100 pUC18.fa pUC18.fa LRS.fasta AHYO.000101.000200 pUC18.fa pUC18.fa LRS.fasta AHYP.000001.000100 pMCL200.fa pMCL200.fa LRS.fasta AIGA.000001.000100 pMCL200.fa pMCL200.fa LRS.fasta AIGB.000001.000100 pCC1Fos.fa pCC1Fos.fa LRS.fasta AHYO.000001.000100 pUC18.fa pUC18.fa LRS.fasta AHYO.000101.000200 pUC18.fa pUC18.fa LRS.fasta AHYP.000001.000100 pMCL200.fa pMCL200.fa LRS.fasta AIGA.000001.000100 pMCL200.fa pMCL200.fa LRS.fasta AIGB.000001.000100 pCC1Fos.fa pCC1Fos.fa LRS.fasta ------------------------------------------------------------------- GC content histogram: Command: /bin/nawk '{print $5+$6}' GC.3634478_fasta.screen.trimQ20 | /home/copeland/scripts/histogram2.pl - 1 0.005 -------------------------------------------------------------------

Contigs and Assemblies

------------------------------------------------------------------- Command: /usr/local/bin/contig > contig.out [ final 30 lines ] ------------------------------------------------------------------- Contig 2088. 46 reads; 11942 bp (untrimmed), 11838 (trimmed). Contig 2089. 47 reads; 9559 bp (untrimmed), 9557 (trimmed). Contig 2090. 47 reads; 13306 bp (untrimmed), 13067 (trimmed). Contig 2091. 47 reads; 11347 bp (untrimmed), 11339 (trimmed). Contig 2092. 48 reads; 8638 bp (untrimmed), 8550 (trimmed). Contig 2093. 49 reads; 9115 bp (untrimmed), 8910 (trimmed). Contig 2094. 49 reads; 9801 bp (untrimmed), 9711 (trimmed). Contig 2095. 50 reads; 11205 bp (untrimmed), 11187 (trimmed). Contig 2096. 52 reads; 11792 bp (untrimmed), 11749 (trimmed). Contig 2097. 52 reads; 13650 bp (untrimmed), 13563 (trimmed). Contig 2098. 53 reads; 12970 bp (untrimmed), 12947 (trimmed). Contig 2099. 54 reads; 8320 bp (untrimmed), 8167 (trimmed). Contig 2100. 54 reads; 17011 bp (untrimmed), 16745 (trimmed). Contig 2101. 55 reads; 10754 bp (untrimmed), 10651 (trimmed). Contig 2102. 55 reads; 11158 bp (untrimmed), 11112 (trimmed). Contig 2103. 58 reads; 16232 bp (untrimmed), 16087 (trimmed). Contig 2104. 60 reads; 14469 bp (untrimmed), 14465 (trimmed). Contig 2105. 62 reads; 17548 bp (untrimmed), 17520 (trimmed). Contig 2106. 62 reads; 12371 bp (untrimmed), 12278 (trimmed). Contig 2107. 64 reads; 15779 bp (untrimmed), 15604 (trimmed). Contig 2108. 65 reads; 14265 bp (untrimmed), 14088 (trimmed). Contig 2109. 70 reads; 19511 bp (untrimmed), 19390 (trimmed). Contig 2110. 74 reads; 17292 bp (untrimmed), 17252 (trimmed). Contig 2111. 79 reads; 15679 bp (untrimmed), 15613 (trimmed). Contig 2112. 84 reads; 20756 bp (untrimmed), 20659 (trimmed). Contig 2113. 89 reads; 14103 bp (untrimmed), 13875 (trimmed). Contig 2114. 124 reads; 29602 bp (untrimmed), 28752 (trimmed). -------------------------------------------------------------- Totals 12821 reads; 4909641 bp (untrimmed), 4632475 (trimmed). ------------------------------------------------------------------- Depth Summary Command: /home/copeland/scripts/depth_summary.pl depth.out ------------------------------------------------------------------- depth.out contains 4842974 bases = 2.43 +- 1.57 = 0.73 +- 1.50 ------------------------------------------------------------------- Histogram of All Contig Depth Values: Command: /home/copeland/scripts/histogram2.pl depth.out 9 0.5 ------------------------------------------------------------------- #Found 2037 total values totalling 4206.6200. <2.065106 +/- 0.805209> #Range: [ 1.04 - 7.72 ] #Most likely bin: [ 1.5 - 2 ] 769 counts #Median bin: [ 1.5 - 2 ] 769 counts #Histogram Bins Count Fraction Cum_Fraction |XXXXXXXXXXXXXXXXXXXXXXXX 1 - 1.5 : [ 466 0.23 0.23 ] |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 1.5 - 2 : [ 769 0.38 0.61 ] |XXXXXXXXXXXXXXXXXXX 2 - 2.5 : [ 358 0.18 0.78 ] |XXXXXXXXXX 2.5 - 3 : [ 194 0.10 0.88 ] |XXXXXX 3 - 3.5 : [ 107 0.05 0.93 ] |XXXX 3.5 - 4 : [ 76 0.04 0.97 ] |XX 4 - 4.5 : [ 36 0.02 0.98 ] |X 4.5 - 5 : [ 17 0.01 0.99 ] | 5 - 5.5 : [ 7 0.00 1.00 ] | 5.5 - 6 : [ 2 0.00 1.00 ] | 6 - 6.5 : [ 3 0.00 1.00 ] #... | 7 - 7.5 : [ 1 0.00 1.00 ] | 7.5 - 8 : [ 1 0.00 1.00 ] ------------------------------------------------------------------- Histogram of Major Contig Depth Values: Command: /home/copeland/scripts/histogram2.pl depth.out 9 0.5 3 10 10000000 5 2000 10000000 ------------------------------------------------------------------- #Found 257 total values totalling 888.2100. <3.456070 +/- 0.883951> #Range: [ 1.76 - 7.72 ] #Most likely bin: [ 3.5 - 4 ] 61 counts #Median bin: [ 3 - 3.5 ] 55 counts #Histogram Bins Count Fraction Cum_Fraction |X 1.5 - 2 : [ 1 0.00 0.00 ] |XXXXXXXXXXXXXXXXXX 2 - 2.5 : [ 27 0.11 0.11 ] |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 2.5 - 3 : [ 59 0.23 0.34 ] |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 3 - 3.5 : [ 55 0.21 0.55 ] |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 3.5 - 4 : [ 61 0.24 0.79 ] |XXXXXXXXXXXXXXXXXX 4 - 4.5 : [ 27 0.11 0.89 ] |XXXXXXXXX 4.5 - 5 : [ 14 0.05 0.95 ] |XXXX 5 - 5.5 : [ 6 0.02 0.97 ] |X 5.5 - 6 : [ 2 0.01 0.98 ] |XX 6 - 6.5 : [ 3 0.01 0.99 ] #... |X 7 - 7.5 : [ 1 0.00 1.00 ] |X 7.5 - 8 : [ 1 0.00 1.00 ] ------------------------------------------------------------------- Sorted Depth Values: Command: sort -n -k 9 depth.out > sorted.depth.out [first and last 20 lines included] ------------------------------------------------------------------- Contig 440 2 reads 1917 bases = 1.04 +- 0.21 = 1.04 +- 0.21 Contig 475 2 reads 1919 bases = 1.04 +- 0.19 = 1.04 +- 0.19 Contig 123 2 reads 1907 bases = 1.05 +- 0.22 = 1.05 +- 0.22 Contig 229 2 reads 1804 bases = 1.05 +- 0.22 = -0.03 +- 0.97 Contig 231 2 reads 1757 bases = 1.05 +- 0.21 = 0.04 +- 0.98 Contig 443 2 reads 1905 bases = 1.05 +- 0.22 = 1.05 +- 0.22 Contig 522 2 reads 1822 bases = 1.05 +- 0.22 = 1.05 +- 0.22 Contig 581 2 reads 1910 bases = 1.05 +- 0.22 = 0.05 +- 0.97 Contig 620 2 reads 1950 bases = 1.05 +- 0.21 = 1.05 +- 0.21 Contig 390 2 reads 1841 bases = 1.06 +- 0.25 = 0.01 +- 0.97 Contig 524 2 reads 1935 bases = 1.06 +- 0.23 = 1.06 +- 0.23 Contig 575 2 reads 1920 bases = 1.06 +- 0.24 = 1.06 +- 0.24 Contig 139 2 reads 1582 bases = 1.07 +- 0.25 = 1.07 +- 0.25 Contig 261 2 reads 1890 bases = 1.07 +- 0.25 = 1.07 +- 0.25 Contig 532 2 reads 1447 bases = 1.07 +- 0.25 = 1.07 +- 0.25 Contig 648 2 reads 1884 bases = 1.07 +- 0.26 = 1.07 +- 0.26 Contig 97 2 reads 1814 bases = 1.07 +- 0.26 = 1.07 +- 0.26 Contig 615 2 reads 1865 bases = 1.08 +- 0.28 = 1.08 +- 0.28 Contig 341 2 reads 1822 bases = 1.09 +- 0.29 = 1.09 +- 0.29 Contig 410 2 reads 1772 bases = 1.09 +- 0.28 = 1.09 +- 0.28 Contig 2045 31 reads 6120 bases = 4.81 +- 2.59 = 1.58 +- 2.62 Contig 2026 25 reads 4533 bases = 4.89 +- 2.47 = 1.36 +- 2.28 Contig 2093 49 reads 9115 bases = 4.89 +- 2.42 = 0.21 +- 2.76 Contig 1951 15 reads 2435 bases = 4.90 +- 1.62 = 1.72 +- 1.52 Contig 2028 26 reads 4481 bases = 4.92 +- 2.91 = -0.21 +- 2.24 Contig 2048 32 reads 6148 bases = 4.95 +- 2.24 = 0.54 +- 3.07 Contig 1917 12 reads 2079 bases = 5.03 +- 3.19 = 2.32 +- 1.37 Contig 2009 23 reads 4486 bases = 5.07 +- 3.36 = 0.15 +- 1.97 Contig 1759 7 reads 1309 bases = 5.09 +- 1.89 = 2.22 +- 1.06 Contig 2075 40 reads 6954 bases = 5.12 +- 2.66 = 0.48 +- 2.26 Contig 2092 48 reads 8638 bases = 5.12 +- 1.89 = 0.09 +- 1.67 Contig 2099 54 reads 8320 bases = 5.24 +- 4.18 = 0.34 +- 2.55 Contig 1913 12 reads 2114 bases = 5.28 +- 2.05 = 0.86 +- 1.47 Contig 2113 89 reads 14103 bases = 5.69 +- 4.03 = 0.15 +- 2.37 Contig 1952 15 reads 2436 bases = 5.90 +- 2.75 = 2.01 +- 2.14 Contig 1964 16 reads 2662 bases = 6.05 +- 4.10 = 2.27 +- 1.72 Contig 2079 43 reads 6981 bases = 6.10 +- 5.03 = 1.29 +- 3.48 Contig 2081 44 reads 6954 bases = 6.16 +- 4.48 = 0.01 +- 4.54 Contig 2040 28 reads 3426 bases = 7.28 +- 4.12 = 2.10 +- 1.54 Contig 2013 23 reads 2532 bases = 7.72 +- 6.23 = 0.53 +- 0.87

Assembler Specific Info

------------------------------------------------------------------- Reads in assembly summary ------------------------------------------------------------------- Small Inserts = 282 HQ Discrepant reads = 297 Chimeric reads = 70 Suspect alignments = 40 ------------------------------------------------------------------- Assembly parameters ------------------------------------------------------------------- phrap version SPS - 3.57 SUN/Ultra-2/3 Equivalent to Phil Green's version 0.990329 Score matrix (set by value of penalty: -2) A C G T N X A 1 -2 -2 -2 0 -3 C -2 1 -2 -2 0 -3 G -2 -2 1 -2 0 -3 T -2 -2 -2 1 0 -3 N 0 0 0 0 0 0 X -3 -3 -3 -3 0 -3 gap_init: -4 gap_ext: -3 ins_gap_ext: -3 del_gap_ext: -3 Using complexity-adjusted scores. Assumed background frequencies: A: 0.250 C: 0.250 G: 0.250 T: 0.250 N: 0.000 X: 0.000 minmatch: 30 maxmatch: 55 max_group_size: 20 minscore: 55 bandwidth: 14 indexwordsize: 10 vector_bound: 20 word_raw: 0 trim_penalty: -2 trim_score: 20 trim_qual: 13 maxgap: 30 repeat_stringency: 0.950000 qual_show: 20 confirm_length: 8 confirm_trim: 1 confirm_penalty: -5 confirm_score: 30 node_seg: 8 node_space: 4 forcelevel: 0 bypasslevel: 1 max_subclone_size: 50000 File generated in /psf/project/microbe4/3634478/edit_dir.23Nov04.QC