Table of Contents
Project Information
report description
projects db info
taxonomy summary
genome size estimates
contamination summary
project base counts
Libraries and Reads
assembled average insert size estimates
library / read quality summary
reads2plates summary
trimmed read length histograms
library vector screening
GC Content of reads histogram
Contigs and Assemblies
contig size and read count table
depth summary
depth histogram
depth values
Assembler Specific Info
reads in assembly summary from assembler
assembly parameters
Project Information
-------------------------------------------------------------------
Assembly QC Report
Date: 12-12-2005
Runby: Kerrie Barry
Description: analysis of libraries in progress (or initial analysis of completed)
-------------------------------------------------------------------
-------------------------------------------------------------------
Project information from 'PROJECTS' db
-------------------------------------------------------------------
Project Size(KB) TaxID GenusSpecies
3634478 4500 863 Syntrophomonas wolfei
-------------------------------------------------------------------
Taxonomy summary
Command: /home/copeland/scripts/tax2tree.sh Syntrophomonas_wolfei
-------------------------------------------------------------------
Clostridiales, order, eubacteria
Clostridia, class, eubacteria
Syntrophomonadaceae, family, eubacteria
Firmicutes (Gram-positive bacteria), phylum, eubacteria
Syntrophomonas wolfei, species, eubacteria
Syntrophomonas, genus, eubacteria
Bacteria (eubacteria), superkingdom, eubacteria
cellular organisms
root
-------------------------------------------------------------------
Genome size estimates
-------------------------------------------------------------------
# contigs: 4632475
# phrap: 3047800
# db:
altered.
4500000
4060091 +/- 717838
-------------------------------------------------------------------
Contam Summary with *.contigs:
Command: /psf/QC/bin/sparc/summarizeCrossMatchHits -o
-------------------------------------------------------------------
Number of reads with X's: 3421
Number of reads with percent X's >= 20%: 541 = 3.6%
Number of reads with percent X's >= 50%: 490 = 3.2%
Number of reads with percent X's >= 80%: 418 = 2.8%
Total reads in project: 15135
Total bp X'd : 567172
reads >= 20% >= 50% >= 80% screened
Nr with L09136 3031 166 148 122
Nr with pMCL200_JGI_XZX+XZK 390 375 342 296
-------------------------------------------------------------------
Contam Summary with *.singlets:
Command: /psf/QC/bin/sparc/summarizeCrossMatchHits -o -s
-------------------------------------------------------------------
-------------------------------------------------------------------
Base Count for Project:
Command: /home/copeland/scripts/projectBaseCount.pl phrap.out
-------------------------------------------------------------------
A = 4455099
C = 2870916
G = 2871352
T = 4343444
N = 104796
X = 581467
GC fraction = 0.38
Total = 15227074
-------------------------------------------------------------------
Base Count for contigs:
Command: /psf/QC/bin/sparc/faCount 3634478_fasta.screen.contigs
-------------------------------------------------------------------
A 1537447
C 915847
G 912906
T 1539826
N 3615
fraction GC = 0.37
total bases = 4909641
Libraries and Reads
-------------------------------------------------------------------
Histogram of Assembled Average Insert Sizes:
Command: /home/copeland/scripts/phrapView2.pl -p phrap.out -C > reads.list
-------------------------------------------------------------------
-------------------------------------------------------------------
Command: /usr/xpg4/bin/grep AHYO reads.list > grep.reads.list.AHYO
Command: /home/copeland/scripts/histogram2.pl grep.reads.list.AHYO 4 500
-------------------------------------------------------------------
#Found 1703 total values totalling 5688004.0000. <3339.990605 +/- 328.529309>
#Range: [ 1092 - 4097 ]
#Most likely bin: [ 3000 - 3500 ] 1061 counts
#Median bin: [ 3000 - 3500 ] 1061 counts
#Histogram Bins Count Fraction Cum_Fraction
| 1000 - 1500 : [ 7 0.00 0.00 ]
| 1500 - 2000 : [ 5 0.00 0.01 ]
| 2000 - 2500 : [ 9 0.01 0.01 ]
|XXXXX 2500 - 3000 : [ 124 0.07 0.09 ]
|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 3000 - 3500 : [ 1061 0.62 0.71 ]
|XXXXXXXXXXXXXXXXXX 3500 - 4000 : [ 478 0.28 0.99 ]
|X 4000 - 4500 : [ 19 0.01 1.00 ]
-------------------------------------------------------------------
Estimated Assembled Average Insert Sizes:
Command: /home/copeland/scripts/estInsertSize.pl -f phrap.out
-------------------------------------------------------------------
# AHYO 3179 +- 435 (n=878)
# AHYP 4830 +- 2484 (n=95)
-------------------------------------------------------------------
Library / Read Quality summary
extracted from 3634478_fasta.screen.trimQ15.SaF and database (md_run table)
* note J15 is Jazz trimmed length, Q20 is count of Quality 20+ bases
-------------------------------------------------------------------
DB |----J15---| |----Q20---| Fasta |----J15---| |----Q20---|
LIB Reads %pass AvgLen %pass AvgNum Reads %pass AvgLen %pass AvgNun
AHYO 15360 94 762 97 759 7647 89 680 93 694
AHYP 7680 93 813 97 788 7488 93 814 97 812
FWD |----J15---| |----Q20---| REV |----J15---| |----Q20---|
LIB Reads %pass AvgLen %pass AvgNum Reads %pass AvgLen %pass AvgNun
AHYO 3828 90 687 94 704 3819 87 672 91 683
AHYP 3840 92 816 98 813 3648 93 812 97 811
-------------------------------------------------------------------
reads2plates summary
extracted from file: 3634478_fasta.screen.r2p [ from "READ/CLONE COUNT SUMMARY" to end ]
-------------------------------------------------------------------
plate(s) reads clones N/plate avg% LIBRARY @
40 7647 3836 95.90 99.90 AHYO @
40 7488 3840 96.00 100.00 AHYP @
] 15135 7676 95.95 cumulative total@@
LIBRARY PLATE ID COUNT [ AHYO 40 AHYP 40 ] for 80 total 96 well plate ids.
Only indicates plates present in input file.
Make no assumption regarding plates (not) present in project that do not appear above.
-------------------------------------------------------------------
trimt JAZZ trim 15 readlength histogram:
Command: /home/copeland/scripts/histogram2.pl 3634478_fasta.screen.trimQ15.SaF 4 50
-------------------------------------------------------------------
#Found 15135 total values totalling 10265447.0000. <678.258804 +/- 263.380933>
#Range: [ 0 - 977 ]
#Most likely bin: [ 800 - 850 ] 3984 counts
#Median bin: [ 750 - 800 ] 1961 counts
#Histogram Bins Count Fraction Cum_Fraction
|XXXXXXXXXXXXX 0 - 50 : [ 1290 0.09 0.09 ]
|X 50 - 100 : [ 135 0.01 0.09 ]
|X 100 - 150 : [ 115 0.01 0.10 ]
|X 150 - 200 : [ 116 0.01 0.11 ]
|X 200 - 250 : [ 118 0.01 0.12 ]
|X 250 - 300 : [ 112 0.01 0.12 ]
|X 300 - 350 : [ 143 0.01 0.13 ]
|XX 350 - 400 : [ 153 0.01 0.14 ]
|XX 400 - 450 : [ 193 0.01 0.16 ]
|XX 450 - 500 : [ 245 0.02 0.17 ]
|XXX 500 - 550 : [ 286 0.02 0.19 ]
|XXXX 550 - 600 : [ 426 0.03 0.22 ]
|XXXXXX 600 - 650 : [ 584 0.04 0.26 ]
|XXXXXXXXX 650 - 700 : [ 899 0.06 0.32 ]
|XXXXXXXXXXX 700 - 750 : [ 1082 0.07 0.39 ]
|XXXXXXXXXXXXXXXXXXXX 750 - 800 : [ 1961 0.13 0.52 ]
|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 800 - 850 : [ 3984 0.26 0.78 ]
|XXXXXXXXXXXXXXXXXXXXXXXXXXXXX 850 - 900 : [ 2847 0.19 0.97 ]
|XXXX 900 - 950 : [ 430 0.03 1.00 ]
| 950 - 1000 : [ 16 0.00 1.00 ]
trimt JAZZ trim 15 readlength histogram for AHYO
-------------------------------------------------------------------
Command: /usr/xpg4/bin/grep AHYO 3634478_fasta.screen.trimQ15.SaF > reads.trim15.AHYO.rl
Command: /home/copeland/scripts/histogram2.pl reads.trim15.AHYO.rl 2 50
-------------------------------------------------------------------
#Found 7647 total values totalling 4622832.0000. <604.528835 +/- 275.066064>
#Range: [ 0 - 977 ]
#Most likely bin: [ 850 - 900 ] 964 counts
#Median bin: [ 650 - 700 ] 835 counts
#Histogram Bins Count Fraction Cum_Fraction
|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 0 - 50 : [ 746 0.10 0.10 ]
|XXXXX 50 - 100 : [ 121 0.02 0.11 ]
|XXXX 100 - 150 : [ 102 0.01 0.13 ]
|XXXX 150 - 200 : [ 101 0.01 0.14 ]
|XXXX 200 - 250 : [ 104 0.01 0.15 ]
|XXXX 250 - 300 : [ 96 0.01 0.17 ]
|XXXXX 300 - 350 : [ 125 0.02 0.18 ]
|XXXXXX 350 - 400 : [ 141 0.02 0.20 ]
|XXXXXXX 400 - 450 : [ 176 0.02 0.22 ]
|XXXXXXXXXX 450 - 500 : [ 232 0.03 0.25 ]
|XXXXXXXXXXX 500 - 550 : [ 264 0.03 0.29 ]
|XXXXXXXXXXXXXXXXX 550 - 600 : [ 403 0.05 0.34 ]
|XXXXXXXXXXXXXXXXXXXXXXX 600 - 650 : [ 550 0.07 0.41 ]
|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 650 - 700 : [ 835 0.11 0.52 ]
|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 700 - 750 : [ 818 0.11 0.63 ]
|XXXXXXXXXXXXXXXXXXXXXXXXXXXX 750 - 800 : [ 670 0.09 0.72 ]
|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 800 - 850 : [ 881 0.12 0.83 ]
|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 850 - 900 : [ 964 0.13 0.96 ]
|XXXXXXXXXXXXX 900 - 950 : [ 303 0.04 1.00 ]
|X 950 - 1000 : [ 15 0.00 1.00 ]
trimt JAZZ trim 15 readlength histogram for AIGA
trimt JAZZ trim 15 readlength histogram for AIGB
trimt JAZZ trim 15 readlength histogram for AWZG
trimt JAZZ trim 15 readlength histogram for AWZH
trimt JAZZ trim 15 readlength histogram for AWZI
-------------------------------------------------------------------
Library vector screening
Command: /home/copeland/scripts/checkScreen.sh 3634478
-------------------------------------------------------------------
AHYO.000001.000100 pUC18.fa pUC18.fa LRS.fasta
AHYO.000101.000200 pUC18.fa pUC18.fa LRS.fasta
AHYP.000001.000100 pMCL200.fa pMCL200.fa LRS.fasta
AIGA.000001.000100 pMCL200.fa pMCL200.fa LRS.fasta
AIGB.000001.000100 pCC1Fos.fa pCC1Fos.fa LRS.fasta
AHYO.000001.000100 pUC18.fa pUC18.fa LRS.fasta
AHYO.000101.000200 pUC18.fa pUC18.fa LRS.fasta
AHYP.000001.000100 pMCL200.fa pMCL200.fa LRS.fasta
AIGA.000001.000100 pMCL200.fa pMCL200.fa LRS.fasta
AIGB.000001.000100 pCC1Fos.fa pCC1Fos.fa LRS.fasta
-------------------------------------------------------------------
GC content histogram:
Command: /bin/nawk '{print $5+$6}' GC.3634478_fasta.screen.trimQ20 | /home/copeland/scripts/histogram2.pl - 1 0.005
-------------------------------------------------------------------
Contigs and Assemblies
-------------------------------------------------------------------
Command: /usr/local/bin/contig > contig.out [ final 30 lines ]
-------------------------------------------------------------------
Contig 2088. 46 reads; 11942 bp (untrimmed), 11838 (trimmed).
Contig 2089. 47 reads; 9559 bp (untrimmed), 9557 (trimmed).
Contig 2090. 47 reads; 13306 bp (untrimmed), 13067 (trimmed).
Contig 2091. 47 reads; 11347 bp (untrimmed), 11339 (trimmed).
Contig 2092. 48 reads; 8638 bp (untrimmed), 8550 (trimmed).
Contig 2093. 49 reads; 9115 bp (untrimmed), 8910 (trimmed).
Contig 2094. 49 reads; 9801 bp (untrimmed), 9711 (trimmed).
Contig 2095. 50 reads; 11205 bp (untrimmed), 11187 (trimmed).
Contig 2096. 52 reads; 11792 bp (untrimmed), 11749 (trimmed).
Contig 2097. 52 reads; 13650 bp (untrimmed), 13563 (trimmed).
Contig 2098. 53 reads; 12970 bp (untrimmed), 12947 (trimmed).
Contig 2099. 54 reads; 8320 bp (untrimmed), 8167 (trimmed).
Contig 2100. 54 reads; 17011 bp (untrimmed), 16745 (trimmed).
Contig 2101. 55 reads; 10754 bp (untrimmed), 10651 (trimmed).
Contig 2102. 55 reads; 11158 bp (untrimmed), 11112 (trimmed).
Contig 2103. 58 reads; 16232 bp (untrimmed), 16087 (trimmed).
Contig 2104. 60 reads; 14469 bp (untrimmed), 14465 (trimmed).
Contig 2105. 62 reads; 17548 bp (untrimmed), 17520 (trimmed).
Contig 2106. 62 reads; 12371 bp (untrimmed), 12278 (trimmed).
Contig 2107. 64 reads; 15779 bp (untrimmed), 15604 (trimmed).
Contig 2108. 65 reads; 14265 bp (untrimmed), 14088 (trimmed).
Contig 2109. 70 reads; 19511 bp (untrimmed), 19390 (trimmed).
Contig 2110. 74 reads; 17292 bp (untrimmed), 17252 (trimmed).
Contig 2111. 79 reads; 15679 bp (untrimmed), 15613 (trimmed).
Contig 2112. 84 reads; 20756 bp (untrimmed), 20659 (trimmed).
Contig 2113. 89 reads; 14103 bp (untrimmed), 13875 (trimmed).
Contig 2114. 124 reads; 29602 bp (untrimmed), 28752 (trimmed).
--------------------------------------------------------------
Totals 12821 reads; 4909641 bp (untrimmed), 4632475 (trimmed).
-------------------------------------------------------------------
Depth Summary
Command: /home/copeland/scripts/depth_summary.pl depth.out
-------------------------------------------------------------------
depth.out contains 4842974 bases = 2.43 +- 1.57 = 0.73 +- 1.50
-------------------------------------------------------------------
Histogram of All Contig Depth Values:
Command: /home/copeland/scripts/histogram2.pl depth.out 9 0.5
-------------------------------------------------------------------
#Found 2037 total values totalling 4206.6200. <2.065106 +/- 0.805209>
#Range: [ 1.04 - 7.72 ]
#Most likely bin: [ 1.5 - 2 ] 769 counts
#Median bin: [ 1.5 - 2 ] 769 counts
#Histogram Bins Count Fraction Cum_Fraction
|XXXXXXXXXXXXXXXXXXXXXXXX 1 - 1.5 : [ 466 0.23 0.23 ]
|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 1.5 - 2 : [ 769 0.38 0.61 ]
|XXXXXXXXXXXXXXXXXXX 2 - 2.5 : [ 358 0.18 0.78 ]
|XXXXXXXXXX 2.5 - 3 : [ 194 0.10 0.88 ]
|XXXXXX 3 - 3.5 : [ 107 0.05 0.93 ]
|XXXX 3.5 - 4 : [ 76 0.04 0.97 ]
|XX 4 - 4.5 : [ 36 0.02 0.98 ]
|X 4.5 - 5 : [ 17 0.01 0.99 ]
| 5 - 5.5 : [ 7 0.00 1.00 ]
| 5.5 - 6 : [ 2 0.00 1.00 ]
| 6 - 6.5 : [ 3 0.00 1.00 ]
#...
| 7 - 7.5 : [ 1 0.00 1.00 ]
| 7.5 - 8 : [ 1 0.00 1.00 ]
-------------------------------------------------------------------
Histogram of Major Contig Depth Values:
Command: /home/copeland/scripts/histogram2.pl depth.out 9 0.5 3 10 10000000 5 2000 10000000
-------------------------------------------------------------------
#Found 257 total values totalling 888.2100. <3.456070 +/- 0.883951>
#Range: [ 1.76 - 7.72 ]
#Most likely bin: [ 3.5 - 4 ] 61 counts
#Median bin: [ 3 - 3.5 ] 55 counts
#Histogram Bins Count Fraction Cum_Fraction
|X 1.5 - 2 : [ 1 0.00 0.00 ]
|XXXXXXXXXXXXXXXXXX 2 - 2.5 : [ 27 0.11 0.11 ]
|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 2.5 - 3 : [ 59 0.23 0.34 ]
|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 3 - 3.5 : [ 55 0.21 0.55 ]
|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 3.5 - 4 : [ 61 0.24 0.79 ]
|XXXXXXXXXXXXXXXXXX 4 - 4.5 : [ 27 0.11 0.89 ]
|XXXXXXXXX 4.5 - 5 : [ 14 0.05 0.95 ]
|XXXX 5 - 5.5 : [ 6 0.02 0.97 ]
|X 5.5 - 6 : [ 2 0.01 0.98 ]
|XX 6 - 6.5 : [ 3 0.01 0.99 ]
#...
|X 7 - 7.5 : [ 1 0.00 1.00 ]
|X 7.5 - 8 : [ 1 0.00 1.00 ]
-------------------------------------------------------------------
Sorted Depth Values:
Command: sort -n -k 9 depth.out > sorted.depth.out [first and last 20 lines included]
-------------------------------------------------------------------
Contig 440 2 reads 1917 bases = 1.04 +- 0.21 = 1.04 +- 0.21
Contig 475 2 reads 1919 bases = 1.04 +- 0.19 = 1.04 +- 0.19
Contig 123 2 reads 1907 bases = 1.05 +- 0.22 = 1.05 +- 0.22
Contig 229 2 reads 1804 bases = 1.05 +- 0.22 = -0.03 +- 0.97
Contig 231 2 reads 1757 bases = 1.05 +- 0.21 = 0.04 +- 0.98
Contig 443 2 reads 1905 bases = 1.05 +- 0.22 = 1.05 +- 0.22
Contig 522 2 reads 1822 bases = 1.05 +- 0.22 = 1.05 +- 0.22
Contig 581 2 reads 1910 bases = 1.05 +- 0.22 = 0.05 +- 0.97
Contig 620 2 reads 1950 bases = 1.05 +- 0.21 = 1.05 +- 0.21
Contig 390 2 reads 1841 bases = 1.06 +- 0.25 = 0.01 +- 0.97
Contig 524 2 reads 1935 bases = 1.06 +- 0.23 = 1.06 +- 0.23
Contig 575 2 reads 1920 bases = 1.06 +- 0.24 = 1.06 +- 0.24
Contig 139 2 reads 1582 bases = 1.07 +- 0.25 = 1.07 +- 0.25
Contig 261 2 reads 1890 bases = 1.07 +- 0.25 = 1.07 +- 0.25
Contig 532 2 reads 1447 bases = 1.07 +- 0.25 = 1.07 +- 0.25
Contig 648 2 reads 1884 bases = 1.07 +- 0.26 = 1.07 +- 0.26
Contig 97 2 reads 1814 bases = 1.07 +- 0.26 = 1.07 +- 0.26
Contig 615 2 reads 1865 bases = 1.08 +- 0.28 = 1.08 +- 0.28
Contig 341 2 reads 1822 bases = 1.09 +- 0.29 = 1.09 +- 0.29
Contig 410 2 reads 1772 bases = 1.09 +- 0.28 = 1.09 +- 0.28
Contig 2045 31 reads 6120 bases = 4.81 +- 2.59 = 1.58 +- 2.62
Contig 2026 25 reads 4533 bases = 4.89 +- 2.47 = 1.36 +- 2.28
Contig 2093 49 reads 9115 bases = 4.89 +- 2.42 = 0.21 +- 2.76
Contig 1951 15 reads 2435 bases = 4.90 +- 1.62 = 1.72 +- 1.52
Contig 2028 26 reads 4481 bases = 4.92 +- 2.91 = -0.21 +- 2.24
Contig 2048 32 reads 6148 bases = 4.95 +- 2.24 = 0.54 +- 3.07
Contig 1917 12 reads 2079 bases = 5.03 +- 3.19 = 2.32 +- 1.37
Contig 2009 23 reads 4486 bases = 5.07 +- 3.36 = 0.15 +- 1.97
Contig 1759 7 reads 1309 bases = 5.09 +- 1.89 = 2.22 +- 1.06
Contig 2075 40 reads 6954 bases = 5.12 +- 2.66 = 0.48 +- 2.26
Contig 2092 48 reads 8638 bases = 5.12 +- 1.89 = 0.09 +- 1.67
Contig 2099 54 reads 8320 bases = 5.24 +- 4.18 = 0.34 +- 2.55
Contig 1913 12 reads 2114 bases = 5.28 +- 2.05 = 0.86 +- 1.47
Contig 2113 89 reads 14103 bases = 5.69 +- 4.03 = 0.15 +- 2.37
Contig 1952 15 reads 2436 bases = 5.90 +- 2.75 = 2.01 +- 2.14
Contig 1964 16 reads 2662 bases = 6.05 +- 4.10 = 2.27 +- 1.72
Contig 2079 43 reads 6981 bases = 6.10 +- 5.03 = 1.29 +- 3.48
Contig 2081 44 reads 6954 bases = 6.16 +- 4.48 = 0.01 +- 4.54
Contig 2040 28 reads 3426 bases = 7.28 +- 4.12 = 2.10 +- 1.54
Contig 2013 23 reads 2532 bases = 7.72 +- 6.23 = 0.53 +- 0.87
Assembler Specific Info
-------------------------------------------------------------------
Reads in assembly summary
-------------------------------------------------------------------
Small Inserts = 282
HQ Discrepant reads = 297
Chimeric reads = 70
Suspect alignments = 40
-------------------------------------------------------------------
Assembly parameters
-------------------------------------------------------------------
phrap version SPS - 3.57 SUN/Ultra-2/3
Equivalent to Phil Green's version 0.990329
Score matrix (set by value of penalty: -2)
A C G T N X
A 1 -2 -2 -2 0 -3
C -2 1 -2 -2 0 -3
G -2 -2 1 -2 0 -3
T -2 -2 -2 1 0 -3
N 0 0 0 0 0 0
X -3 -3 -3 -3 0 -3
gap_init: -4
gap_ext: -3
ins_gap_ext: -3
del_gap_ext: -3
Using complexity-adjusted scores. Assumed background frequencies:
A: 0.250 C: 0.250 G: 0.250 T: 0.250 N: 0.000 X: 0.000
minmatch: 30
maxmatch: 55
max_group_size: 20
minscore: 55
bandwidth: 14
indexwordsize: 10
vector_bound: 20
word_raw: 0
trim_penalty: -2
trim_score: 20
trim_qual: 13
maxgap: 30
repeat_stringency: 0.950000
qual_show: 20
confirm_length: 8
confirm_trim: 1
confirm_penalty: -5
confirm_score: 30
node_seg: 8
node_space: 4
forcelevel: 0
bypasslevel: 1
max_subclone_size: 50000
File generated in /psf/project/microbe4/3634478/edit_dir.23Nov04.QC