(blastReads) using cached blast results: megablast.2662201_fasta.screen.v.nt.FmLD2a10p98e1e30JFfT ...
(blastReads) using cached filtered blast results: megablast.2662201_fasta.screen.v.nt.FmLD2a10p98e1e30JFfT.parsedL200 ...
(blastReads) extracting gi numbers from megablast.2662201_fasta.screen.v.nt.FmLD2a10p98e1e30JFfT.parsedL200
(blastReads) 1359 unique gi numbers in megablast.2662201_fasta.screen.v.nt.FmLD2a10p98e1e30JFfT.parsedL200
(eukaryotaFilter) filtering gi numbers from megablast.2662201_fasta.screen.v.nt.FmLD2a10p98e1e30JFfT.parsedL200.gi by taxa
(eukaryotaFilter) 44 unique eukaryotic gi numbers in megablast.2662201_fasta.screen.v.nt.FmLD2a10p98e1e30JFfT.parsedL200.gi
(eukaryotaFilter) identifying eukaryotic reads from megablast.2662201_fasta.screen.v.nt.FmLD2a10p98e1e30JFfT.parsedL200
(eukaryotaFilter) 140 eukaryotic reads
(contamFilter) run_megablast 2662201_fasta.screen JGIContaminants
(contamFilter) parsing megablast.2662201_fasta.screen.v.JGIContaminants.FmLD2a10p98e1e30JFfT
(contamFilter) /home/copeland/scripts/blastParser_P.pl megablast.2662201_fasta.screen.v.JGIContaminants.FmLD2a10p98e1e30JFfT
(contamFilter) 746 probable JGI contaminant reads
(qualFilter) identifying low quality reads (<100 Q20 bases)...
(qualFilter) 9870 low quality reads
(smallContigFilter) identifying reads from 2-read contigs...
(smallContigFilter) 113 reads from 2-read contigs found
(cleanFasta) backing up 2662201_fasta.screen (cp 2662201_fasta.screen 2662201_fasta.screen.orig)
(cleanFasta) creating master list of reads to remove...
(cleanFasta) 10425 reads to remove
(cleanFasta) /home/copeland/scripts/fasta_coll.pl -output fasta_good -exclude reads.toRemove -fasta 2662201_fasta.screen
(cleanFasta) verifying read removal...
(cleanFasta) mv fasta_good 2662201_fasta.screen
(cleanFasta) mv fasta_good.qual 2662201_fasta.screen.qual
(cleanFasta) mv reads.toRemove reads_removed

#######################################################################

WARNING. WARNING. WARNING.

14Jun04

This project has been automatically analyzed by the 'draftQD.sh' script,
a crude zeroth order, project clean up script, designed to identify and
remove low quality and contaminant reads from a project. The heuristics
used by the script may both remove valid project data and fail to remove
bona fide contaminants. 

The following sets of reads have been removed from the project fasta in 
this edit_dir only. The trace data of possible contaminants has not been
removed from the project. Therefore, if you are using automated assembly 
procedures which recreate a project fasta from data in the partitions, 
then reads removed in this edit_dir will be present in new file. 

Removed reads lists:

reads.lowQual.q20lt100 -- less than 100 contiguous q20 bases not X
reads.2RdContigs -- all reads from 2 read contigs
reads.possible.eukaryota -- 98%id, 200bp+ blast hits to eukaryotic
	entries in 'nt'

After removing suspect reads from the project fasta, a new assembly
was created in this directory using the cleaned fasta file.


#######################################################################

reads removed from fasta:


140 reads.possible.eukaryota
746 reads.JGIContaminants
113 reads.2RdContigs
9869 reads.lowQual.q20lt100
10424 total unique reads removed

35797 reads prior to clean up


-----------------------------------
Assembly prior to QD

readsTotal:                               46221
readsInAllContigs:                        37854
readsInMajorContigs:                      36062
 
contigsTotal:                               330
contigsMajor:                               176
contigsSingletons:                         8367

basesInConsensusUntrimmed:              2173646
basesInConsensusTrimmed:                2126571 
basesInConsensusMajContigsTrimmed:      1982626
basesInConsensusMajContigsUnrimmed:     2007176

depthMajorContigsTrimmed:                    16

-----------------------------------
Assembly after QD:

readsTotal:                               35797
readsInAllContigs:                        35701
readsInMajorContigs:                      35112
 
contigsTotal:                               269
contigsMajor:                               177

basesInConsensusUntrimmed:              2125955
basesInConsensusTrimmed:                2087146 
basesInConsensusMajContigsTrimmed:      1978218
basesInConsensusMajContigsUnrimmed:     2002539
 
depthMajorContigsTrimmed:                    16