(blastReads) Didn't find blast results file. blast 2351355_fasta.screen against nt ...
cat megablast.2351355_fasta.screen.v.nt.FmLD2a10p98e1e30JFfT.parsed | megablastFilter.sh 98 200 > megablast.2351355_fasta.screen.v.nt.FmLD2a10p98e1e30JFfT.parsedL200
extracting gi numbers from megablast.2351355_fasta.screen.v.nt.FmLD2a10p98e1e30JFfT.parsedL200
(blastReads) 5475 unique gi numbers in megablast.2351355_fasta.screen.v.nt.FmLD2a10p98e1e30JFfT.parsedL200
(eukaryotaFilter) filtering gi numbers from megablast.2351355_fasta.screen.v.nt.FmLD2a10p98e1e30JFfT.parsedL200.gi by taxa
(eukaryotaFilter) 141 unique eukaryotic gi numbers in megablast.2351355_fasta.screen.v.nt.FmLD2a10p98e1e30JFfT.parsedL200.gi
(eukaryotaFilter) identifying eukaryotic reads from megablast.2351355_fasta.screen.v.nt.FmLD2a10p98e1e30JFfT.parsedL200
(eukaryotaFilter) 315 eukaryotic reads
(contamFilter) run_megablast 2351355_fasta.screen JGIContaminants
(contamFilter) parsing megablast.2351355_fasta.screen.v.JGIContaminants.FmLD2a10p98e1e30JFfT
(contamFilter) /home/copeland/scripts/blastParser_P.pl megablast.2351355_fasta.screen.v.JGIContaminants.FmLD2a10p98e1e30JFfT
(contamFilter) 177 probable JGI contaminant reads
(qualFilter) identifying low quality reads...
(qualFilter) 25106 low quality reads
(smallContigFilter) identifying reads from 2 reads contigs...
(smallContigFilter) 1099 reads from 2 reads contigs found
(cleanFasta) creating master list of reads to remove...
(cleanFasta) 26456 reads to remove
(cleanFasta) /home/copeland/scripts/extractReads.pl -o fasta_good -X reads.toRemove 2351355_fasta.screen
(cleanFasta) verifying read removal...
(cleanFasta) backing up 2351355_fasta.screen
(cleanFasta) mv fasta_good 2351355_fasta.screen
(cleanFasta) mv fasta_good.qual 2351355_fasta.screen.qual
(cleanFasta) mv reads.toRemove reads_removed

#######################################################################

WARNING. WARNING. WARNING.

02Apr04

This project has been automatically analyzed by the 'draftQD.sh' script,
a crude zeroth order, project clean up script, designed to identify and
remove low quality and contaminant reads from a project. The heuristics
used by the script may both remove valid project data and fail to remove
bona fide contaminants. 

The following sets of reads have been removed from the project fasta in 
this edit_dir only. The trace data of possible contaminants has not been
removed from the project. Therefore, if you are using automated assembly 
procedures which recreate a project fasta from data in the partitions, 
then reads removed in this edit_dir will be present in new file. 

Removed reads lists:

reads.lowQual.q20lt100 -- less than 100 contiguous q20 bases not X
reads.2RdContigs -- all reads from 2 read contigs
reads.possible.eukaryota -- 98%id, 200bp+ blast hits to eukaryotic
	entries in 'nt'

After removing suspect reads from the project fasta, a new assembly
was created in this directory using the cleaned fasta file.


#######################################################################

reads removed from fasta:


315 reads.possible.eukaryota
177 reads.JGIContaminants
1099 reads.2RdContigs
24979 reads.lowQual.q20lt100
--------------------------------
26455 total unique reads removed


# additional reads removed - these reads were not properly screened and there is
# no information regarding vector type and they are most likely responsible for 
# causing the assembly to dump core. Remove them until vector type can be 
# identified

1700 reads.unknown.vector 
--------------------------------
28155 total unique reads removed

76006 reads prior to clean up
