Found a core file...check assembly and blast results before proceeding
Found a core file...check assembly and blast results before proceeding
(blastReads) Didn't find blast results file. BLASTING 3634475_fasta.screen against nt ...
(blastReads) using cached parsed blast results (83595 lines): megablast.3634475_fasta.screen.v.nt.FmLD2a10p98e1e30JFfTI
(blastReads) cat megablast.3634475_fasta.screen.v.nt.FmLD2a10p98e1e30JFfTI.parsed | megablastFilter.sh 98 200 > megablast.3634475_fasta.screen.v.nt.FmLD2a10p98e1e30JFfTI.parsed.P98L200
(blastReads) Looking for cached gi file megablast.3634475_fasta.screen.v.nt.FmLD2a10p98e1e30JFfTI.parsed.P98L200.gi ...
(blastReads) Extracting gi numbers from megablast.3634475_fasta.screen.v.nt.FmLD2a10p98e1e30JFfTI.parsed.P98L200
(blastReads) 147 unique gi numbers in megablast.3634475_fasta.screen.v.nt.FmLD2a10p98e1e30JFfTI.parsed.P98L200.gi
(eukaryotaFilter) filtering gi numbers from megablast.3634475_fasta.screen.v.nt.FmLD2a10p98e1e30JFfTI.parsed.P98L200.gi by taxa
(eukaryotaFilter) 22 unique eukaryotic gi numbers in megablast.3634475_fasta.screen.v.nt.FmLD2a10p98e1e30JFfTI.parsed.P98L200.gi
(eukaryotaFilter) identifying eukaryotic reads from megablast.3634475_fasta.screen.v.nt.FmLD2a10p98e1e30JFfTI.parsed.P98L200
(eukaryotaFilter) 20 eukaryotic reads
(contamFilter) Blast 3634475_fasta.screen against JGIContaminants
(contamFilter) parsing megablast.3634475_fasta.screen.v.JGIContaminants.FmLD2a10p98e1e30JFfTI
(contamFilter) Parsing megablast.3634475_fasta.screen.v.JGIContaminants.FmLD2a10p98e1e30JFfTI
(contamFilter) 0 probable JGI contaminant reads
(vectorFilter) run_megablast 3634475_fasta.screen JGIVectors
(vectorFilter) parsing megablast.3634475_fasta.screen.v.JGIVectors.FmLD2a10p98e1e30JFfTI
(vectorFilter) /home/copeland/scripts/blastParser_P.pl megablast.3634475_fasta.screen.v.JGIVectors.FmLD2a10p98e1e30JFfTI
(vectorFilter) 408 probable JGI vector reads
(qualFilter) identifying low quality reads (<100 Q20 bases)...
(qualFilter) 13194 low quality reads
Found a core file...check assembly and blast results before proceeding
(blastReads) using cached blast results (1358247 lines): megablast.3634475_fasta.screen.v.nt.FmLD2a10p98e1e30JFfTI
(blastReads) using cached parsed blast results (83595 lines): megablast.3634475_fasta.screen.v.nt.FmLD2a10p98e1e30JFfTI
(blastReads) using cached filtered blast results: megablast.3634475_fasta.screen.v.nt.FmLD2a10p98e1e30JFfTI.parsed.P98L200 ...
(blastReads) Looking for cached gi file megablast.3634475_fasta.screen.v.nt.FmLD2a10p98e1e30JFfTI.parsed.P98L200.gi ...
(blastReads) using cached gi numbers from megablast.3634475_fasta.screen.v.nt.FmLD2a10p98e1e30JFfTI.parsed.P98L200.gi
(eukaryotaFilter) filtering gi numbers from megablast.3634475_fasta.screen.v.nt.FmLD2a10p98e1e30JFfTI.parsed.P98L200.gi by taxa
(eukaryotaFilter) using cached eukaryotic gi list: 3634475.eukaryota.gi ...
(eukaryotaFilter) identifying eukaryotic reads from megablast.3634475_fasta.screen.v.nt.FmLD2a10p98e1e30JFfTI.parsed.P98L200
(eukaryotaFilter) 20 eukaryotic reads
(contamFilter) using cached blast results 88 lines): megablast.3634475_fasta.screen.v.JGIContaminants.FmLD2a10p98e1e30JFfTI ...
(contamFilter) parsing megablast.3634475_fasta.screen.v.JGIContaminants.FmLD2a10p98e1e30JFfTI
(contamFilter) Parsing megablast.3634475_fasta.screen.v.JGIContaminants.FmLD2a10p98e1e30JFfTI
(contamFilter) 0 probable JGI contaminant reads
(vectorFilter) using cached blast results (46323 lines): megablast.3634475_fasta.screen.v.JGIVectors.FmLD2a10p98e1e30JFfTI ...
(vectorFilter) parsing megablast.3634475_fasta.screen.v.JGIVectors.FmLD2a10p98e1e30JFfTI
(vectorFilter) /home/copeland/scripts/blastParser_P.pl megablast.3634475_fasta.screen.v.JGIVectors.FmLD2a10p98e1e30JFfTI
(vectorFilter) 408 probable JGI vector reads
(qualFilter) identifying low quality reads (<100 Q20 bases)...
(qualFilter) using cached copy of low quality reads file: reads.lowQual.q20lt100
(qualFilter) 13194 low quality reads
(smallContigFilter) identifying reads from 2-read contigs...
(smallContigFilter) 0 reads from 2-read contigs found
(cleanFasta) backing up 3634475_fasta.screen (cp 3634475_fasta.screen 3634475_fasta.screen.orig)
(cleanFasta) creating master list of reads to remove...
(cleanFasta) 13214 reads to remove
(cleanFasta) /home/copeland/scripts/fasta_coll.pl -output fasta_good -exclude reads.toRemove -fasta 3634475_fasta.screen
(cleanFasta) verifying read removal.../home/copeland/local/SPARC/bin/agrep-XL -c -f reads.toRemove fasta_good
(cleanFasta) mv fasta_good 3634475_fasta.screen
(cleanFasta) mv fasta_good.qual 3634475_fasta.screen.qual
(cleanFasta) mv reads.toRemove reads_removed

#######################################################################

WARNING. WARNING. WARNING.

11Aug04

This project has been automatically analyzed by the 'draftQD.sh' script,
a crude zeroth order, project clean up script, designed to identify and
remove low quality and contaminant reads from a project. The heuristics
used by the script may both remove valid project data and fail to remove
bona fide contaminants. 

The following sets of reads have been removed from the project fasta in 
this edit_dir only. The trace data of possible contaminants has not been
removed from the project. Therefore, if you are using automated assembly 
procedures which recreate a project fasta from data in the partitions, 
then reads removed in this edit_dir will be present in new file. 

Removed reads lists:

reads.lowQual.q20lt100 -- less than 100 contiguous q20 bases not X
reads.2RdContigs -- all reads from 2 read contigs
reads.possible.eukaryota -- 98%id, 200bp+ blast hits to eukaryotic
	entries in 'nt'

After removing suspect reads from the project fasta, a new assembly
was created in this directory using the cleaned fasta file.


#######################################################################

reads removed from fasta:


20 reads.possible.eukaryota
0 reads.JGIContaminants
0 reads.2RdContigs
13193 reads.lowQual.q20lt100
13213 total unique reads removed

34899 reads prior to clean up
