(blastReads) using cached blast results: megablast.2351492_fasta.screen.v.nt.FmLD2a10p98e1e30JFfT ... (blastReads) using cached filtered blast results: megablast.2351492_fasta.screen.v.nt.FmLD2a10p98e1e30JFfT.parsedL200 ... (blastReads) using cached gi numbers from megablast.2351492_fasta.screen.v.nt.FmLD2a10p98e1e30JFfT.parsedL200.gi (eukaryotaFilter) filtering gi numbers from megablast.2351492_fasta.screen.v.nt.FmLD2a10p98e1e30JFfT.parsedL200.gi by taxa (eukaryotaFilter) using cached eukaryotic gi list: 2351492.eukaryota.gi ... (eukaryotaFilter) identifying eukaryotic reads from megablast.2351492_fasta.screen.v.nt.FmLD2a10p98e1e30JFfT.parsedL200 (eukaryotaFilter) 16 eukaryotic reads (contamFilter) run_megablast 2351492_fasta.screen JGIContaminants (blastReads) using cached blast results: megablast.2351492_fasta.screen.v.nt.FmLD2a10p98e1e30JFfT ... (blastReads) using cached filtered blast results: megablast.2351492_fasta.screen.v.nt.FmLD2a10p98e1e30JFfT.parsedL200 ... (blastReads) using cached gi numbers from megablast.2351492_fasta.screen.v.nt.FmLD2a10p98e1e30JFfT.parsedL200.gi (eukaryotaFilter) filtering gi numbers from megablast.2351492_fasta.screen.v.nt.FmLD2a10p98e1e30JFfT.parsedL200.gi by taxa (eukaryotaFilter) using cached eukaryotic gi list: 2351492.eukaryota.gi ... (eukaryotaFilter) identifying eukaryotic reads from megablast.2351492_fasta.screen.v.nt.FmLD2a10p98e1e30JFfT.parsedL200 (eukaryotaFilter) 16 eukaryotic reads (contamFilter) using cached blast results: megablast.2351492_fasta.screen.v.JGIContaminants.FmLD2a10p98e1e30JFfT ... (contamFilter) parsing megablast.2351492_fasta.screen.v.JGIContaminants.FmLD2a10p98e1e30JFfT /home/copeland/scripts/blastParser_P.pl megablast.2351492_fasta.screen.v.JGIContaminants.FmLD2a10p98e1e30JFfT 263 probable JGI contaminant reads identifying low quality reads... 8519 low quality reads identifying reads from 2 reads contigs... 79 reads from 2 reads contigs found creating master list of reads to remove... 8839 reads to remove /home/copeland/scripts/extractReads.pl -o fasta_good -X reads.toRemove 2351492_fasta.screen verifying read removal... backing up 2351492_fasta.screen mv fasta_good 2351492_fasta.screen mv fasta_good.qual 2351492_fasta.screen.qual mv reads.toRemove reads_removed ####################################################################### WARNING. WARNING. WARNING. 31Mar04 This project has been automatically analyzed by the 'draftQD.sh' script, a crude zeroth order, project clean up script, designed to identify and remove low quality and contaminant reads from a project. The heuristics used by the script may both remove valid project data and fail to remove bona fide contaminants. The following sets of reads have been removed from the project fasta in this edit_dir only. The trace data of possible contaminants has not been removed from the project. Therefore, if you are using automated assembly procedures which recreate a project fasta from data in the partitions, then reads removed in this edit_dir will be present in new file. Removed reads lists: reads.lowQual.q20lt100 -- less than 100 contiguous q20 bases not X reads.2RdContigs -- all reads from 2 read contigs reads.possible.eukaryota -- 98%id, 200bp+ blast hits to eukaryotic entries in 'nt' After removing suspect reads from the project fasta, a new assembly was created in this directory using the cleaned fasta file. ####################################################################### reads removed from fasta: 16 reads.possible.eukaryota 263 reads.JGIContaminants 79 reads.2RdContigs 8518 reads.lowQual.q20lt100 8838 total unique reads removed 38702 reads prior to clean up