Welcome to ALE’s documentation!¶
Introduction¶
- Getting Started:
Installation¶
Download the latest source:
$ git clone git@github.com:sc932/ALE.git
Enter the directory and run make:
$ cd ALE/src
$ make
To build this documentation (optional) run:
$ cd ../doc
$ make html
This documentation can now browsed from ../doc/_build/html/index.html
Running ALE¶
After installation we can run ALE from the ALE src directory:
$ ./ALE
Usage: ./ALE [-options] readSorted.[s|b]am assembly.fasta[.gz] ALEoutput.ale
Options:
-h : print out help
From this point if you have a bam/sam file and an assembly you can run ALE directly. What follows is an example of how to create these synthetically from scratch and produce figures similar to that of the paper.
ALE Synthetic Example From Scratch¶
We will make some synthetic reads from the first part of E.Coli K12
First we download the genome for Escherichia_coli_K_12_substr__DH10B:
$ cd example
$ wget ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Bacteria/Escherichia_coli_K_12_substr__DH10B_uid20079/CP000948.fna
Then we extract the first 350k bases:
$ head -50001 CP000948.fna > Ecoli_first350k.fna
Now we generate a set of 2 million synthetic reads (for more info see documentation):
$ ../synthReadGen -ip 1.0 -nr 2000000 -ps 10 -b Ecoli_first350k.fna Ecoli_first350k.fastq
At this point we can run bowtie to get an initial mapping:
$ bowtie-build Ecoli_first350k.fna Ecoli_first350k
$ bowtie -t -I 0 -X 300 --fr -a -l 10 -v 1 -e 300 -S --threads 2 Ecoli_first350k -1 part1_Ecoli_first350k.fastq -2 part2_Ecoli_first350k.fastq Ecoli_first350k.map.sam
Now we have the required .sam and .fna files to run ALE:
$ ../ALE Ecoli_first350k.map.sam Ecoli_first350k.fna Ecoli_first350k.ale
This results in a file Ecoli_first350k.ale in the following format:
$ head -6 Ecoli_first350k.ale
# Reference: gi|170079663|ref|NC_010473.1| 350000
# contig position depth ln(depthLike) ln(placeLike) ln(kmerLike) ln(totalLike)
0 1.000000 -60.000000 0.194888 -5.760798 -65.565910
0 1 3.000000 -60.000000 0.466271 -5.608334 -65.142063
0 2 5.000000 -60.000000 0.010585 -5.541655 -65.531071
0 3 12.000000 -60.000000 -0.057731 -5.380759 -65.438491
We can use this information in its raw format or plot it using plotter3.py
For a more complete example see image_maker
Plotting the Output¶
The authors recommend using IGV to view the output.
http://www.broadinstitute.org/igv/
Just import the assembly, bam and ALE scores. You can convert the .ale file to a set of .wig files with ale2wiggle.py and IGV can read those directly. Depending on your genome size you may want to convert the .wig files to the BigWig format.
Here we show how to use the built in debugging plotter. Note, this plotter is no longer in active development and is not supported. We recommend using IGV.
We can invoke the plotter by running:
$ ./plotter3.py ALEoutput.ale
Which results in output similar to the following figure (link to figure)
For a full list of options please see the plotter3
or run:
$ ./plotter3.py -h
Usage: ./ALE_plotter.py [-options] <inputfile.ale>
where basic options are:
-h : show brief help on version and full usage
-nosave : do not save the figure as a pdf (instead plot to screen)
parameter options accepting <f>loats and <i>ntegers and <s>trings (default):
-s <i> : the starting position to plot (for all contigs, ie a single insert length)
-e <i> : the ending position of the plot
-pt <s> : plot type 't'otal 'k'mer 'p'lacement 'd'epth (-pt dpkt)
-dsw <i> : depth smoothing window, averaging over position (-dsw 10000)
-psw <i> : placement smoothing window (-psw 1000)
-ksw <i> : kmer smoothing window (-ksw 1000)
-t <f> : threshold percentage, see paper (-t 0.99999)
-pt <f> : plot threshold, only plot if more than % of errors (-pt 0.0)
-st <i> : number of standard deviations to engage threshold (-st 5)
-fn <s> : figure name (default: contig name)
-mps <i> : minimum plot size in bp (-mps 20000)
-sc <s> : plot only a specific contig (ie -sc contigName213)
-pmo : plot meta information only (off)
-dpm : don't plot meta information at all (off)
plotter3.py¶
Plotting¶
The authors recommend using IGV to view the output.
http://www.broadinstitute.org/igv/
Just import the assembly, bam and ALE scores. You can convert the .ale file to a set of .wig files with ale2wiggle.py and IGV can read those directly. Depending on your genome size you may want to convert the .wig files to the BigWig format.
Requirements for plotter4.py¶
Running plotter3.py¶
Here we show how to use the built in debugging plotter. Note, this plotter is no longer in active development and is not supported. We recommend using IGV.
We can invoke the plotter by running:
$ ./plotter3.py ALEoutput.ale
Which results in output similar to the following figure (link to figure)
For a full list of options please see the documentation/source below or run:
$ ./plotter3.py -h
Usage: ./ALE_plotter.py [-options] <inputfile.ale>
where basic options are:
-h : show brief help on version and full usage
-nosave : do not save the figure as a pdf (instead plot to screen)
parameter options accepting <f>loats and <i>ntegers and <s>trings (default):
-s <i> : the starting position to plot (for all contigs, ie a single insert length)
-e <i> : the ending position of the plot
-pt <s> : plot type 't'otal 'k'mer 'p'lacement 'd'epth (-pt dpkt)
-dsw <i> : depth smoothing window, averaging over position (-dsw 10000)
-psw <i> : placement smoothing window (-psw 1000)
-ksw <i> : kmer smoothing window (-ksw 1000)
-t <f> : threshold percentage, see paper (-t 0.99999)
-pt <f> : plot threshold, only plot if more than % of errors (-pt 0.0)
-st <i> : number of standard deviations to engage threshold (-st 5)
-fn <s> : figure name (default: contig name)
-mps <i> : minimum plot size in bp (-mps 20000)
-sc <s> : plot only a specific contig (ie -sc contigName213)
-pmo : plot meta information only (off)
-dpm : don't plot meta information at all (off)
plotter3.py functions and classes¶
synthReadGen¶
Compiling synthReadGen¶
By running make in the src/ directory synthReadGen should be automatically compiled:
$ cd src
$ make
Alternately, you can compile it manually with:
$ cc -g -O3 synthReadGen.c -o synthReadGen -lz -lm -Isamtools-0.1.18 -Lsamtools-0.1.18
Running synthReadGen¶
The usage can be found by running:
$./synthReadGen -h
Usage: ./synthReadGen [options] <inputFile> <outputFile>
Options: <i>nt <f>loat [default]
-h : print out this help
-id <i> : set distribution used for insert length
[1 = normal], 2 = poisson
-ld <i> : set distribution used for read length
[1 = normal], 2 = poisson
-im <f> : inward insert length mean [200.0]
-om <f> : outward insert length mean [500.0]
-is <f> : inward insert length std dev [10.0]
-os <f> : outward insert length std dev [15.0]
-ip <f> : probability for an inward read [0.5]
-er <c> : illumina error char [^]
-nr <i> : number of reads to make [1000]
-rl <x> : read length mean [85.0]
-rs <x> : read length sigma [7.0]
-ps <x> : no error for first x bases in a read [0]
-b : outputs two fastq files for bowtie mapping [off]
artificial_errors.py¶
Running artificial_errors.py¶
We can invoke the artificial error maker by running:
$ ./artificial_errors.py [-options] <inputfile.fna>
This will create a new file errors_<inputfile.fna> that has the transformations requested in [-options] (performed left to right). If no options are given the errors_<inputfile.fna> will be identical to <inputfile.fna>.
Options are:
$ ./artificial_errors.py -h
Usage: ./artificial_errors.py [-options] <inputfile.fna>
where basic options are:
-h : show brief help on version and full usage
parameter options accepting <i>ntegers and <s>trings (default):
Note: transformations will be made left to right
-ase <i> <i> : add substitution error at <location> for <length>
-ade <i> <i> : add deletion error at <location> for <length>
-aie <i> <i> : add insertion error at <location> for <length>
-inv <i> <i> : add inversion error at <location> for <length>
-cip <i> <i> : copy part of the assembly at <location> for <length>
-trp <i> : transpose assembly around <pivot>
-ab <i> : add a break (split into 2 contigs) at <location>
-o <s> : output file name (error_ + inputfile.fna)
artificial_errors.py functions and classes¶
- /*
- Copyright (C) 2010,2011,2012 Scott Clark. All rights reserved.
- Developed by:
- Scott Clark
- Cornell University Center for Applied Mathematics
- http://cam.cornell.edu
- AND
- Rob Egan
- Department of Energy Joint Genome Institute
- http://jgi.doe.gov
- Permission is hereby granted, free of charge, to any person obtaining a
- copy of this software and associated documentation files (the “Software”),
- to deal with the Software without restriction, including without limitation
- the rights to use, copy, modify, merge, publish, distribute, sublicense,
- and/or sell copies of the Software, and to permit persons to whom the
- Software is furnished to do so, subject to the following conditions:
- Redistributions of source code must retain the above copyright notice,
- this list of conditions and the following disclaimers.
- Redistributions in binary form must reproduce the above copyright
- notice, this list of conditions and the following disclaimers in the
- documentation and/or other materials provided with the distribution.
- Neither the names of Cornell University, The Joint Genome Institute,
- nor the names of its contributors may be used to endorse or promote
- products derived from this Software without specific prior written
- permission.
- THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
- CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
- DEALINGS WITH THE SOFTWARE.
*/
// For more information on the license please see // The University of Illinois/NCSA Open Source License // http://www.opensource.org/licenses/UoI-NCSA.php
-
artificial_errors.
add_deletion_error
(assembly, location, length)[source]¶ add a deletion error to <assembly> at <location> for a set <length>
-
artificial_errors.
add_insertion_error
(assembly, location, length)[source]¶ add a insertion error to <assembly> at <location> for a set <length>
-
artificial_errors.
add_inversion_error
(assembly, location, length)[source]¶ add a inversion error to <assembly> at <location> for a set <length>
-
artificial_errors.
add_sub_error
(assembly, location, length)[source]¶ add a substitution error to <assembly> at <location> for a set <length>
-
artificial_errors.
copy_in_place
(assembly, location, length)[source]¶ copy a section of <assembly> at <location> for a given <length>
-
artificial_errors.
main
()[source]¶ read in an assembly file and transform it based on the command line options then output it again in another fasta file, see __full_usage__
-
artificial_errors.
output_assembly
(file_name, assembly)[source]¶ output a list of bases as an assembly file <file_name>
image_maker.py¶
image_maker.py functions and classes¶
- /*
- Copyright (C) 2010,2011,2012 Scott Clark. All rights reserved.
- Developed by:
- Scott Clark
- Cornell University Center for Applied Mathematics
- http://cam.cornell.edu
- AND
- Rob Egan
- Department of Energy Joint Genome Institute
- http://jgi.doe.gov
- Permission is hereby granted, free of charge, to any person obtaining a
- copy of this software and associated documentation files (the “Software”),
- to deal with the Software without restriction, including without limitation
- the rights to use, copy, modify, merge, publish, distribute, sublicense,
- and/or sell copies of the Software, and to permit persons to whom the
- Software is furnished to do so, subject to the following conditions:
- Redistributions of source code must retain the above copyright notice,
- this list of conditions and the following disclaimers.
- Redistributions in binary form must reproduce the above copyright
- notice, this list of conditions and the following disclaimers in the
- documentation and/or other materials provided with the distribution.
- Neither the names of Cornell University, The Joint Genome Institute,
- nor the names of its contributors may be used to endorse or promote
- products derived from this Software without specific prior written
- permission.
- THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
- CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
- DEALINGS WITH THE SOFTWARE.
*/
// For more information on the license please see // The University of Illinois/NCSA Open Source License // http://www.opensource.org/licenses/UoI-NCSA.php