Welcome to ALE’s documentation!

Introduction

Getting Started:
  1. Installation
  2. Running ALE
  3. Plotting the Output

Installation

Download the latest source:

$ git clone git@github.com:sc932/ALE.git

Enter the directory and run make:

$ cd ALE/src
$ make

To build this documentation (optional) run:

$ cd ../doc
$ make html

This documentation can now browsed from ../doc/_build/html/index.html

Running ALE

After installation we can run ALE from the ALE src directory:

$ ./ALE
Usage: ./ALE [-options] readSorted.[s|b]am assembly.fasta[.gz] ALEoutput.ale
   Options:
   -h : print out help

From this point if you have a bam/sam file and an assembly you can run ALE directly. What follows is an example of how to create these synthetically from scratch and produce figures similar to that of the paper.

ALE Synthetic Example From Scratch

We will make some synthetic reads from the first part of E.Coli K12

First we download the genome for Escherichia_coli_K_12_substr__DH10B:

$ cd example
$ wget ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Bacteria/Escherichia_coli_K_12_substr__DH10B_uid20079/CP000948.fna

Then we extract the first 350k bases:

$ head -50001 CP000948.fna > Ecoli_first350k.fna

Now we generate a set of 2 million synthetic reads (for more info see documentation):

$ ../synthReadGen -ip 1.0 -nr 2000000 -ps 10 -b Ecoli_first350k.fna Ecoli_first350k.fastq

At this point we can run bowtie to get an initial mapping:

$ bowtie-build Ecoli_first350k.fna Ecoli_first350k
$ bowtie -t -I 0 -X 300 --fr -a -l 10 -v 1 -e 300 -S --threads 2 Ecoli_first350k -1 part1_Ecoli_first350k.fastq  -2 part2_Ecoli_first350k.fastq Ecoli_first350k.map.sam

Now we have the required .sam and .fna files to run ALE:

$ ../ALE Ecoli_first350k.map.sam Ecoli_first350k.fna Ecoli_first350k.ale

This results in a file Ecoli_first350k.ale in the following format:

$ head -6 Ecoli_first350k.ale
# Reference: gi|170079663|ref|NC_010473.1| 350000
# contig position depth ln(depthLike) ln(placeLike) ln(kmerLike) ln(totalLike)
0 1.000000 -60.000000 0.194888 -5.760798 -65.565910
0 1 3.000000 -60.000000 0.466271 -5.608334 -65.142063
0 2 5.000000 -60.000000 0.010585 -5.541655 -65.531071
0 3 12.000000 -60.000000 -0.057731 -5.380759 -65.438491

We can use this information in its raw format or plot it using plotter3.py

For a more complete example see image_maker

Plotting the Output

The authors recommend using IGV to view the output.

http://www.broadinstitute.org/igv/

Just import the assembly, bam and ALE scores. You can convert the .ale file to a set of .wig files with ale2wiggle.py and IGV can read those directly. Depending on your genome size you may want to convert the .wig files to the BigWig format.

Here we show how to use the built in debugging plotter. Note, this plotter is no longer in active development and is not supported. We recommend using IGV.

We can invoke the plotter by running:

$ ./plotter3.py ALEoutput.ale

Which results in output similar to the following figure (link to figure)

example/Ecoli_first352k.ale.pdf.png

For a full list of options please see the plotter3 or run:

$ ./plotter3.py -h
Usage: ./ALE_plotter.py [-options] <inputfile.ale>

where basic options are:
  -h      : show brief help on version and full usage
  -nosave : do not save the figure as a pdf (instead plot to screen)

parameter options accepting <f>loats and <i>ntegers and <s>trings (default):
  -s <i>   : the starting position to plot (for all contigs, ie a single insert length)
  -e <i>   : the ending position of the plot
  -pt <s>  : plot type 't'otal 'k'mer 'p'lacement 'd'epth (-pt dpkt)
  -dsw <i> : depth smoothing window, averaging over position (-dsw 10000)
  -psw <i> : placement smoothing window (-psw 1000)
  -ksw <i> : kmer smoothing window (-ksw 1000)
  -t <f>   : threshold percentage, see paper (-t 0.99999)
  -pt <f>  : plot threshold, only plot if more than % of errors (-pt 0.0)
  -st <i>  : number of standard deviations to engage threshold (-st 5)
  -fn <s>  : figure name (default: contig name)
  -mps <i> : minimum plot size in bp (-mps 20000)
  -sc <s>  : plot only a specific contig (ie -sc contigName213)
  -pmo     : plot meta information only (off)
  -dpm     : don't plot meta information at all (off)

plotter3.py

Jump to:
  1. `Requirements for plotter3.py`_
  2. Running plotter3.py
  3. plotter3.py functions and classes

Plotting

The authors recommend using IGV to view the output.

http://www.broadinstitute.org/igv/

Just import the assembly, bam and ALE scores. You can convert the .ale file to a set of .wig files with ale2wiggle.py and IGV can read those directly. Depending on your genome size you may want to convert the .wig files to the BigWig format.

Requirements for plotter4.py

Running plotter3.py

Here we show how to use the built in debugging plotter. Note, this plotter is no longer in active development and is not supported. We recommend using IGV.

We can invoke the plotter by running:

$ ./plotter3.py ALEoutput.ale

Which results in output similar to the following figure (link to figure)

example/Ecoli_first350k.ale.pdf.png

For a full list of options please see the documentation/source below or run:

$ ./plotter3.py -h
Usage: ./ALE_plotter.py [-options] <inputfile.ale>

where basic options are:
  -h      : show brief help on version and full usage
  -nosave : do not save the figure as a pdf (instead plot to screen)

parameter options accepting <f>loats and <i>ntegers and <s>trings (default):
  -s <i>   : the starting position to plot (for all contigs, ie a single insert length)
  -e <i>   : the ending position of the plot
  -pt <s>  : plot type 't'otal 'k'mer 'p'lacement 'd'epth (-pt dpkt)
  -dsw <i> : depth smoothing window, averaging over position (-dsw 10000)
  -psw <i> : placement smoothing window (-psw 1000)
  -ksw <i> : kmer smoothing window (-ksw 1000)
  -t <f>   : threshold percentage, see paper (-t 0.99999)
  -pt <f>  : plot threshold, only plot if more than % of errors (-pt 0.0)
  -st <i>  : number of standard deviations to engage threshold (-st 5)
  -fn <s>  : figure name (default: contig name)
  -mps <i> : minimum plot size in bp (-mps 20000)
  -sc <s>  : plot only a specific contig (ie -sc contigName213)
  -pmo     : plot meta information only (off)
  -dpm     : don't plot meta information at all (off)

plotter3.py functions and classes

synthReadGen

Jump to:
  1. Compiling synthReadGen
  2. Running synthReadGen

Compiling synthReadGen

By running make in the src/ directory synthReadGen should be automatically compiled:

$ cd src
$ make

Alternately, you can compile it manually with:

$ cc -g -O3 synthReadGen.c -o synthReadGen -lz -lm -Isamtools-0.1.18 -Lsamtools-0.1.18

Running synthReadGen

The usage can be found by running:

$./synthReadGen -h
Usage: ./synthReadGen [options] <inputFile> <outputFile>

Options: <i>nt <f>loat [default]
  -h      : print out this help
  -id <i> : set distribution used for insert length
            [1 = normal], 2 = poisson
  -ld <i> : set distribution used for read length
            [1 = normal], 2 = poisson
  -im <f> : inward insert length mean [200.0]
  -om <f> : outward insert length mean [500.0]
  -is <f> : inward insert length std dev [10.0]
  -os <f> : outward insert length std dev [15.0]
  -ip <f> : probability for an inward read [0.5]
  -er <c> : illumina error char [^]
  -nr <i> : number of reads to make [1000]
  -rl <x> : read length mean [85.0]
  -rs <x> : read length sigma [7.0]
  -ps <x> : no error for first x bases in a read [0]
  -b      : outputs two fastq files for bowtie mapping [off]

artificial_errors.py

Jump to:
  1. Running artificial_errors.py
  2. artificial_errors.py functions and classes

Running artificial_errors.py

We can invoke the artificial error maker by running:

$ ./artificial_errors.py [-options] <inputfile.fna>

This will create a new file errors_<inputfile.fna> that has the transformations requested in [-options] (performed left to right). If no options are given the errors_<inputfile.fna> will be identical to <inputfile.fna>.

Options are:

$ ./artificial_errors.py -h
Usage: ./artificial_errors.py [-options] <inputfile.fna>

where basic options are:
  -h      : show brief help on version and full usage

parameter options accepting <i>ntegers and <s>trings (default):
  Note: transformations will be made left to right
  -ase <i> <i> : add substitution error at <location> for <length>
  -ade <i> <i> : add deletion error at <location> for <length>
  -aie <i> <i> : add insertion error at <location> for <length>
  -inv <i> <i> : add inversion error at <location> for <length>
  -cip <i> <i> : copy part of the assembly at <location> for <length>
  -trp <i>     : transpose assembly around <pivot>
  -ab  <i>     : add a break (split into 2 contigs) at <location>
  -o   <s>     : output file name (error_ + inputfile.fna)

artificial_errors.py functions and classes

/*
  • Copyright (C) 2010,2011,2012 Scott Clark. All rights reserved.
  • Developed by:
  • Scott Clark
  • Cornell University Center for Applied Mathematics
  • http://cam.cornell.edu
  • AND
  • Rob Egan
  • Department of Energy Joint Genome Institute
  • http://jgi.doe.gov
  • Permission is hereby granted, free of charge, to any person obtaining a
  • copy of this software and associated documentation files (the “Software”),
  • to deal with the Software without restriction, including without limitation
  • the rights to use, copy, modify, merge, publish, distribute, sublicense,
  • and/or sell copies of the Software, and to permit persons to whom the
  • Software is furnished to do so, subject to the following conditions:
    1. Redistributions of source code must retain the above copyright notice,
  • this list of conditions and the following disclaimers.
    1. Redistributions in binary form must reproduce the above copyright
  • notice, this list of conditions and the following disclaimers in the
  • documentation and/or other materials provided with the distribution.
    1. Neither the names of Cornell University, The Joint Genome Institute,
  • nor the names of its contributors may be used to endorse or promote
  • products derived from this Software without specific prior written
  • permission.
  • THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
  • IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  • FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
  • CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
  • LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
  • FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
  • DEALINGS WITH THE SOFTWARE.

*/

// For more information on the license please see // The University of Illinois/NCSA Open Source License // http://www.opensource.org/licenses/UoI-NCSA.php

artificial_errors.add_break(assembly, location)[source]

add a break in <assembly> at <location>

artificial_errors.add_deletion_error(assembly, location, length)[source]

add a deletion error to <assembly> at <location> for a set <length>

artificial_errors.add_insertion_error(assembly, location, length)[source]

add a insertion error to <assembly> at <location> for a set <length>

artificial_errors.add_inversion_error(assembly, location, length)[source]

add a inversion error to <assembly> at <location> for a set <length>

artificial_errors.add_sub_error(assembly, location, length)[source]

add a substitution error to <assembly> at <location> for a set <length>

artificial_errors.copy_in_place(assembly, location, length)[source]

copy a section of <assembly> at <location> for a given <length>

artificial_errors.main()[source]

read in an assembly file and transform it based on the command line options then output it again in another fasta file, see __full_usage__

artificial_errors.output_assembly(file_name, assembly)[source]

output a list of bases as an assembly file <file_name>

artificial_errors.read_in_assembly(assembly_file)[source]

read a fasta file <assembly_file> into a list of bases and return it

artificial_errors.transpose_assembly(assembly, start, end, pos)[source]

transpose <assembly> from <start> to <end> placing it at <pos>

image_maker.py

Jump to:
  1. Running image_maker.py
  2. image_maker.py functions and classes

Running image_maker.py

Run the following command:

$ ./image_maker.py

image_maker.py functions and classes

/*
  • Copyright (C) 2010,2011,2012 Scott Clark. All rights reserved.
  • Developed by:
  • Scott Clark
  • Cornell University Center for Applied Mathematics
  • http://cam.cornell.edu
  • AND
  • Rob Egan
  • Department of Energy Joint Genome Institute
  • http://jgi.doe.gov
  • Permission is hereby granted, free of charge, to any person obtaining a
  • copy of this software and associated documentation files (the “Software”),
  • to deal with the Software without restriction, including without limitation
  • the rights to use, copy, modify, merge, publish, distribute, sublicense,
  • and/or sell copies of the Software, and to permit persons to whom the
  • Software is furnished to do so, subject to the following conditions:
    1. Redistributions of source code must retain the above copyright notice,
  • this list of conditions and the following disclaimers.
    1. Redistributions in binary form must reproduce the above copyright
  • notice, this list of conditions and the following disclaimers in the
  • documentation and/or other materials provided with the distribution.
    1. Neither the names of Cornell University, The Joint Genome Institute,
  • nor the names of its contributors may be used to endorse or promote
  • products derived from this Software without specific prior written
  • permission.
  • THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
  • IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  • FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
  • CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
  • LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
  • FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
  • DEALINGS WITH THE SOFTWARE.

*/

// For more information on the license please see // The University of Illinois/NCSA Open Source License // http://www.opensource.org/licenses/UoI-NCSA.php

image_maker.main()[source]

Perform the following (if needed):

  1. Download an E.Coli genome
  2. Truncate it to 350k bases
  3. Synthesize 2M reads
  4. Run ALE on some basic transformations
  5. Generate output plots
image_maker.run_it_through(file_name, error_opts)[source]

Perform the following (if needed):

  1. make a fasta file using artificial_errors
  2. make a bowtie db
  3. run bowtie on the db and fasta file
  4. run ALE on the map and fasta
  5. run plotter3 on the ALE file

Indices and tables