README.md

ONT transcriptome assemblies

Please see the data and scripts in /global/cscratch1/sd/mmingay/ont_transcriptome_assembly for the output of RNA bloom and RNA Spades. Let me know if you have any issues accessing the data or any questions.

JGI in house ONT DRS-seq Dataset

The ONT data used was from an in-house JGI ONT promethion Direct RNA-sequencing (DRS-seq) experiment and can be found here:

fastq/Czofi_X0179P_pass_trimmed.fastq

The run produced 7.2M reads and 6.2G called bases.

Publically available illumina RNA-seq data

Since RNA Spades is a hybrid assembler that requires Illumina paired-end data publicly available Chromocloris RNA-seq data was downloaded from the SRA run selector.

The data for was downloaded by running the script shell/dump_illumina_fastq.sh but only data from SRR5117282 was used in the analysis done.

The data was published in the article:

Chromosome-level genome assembly and transcriptome of the green alga Chromochloris zofingiensis illuminates astaxanthin production


RNASpades

To run RNASpades I used a conda environment that can be created by running:

module load python
conda env create -f rnaspades.yml

I then ran the script shell/run_rnaspades_X0179P.sh

You might have to change the hardcoded path variables for the files / outputs.

You can also include an existing assembly as an argument. An example of this can be found here:

shell/run_rnaspades_X0179P.sh

I ran into some memory issues when trying to use to many threads. I got it to work with 8 threads but its far from optimized.

The output from the rnaspades can be found in the directory ./output_8t and the output from running it with a transcriptome fasta file can be found in ./output_wtranscriptome.

Both directories had an identical folder called containing split reads from the fastqs. This is now just a single folder: ./fastq/split_input.


RNA bloom

I also wanted to explore RNA Bloom which is similar to RNA Spades but can produce transcriptome assemblies from ONT DRS-seq data alone.

I ran RNA bloom in a conda environment that can be created from the rnabloom.yml file.

RNA bloom was run on the same ONT dataset (X0179P) using the script ./shell/run_rnabloom.sh

The output can be found in ./rnabloom_output.

You might have to path to the conda environment in the scripts included to reflect the location of your environment.


mmingay@lbl.gov