ONT transcriptome assemblies
Please see the data and scripts in /global/cscratch1/sd/mmingay/ont_transcriptome_assembly
for the output of RNA bloom and RNA Spades. Let me know if you have any issues accessing the data or any questions.
JGI in house ONT DRS-seq Dataset
The ONT data used was from an in-house JGI ONT promethion Direct RNA-sequencing (DRS-seq) experiment and can be found here:
fastq/Czofi_X0179P_pass_trimmed.fastq
The run produced 7.2M reads and 6.2G called bases.
Publically available illumina RNA-seq data
Since RNA Spades is a hybrid assembler that requires Illumina paired-end data publicly available Chromocloris RNA-seq data was downloaded from the SRA run selector.
The data for was downloaded by running the script shell/dump_illumina_fastq.sh
but only data from SRR5117282 was used in the analysis done.
The data was published in the article:
RNASpades
To run RNASpades I used a conda environment that can be created by running:
module load python
conda env create -f rnaspades.yml
I then ran the script shell/run_rnaspades_X0179P.sh
You might have to change the hardcoded path variables for the files / outputs.
You can also include an existing assembly as an argument. An example of this can be found here:
shell/run_rnaspades_X0179P.sh
I ran into some memory issues when trying to use to many threads. I got it to work with 8 threads but its far from optimized.
The output from the rnaspades can be found in the directory ./output_8t
and the output from running it with a transcriptome fasta file can be found in ./output_wtranscriptome
.
Both directories had an identical folder called containing split reads from the fastqs. This is now just a single folder: ./fastq/split_input
.
RNA bloom
I also wanted to explore RNA Bloom which is similar to RNA Spades but can produce transcriptome assemblies from ONT DRS-seq data alone.
I ran RNA bloom in a conda environment that can be created from the rnabloom.yml
file.
RNA bloom was run on the same ONT dataset (X0179P) using the script ./shell/run_rnabloom.sh
The output can be found in ./rnabloom_output
.
You might have to path to the conda environment in the scripts included to reflect the location of your environment.