-
Notifications
You must be signed in to change notification settings - Fork 64
Description
Hi all,
apologies for yet another request! Specifically, I want to simulate amplicon-seq reads of ONT data using NanoSim but fail at the simulation step which does not finish (at least within hours).
I have a reference sequence based on Sanger sequencing of the amplicon (Stor1_cox1.fa). In addition, I have ONT data of the same amplicon (COX1.fastq), which I could use for model training.
Following your suggestion in issue 112, I am using the "transcriptome" method.
conda activate nanosim
read_analysis.py transcriptome \
-i ${wd}Syrphid/results/demo_ext/data/demultiplexed/Stor-1/COX1.fastq \
-rg ${wd}simulations/data/Stor1_cox1.fa \
-rt ${wd}simulations/data/Stor1_cox1.fa \
-o ${wd}simulations/data/COX1_training \
--no_intron_retention \
-t 100This finisihes without error. However, when I want to use the model for simulations, the script gets stuck even when simulating only 100 reads.
printf """target_id\test_counts\tpm\nENSStor-1\t1000\t1000\n""" > ${wd}simulations/data/Stor1_cox1.exp
simulator.py transcriptome \
-rt ${wd}simulations/data/Stor1_cox1.fa \
-c ${wd}simulations/data/COX1_training \
-o ${wd}simulations/data/Stor1_cox1_sim \
-e ${wd}simulations/data/Stor1_cox1.exp \
-n 100 \
--no_model_ir \
-t 4Can you help me with this?
Moreover, I am wondering if this model can also be used for other amplicons with longer read lengths? I fear not if I understand the logic correctly. What to do in this case (when there is no amplicon-specific Training data available)?
Thanks a lot,
Martin