How to simulate amplicon-seq data?

Hi all, 

apologies for yet another request! Specifically, I want to simulate amplicon-seq reads of ONT data using NanoSim but fail at the simulation step which does not finish (at least within hours).

I have a reference sequence based on Sanger sequencing of the amplicon (Stor1_cox1.fa). In addition, I have ONT data of the same amplicon (COX1.fastq), which I could use for model training.

Following your suggestion in [issue 112](https://github.com/bcgsc/NanoSim/issues/112#issuecomment-820651030), I am using the "transcriptome" method.

```bash
conda activate nanosim

read_analysis.py transcriptome \
    -i ${wd}Syrphid/results/demo_ext/data/demultiplexed/Stor-1/COX1.fastq \
    -rg ${wd}simulations/data/Stor1_cox1.fa \
    -rt ${wd}simulations/data/Stor1_cox1.fa \
    -o ${wd}simulations/data/COX1_training \
    --no_intron_retention \
    -t 100
```

This finisihes without error. However, when I want to use the model for simulations, the script gets stuck even when simulating only 100 reads. 

```bash
printf  """target_id\test_counts\tpm\nENSStor-1\t1000\t1000\n""" > ${wd}simulations/data/Stor1_cox1.exp

simulator.py transcriptome \
    -rt ${wd}simulations/data/Stor1_cox1.fa \
    -c ${wd}simulations/data/COX1_training \
    -o ${wd}simulations/data/Stor1_cox1_sim \
    -e ${wd}simulations/data/Stor1_cox1.exp \
    -n 100 \
    --no_model_ir \
    -t 4
```

Can you help me with this? 

Moreover, I am wondering if this model can also be used for other amplicons with longer read lengths? I fear not if I understand the logic correctly. What to do in this case (when there is no amplicon-specific Training data available)?

Thanks a lot,

[Testdata.zip](https://github.com/user-attachments/files/16922217/Testdata.zip)

Martin



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to simulate amplicon-seq data? #221

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to simulate amplicon-seq data? #221

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions