-
Notifications
You must be signed in to change notification settings - Fork 64
Description
Hi there! Im new using this tool, so sorry in advance if I'm asking something silly.
I'm trying to use the read_analysis.py script for metagenome analysis, but I'm encountering issues with the genome_list input. Here's what I’ve done so far:
My genome_list.tsv file is structured like this:
Identifier FilePath
AB008394_Torque_teno_virus_1 References.split/AB008394.fasta
AB017613_Torque_teno_virus_16 References.split/AB017613.fasta
AB025946_Torque_teno_virus_19 References.split/AB025946.fasta
AB026929_Torque_teno_mini_virus_6 References.split/AB026929.fasta
AB026931_Torque_teno_mini_virus_1 References.split/AB026931.fasta
AB028668_Torque_teno_virus_15 References.split/AB028668.fasta
AB037926_Torque_teno_virus_14 References.split/AB037926.fasta
AB038621_Torque_teno_virus_29 References.split/AB038621.fasta
AB038627_Torque_teno_mini_virus_7 References.split/AB038627.fasta
AB038629_Torque_teno_mini_virus_2 References.split/AB038629.fasta
AB038630_Torque_teno_mini_virus_3 References.split/AB038630.fasta
AB038631_Torque_teno_mini_virus_9 References.split/AB038631.fasta
AB041957_Torque_teno_virus_4 References.split/AB041957.fasta
AB041958_Torque_teno_virus_26 References.split/AB041958.fasta
AB041959_Torque_teno_virus_25 References.split/AB041959.fasta
AB041960_Torque_teno_tamarin_virus References.split/AB041960.fasta
...
- Each identifier corresponds to a reference genome.
- File paths point to valid .fasta files in the specified directory.
- Im using nanosim under conda environment, using the last version
I ran the command:
read_analysis.py metagenome -i /path/to/myfile.fastq.gz -gl genome_list.tsv --no_model_fit -o nanosim_output -t 16
The script failed with the following error:
(nanosim) [fmarti34@login02 NANOSIM-TEST]$ read_analysis.py metagenome -i /home/fmarti34/data_sclipma1/Anellome_outputs_hash/AS1_12_mo./AS1_12_mo..fastq.gz -gl genome_list.tsv --no_model_fit -o nanosim_1_test -t 16
Running the code with following parameters:
infile /home/fmarti34/data_sclipma1/Anellome_outputs_hash/AS1_12_mo./AS1_12_mo..fastq.gz
genome_list genome_list.tsv
g_alnm
prefix nanosim_1_test
num_threads 16
model_fit False
chimeric False
homopolymer False
fastq False
quantification False
2024-11-21 10:29:43: Read pre-process
2024-11-21 10:31:32: Processing reference genome
Traceback (most recent call last):
File "/home/fmarti34/.conda/envs/nanosim/bin/read_analysis.py", line 879, in
main()
File "/home/fmarti34/.conda/envs/nanosim/bin/read_analysis.py", line 675, in main
metagenome_list[species] = {'path': info[1]}
Questions:
- Could you clarify what the correct format for the genome_list file should be?
- As additional information, like abundance, required in this file? would output from tools like Bracken be appropriate as input?
Thank you in advance!
Best regards,
Flor