-
Notifications
You must be signed in to change notification settings - Fork 36
Open
Description
When running pairtools merge with a large list of files (~4000 files), I received the following error:
Traceback (most recent call last):
File "/home/epi2melabs/conda/bin/pairtools", line 11, in
sys.exit(cli())
File "/home/epi2melabs/conda/lib/python3.8/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/home/epi2melabs/conda/lib/python3.8/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/epi2melabs/conda/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/epi2melabs/conda/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/epi2melabs/conda/lib/python3.8/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/epi2melabs/conda/lib/python3.8/site-packages/pairtools/cli/merge.py", line 134, in merge
merge_py(
File "/home/epi2melabs/conda/lib/python3.8/site-packages/pairtools/cli/merge.py", line 254, in merge_py
subprocess.check_call(command, shell=True, stdout=outstream)
File "/home/epi2melabs/conda/lib/python3.8/subprocess.py", line 359, in check_call
retcode = call(*popenargs, **kwargs)
File "/home/epi2melabs/conda/lib/python3.8/subprocess.py", line 340, in call
with Popen(*popenargs, **kwargs) as p:
File "/home/epi2melabs/conda/lib/python3.8/subprocess.py", line 858, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/home/epi2melabs/conda/lib/python3.8/subprocess.py", line 1720, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 7] Argument list too long: '/bin/sh'
I am running pairtools in the wf-pore-c nextflow pipeline. But it internally calls pairtools with the command (with my 4000 files in the to_merge/ directory). See source
pairtools merge -o output.pairs.gz --concatenate 'to_merge/*'I can see from the pairtools source code that the merge function iterates all the files given to it and combines them into a single subprocess command:
pairtools/pairtools/cli/merge.py
Lines 235 to 254 in f896311
| for path in paths: | |
| if kwargs.get("cmd_in", None): | |
| command += r""" <(cat {} | {} | sed -n -e '\''/^[^#]/,$p'\'')""".format( | |
| path, kwargs["cmd_in"] | |
| ) | |
| elif path.endswith(".gz"): | |
| command += ( | |
| r""" <(bgzip -dc -@ {} {} | sed -n -e '\''/^[^#]/,$p'\'')""".format( | |
| kwargs["nproc_in"], path | |
| ) | |
| ) | |
| elif path.endswith(".lz4"): | |
| command += r""" <(lz4c -dc {} | sed -n -e '\''/^[^#]/,$p'\'')""".format( | |
| path | |
| ) | |
| else: | |
| command += r""" <(sed -n -e '\''/^[^#]/,$p'\'' {})""".format(path) | |
| command += "'" | |
| subprocess.check_call(command, shell=True, stdout=outstream) |
This is causing the OS error when the command is too large.
Is there a way to restructure this code so that the command does not become so large (potentially process the files in chunks or batches)?
Much appreciated.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels