Skip to content

Using GATK HapCaller/CombinesGVCFs/GenotypeGVCFs as a java library. Any drawback ? #9277

@lindenb

Description

@lindenb

Hi all,

That is not an issue but a technical question about using the 'jar' of gatk as a java library.

I want to genotype a lot of BAM files in a small region of the genome in the same process of my pipeline.
The JVM is slow on startup and I think the initialization process takes more time than the calling itself.

So I wrote a small java program that use gatk as a library and invokes all the steps in the same JVM instance:

So Each call to gatk is basically the following code:

public class Gatk4ProxyImpl  extends org.broadinstitute.hellbender.Main implements Gatk4Proxy {
(...)
	public void execute(final List<String> argv) throws Exception {
		final String[] args = argv.toArray(new String[argv.size()]);
		LOG.info(getCommandLineName()+":Executing:  gatk "+ String.join(" ",argv));
		final CommandLineProgram program =
			this.setupConfigAndExtractProgram(args, 
				this.getPackageList(),
				this.getClassList(),
				this.getCommandLineName()
				);
	    final Object result = Main.runCommandLineProgram(program, args);

(...)

(full code at https://github.com/lindenb/jvarkit/blob/master/src/main/java/com/github/lindenb/jvarkit/gatk/Gatk4ProxyImpl.java )

my question is: The code works but is there any hidden drawback in using gatk such way ? For example there any chance that I'm missing any function that dispose some resources, that loads a huge resource for each call, etc....

thanks,

Pierre

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions