Skip to content

Getting "subprocess" error for the same PDF files which are working fine with Tabula in local machine. #540

@deepakdhiman7

Description

@deepakdhiman7

We are getting below "subprocess" error, when we are running code in container. In local machine, however it is working fine. We had installed Tabula on local machine an year back. Even in container, it was working fine until this week. Attaching PDFs as well for which it is failing. Versions of packages mentioned below. Can it be PDF files although for same version they are running in local machine? or Environments? Although we checked, there has been no update in environments permissions etc.

PDFs:
IONIS Registartion document (002).pdf
test_Vinayak.pdf
Uploading Annual_Report.pdf…

Package Versions:
(llms) dd00740409@ns3067540:~$ java -version openjdk version "1.8.0_312" OpenJDK Runtime Environment (build 1.8.0_312-8u312-b07-0ubuntu1~18.04-b07) OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)

(llms) dd00740409@ns3067540:~$ python Python 3.8.17 | packaged by conda-forge | (default, Jun 16 2023, 07:06:00) [GCC 11.4.0] on linux

Error:
subprocess.CalledProcessError: Command '['java', '-Dfile.encoding=UTF8', '-jar', '/usr/local/lib/python3.8/site-packages/tabula/tabula-1.0.5-jar-with-dependencies.jar', '--pages', '9', '--stream', '--guess', '--format', 'JSON', 'Roa8dvYUVmHQLKhhvTiPL.pdf']' returned non-zero exit status 1.

Logs:
Exception in thread "main" java.lang.UnsatisfiedLinkError: /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libjavajpeg.so: libjpeg.so.8: cannot open shared object file: No such file or directory at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1934) at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1838) at java.lang.Runtime.loadLibrary0(Runtime.java:843) at java.lang.System.loadLibrary(System.java:1136) at com.sun.imageio.plugins.jpeg.JPEGImageReader$1.run(JPEGImageReader.java:92) at com.sun.imageio.plugins.jpeg.JPEGImageReader$1.run(JPEGImageReader.java:90) at java.security.AccessController.doPrivileged(Native Method) at com.sun.imageio.plugins.jpeg.JPEGImageReader.<clinit>(JPEGImageReader.java:89) at com.sun.imageio.plugins.jpeg.JPEGImageReaderSpi.createReaderInstance(JPEGImageReaderSpi.java:85) at javax.imageio.spi.ImageReaderSpi.createReaderInstance(ImageReaderSpi.java:320) at javax.imageio.ImageIO$ImageReaderIterator.next(ImageIO.java:529) at javax.imageio.ImageIO$ImageReaderIterator.next(ImageIO.java:513) at org.apache.pdfbox.filter.Filter.findImageReader(Filter.java:155) at org.apache.pdfbox.filter.DCTFilter.decode(DCTFilter.java:58) at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:80) at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:175) at org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:243) at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.createInputStream(PDImageXObject.java:791) at org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.from8bit(SampledImageReader.java:517) at org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:226) at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:481) at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:462) at org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1110) at org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:67) at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:933) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:514) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:492) at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:155) at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:277) at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:347) at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:268) at org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:254) at technology.tabula.Utils.pageConvertToImage(Utils.java:285) at technology.tabula.detectors.NurminenDetectionAlgorithm.detect(NurminenDetectionAlgorithm.java:101) at technology.tabula.CommandLineApp$TableExtractor.extractTablesBasic(CommandLineApp.java:421) at technology.tabula.CommandLineApp$TableExtractor.extractTables(CommandLineApp.java:408) at technology.tabula.CommandLineApp.extractFile(CommandLineApp.java:180) at technology.tabula.CommandLineApp.extractFileTables(CommandLineApp.java:124) at technology.tabula.CommandLineApp.extractTables(CommandLineApp.java:106) at technology.tabula.CommandLineApp.main(CommandLineApp.java:76)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions