[feat] Add multimodal PDF reader (text + image captions) #5623

Bessouat40 · 2025-12-05T09:03:10Z

Summary

This PR introduces a new VllmPDFReader, a multimodal PDF ingestion component that extracts:

Text from every page
Vision captions for each image via a VLLM / OpenAI-compatible model

Previously, image information was silently lost during ingestion, especially problematic for:

Technical documentation
Engineering reports
Training manuals
Scientific papers including diagrams, schematics, and figures

This new reader ensures full preservation of PDF semantics, producing separate Document objects for both text and image captions.

Related issue

This PR relates to #4677 (similar in goal although not a direct implementation).

Type of change

Checklist

Code complies with style guidelines
Ran format/validation scripts (./scripts/format.sh and ./scripts/validate.sh)
Self-review completed
Documentation updated (comments, docstrings)
Examples and guides: Relevant cookbook examples have been included or updated (if applicable)
Tested in clean environment
Tests added/updated (if applicable)

… VllmPDFReader

Bessouat40 added 4 commits December 4, 2025 18:48

feat: Implement VLM-based PDF reader for multimodal content extraction

569c5d5

format code

8c33f18

add cookbook example

85d3a34

add unit tests for vllm pdf reader

30c7028

Bessouat40 requested a review from a team as a code owner December 5, 2025 09:03

Bessouat40 changed the title ~~Feature/pdf reader with vllm~~ [feat] PDF reader with VLLM Dec 5, 2025

Bessouat40 and others added 3 commits December 5, 2025 10:31

fix: Ensure PyMuPDF is imported correctly and update dependencies for…

ead7e46

… VllmPDFReader

fix: Add type ignore for PyMuPDF import to resolve type checking issues

2184eea

Merge branch 'main' into feature/pdf-reader-with-vllm

323232d

Bessouat40 changed the title ~~[feat] PDF reader with VLLM~~ [feat] Add multimodal PDF reader (text + image captions) Dec 5, 2025

Bessouat40 added 3 commits December 5, 2025 11:57

Merge branch 'main' into feature/pdf-reader-with-vllm

8efcdcb

Merge branch 'main' into feature/pdf-reader-with-vllm

0821fc6

Merge branch 'main' into feature/pdf-reader-with-vllm

d0bcb3d

Bessouat40 mentioned this pull request Dec 8, 2025

[Feature Request] Add Multimodal Support to PDFKnowledgeBase with multimodal=True Parameter #4677

Open

3 tasks

Bessouat40 added 5 commits December 9, 2025 10:46

Merge branch 'main' into feature/pdf-reader-with-vllm

38de3a3

Merge branch 'main' into feature/pdf-reader-with-vllm

1152eec

Merge branch 'main' into feature/pdf-reader-with-vllm

ce2450a

Merge branch 'main' into feature/pdf-reader-with-vllm

4134e7d

Merge branch 'main' into feature/pdf-reader-with-vllm

dd9f638

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feat] Add multimodal PDF reader (text + image captions) #5623

[feat] Add multimodal PDF reader (text + image captions) #5623

Uh oh!

Bessouat40 commented Dec 5, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[feat] Add multimodal PDF reader (text + image captions) #5623

Are you sure you want to change the base?

[feat] Add multimodal PDF reader (text + image captions) #5623

Uh oh!

Conversation

Bessouat40 commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related issue

Type of change

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Bessouat40 commented Dec 5, 2025 •

edited

Loading