Skip to content

Conversation

@Bessouat40
Copy link
Contributor

@Bessouat40 Bessouat40 commented Dec 5, 2025

Summary

This PR introduces a new VllmPDFReader, a multimodal PDF ingestion component that extracts:

  • Text from every page
  • Vision captions for each image via a VLLM / OpenAI-compatible model

Previously, image information was silently lost during ingestion, especially problematic for:

  • Technical documentation
  • Engineering reports
  • Training manuals
  • Scientific papers including diagrams, schematics, and figures

This new reader ensures full preservation of PDF semantics, producing separate Document objects for both text and image captions.

Related issue

This PR relates to #4677 (similar in goal although not a direct implementation).

Type of change

  • Bug fix
  • New feature
  • Breaking change
  • Improvement
  • Model update
  • Other:

Checklist

  • Code complies with style guidelines
  • Ran format/validation scripts (./scripts/format.sh and ./scripts/validate.sh)
  • Self-review completed
  • Documentation updated (comments, docstrings)
  • Examples and guides: Relevant cookbook examples have been included or updated (if applicable)
  • Tested in clean environment
  • Tests added/updated (if applicable)

@Bessouat40 Bessouat40 requested a review from a team as a code owner December 5, 2025 09:03
@Bessouat40 Bessouat40 changed the title Feature/pdf reader with vllm [feat] PDF reader with VLLM Dec 5, 2025
@Bessouat40 Bessouat40 changed the title [feat] PDF reader with VLLM [feat] Add multimodal PDF reader (text + image captions) Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant