HelpMateAI is a modular Retrieval-Augmented Generation (RAG) system designed to navigate complex insurance documents using semantic search and generative answering. Built using LangChain, HuggingFace Transformers, Ollama 3.2, and ChromaDB, it efficiently extracts and surfaces relevant information from lengthy, structured PDF documents like insurance policies.
Insurance policies are often long, dense, and difficult to interpret. This project addresses:
- ❓ Difficulty locating specific clauses or definitions
- ⏱️ Time-consuming manual search
- 🧾 Need for human-readable, context-aware summaries
HelpMateAI provides a lightweight, extensible framework to improve document comprehension and accelerate decision-making.
- 📄 PDF Processing: Parses text and tables using
pdfplumber - 🧠 Semantic Embeddings: Leverages
all-MiniLM-L6-v2from HuggingFace for robust semantic representation - 🔎 Vector Search: ChromaDB for persistent, fast, chunk-level retrieval
- 🔁 Re-ranking (Optional): Cross-encoder for refining semantic hits
- ✍️ LLM Response Generation: Uses Ollama 3.2 via LangChain for contextual answers
- 🧩 Metadata Handling: Captures page numbers and source chunks for better traceability
- ⚡ Caching Layer: Avoids redundant retrievals for recurring queries
| Component | Usage |
|---|---|
Python |
Core scripting and orchestration |
pdfplumber |
PDF parsing (text and tables) |
HuggingFace |
Embedding model (all-MiniLM-L6-v2) |
ChromaDB |
Vector store for semantic search |
LangChain |
Tool orchestration + Ollama integration |
Ollama 3.2 |
Lightweight LLM for question answering |
pandas |
Data handling and JSON/DF preprocessing |
┌────────────┐ ┌────────────┐ ┌──────────────┐ ┌──────────────┐
│ PDF Parser │ →→ │ Embeddings │ →→ │ Vector Search │ →→ │ LLM Response │
└────────────┘ └────────────┘ └──────────────┘ └──────────────┘
↑ ↓ ↑ ↓
Metadata ChromaDB Store Reranker (opt.) Final Answer
# 1. Install dependencies
pip install -r requirements.txt
# 2. Run the notebook
jupyter notebook notebooks/HelpMate_AI_Ollama.ipynbA detailed technical report is available in the reports/ folder:
📄 📥 Download Project Report HELPMATE_AI.pdf
Planned extensions to improve usability, transparency, and scale:
- Streamlit/Gradio UI for non-technical users
- Hybrid RAG with rule-based fallback and citations
- In-app table visualizer and clause tracking
- Multi-document summarization & comparison
This system is evolving toward a more robust, fallback-aware document QA pipeline.
Current areas of focus:
- Add Fallback Logic for Missing Vector Embeddings (#1)
- Score and Rank Answers by Retrieval Confidence (#2)
- Chunking Strategy Evaluation Notebook (#3)
- .env Template for Local Config (#4)
- Architecture Diagram for System Flow (#5)
Check out all 📌 Open Issues
or open a new one to explore edge cases, extensions, or improvements.
-
Shibani Roychoudhury
Data Science Professional | Applied NLP | Explainable AI
LinkedIn | GitHub -
Himanshu Agrawal
AI Practitioner | Software Engineer at Mediaocean
LinkedIn -
Adarsh S A
Analytics Professional at ANZ | Data & Decision Systems
LinkedIn
Shibani Roychoudhury
Data Science Professional | Applied NLP | Explainable AI
📫 LinkedIn
🌐 GitHub
This project is licensed under the MIT License.