🧠 HelpMateAI: Retrieval-Augmented Search System for Insurance Documents

HelpMateAI is a modular Retrieval-Augmented Generation (RAG) system designed to navigate complex insurance documents using semantic search and generative answering. Built using LangChain, HuggingFace Transformers, Ollama 3.2, and ChromaDB, it efficiently extracts and surfaces relevant information from lengthy, structured PDF documents like insurance policies.

🔍 Problem Statement

Insurance policies are often long, dense, and difficult to interpret. This project addresses:

❓ Difficulty locating specific clauses or definitions
⏱️ Time-consuming manual search
🧾 Need for human-readable, context-aware summaries

HelpMateAI provides a lightweight, extensible framework to improve document comprehension and accelerate decision-making.

🚀 Features

📄 PDF Processing: Parses text and tables using pdfplumber
🧠 Semantic Embeddings: Leverages all-MiniLM-L6-v2 from HuggingFace for robust semantic representation
🔎 Vector Search: ChromaDB for persistent, fast, chunk-level retrieval
🔁 Re-ranking (Optional): Cross-encoder for refining semantic hits
✍️ LLM Response Generation: Uses Ollama 3.2 via LangChain for contextual answers
🧩 Metadata Handling: Captures page numbers and source chunks for better traceability
⚡ Caching Layer: Avoids redundant retrievals for recurring queries

📦 Technology Stack

Component	Usage
`Python`	Core scripting and orchestration
`pdfplumber`	PDF parsing (text and tables)
`HuggingFace`	Embedding model (`all-MiniLM-L6-v2`)
`ChromaDB`	Vector store for semantic search
`LangChain`	Tool orchestration + Ollama integration
`Ollama 3.2`	Lightweight LLM for question answering
`pandas`	Data handling and JSON/DF preprocessing

🧱 System Architecture

┌────────────┐     ┌────────────┐     ┌──────────────┐     ┌──────────────┐
│ PDF Parser │ →→ │ Embeddings │ →→ │ Vector Search │ →→ │ LLM Response │
└────────────┘     └────────────┘     └──────────────┘     └──────────────┘
     ↑                    ↓                 ↑                       ↓
 Metadata           ChromaDB Store      Reranker (opt.)       Final Answer

🧪 Quickstart

# 1. Install dependencies
pip install -r requirements.txt

# 2. Run the notebook
jupyter notebook notebooks/HelpMate_AI_Ollama.ipynb

📚 Project Report

A detailed technical report is available in the reports/ folder:
📄 📥 Download Project Report HELPMATE_AI.pdf

🔄 Future Roadmap

Planned extensions to improve usability, transparency, and scale:

Streamlit/Gradio UI for non-technical users
Hybrid RAG with rule-based fallback and citations
In-app table visualizer and clause tracking
Multi-document summarization & comparison

🗺️ Roadmap & Open Issues

This system is evolving toward a more robust, fallback-aware document QA pipeline.

Current areas of focus:

Add Fallback Logic for Missing Vector Embeddings (#1)
Score and Rank Answers by Retrieval Confidence (#2)
Chunking Strategy Evaluation Notebook (#3)
.env Template for Local Config (#4)
Architecture Diagram for System Flow (#5)

Check out all 📌 Open Issues
or open a new one to explore edge cases, extensions, or improvements.

👥 Authors

Shibani Roychoudhury
Data Science Professional | Applied NLP | Explainable AI
LinkedIn | GitHub
Himanshu Agrawal
AI Practitioner | Software Engineer at Mediaocean
LinkedIn
Adarsh S A
Analytics Professional at ANZ | Data & Decision Systems
LinkedIn

Shibani Roychoudhury
Data Science Professional | Applied NLP | Explainable AI
📫 LinkedIn
🌐 GitHub

📜 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Notebook		Notebook
Policy+Documents		Policy+Documents
Reports		Reports
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 HelpMateAI: Retrieval-Augmented Search System for Insurance Documents

🔍 Problem Statement

🚀 Features

📦 Technology Stack

🧱 System Architecture

🧪 Quickstart

📚 Project Report

🔄 Future Roadmap

🗺️ Roadmap & Open Issues

👥 Authors

📜 License

About

Uh oh!

Releases

Packages

Languages

License

HelloShibani/HelpMate_AI

Folders and files

Latest commit

History

Repository files navigation

🧠 HelpMateAI: Retrieval-Augmented Search System for Insurance Documents

🔍 Problem Statement

🚀 Features

📦 Technology Stack

🧱 System Architecture

🧪 Quickstart

📚 Project Report

🔄 Future Roadmap

🗺️ Roadmap & Open Issues

👥 Authors

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages