Skip to content

Retrieval-Augmented Question Answering system for complex insurance documents using Ollama, LangChain, and ChromaDB. Designed for scalable, intuitive document navigation and decision support.

License

Notifications You must be signed in to change notification settings

HelloShibani/HelpMate_AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 HelpMateAI: Retrieval-Augmented Search System for Insurance Documents

LangChain LLM-Ollama Embeddings Vector DB License

HelpMateAI is a modular Retrieval-Augmented Generation (RAG) system designed to navigate complex insurance documents using semantic search and generative answering. Built using LangChain, HuggingFace Transformers, Ollama 3.2, and ChromaDB, it efficiently extracts and surfaces relevant information from lengthy, structured PDF documents like insurance policies.


🔍 Problem Statement

Insurance policies are often long, dense, and difficult to interpret. This project addresses:

  • ❓ Difficulty locating specific clauses or definitions
  • ⏱️ Time-consuming manual search
  • 🧾 Need for human-readable, context-aware summaries

HelpMateAI provides a lightweight, extensible framework to improve document comprehension and accelerate decision-making.


🚀 Features

  • 📄 PDF Processing: Parses text and tables using pdfplumber
  • 🧠 Semantic Embeddings: Leverages all-MiniLM-L6-v2 from HuggingFace for robust semantic representation
  • 🔎 Vector Search: ChromaDB for persistent, fast, chunk-level retrieval
  • 🔁 Re-ranking (Optional): Cross-encoder for refining semantic hits
  • ✍️ LLM Response Generation: Uses Ollama 3.2 via LangChain for contextual answers
  • 🧩 Metadata Handling: Captures page numbers and source chunks for better traceability
  • Caching Layer: Avoids redundant retrievals for recurring queries

📦 Technology Stack

Component Usage
Python Core scripting and orchestration
pdfplumber PDF parsing (text and tables)
HuggingFace Embedding model (all-MiniLM-L6-v2)
ChromaDB Vector store for semantic search
LangChain Tool orchestration + Ollama integration
Ollama 3.2 Lightweight LLM for question answering
pandas Data handling and JSON/DF preprocessing

🧱 System Architecture

┌────────────┐     ┌────────────┐     ┌──────────────┐     ┌──────────────┐
│ PDF Parser │ →→ │ Embeddings │ →→ │ Vector Search │ →→ │ LLM Response │
└────────────┘     └────────────┘     └──────────────┘     └──────────────┘
     ↑                    ↓                 ↑                       ↓
 Metadata           ChromaDB Store      Reranker (opt.)       Final Answer

🧪 Quickstart

# 1. Install dependencies
pip install -r requirements.txt

# 2. Run the notebook
jupyter notebook notebooks/HelpMate_AI_Ollama.ipynb

📚 Project Report

A detailed technical report is available in the reports/ folder:
📄 📥 Download Project Report HELPMATE_AI.pdf


🔄 Future Roadmap

Planned extensions to improve usability, transparency, and scale:

  • Streamlit/Gradio UI for non-technical users
  • Hybrid RAG with rule-based fallback and citations
  • In-app table visualizer and clause tracking
  • Multi-document summarization & comparison

🗺️ Roadmap & Open Issues

This system is evolving toward a more robust, fallback-aware document QA pipeline.

Current areas of focus:

  • Add Fallback Logic for Missing Vector Embeddings (#1)
  • Score and Rank Answers by Retrieval Confidence (#2)
  • Chunking Strategy Evaluation Notebook (#3)
  • .env Template for Local Config (#4)
  • Architecture Diagram for System Flow (#5)

Check out all 📌 Open Issues
or open a new one to explore edge cases, extensions, or improvements.


👥 Authors

  • Shibani Roychoudhury
    Data Science Professional | Applied NLP | Explainable AI
    LinkedIn | GitHub

  • Himanshu Agrawal
    AI Practitioner | Software Engineer at Mediaocean
    LinkedIn

  • Adarsh S A
    Analytics Professional at ANZ | Data & Decision Systems
    LinkedIn

Shibani Roychoudhury
Data Science Professional | Applied NLP | Explainable AI
📫 LinkedIn
🌐 GitHub


📜 License

This project is licensed under the MIT License.

About

Retrieval-Augmented Question Answering system for complex insurance documents using Ollama, LangChain, and ChromaDB. Designed for scalable, intuitive document navigation and decision support.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published