Sentinel is an autonomous knowledge graph that automatically scrapes, extracts, stores, and maintains structured knowledge from the web. It uses AI to understand content, tracks changes over time, and heals itself when information becomes stale.
Tip
οΏ½ Sentinel Core v0.1.7 is Now Live!
The official sentinel-core package has been released on PyPI. You can now install it directly via pip.
Sentinel is a production-ready library for autonomous, self-healing knowledge graphs. While we continue to add new features, the core API is stable and ready for use in your RAG pipelines.
Build smarter, faster, and more reliable AI agents today! οΏ½οΈ
- π€ Autonomous: Automatically scrapes, extracts, and updates knowledge
- β° Temporal: Track how knowledge evolves over time
- π§ Self-Healing: Detects and updates stale information automatically
- π§ AI-Powered: Uses LLMs to extract entities and relationships
- π Graph-Based: Stores knowledge in a Neo4j temporal graph
- π Web Scraping: Intelligent scraping with Firecrawl or local fallback
- π» Developer-Friendly: Simple Python API and CLI tool
- π¨ Beautiful UI: 3D graph visualization with Next.js
pip install sentinel-core# Interactive setup wizard
sentinel init
# Or manually create .env file
cat > .env << EOF
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your-password
OLLAMA_MODEL=ollama/phi3
EOF# Start Neo4j
docker run -d -p 7687:7687 -p 7474:7474 \
-e NEO4J_AUTH=neo4j/password \
neo4j:latest
# Start Ollama (for local LLM)
ollama serve
ollama pull phi3# Process a URL
sentinel watch https://stripe.com/pricing
# Check status
sentinel status
# View in UI
cd sentinel_platform/ui
npm install && npm run dev
# Visit http://localhost:3000import asyncio
from sentinel_core import Sentinel, GraphManager, GraphExtractor
from sentinel_core.scraper import get_scraper
async def main():
# Initialize
graph = GraphManager()
scraper = get_scraper()
extractor = GraphExtractor(model_name="ollama/phi3")
sentinel = Sentinel(graph, scraper, extractor)
# Process URL
result = await sentinel.process_url("https://example.com")
print(f"Extracted {result['extracted_nodes']} nodes!")
# Query graph
snapshot = graph.get_graph_snapshot()
print(f"Total: {snapshot['metadata']['node_count']} nodes")
graph.close()
asyncio.run(main())# Show version
sentinel version
# Check system status
sentinel status
# Process a URL
sentinel watch https://example.com
# Run healing cycle
sentinel heal --days 7
# Interactive setup
sentinel initTrack pricing changes across competitors automatically.
urls = [
"https://stripe.com/pricing",
"https://paypal.com/pricing",
"https://square.com/pricing"
]
for url in urls:
await sentinel.process_url(url)Monitor documentation changes for your favorite libraries.
docs = {
"React": "https://react.dev/learn",
"Next.js": "https://nextjs.org/docs",
}
for name, url in docs.items():
await sentinel.process_url(url)
# Auto-heal to detect changes
await sentinel.run_healing_cycle(days_threshold=7)Build a knowledge graph from multiple news sources.
news_sources = [
"https://techcrunch.com/",
"https://theverge.com/",
]
for url in news_sources:
await sentinel.process_url(url)Track research papers and their citations.
papers = [
"https://arxiv.org/abs/2303.08774", # GPT-4
"https://arxiv.org/abs/2005.14165", # GPT-3
]
for paper in papers:
await sentinel.process_url(paper)
- User Guide - Start Here!
- Quick Start Guide
- CLI Reference
- Usage Examples
LLMs can occasionally "hallucinate" relationships or misinterpret complex DOM structures. Sentinel mitigates this by:
- Using Firecrawl: Converts complex JS/HTML into clean Markdown, reducing noise.
- Structured Extraction: Uses
instructorto enforce strict Pydantic schemas for nodes and edges. - Verification: The
healcommand re-verifies content hashes before any costly LLM extraction.
Sentinel uses a Hash-based Change Detection strategy:
- Monitor: Checks for nodes that haven't been verified in
days_threshold(default: 7). - Scrape & Hash: Re-scrapes the URL and computes a SHA-256 hash of the content.
- Diff: Compares the new hash with the stored hash in Neo4j.
- Match: Updates the
last_verifiedtimestamp (Zero LLM cost). - Mismatch: Triggers a full LLM extraction and graph update.
- Match: Updates the
- LLM Costs: Frequent updates on large sites can be expensive. Use the
days_thresholdinsentinel healto control frequency. - Storage: The temporal graph grows over time. Currently, Sentinel does not auto-prune old versions. We recommend periodically archiving old
VALID_TOrelationships if storage is a concern.
# Clone repository
git clone https://github.com/Om7035/Sentinel-The-Self-Healing-Knowledge-Graph
cd Sentinel-The-Self-Healing-Knowledge-Graph
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -e ".[all]"
# Run tests
pytest tests/sentinel/
βββ sentinel_core/ # Core library (pip-installable)
β βββ scraper/ # Web scraping (Firecrawl + Local)
β βββ graph_store.py # Neo4j temporal graph
β βββ graph_extractor.py # LLM-based extraction
β βββ orchestrator.py # Main Sentinel class
βββ sentinel_platform/ # Demo platform
β βββ api/ # FastAPI backend
β βββ ui/ # Next.js frontend
βββ tests/ # Test suite
βββ docs/ # Documentation
βββ sentinel_cli.py # CLI tool
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with LangChain, Neo4j, and FastAPI
- Inspired by the need for self-maintaining knowledge systems
- Special thanks to the open-source community
- Author: Om Kawale
- Email: [email protected]
- GitHub: @Om7035
- Project: Sentinel
If you find Sentinel useful, please consider giving it a star! β