An intelligent documentation and code analysis system that combines vector-based semantic search with graph database relationships to provide comprehensive answers about software tools, libraries, frameworks, and programming languages.
- π Hybrid Search: Combines vector-based semantic search with graph relationship traversal
- π Repository Analysis: Automated processing of GitHub repositories to extract code structure and documentation
- π·οΈ Version Management: Handles multiple versions of tools/libraries with version-specific queries
- π€ Contextual RAG: Advanced retrieval-augmented generation with cross-referenced information
- π Secure API: JWT-based authentication with comprehensive REST endpoints
- β‘ Async Processing: Background job processing for large repository analysis
- π Observability: Built-in monitoring, logging, and health checks
graph TB
Client[Client Applications] --> API[FastAPI Backend]
API --> Auth[JWT Authentication]
API --> Queue[Celery Workers]
Queue --> GitHub[GitHub Integration]
Queue --> Parser[Code/Doc Parser]
Parser --> DB[(PostgreSQL + pgvector)]
Parser --> Cache[(Redis Cache/Queue)]
API --> Search[Search Service]
Search --> DB
Search --> Cache
- Backend: FastAPI with async support
- Database: PostgreSQL with pgvector extension for vector operations
- Cache/Queue: Redis 7+ for caching and job processing with persistence
- Workers: Celery for background task processing
- Authentication: JWT-based security
- Monitoring: OpenTelemetry with structured logging
- Deployment: Docker Compose with Kubernetes readiness
- Python 3.12+
- PostgreSQL 15+ with pgvector extension
- Redis 7+
- Git
- Docker & Docker Compose (for containerized setup)
# Clone the repository
git clone https://github.com/nhlongnguyen/mindbridge.git
cd mindbridge
# Start all services using the automated setup script
./scripts/docker-setup.sh
# Or manually start services
docker-compose --env-file .env.docker up -d
# Check service health
curl http://localhost:8000/healthThe project includes a comprehensive Docker setup with production-ready configurations:
# One-command setup (recommended)
./scripts/docker-setup.shThis script will:
- Build all Docker images
- Start PostgreSQL and Redis services
- Run database migrations with all enhancements
- Start the full application stack
- Verify all services are healthy
# 1. Build and start database services
docker-compose --env-file .env.docker up -d postgres redis
# 2. Wait for services to be healthy
docker-compose --env-file .env.docker logs postgres
docker-compose --env-file .env.docker logs redis
# 3. Run database migrations (includes pgvector, HNSW indexes, constraints)
docker-compose --env-file .env.docker run --rm -e PYTHONPATH=/app/src app alembic upgrade head
# 4. Start the full application stack
docker-compose --env-file .env.docker up -d
# 5. Verify all services
docker-compose --env-file .env.docker ps
curl http://localhost:8000/health| Service | Port | Description |
|---|---|---|
| API Server | 8000 | FastAPI application with health checks |
| PostgreSQL | 5432 | Database with pgvector extension |
| Redis | 6379 | Cache and job queue |
| Redis Insight | 8001 | Redis management interface |
| Celery Worker | - | Background job processing |
| Celery Beat | - | Scheduled task management |
The Docker setup includes all performance and data integrity enhancements:
β PostgreSQL with pgvector - Vector similarity search β HNSW Indexing - 100-1000x faster vector queries β Composite Indexes - Query optimization β CHECK Constraints - Database-level validation β Cascade Deletes - Data integrity β Enum Types - Type safety
The Docker setup uses .env.docker for configuration:
# Database Configuration
POSTGRES_DB=mindbridge
POSTGRES_USER=mindbridge
POSTGRES_PASSWORD=docker-dev-password-123
# Redis Configuration
REDIS_PASSWORD=docker-redis-password-123
# Application Configuration
JWT_SECRET_KEY=docker-development-secret-key-256-bits-long
ALLOWED_ORIGINS=http://localhost:3000,http://localhost:8080,http://localhost:8000# View service status
docker-compose --env-file .env.docker ps
# View service logs
docker-compose --env-file .env.docker logs -f app
docker-compose --env-file .env.docker logs -f postgres
docker-compose --env-file .env.docker logs -f redis
# Stop all services
docker-compose --env-file .env.docker down
# Reset database (removes all data)
docker-compose --env-file .env.docker down -v
# Run migrations manually
docker-compose --env-file .env.docker run --rm app alembic upgrade head
# Access database directly
docker-compose --env-file .env.docker exec postgres psql -U mindbridge -d mindbridge
# Test vector operations
docker-compose --env-file .env.docker exec postgres psql -U mindbridge -d mindbridge -c "
SELECT COUNT(*) as total_tables FROM information_schema.tables WHERE table_schema = 'public';
SELECT indexname, indexdef FROM pg_indexes WHERE tablename = 'vector_documents';
"# Validate vector operations
docker-compose --env-file .env.docker exec postgres psql -U mindbridge -d mindbridge -c "
INSERT INTO vector_documents (content, title, embedding)
VALUES ('Docker test content', 'Docker Test', ARRAY(SELECT random() FROM generate_series(1, 1536))::vector);
SELECT id, title, vector_dims(embedding) as dimensions
FROM vector_documents WHERE title = 'Docker Test';
"
# Test API endpoints
curl http://localhost:8000/health
curl http://localhost:8000/docs
curl http://localhost:8000/metrics
# Check Redis connectivity
docker-compose --env-file .env.docker exec redis redis-cli -a docker-redis-password-123 ping# Install Python dependencies (Poetry is already available)
python3 -m poetry install
# Install pre-commit hooks
python3 -m poetry run pre-commit install
# Set up environment variables
cp .env.example .env
# Edit .env with your configuration (ensure ALLOWED_ORIGINS is set)
# Start local development services (PostgreSQL + Redis)
docker-compose -f docker-compose.dev.yml up -d
# Initialize database
python3 -m poetry run alembic upgrade head
# Start development server
ALLOWED_ORIGINS="http://localhost:3000,http://localhost:8080" \
python3 -m poetry run uvicorn src.mindbridge.main:app --reload --host 0.0.0.0 --port 8000
# Start worker process (in separate terminal)
python3 -m poetry run celery -A src.worker.celery_app worker --loglevel=infoCode Quality & Linting:
# Run all pre-commit hooks (recommended)
python3 -m poetry run pre-commit run --all-files
# Individual commands (if needed)
python3 -m poetry run ruff check src/ tests/ --fix
python3 -m poetry run black src/ tests/
python3 -m poetry run mypy src/Testing:
# Run all tests with verbose output (with required environment variable)
ALLOWED_ORIGINS="http://localhost:3000,http://localhost:8080" \
python3 -m poetry run pytest tests/ -v
# Run tests with coverage report
ALLOWED_ORIGINS="http://localhost:3000,http://localhost:8080" \
python3 -m poetry run pytest tests/ --cov=src --cov-report=html
# Run specific test file
ALLOWED_ORIGINS="http://localhost:3000,http://localhost:8080" \
python3 -m poetry run pytest tests/database/test_models.py -v
# Quality gate: Run tests with minimum coverage
ALLOWED_ORIGINS="http://localhost:3000,http://localhost:8080" \
python3 -m poetry run pytest tests/ --cov=src --cov-fail-under=85
# Start development services for integration testing
docker-compose -f docker-compose.dev.yml up -dApplication Management:
# Start development server
python3 -m poetry run uvicorn src.mindbridge.main:app --reload
# Run database migrations
python3 -m poetry run alembic upgrade head
# Generate new migration
python3 -m poetry run alembic revision --autogenerate -m "Description"
# Start Celery worker
python3 -m poetry run celery -A src.worker.celery_app worker --loglevel=infoDependency Management:
# Add new dependency
python3 -m poetry add package-name
# Add development dependency
python3 -m poetry add --group dev package-name
# Update all dependencies
python3 -m poetry update
# Show dependency tree
python3 -m poetry show --tree# Database Configuration
DATABASE_URL=postgresql://user:password@localhost:5432/mindbridge
REDIS_URL=redis://localhost:6379/0
# Redis Configuration
REDIS_MAX_CONNECTIONS=10
REDIS_CONNECTION_TIMEOUT=5
REDIS_SOCKET_TIMEOUT=2
# Celery Configuration
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/1
CELERY_VISIBILITY_TIMEOUT=3600
CELERY_WORKER_PREFETCH_MULTIPLIER=4
CELERY_TASK_SOFT_TIME_LIMIT=300
CELERY_TASK_TIME_LIMIT=600
# Security
JWT_SECRET_KEY=your-super-secret-jwt-key
JWT_ALGORITHM=HS256
JWT_ACCESS_TOKEN_EXPIRE_MINUTES=30
# GitHub Integration
GITHUB_TOKEN=your-github-personal-access-token
# Application Settings
LOG_LEVEL=INFO
ENVIRONMENT=development
DEBUG=false
# Embedding Configuration
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
EMBEDDING_DIMENSION=384
BATCH_SIZE=32
# Search Configuration
DEFAULT_SEARCH_LIMIT=10
MAX_SEARCH_LIMIT=100
SIMILARITY_THRESHOLD=0.7# Install PostgreSQL and pgvector extension
# Ubuntu/Debian:
sudo apt-get install postgresql-15 postgresql-15-pgvector
# macOS with Homebrew:
brew install postgresql pgvector
# Enable pgvector extension in your database
psql -d mindbridge -c "CREATE EXTENSION IF NOT EXISTS vector;"
# Run database migrations
python3 -m poetry run alembic upgrade headOnce the server is running, access the interactive API documentation:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
- OpenAPI JSON: http://localhost:8000/openapi.json
# Get JWT token
curl -X POST "http://localhost:8000/auth/login" \
-H "Content-Type: application/json" \
-d '{"username": "admin", "password": "admin"}'
# Use token in subsequent requests
curl -X GET "http://localhost:8000/repositories" \
-H "Authorization: Bearer YOUR_JWT_TOKEN"# Analyze a GitHub repository
curl -X POST "http://localhost:8000/repositories" \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{"github_url": "https://github.com/user/repo"}'
# Check analysis progress
curl -X GET "http://localhost:8000/jobs/{job_id}" \
-H "Authorization: Bearer YOUR_JWT_TOKEN"
# List analyzed repositories
curl -X GET "http://localhost:8000/repositories" \
-H "Authorization: Bearer YOUR_JWT_TOKEN"# Semantic search across documentation
curl -X POST "http://localhost:8000/search" \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"query": "how to implement authentication",
"repository_id": "optional-repo-filter",
"limit": 10
}'
# Search within specific repository
curl -X POST "http://localhost:8000/search" \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"query": "database connection",
"repository_id": "uuid-of-repository",
"limit": 5
}'The project uses pytest with comprehensive test coverage requirements (85%+). All tests require the ALLOWED_ORIGINS environment variable to be set.
# Run all tests (with required environment variable)
ALLOWED_ORIGINS="http://localhost:3000,http://localhost:8080" \
python3 -m poetry run pytest tests/ -v
# Run with coverage report
ALLOWED_ORIGINS="http://localhost:3000,http://localhost:8080" \
python3 -m poetry run pytest tests/ --cov=src --cov-report=html
# Run specific test categories
ALLOWED_ORIGINS="http://localhost:3000,http://localhost:8080" \
python3 -m poetry run pytest tests/database/ -v # Database tests
ALLOWED_ORIGINS="http://localhost:3000,http://localhost:8080" \
python3 -m poetry run pytest tests/ -k "integration" -v # Integration tests only
# Run tests with detailed output
ALLOWED_ORIGINS="http://localhost:3000,http://localhost:8080" \
python3 -m poetry run pytest tests/ -v --tb=short
# Run specific test file
ALLOWED_ORIGINS="http://localhost:3000,http://localhost:8080" \
python3 -m poetry run pytest tests/database/test_models.py -v
# Run tests matching pattern
ALLOWED_ORIGINS="http://localhost:3000,http://localhost:8080" \
python3 -m poetry run pytest tests/ -k "test_vector" -vFor integration tests that require database and Redis connections:
# Start PostgreSQL and Redis services for testing
docker-compose -f docker-compose.dev.yml up -d
# Check service health
docker-compose -f docker-compose.dev.yml ps
# View service logs
docker-compose -f docker-compose.dev.yml logs postgres
docker-compose -f docker-compose.dev.yml logs redis
# Stop services when done
docker-compose -f docker-compose.dev.yml downβ All 167 tests passing β 70% test coverage achieved β CI/CD pipeline fixed and operational
Each test module follows the mandatory pattern:
- Expected Use Case: Normal operation with valid inputs
- Edge Case: Boundary conditions, empty inputs, maximum limits
- Failure Case: Invalid inputs, error conditions, exception handling
Example:
class TestDocumentProcessor:
def test_process_document_success(self):
"""Expected use case: Process valid document successfully"""
# Test implementation
def test_process_empty_document(self):
"""Edge case: Handle empty document"""
# Test implementation
def test_process_document_embedding_failure(self):
"""Failure case: Handle embedding service failure"""
# Test implementation# Production deployment
docker-compose -f docker-compose.prod.yml up -d
# Health check
curl http://your-domain.com/health
# View logs
docker-compose logs -f api
docker-compose logs -f worker# Apply Kubernetes manifests
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/secrets.yaml
kubectl apply -f k8s/postgresql.yaml
kubectl apply -f k8s/redis.yaml
kubectl apply -f k8s/api.yaml
kubectl apply -f k8s/worker.yaml
kubectl apply -f k8s/ingress.yaml
# Check deployment status
kubectl get pods -n mindbridge
kubectl get services -n mindbridge- Development:
docker-compose.yml - Staging:
docker-compose.staging.yml - Production:
docker-compose.prod.yml
# Application health
curl http://localhost:8000/health
# Database connectivity
curl http://localhost:8000/health/db
# Redis connectivity
curl http://localhost:8000/health/redis
# Detailed system metrics
curl http://localhost:8000/metricsThe application uses structured logging with OpenTelemetry integration:
import structlog
logger = structlog.get_logger(__name__)
# Example usage
logger.info(
"Repository analysis started",
repository_id=repo_id,
github_url=url,
user_id=user.id
)- Metrics: Prometheus-compatible metrics available at
/metrics - Tracing: OpenTelemetry traces for request tracking
- Dashboards: Grafana dashboards for visualization
We follow strict development standards outlined in CLAUDE.md.
- Fetch Task: Get assigned task from GitHub Projects
- Understand Requirements: Analyze issue thoroughly
- Create Implementation Plan: Detail technical approach in issue comments
- Wait for Approval: Get plan reviewed before coding
- TDD Development: Write tests first, implement second
- Quality Gates: Pass all tests, linting, and coverage requirements
- Update Status: Keep GitHub issue status current
- Test Coverage: Minimum 85%
- Linting: Ruff, Black, isort compliance
- Type Checking: mypy strict mode
- Documentation: Google-style docstrings for all public APIs
# Run quality checks
python3 -m poetry run pytest tests/ --cov=src --cov-fail-under=85
python3 -m poetry run pre-commit run --all-files# Install pre-commit hooks
python3 -m poetry run pre-commit install
# Run hooks manually
python3 -m poetry run pre-commit run --all-files- Repository Analysis: ~2-5 minutes for typical Python repository (1000+ files)
- Search Latency: <100ms for vector similarity search
- Throughput: 100+ concurrent search requests
- Storage: ~10MB per 1000 documents (with embeddings)
- Batch Processing: Use appropriate batch sizes for embedding generation
- Database Indexing: Ensure proper indexes on frequently queried columns
- Caching: Leverage Redis for frequently accessed data
- Connection Pooling: Configure appropriate database connection pools
- Authentication: JWT-based with configurable expiration
- Authorization: Role-based access control (RBAC)
- Input Validation: Comprehensive request validation
- Rate Limiting: Protection against abuse
- HTTPS: TLS encryption for all communications
- Secrets Management: Environment-based configuration
# Run security scan
python3 -m poetry run bandit -r src/
# Check for vulnerabilities (if safety is installed)
python3 -m poetry run safety check
# Audit dependencies
python3 -m poetry audit- β PostgreSQL + pgvector for vector operations
- β Python repository analysis
- β Markdown documentation processing
- β Semantic search functionality
- β JWT authentication
- β Background job processing
- Neo4j integration for code relationships
- Hybrid search (vector + graph traversal)
- Advanced dependency analysis
- Cross-reference linking
- JavaScript/TypeScript analysis
- Java code processing
- Go language support
- Generic language framework
- Query planning and optimization
- Tool usage and function calling
- Learning from user feedback
- Context-aware recommendations
- Multi-tenant architecture
- Advanced analytics and reporting
- Custom embedding models
- Enterprise SSO integration
# Check PostgreSQL service
sudo systemctl status postgresql
# Test database connectivity
psql -h localhost -U username -d mindbridge -c "SELECT version();"
# Check pgvector extension
psql -d mindbridge -c "SELECT * FROM pg_extension WHERE extname = 'vector';"# Check Redis service
redis-cli ping
# Monitor Redis activity
redis-cli monitor
# Check Redis memory usage
redis-cli info memory
# Check Redis configuration
redis-cli config get "*"
# Monitor Redis slow queries
redis-cli slowlog get 10
# Check connected clients
redis-cli client list# Test cache operations
python3 -c "
import asyncio
from mindbridge.cache.redis_cache import get_redis_cache
async def test_cache():
cache = get_redis_cache()
await cache.connect()
print('Cache connected:', await cache.ping())
await cache.set('test', 'value')
print('Cache get:', await cache.get('test'))
await cache.disconnect()
asyncio.run(test_cache())
"
# Check Celery broker connection
python3 -c "
from mindbridge.jobs.celery_config import get_celery_config
from mindbridge.jobs.celery_app import check_broker_connection
import asyncio
async def test_broker():
config = get_celery_config()
connected = await check_broker_connection(config)
print('Broker connected:', connected)
asyncio.run(test_broker())
"
# Monitor Celery queue lengths
redis-cli -h localhost -p 6379 -n 0 llen celery
# Clear Redis cache (use with caution)
redis-cli flushdb# Check available system resources
htop
# Monitor GPU usage (if available)
nvidia-smi
# Adjust batch size in configuration
export BATCH_SIZE=16 # Reduce if memory constrained# Check Celery worker status
python3 -m poetry run celery -A src.worker.celery_app inspect active
# Monitor job queue
python3 -m poetry run celery -A src.worker.celery_app inspect reserved
# Restart workers
docker-compose restart worker# Issue: Tests fail with "ALLOWED_ORIGINS environment variable must be set"
# Solution: Always run tests with the environment variable
ALLOWED_ORIGINS="http://localhost:3000,http://localhost:8080" \
python3 -m poetry run pytest tests/ -v
# Issue: Health check tests fail
# Solution: Start local development services
docker-compose -f docker-compose.dev.yml up -d
# Issue: AsyncMock import errors in tests
# Solution: Ensure proper async mocking in test files
from unittest.mock import AsyncMock
# Issue: CI tests failing but local tests pass
# Solution: Check CI has ALLOWED_ORIGINS environment variable in workflow- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Project Wiki
This project is licensed under the MIT License - see the LICENSE file for details.
- sentence-transformers for embedding generation
- pgvector for PostgreSQL vector operations
- FastAPI for the excellent web framework
- Celery for distributed task processing
Built with β€οΈ for the developer community
For detailed development guidelines, see CLAUDE.md