Concordance is an open-source inference stack that lets you observe, modify, and control LLM generation in real-time. It provides:
- Quote Engine — An inference server with a programmable mod system for token-level intervention
- Thunder Backend — Observability service that captures full inference traces
- Web UI — Frontend for exploring traces, viewing mod actions, and debugging generation
- CLI — Command-line tool for local development and mod management
- Architecture
- Quick Start
- Manual Setup
- Writing Mods
- Deployment (Modal)
- Component Documentation
- Configuration Reference
- Project Structure
- Contributing
- License
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Frontend │────▶│ Backend │◀────│ Engine │
│ (React/TS) │ │ (Rust/Axum) │ │ (Python) │
└─────────────┘ └─────────────┘ └─────────────┘
:3000 :6767 :8000
The fastest way to get Concordance running is with our interactive setup script.
Before running the setup script, make sure you have the following installed:
| Tool | Purpose | Installation |
|---|---|---|
| uv | Python package manager | curl -LsSf https://astral.sh/uv/install.sh | sh |
| Rust | Backend and CLI | curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh |
| Node.js 18+ | Frontend | Download from nodejs.org or use your package manager |
| psql | Database migrations | brew install postgresql (macOS) or apt install postgresql-client (Linux) |
You'll also need:
- A Hugging Face account with an API token (get one here)
1. Set up a PostgreSQL database
The backend requires a PostgreSQL database. We recommend Neon for a free, serverless Postgres:
- Create an account at neon.tech
- Create a new project
- Copy your connection string from the dashboard
Your connection string will look like:
postgresql://user:[email protected]/dbname?sslmode=require
2. Get your Hugging Face token
- Go to huggingface.co/settings/tokens
- Create a new token with read access
- Copy the token (starts with
hf_)
Clone the repository and run the interactive setup:
git clone https://github.com/concordance-co/quote.git
cd quote
./setup.shThe setup script will guide you through configuring all components:
| Step | What it configures |
|---|---|
| Prerequisites Check | Verifies uv, Rust, Node.js, npm are installed; optionally installs missing tools |
| Backend Setup | Database URL, server host/port, bootstrap secret, playground settings |
| Database Migrations | Runs SQL migrations to create required tables |
| Engine Setup | HF token, admin key, model ID, deployment mode (local/Modal), server settings |
| Frontend Setup | API URL, WebSocket URL for real-time streaming |
| Dependency Installation | Builds backend, installs Python/Node packages |
You can also run setup for individual components:
./setup.sh --quick backend # Set up only the backend
./setup.sh --quick engine # Set up only the engine
./setup.sh --quick frontend # Set up only the frontend
./setup.sh --quick all # Set up everything (non-interactive defaults)After setup, use the run script to start all services:
./run.sh start # Start all services
./run.sh status # Check service status
./run.sh logs engine # View engine logs
./run.sh stop # Stop all servicesOr start services individually:
./run.sh start backend
./run.sh start engine
./run.sh start frontendOnce running:
- Frontend: http://localhost:3000
- Backend API: http://localhost:6767
- Engine API: http://localhost:8000
Ready to build your first mod? Visit docs.concordance.co to get started!
If you prefer to set things up manually, follow these steps:
The backend requires a PostgreSQL database. We recommend Neon for a free, serverless Postgres:
- Create an account at neon.tech
- Create a new project
- Copy your connection string from the dashboard (looks like
postgresql://user:[email protected]/dbname)
See Neon's quickstart guide for detailed instructions.
cd backend
cp .env.example .envEdit .env and set your database URL:
DATABASE_URL=postgresql://user:[email protected]/dbname?sslmode=require
Run database migrations:
./run_migration.shThen start the server:
cargo runVerify it's running:
curl http://localhost:6767/healthzThe engine runs the LLM inference with mod support.
cd engineCreate an inference/.env file with your Hugging Face token:
HF_TOKEN=hf_your_token_here
MODEL_ID=modularai/Llama-3.1-8B-Instruct-GGUF
Install dependencies and start the server:
uv sync --all-packages
uv pip install -e inference
uv run -m quote.server.openai.local --host 0.0.0.0 --port 8000Note: First run downloads the model and compiles it, which takes several minutes. Subsequent starts are faster.
Test the engine:
curl http://localhost:8000/v1/modelscd frontend
npm install
npm run devOpen http://localhost:3000 in your browser.
Mods let you intercept and modify inference at the token level. Here's a simple example:
from quote_mod_sdk import mod, ForwardPassEvent, tokenize
@mod
def inject_thinking(event, actions, tokenizer):
if isinstance(event, ForwardPassEvent) and event.step == 0:
tokens = tokenize("<think>", tokenizer)
return actions.force_tokens(tokens)
return actions.noop()Upload mods to a running server:
# Install the CLI first
cargo install --path cli
# Upload your mod
concai mod upload --file-name my_mod.pyThen enable the mod in your API calls by appending the mod name to the model:
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "modularai/Llama-3.1-8B-Instruct-GGUF/inject_thinking",
"messages": [{"role": "user", "content": "Hello"}]
}'See engine/sdk/README.md for the full mod authoring guide, or visit docs.concordance.co to build your first mod!
For GPU inference in production, deploy the engine to Modal:
cd engine/inference
modal serve src/quote/server/openai/remote.pyModal provides serverless GPU instances that scale to zero when not in use. See engine/inference/README.md for full deployment details.
| Component | Description | Docs |
|---|---|---|
| Engine | Inference server with mod system | engine/inference/README.md |
| Mod SDK | Python SDK for authoring mods | engine/sdk/README.md |
| Backend | Observability and logging service | backend/README.md |
| CLI | Command-line tool | cli/README.md |
| Frontend | Web UI | frontend/README.md |
| Variable | Required | Description |
|---|---|---|
DATABASE_URL |
Yes | Postgres connection string |
APP_HOST |
No | Server bind address (default: 127.0.0.1) |
APP_PORT |
No | Server port (default: 6767) |
BOOTSTRAP_SECRET |
No | Secret for creating initial admin API key |
PLAYGROUND_ADMIN_KEY |
No | Admin key for playground feature |
PLAYGROUND_LLAMA_8B_URL |
No | Modal URL for Llama 8B playground |
PLAYGROUND_QWEN_14B_URL |
No | Modal URL for Qwen 14B playground |
| Variable | Required | Description |
|---|---|---|
HF_TOKEN |
Yes* | Hugging Face token for model downloads |
MODEL_ID |
No | Model to load (default: modularai/Llama-3.1-8B-Instruct-GGUF) |
ADMIN_KEY |
No | Admin key for authenticated operations |
HOST |
No | Server bind address (default: 0.0.0.0) |
PORT |
No | Server port (default: 8000) |
USERS_PATH |
No | Path to users JSON (default: ./users/users.json) |
MODS_BASE |
No | Base path for mods storage (default: ./mods) |
QUOTE_LOG_INGEST_URL |
No | Backend URL for sending inference logs |
| Variable | Required | Description |
|---|---|---|
VITE_API_URL |
No | Backend API URL (default: /api) |
VITE_WS_URL |
No | WebSocket URL for log streaming (default: ws://localhost:6767) |
See each component's .env.example for all available options.
concordance/
├── backend/ # Rust observability service (Thunder)
├── cli/ # Rust CLI tool (concai)
├── engine/
│ ├── inference/ # Python inference server (Quote)
│ ├── sdk/ # Mod SDK
│ └── shared/ # Shared utilities
├── frontend/ # React web UI
├── scripts/ # Build and release scripts
├── setup.sh # Interactive setup script
└── run.sh # Service management script
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests (
cargo test,uv run pytest,npm test) - Submit a pull request
MIT