AI Agent Coding Coach (MVP)

This repository contains an AI-assisted coding planning prototype that runs locally with a llama.cpp-backed LLM. The latest build replaces the heuristic agent with a multi-step LangGraph pipeline (planner → coder → reviewer) so the model produces higher-quality plans and code snippets.

Architecture

app/agent/engine.py – LangGraph orchestration that forwards chat history to a llama.cpp HTTP server. It now chains planner, coder, reviewer nodes and streams each stage (SSE) back to the UI for real-time visibility.
app/server.py – threaded HTTP server that exposes /api/session, /api/agent, and serves the static frontend.
public/ – minimal UI written with vanilla JS + CSS that lets you iterate on prompts and read the agent's output.

Future work can swap the engine with an OpenAI/Anthropic client, add persistent vector memory, and stream tokens back to the UI.

Running locally

Install dependencies:
```
pip install -r requirements.txt
```
Install llama.cpp (example on macOS):
```
brew install llama.cpp
```
Start the DeepSeek model with the built-in server:
```
llama-server -hf lmstudio-community/DeepSeek-Coder-V2-Lite-Instruct-GGUF:Q4_K_M
```
By default this listens on http://127.0.0.1:8080. Export a different URL via LLAMA_SERVER_URL if needed.
Launch the web app:
```
python app/server.py
```

Then open http://127.0.0.1:8000 in a browser. Each browser tab initializes a new session so you can keep experiments separate.

Benchmark runner

You can batch-evaluate prompts/models/agent variants with the built-in runner.

Define tasks in benchmarks/tasks.jsonl (or the new benchmarks/tasks_swe.jsonl for larger SWE-style evaluations). Each line is a JSON object with the fields: id, prompt, optional language, and test (path to a Python checker that receives the generated code). Algorithmic checkers now live in benchmarks/algorithm_test/, while the higher-context SWE exercises reside in benchmarks/swe_benchmark_test/.
Choose an engine:
- local-multi, local-single – llama.cpp-backed agents (needs the local server running via LLAMA_SERVER_URL/LLAMA_SERVER_MODEL).
- api-multi, api-single – OpenAI-backed agents (requires OPENAI_API_KEY).

Run the suite:

python app/run_bench.py \
  --engine local-multi \
  --tasks benchmarks/tasks.jsonl \
  --output results/multi.jsonl

The runner stores newline-delimited JSON outputs (success, elapsed seconds, checker logs, raw responses) so you can compute aggregate metrics later. Use --limit N for smoke tests.

Common run commands (short)

Local execution agent (no toolchain, just codegen + checker):
python app/run_bench.py --engine local-exec --label exec-loop --output results/exec.jsonl
Local self-test agent (agent writes/runs its own tests):
python app/run_bench.py --engine local-selftest --tasks benchmarks/tasks.jsonl --output results/english_selftest.jsonl
API self-test (OpenAI):
python app/run_bench.py --engine api-selftest --tasks benchmarks/tasks.jsonl --output results/english_selftest_api.jsonl
Smoke run first 5 tasks:
python app/run_bench.py --engine local-multi --limit 5
Default output path (if omitted): results/latest.jsonl

Next steps

Add LangGraph subgraphs for tool selection + retrieval augmented planning.
Persist session history (Redis/Postgres) for multi-device continuity.
Add WebSocket streaming so plans render progressively.
Package the server as FastAPI / Next.js API routes for production use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Agent Coding Coach (MVP)

Architecture

Running locally

Benchmark runner

Common run commands (short)

Next steps

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
app		app
benchmarks		benchmarks
public		public
results		results
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

taeyoonh/agent_coder

Folders and files

Latest commit

History

Repository files navigation

AI Agent Coding Coach (MVP)

Architecture

Running locally

Benchmark runner

Common run commands (short)

Next steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages