-
Notifications
You must be signed in to change notification settings - Fork 8
Evaluation: Use Config Management #477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughEvaluation endpoints, CRUD, models, and tests now use stored configs referenced by Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API as Evaluation API
participant Config as ConfigVersionCrud
participant Eval as Evaluation CRUD
participant Batch as Batch Service
participant LLM as OpenAI Provider
Client->>API: POST /evaluate (config_id, config_version, dataset_id...)
API->>Config: resolve_config_blob(config_id, config_version)
Config-->>API: config blob (KaapiLLMParams) or error
API->>API: validate provider == "openai"
API->>Eval: create_evaluation_run(config_id, config_version, metadata)
Eval-->>API: EvaluationRun (id, config_id, config_version)
API->>Batch: start_evaluation_batch(eval_run, config.completion.params)
Batch->>LLM: enqueue/call provider
Batch-->>Eval: update run status / attach batch ids
API-->>Client: 200 Created (EvaluationRun) or 4xx on error
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (2)**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
backend/app/tests/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🧠 Learnings (1)📚 Learning: 2025-12-17T15:39:30.469ZApplied to files:
🧬 Code graph analysis (1)backend/app/tests/crud/evaluations/test_processing.py (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (7)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (4)
backend/app/crud/evaluations/embeddings.py (1)
366-367: Misleading comment - update to reflect actual behavior.The comment says "Get embedding model from config" but the code hardcodes the value. Update the comment to accurately describe the implementation.
- # Get embedding model from config (default: text-embedding-3-large) - embedding_model = "text-embedding-3-large" + # Use fixed embedding model (text-embedding-3-large) + embedding_model = "text-embedding-3-large"backend/app/tests/api/routes/test_evaluation.py (1)
524-545: Consider renaming function to match its new purpose.The function
test_start_batch_evaluation_missing_modelwas repurposed to test invalidconfig_idscenarios. The docstring was updated but the function name still references "missing_model". Consider renaming for clarity.- def test_start_batch_evaluation_missing_model(self, client, user_api_key_header): - """Test batch evaluation fails with invalid config_id.""" + def test_start_batch_evaluation_invalid_config_id(self, client, user_api_key_header): + """Test batch evaluation fails with invalid config_id."""backend/app/api/routes/evaluation.py (1)
499-510: Consider validating thatmodelis present in config params.The model is extracted with
.get("model")which returnsNoneif not present. Sincemodelis critical for cost tracking (used increate_langfuse_dataset_run), consider validating its presence and returning an error if missing.# Extract model from config for storage model = config.completion.params.get("model") + if not model: + raise HTTPException( + status_code=400, + detail="Config must specify a 'model' in completion params for evaluation", + ) # Create EvaluationRun record with config referencesbackend/app/crud/evaluations/core.py (1)
15-69: Config-basedcreate_evaluation_runrefactor is correctly implemented; consider loggingmodelfor improved traceability.The refactor from inline config dict to
config_id: UUIDandconfig_version: intis properly implemented throughout:
- The sole call site in
backend/app/api/routes/evaluation.py:503correctly passes all new parameters with the right types (config_idas UUID,config_versionas int,modelextracted from config).- The
EvaluationRunmodel inbackend/app/models/evaluation.pycorrectly defines all three fields with appropriate types and descriptions.- All type hints align with Python 3.11+ guidelines.
One suggested improvement for debugging:
Include
modelin the creation log for better traceability when correlating evaluation runs with model versions:logger.info( f"Created EvaluationRun record: id={eval_run.id}, run_name={run_name}, " - f"config_id={config_id}, config_version={config_version}" + f"config_id={config_id}, config_version={config_version}, model={model}" )Since the model is already extracted at the call site and passed to the function, including it in the log will provide fuller context for operational debugging without any additional cost.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
backend/app/alembic/versions/7b48f23ebfdd_add_config_id_and_version_in_evals_run_.py(1 hunks)backend/app/api/routes/evaluation.py(5 hunks)backend/app/crud/evaluations/core.py(5 hunks)backend/app/crud/evaluations/embeddings.py(1 hunks)backend/app/crud/evaluations/processing.py(1 hunks)backend/app/models/evaluation.py(3 hunks)backend/app/tests/api/routes/test_evaluation.py(5 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/api/routes/evaluation.pybackend/app/models/evaluation.pybackend/app/crud/evaluations/embeddings.pybackend/app/tests/api/routes/test_evaluation.pybackend/app/crud/evaluations/processing.pybackend/app/crud/evaluations/core.pybackend/app/alembic/versions/7b48f23ebfdd_add_config_id_and_version_in_evals_run_.py
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Expose FastAPI REST endpoints under backend/app/api/ organized by domain
Files:
backend/app/api/routes/evaluation.py
backend/app/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define SQLModel entities (database tables and domain objects) in backend/app/models/
Files:
backend/app/models/evaluation.py
backend/app/crud/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Implement database access operations in backend/app/crud/
Files:
backend/app/crud/evaluations/embeddings.pybackend/app/crud/evaluations/processing.pybackend/app/crud/evaluations/core.py
🧬 Code graph analysis (2)
backend/app/tests/api/routes/test_evaluation.py (2)
backend/app/crud/evaluations/batch.py (1)
build_evaluation_jsonl(62-115)backend/app/models/evaluation.py (2)
EvaluationDataset(74-130)EvaluationRun(133-248)
backend/app/crud/evaluations/processing.py (1)
backend/app/crud/evaluations/langfuse.py (1)
create_langfuse_dataset_run(20-163)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (3)
backend/app/crud/evaluations/processing.py (1)
257-264: LGTM! Clean refactor to use stored model field.The change correctly retrieves the model from
eval_run.modelinstead of extracting it from config. This aligns with the new data model where the model is snapshotted at evaluation creation time.backend/app/models/evaluation.py (1)
148-157: LGTM! Well-structured config reference fields.The new
config_idandconfig_versionfields properly establish the relationship to stored configs with appropriate constraints (ge=1for version). The nullable design allows backward compatibility with existing data.backend/app/api/routes/evaluation.py (1)
478-495: LGTM! Robust config resolution with provider validation.The config resolution flow properly validates that the stored config exists and uses the OPENAI provider. Error handling returns appropriate HTTP 400 responses with descriptive messages.
backend/app/alembic/versions/040_add_config_in_evals_run_table.py
Outdated
Show resolved
Hide resolved
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
backend/app/models/evaluation.py (1)
148-158: AlignEvaluationRuntype hints with nullable DB columns for config fields
config_idandconfig_versionare nullable in the schema but annotated as non-optional types. This can mislead callers and type checkers into assuming they’re always present, even for legacy runs or transitional data.Consider updating the annotations to reflect nullability:
- config_id: UUID = SQLField( + config_id: UUID | None = SQLField( foreign_key="config.id", nullable=True, description="Reference to the stored config used for this evaluation", ) - config_version: int = SQLField( + config_version: int | None = SQLField( nullable=True, ge=1, description="Version of the config used for this evaluation", )This keeps the schema the same while making runtime and type expectations clearer.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/models/evaluation.py(3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/models/evaluation.py
backend/app/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define SQLModel entities (database tables and domain objects) in backend/app/models/
Files:
backend/app/models/evaluation.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (1)
backend/app/models/evaluation.py (1)
271-273: Public model nullability now matches the schemaMaking
config_id,config_version, andmodelnullable inEvaluationRunPubliccorrectly reflects the DB fields and avoids validation issues for existing rows. This resolves the earlier mismatch between the table and the public model.
backend/app/alembic/versions/7b48f23ebfdd_add_config_id_and_version_in_evals_run_.py
Outdated
Show resolved
Hide resolved
f5b94b0 to
cdb0b2e
Compare
…ssistant_id handling
…nstead of config dict
…ig_version fields
…ve_model_from_config function, and update processing logic to use config references
c9cc51a to
a2c8a95
Compare
Prajna1999
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
hold merge- untill frontend is ready. |
|
good to go. Can be merged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/api/routes/evaluation.py (1)
8-16: Critical: MissingDependsimport.The pipeline failure indicates
NameError: name 'Depends' is not definedat line 130. The import statement forDependsfromfastapiis missing.🐛 Add missing import
from fastapi import ( APIRouter, Body, + Depends, File, Form, HTTPException, Query, UploadFile, )
🤖 Fix all issues with AI agents
In @backend/app/alembic/versions/041_add_config_in_evals_run_table.py:
- Line 20: The migration functions upgrade() and downgrade() lack return type
hints; update both function definitions (upgrade and downgrade) to include
explicit return types (e.g., change "def upgrade():" and "def downgrade():" to
"def upgrade() -> None:" and "def downgrade() -> None:") so they conform to the
project's typing guidelines.
- Around line 47-56: The downgrade currently re-adds the "config" column on the
"evaluation_run" table using op.add_column with sa.Column(..., nullable=False)
which will fail if rows exist; update that op.add_column call in the downgrade
to use nullable=True (or alternatively add a server_default or a prior data
migration to populate values before setting non-nullable), ensuring the column
is created nullable during downgrade to avoid PostgreSQL errors.
In @backend/app/api/routes/evaluation.py:
- Around line 505-509: The code references a non-existent constant
LLMProvider.OPENAI in the evaluation config validation, causing AttributeError;
update the check in evaluation.py (the block that raises HTTPException) to
compare against the actual provider string "openai" (i.e., use
config.completion.provider != "openai") or alternatively add a new constant
OPENAI = "openai" to the LLMProvider class in
backend/app/services/llm/providers/registry.py so the symbol exists and matches
tests; pick one approach and ensure the error message and tests remain
consistent with the chosen value.
In @backend/app/crud/evaluations/core.py:
- Around line 308-349: resolve_model_from_config currently declares returning
str but assigns model = config.completion.params.get("model") which may be None;
update resolve_model_from_config to validate that model is present and a str
(e.g., if not model: raise ValueError(...) with context including eval_run.id,
config_id, config_version) before returning, or coerce/choose a safe default
only if intended; reference the resolve_model_from_config function and the model
variable from config.completion.params.get("model") when implementing the check.
In @backend/app/crud/evaluations/processing.py:
- Around line 257-263: resolve_model_from_config currently uses
config.get("model") which can return None despite its str return annotation and
docstring promise; modify resolve_model_from_config to validate the retrieved
value and raise ValueError if missing (or alternatively change the function
signature to return str | None and update callers), e.g., after fetching model =
config.get("model") check if model is truthy and raise ValueError("missing model
in config") to enforce the contract so callers like
resolve_model_from_config(session=session, eval_run=eval_run) always receive a
str or an explicit None-aware type is used consistently.
🧹 Nitpick comments (1)
backend/app/alembic/versions/041_add_config_in_evals_run_table.py (1)
1-60: Consider a multi-step migration strategy for safer deployment.Given the destructive nature of this schema change (dropping the
configcolumn) and the PR status ("hold merge - until frontend is ready"), consider deploying this as a multi-phase migration:Phase 1: Add new columns without dropping old ones
- Add
config_idandconfig_version(nullable)- Add foreign key constraint
- Deploy application code that writes to both old and new columns
Phase 2: Backfill existing data
- Create a data migration script to populate
config_id/config_versionfrom existingconfigJSONB- Validate data integrity
Phase 3: Cut over
- Deploy application code that only uses new columns
- Monitor for issues
Phase 4: Cleanup
- Drop the old
configcolumn in a subsequent migrationThis approach provides:
- Zero downtime deployment
- Easy rollback at each phase
- Data preservation and validation
- Safer production deployment
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
backend/app/alembic/versions/041_add_config_in_evals_run_table.pybackend/app/api/routes/evaluation.pybackend/app/crud/evaluations/__init__.pybackend/app/crud/evaluations/core.pybackend/app/crud/evaluations/embeddings.pybackend/app/crud/evaluations/processing.pybackend/app/models/evaluation.pybackend/app/tests/api/routes/test_evaluation.py
🚧 Files skipped from review as they are similar to previous changes (2)
- backend/app/crud/evaluations/embeddings.py
- backend/app/models/evaluation.py
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets:logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase
Files:
backend/app/tests/api/routes/test_evaluation.pybackend/app/alembic/versions/041_add_config_in_evals_run_table.pybackend/app/api/routes/evaluation.pybackend/app/crud/evaluations/__init__.pybackend/app/crud/evaluations/processing.pybackend/app/crud/evaluations/core.py
backend/app/tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use factory pattern for test fixtures in
backend/app/tests/
Files:
backend/app/tests/api/routes/test_evaluation.py
backend/app/alembic/versions/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Generate database migrations using
alembic revision --autogenerate -m "Description" --rev-id <number>where rev-id is the latest existing revision ID + 1
Files:
backend/app/alembic/versions/041_add_config_in_evals_run_table.py
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
backend/app/api/**/*.py: Define FastAPI REST endpoints inbackend/app/api/organized by domain
Load Swagger endpoint descriptions from external markdown files instead of inline strings usingload_description("domain/action.md")
Files:
backend/app/api/routes/evaluation.py
🧠 Learnings (2)
📚 Learning: 2025-12-17T15:39:30.469Z
Learnt from: CR
Repo: ProjectTech4DevAI/kaapi-backend PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-17T15:39:30.469Z
Learning: Applies to backend/app/alembic/versions/*.py : Generate database migrations using `alembic revision --autogenerate -m "Description" --rev-id <number>` where rev-id is the latest existing revision ID + 1
Applied to files:
backend/app/alembic/versions/041_add_config_in_evals_run_table.py
📚 Learning: 2025-12-17T15:39:30.469Z
Learnt from: CR
Repo: ProjectTech4DevAI/kaapi-backend PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-17T15:39:30.469Z
Learning: Organize backend code in `backend/app/` following the layered architecture: Models, CRUD, Routes, Core, Services, and Celery directories
Applied to files:
backend/app/api/routes/evaluation.py
🧬 Code graph analysis (5)
backend/app/tests/api/routes/test_evaluation.py (3)
backend/app/crud/evaluations/batch.py (1)
build_evaluation_jsonl(62-115)backend/app/models/evaluation.py (2)
EvaluationDataset(74-168)EvaluationRun(171-322)backend/app/tests/utils/test_data.py (1)
create_test_config(239-302)
backend/app/api/routes/evaluation.py (6)
backend/app/crud/config/version.py (1)
ConfigVersionCrud(15-142)backend/app/models/llm/request.py (1)
LLMCallConfig(132-188)backend/app/services/llm/jobs.py (1)
resolve_config_blob(84-116)backend/app/services/llm/providers/registry.py (1)
LLMProvider(14-41)backend/app/utils.py (4)
APIResponse(33-57)get_langfuse_client(212-248)get_openai_client(179-209)load_description(393-398)backend/app/crud/evaluations/core.py (1)
create_evaluation_run(18-71)
backend/app/crud/evaluations/__init__.py (1)
backend/app/crud/evaluations/core.py (1)
resolve_model_from_config(308-349)
backend/app/crud/evaluations/processing.py (2)
backend/app/crud/evaluations/core.py (2)
update_evaluation_run(154-206)resolve_model_from_config(308-349)backend/app/crud/evaluations/langfuse.py (1)
create_langfuse_dataset_run(21-164)
backend/app/crud/evaluations/core.py (3)
backend/app/crud/config/version.py (1)
ConfigVersionCrud(15-142)backend/app/models/llm/request.py (1)
LLMCallConfig(132-188)backend/app/services/llm/jobs.py (1)
resolve_config_blob(84-116)
🪛 GitHub Actions: Kaapi CI
backend/app/api/routes/evaluation.py
[error] 130-130: NameError: name 'Depends' is not defined.
🔇 Additional comments (4)
backend/app/crud/evaluations/__init__.py (1)
8-8: LGTM!The new
resolve_model_from_configfunction is correctly imported and exported for public use.Also applies to: 43-43
backend/app/tests/api/routes/test_evaluation.py (1)
3-3: LGTM!The test updates correctly reflect the shift from inline config dictionaries to stored config references. The use of
create_test_configfactory function aligns with the coding guidelines for test fixtures, and the error scenarios properly test config-not-found cases.Also applies to: 10-10, 499-545, 728-803
backend/app/api/routes/evaluation.py (1)
492-509: Verify config resolution error handling covers all failure modes.The config resolution logic handles errors from
resolve_config_bloband validates the provider, but ensure that:
- Config version not found scenarios are properly handled
- Invalid/corrupted config blobs are caught
- The provider validation matches actual config schemas used in production
backend/app/crud/evaluations/core.py (1)
66-69: LGTM!The logging statement correctly follows the coding guideline format with function context and includes the new config_id and config_version fields.
| depends_on = None | ||
|
|
||
|
|
||
| def upgrade(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add return type hints to migration functions.
Both upgrade() and downgrade() functions are missing return type hints.
As per coding guidelines, all functions should have type hints.
📝 Proposed fix
-def upgrade():
+def upgrade() -> None:-def downgrade():
+def downgrade() -> None:Also applies to: 45-45
🤖 Prompt for AI Agents
In @backend/app/alembic/versions/041_add_config_in_evals_run_table.py at line
20, The migration functions upgrade() and downgrade() lack return type hints;
update both function definitions (upgrade and downgrade) to include explicit
return types (e.g., change "def upgrade():" and "def downgrade():" to "def
upgrade() -> None:" and "def downgrade() -> None:") so they conform to the
project's typing guidelines.
| op.add_column( | ||
| "evaluation_run", | ||
| sa.Column( | ||
| "config_id", | ||
| sa.Uuid(), | ||
| nullable=True, | ||
| comment="Reference to the stored config used", | ||
| ), | ||
| ) | ||
| op.add_column( | ||
| "evaluation_run", | ||
| sa.Column( | ||
| "config_version", | ||
| sa.Integer(), | ||
| nullable=True, | ||
| comment="Version of the config used", | ||
| ), | ||
| ) | ||
| op.create_foreign_key(None, "evaluation_run", "config", ["config_id"], ["id"]) | ||
| op.drop_column("evaluation_run", "config") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Data loss and foreign key constraint naming issues.
This migration has several critical problems:
-
Data loss: Line 41 drops the
configcolumn without migrating existing data to the newconfig_id/config_versioncolumns. Any existing evaluation runs will lose their configuration data permanently. -
Foreign key constraint naming: Line 40 creates a foreign key with
Noneas the constraint name, causing Alembic to auto-generate a name. However, the downgrade function (Line 57) also usesNoneto drop the constraint, which won't match the auto-generated name and will fail.
Required actions:
-
Add a data migration step before dropping the
configcolumn. You'll need to:- Parse each existing
configJSONB object - Look up or create corresponding
configrecords with appropriate versions - Update
config_idandconfig_versionfor each evaluation_run - Or, if data migration isn't feasible, add a comment explaining why data loss is acceptable
- Parse each existing
-
Specify an explicit constraint name instead of
None:
🔧 Proposed fix for FK constraint naming
- op.create_foreign_key(None, "evaluation_run", "config", ["config_id"], ["id"])
+ op.create_foreign_key(
+ "fk_evaluation_run_config_id",
+ "evaluation_run",
+ "config",
+ ["config_id"],
+ ["id"]
+ )And update the downgrade:
- op.drop_constraint(None, "evaluation_run", type_="foreignkey")
+ op.drop_constraint("fk_evaluation_run_config_id", "evaluation_run", type_="foreignkey")Committable suggestion skipped: line range outside the PR's diff.
| op.add_column( | ||
| "evaluation_run", | ||
| sa.Column( | ||
| "config", | ||
| postgresql.JSONB(astext_type=sa.Text()), | ||
| autoincrement=False, | ||
| nullable=False, | ||
| comment="Evaluation configuration (model, instructions, etc.)", | ||
| ), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Downgrade will fail with existing data.
The downgrade re-adds the config column with nullable=False (Line 53). If the evaluation_run table contains any records when downgrading, this operation will fail because PostgreSQL cannot add a non-nullable column to a table with existing rows without specifying a default value.
Either:
- Make the column nullable during downgrade:
nullable=True - Provide a server default value
- Add a data migration to populate the column before setting it non-nullable
🔧 Proposed fix (Option 1: Make nullable)
op.add_column(
"evaluation_run",
sa.Column(
"config",
postgresql.JSONB(astext_type=sa.Text()),
autoincrement=False,
- nullable=False,
+ nullable=True,
comment="Evaluation configuration (model, instructions, etc.)",
),
)🤖 Prompt for AI Agents
In @backend/app/alembic/versions/041_add_config_in_evals_run_table.py around
lines 47 - 56, The downgrade currently re-adds the "config" column on the
"evaluation_run" table using op.add_column with sa.Column(..., nullable=False)
which will fail if rows exist; update that op.add_column call in the downgrade
to use nullable=True (or alternatively add a server_default or a prior data
migration to populate values before setting non-nullable), ensuring the column
is created nullable during downgrade to avoid PostgreSQL errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/api/routes/evaluation.py (1)
527-534: Fix type incompatibility instart_evaluation_batchcall.The function expects
config: dict[str, Any], butconfig.completion.paramscould be eitherdict[str, Any](NativeCompletionConfig) orKaapiLLMParams(KaapiCompletionConfig). Update the function signature to accept both types:config: dict[str, Any] | KaapiLLMParamsAlternatively, convert
config.completion.paramsto a dict before passing it if the function strictly requires a dict.
🤖 Fix all issues with AI agents
In @backend/app/api/routes/evaluation.py:
- Around line 520-524: The attribute access is inconsistent: replace
auth_context.organization.id and auth_context.project.id with the existing
variables organization_.id and project_.id to match the rest of the file; update
the call that currently passes config_id=config_id,
config_version=config_version, organization_id=auth_context.organization.id,
project_id=auth_context.project.id to use organization_.id and project_.id
(keeping the same parameter names) so all references are consistent with the
file’s use of organization_ and project_.
- Around line 493-495: Replace direct nullable attribute access to AuthContext
with the non-null asserting properties: update usages of auth_context.project.id
to auth_context.project_.id and auth_context.organization.id to
auth_context.organization_.id (e.g., where ConfigVersionCrud is instantiated as
config_version_crud = ConfigVersionCrud(session=_session, config_id=config_id,
project_id=auth_context.project.id) change to use project_); scan the rest of
this module for any other auth_context.project or auth_context.organization
usages and switch them to project_ and organization_ to avoid potential
AttributeError when those attributes are None.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/api/routes/evaluation.py
🧰 Additional context used
📓 Path-based instructions (2)
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
backend/app/api/**/*.py: Define FastAPI REST endpoints inbackend/app/api/organized by domain
Load Swagger endpoint descriptions from external markdown files instead of inline strings usingload_description("domain/action.md")
Files:
backend/app/api/routes/evaluation.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets:logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase
Files:
backend/app/api/routes/evaluation.py
🧠 Learnings (2)
📚 Learning: 2025-12-17T15:39:30.469Z
Learnt from: CR
Repo: ProjectTech4DevAI/kaapi-backend PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-17T15:39:30.469Z
Learning: Organize backend code in `backend/app/` following the layered architecture: Models, CRUD, Routes, Core, Services, and Celery directories
Applied to files:
backend/app/api/routes/evaluation.py
📚 Learning: 2025-12-17T10:16:25.880Z
Learnt from: nishika26
Repo: ProjectTech4DevAI/kaapi-backend PR: 502
File: backend/app/models/collection.py:29-32
Timestamp: 2025-12-17T10:16:25.880Z
Learning: In backend/app/models/collection.py, the `provider` field indicates the LLM provider name (e.g., "openai"), while `llm_service_name` specifies which particular service from that provider is being used. These fields serve different purposes and are not redundant.
Applied to files:
backend/app/api/routes/evaluation.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.12, 6)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/crud/evaluations/batch.py (1)
104-119: Filter outNonevalues from request body to avoid API errors.The body construction includes fields that may be
None(e.g.,instructions,temperature,reasoning). Passing explicitNonevalues to the OpenAI API may cause errors, as some endpoints don't acceptnullfor optional fields.Proposed fix
batch_request = { "custom_id": item["id"], "method": "POST", "url": "/v1/responses", - "body": { - # Use config as-is - "model": config.model, - "instructions": config.instructions, - "temperature": config.temperature, - "reasoning": {"effort": config.reasoning} if config.reasoning else None, - "tools": config.knowledge_base_ids if config.knowledge_base_ids else [], - "input": question, # Add input from dataset - }, + "body": { + k: v for k, v in { + "model": config.model, + "instructions": config.instructions, + "temperature": config.temperature, + "reasoning": {"effort": config.reasoning} if config.reasoning else None, + "tools": config.knowledge_base_ids if config.knowledge_base_ids else None, + "input": question, + }.items() if v is not None + }, }
🤖 Fix all issues with AI agents
In @backend/app/crud/evaluations/batch.py:
- Around line 108-116: The body currently sets "tools" to
config.knowledge_base_ids (a list of IDs) but OpenAI expects tool objects;
update the body construction in batch.py to use the mapper or construct the
required tool object: call map_kaapi_to_openai_params(config) or transform
config.knowledge_base_ids into [{"type":"file_search","vector_store_ids":
config.knowledge_base_ids, "max_num_results": config.max_num_results or 20}] and
assign that to "tools" so the payload matches the OpenAI Responses API format
(refer to map_kaapi_to_openai_params in backend/app/services/llm/mappers.py and
the body assembly where "tools" is set).
In @backend/app/crud/evaluations/core.py:
- Around line 342-348: The code directly accesses config.completion.params.model
which can raise AttributeError if completion or params are missing; update the
block that follows the config resolution (around the use of eval_run and config)
to defensively validate these nested attributes: verify config has a non-None
completion and that completion has non-None params and a model (e.g., using
explicit checks or getattr with defaults), and if any are missing raise a
ValueError that includes eval_run.id, eval_run.config_id,
eval_run.config_version and a clear message about the missing field(s); ensure
subsequent uses of model assume it is valid.
In @backend/app/services/audio/speech_to_text.py:
- Around line 610-625: The __main__ test block contains developer-specific
hardcoded file paths and should be removed or guarded; delete or replace the if
__name__ == "__main__": section (which calls
transcribe_audio_with_indic_conformer with ai4b_file_path) or wrap it behind a
non-committed debug flag/CLI using argparse or environment variables, or move
the scenario into a proper test/integration test that uses fixtures or temp
files; ensure no absolute local paths remain and any manual-run examples are
documented separately rather than committed in the module.
- Around line 53-66: In the __init__ of the speech-to-text class fix the typo
and clarify the error: replace the incorrect attribute assignment self.provide =
provider with self.provider = provider, and change the API-key missing
ValueError message ("Missing OpenAI API Key for Client STT Client
initialization") to a provider-agnostic message like "Missing API key for
speech-to-text client initialization" so it no longer references only OpenAI;
keep the rest of the provider branching (provider == "openai" / provider ==
"gemini") and the unsupported-provider ValueError as-is.
- Around line 9-13: Remove the duplicate Literal import and modernize typing:
keep only the needed typing imports (e.g., BinaryIO) and remove the second
"Literal" import from the top of speech_to_text.py, then replace all usages of
List[...] and Dict[...] in function signatures, class BaseModel fields, and
return types (e.g., any occurrences in functions like transcribe_* or classes
that reference lists/dicts) with the Python 3.9+ built-in generics list[...] and
dict[...]; ensure annotations still import BinaryIO and, if Literal is required,
import it only once (or use typing.Literal) and run a quick lint/type-check to
confirm no remaining deprecated typing usages.
- Around line 76-92: The code opens audio_file when it's a string but never
closes it; change the handling around isinstance(audio_file, str) to open the
file with a context manager or track the opened file in a local variable and
ensure it's closed in a finally block so that the file handle is always released
even on exceptions; keep using self.openai_client.audio.transcriptions.create
for transcription and return the result, and update the logger calls
(logger.info and logger.error) to prefix messages with the function/method name
(e.g., the enclosing method name) per guidelines so logs include that context.
In @backend/app/services/llm/providers/registry.py:
- Around line 16-23: The registry currently includes the OPENAI constant but
get_llm_provider only recognizes OPENAI_NATIVE, causing a runtime ValueError;
fix by either removing the OPENAI entry from the _registry dict so only
OPENAI_NATIVE maps to OpenAIProvider, or update get_llm_provider to accept the
OPENAI symbol as an alias (handle provider_type == OPENAI by mapping it to
OpenAIProvider) and ensure the registry check and the ValueError logic align;
refer to OPENAI, OPENAI_NATIVE, _registry, and get_llm_provider when making the
change.
🧹 Nitpick comments (4)
backend/app/services/audio/speech_to_text.py (2)
316-323: Usetempfilemodule instead of hardcoded/tmppath.The hardcoded
/tmppath is not portable (e.g., Windows). Consider usingtempfile.NamedTemporaryFilefor cross-platform compatibility.Proposed fix
+import tempfile ... - media_extention = ( + media_extension = ( file_data.media_type.split("/")[-1] if file_data.media_type else ".ogg" ) - temp_file = f"/tmp/{uuid4()}.{media_extention}" + temp_file = tempfile.NamedTemporaryFile( + suffix=f".{media_extension}", delete=False + ).name
52-67: Add type hints to all function parameters and return values.Per coding guidelines, all functions should have type hints. Several functions in this file are missing return type annotations. For example:
Example fixes
-def __init__(self, provider, api_key: str | None = None): +def __init__(self, provider: str, api_key: str | None = None) -> None: -def transcribe_with_openai( +def transcribe_with_openai( self, audio_file: BinaryIO | str, model: str = "gpt-4o-transcribe", prompt: str | None = None, -): +) -> str: -def transcribe_audio_with_chirp_v3(audio_file_path: str): +def transcribe_audio_with_chirp_v3(audio_file_path: str) -> str | None: -def transcribe_audio_with_indic_conformer(audio_file_path: str): +def transcribe_audio_with_indic_conformer(audio_file_path: str) -> str:Apply similar changes to other functions missing return type hints.
backend/app/crud/evaluations/core.py (2)
61-69: Add function name prefix to log messages.Per coding guidelines, log messages should be prefixed with the function name in square brackets.
Proposed fix
except Exception as e: session.rollback() - logger.error(f"Failed to create EvaluationRun: {e}", exc_info=True) + logger.error(f"[create_evaluation_run] Failed to create EvaluationRun: {e}", exc_info=True) raise - logger.info( - f"Created EvaluationRun record: id={eval_run.id}, run_name={run_name}, " + logger.info( + f"[create_evaluation_run] Created EvaluationRun record: id={eval_run.id}, run_name={run_name}, " f"config_id={config_id}, config_version={config_version}" )
308-311: Add return type hint.The function is missing a return type annotation in the signature on line 311.
The signature already shows
-> str:so this appears correct. The function implementation looks good overall.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
backend/app/crud/evaluations/batch.pybackend/app/crud/evaluations/core.pybackend/app/services/audio/speech_to_text.pybackend/app/services/llm/providers/registry.py
🧰 Additional context used
📓 Path-based instructions (2)
backend/app/services/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Implement business logic in services located in
backend/app/services/
Files:
backend/app/services/llm/providers/registry.pybackend/app/services/audio/speech_to_text.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets:logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase
Files:
backend/app/services/llm/providers/registry.pybackend/app/crud/evaluations/batch.pybackend/app/crud/evaluations/core.pybackend/app/services/audio/speech_to_text.py
🧠 Learnings (1)
📚 Learning: 2025-12-17T10:16:25.880Z
Learnt from: nishika26
Repo: ProjectTech4DevAI/kaapi-backend PR: 502
File: backend/app/models/collection.py:29-32
Timestamp: 2025-12-17T10:16:25.880Z
Learning: In backend/app/models/collection.py, the `provider` field indicates the LLM provider name (e.g., "openai"), while `llm_service_name` specifies which particular service from that provider is being used. These fields serve different purposes and are not redundant.
Applied to files:
backend/app/services/llm/providers/registry.py
🧬 Code graph analysis (4)
backend/app/services/llm/providers/registry.py (2)
backend/app/services/llm/providers/base.py (1)
BaseProvider(13-63)backend/app/services/llm/providers/openai.py (1)
OpenAIProvider(21-110)
backend/app/crud/evaluations/batch.py (1)
backend/app/models/llm/request.py (1)
KaapiLLMParams(8-41)
backend/app/crud/evaluations/core.py (5)
backend/app/crud/config/version.py (1)
ConfigVersionCrud(15-142)backend/app/crud/evaluations/langfuse.py (1)
fetch_trace_scores_from_langfuse(318-557)backend/app/models/evaluation.py (1)
EvaluationRun(171-322)backend/app/models/llm/request.py (1)
LLMCallConfig(132-188)backend/app/services/llm/jobs.py (1)
resolve_config_blob(84-116)
backend/app/services/audio/speech_to_text.py (1)
backend/app/core/providers.py (1)
ProviderConfig(18-21)
🪛 Ruff (0.14.10)
backend/app/services/audio/speech_to_text.py
13-13: typing.List is deprecated, use list instead
(UP035)
13-13: typing.Dict is deprecated, use dict instead
(UP035)
13-13: Redefinition of unused Literal from line 9
Remove definition: Literal
(F811)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.12, 6)
🔇 Additional comments (2)
backend/app/crud/evaluations/batch.py (1)
63-65: LGTM on type change.The signature change from
dict[str, Any]toKaapiLLMParamsprovides better type safety and aligns with the PR's goal of using stored configurations.backend/app/crud/evaluations/core.py (1)
18-27: LGTM on signature change.The refactor from
config: dicttoconfig_id: UUID, config_version: intproperly implements the PR objective of using stored configurations instead of raw config dictionaries.
| from typing import BinaryIO, Literal | ||
| from pydantic import BaseModel | ||
| from dotenv import load_dotenv | ||
| from uuid import uuid4 | ||
| from typing import List, Literal, Dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix duplicate import and use modern type hints.
Literal is imported twice (lines 9 and 13), and List/Dict from typing are deprecated in Python 3.9+.
Proposed fix
-from typing import BinaryIO, Literal
+from typing import BinaryIO
from pydantic import BaseModel
from dotenv import load_dotenv
from uuid import uuid4
-from typing import List, Literal, Dict
+from typing import LiteralThen replace List[...] with list[...] and Dict[...] with dict[...] throughout the file.
🧰 Tools
🪛 Ruff (0.14.10)
13-13: typing.List is deprecated, use list instead
(UP035)
13-13: typing.Dict is deprecated, use dict instead
(UP035)
13-13: Redefinition of unused Literal from line 9
Remove definition: Literal
(F811)
🤖 Prompt for AI Agents
In @backend/app/services/audio/speech_to_text.py around lines 9 - 13, Remove the
duplicate Literal import and modernize typing: keep only the needed typing
imports (e.g., BinaryIO) and remove the second "Literal" import from the top of
speech_to_text.py, then replace all usages of List[...] and Dict[...] in
function signatures, class BaseModel fields, and return types (e.g., any
occurrences in functions like transcribe_* or classes that reference
lists/dicts) with the Python 3.9+ built-in generics list[...] and dict[...];
ensure annotations still import BinaryIO and, if Literal is required, import it
only once (or use typing.Literal) and run a quick lint/type-check to confirm no
remaining deprecated typing usages.
| def __init__(self, provider, api_key: str | None = None): | ||
| if api_key is None: | ||
| raise ValueError( | ||
| "Missing OpenAI API Key for Client STT Client initialization" | ||
| ) | ||
| self.provide = provider | ||
| self.openai_client = None | ||
| self.gemini_client = None | ||
| if provider == "openai": | ||
| self.openai_client = OpenAI(api_key=api_key) | ||
| elif provider == "gemini": | ||
| self.gemini_client = genai.Client(api_key=api_key) | ||
| else: | ||
| raise ValueError("This provider is not supported yet.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo and misleading error message in __init__.
- Line 58:
self.provide = providershould beself.provider = provider - Lines 54-57: Error message mentions "OpenAI" but this class supports multiple providers
Proposed fix
def __init__(self, provider, api_key: str | None = None):
if api_key is None:
raise ValueError(
- "Missing OpenAI API Key for Client STT Client initialization"
+ f"Missing API Key for {provider} STT Client initialization"
)
- self.provide = provider
+ self.provider = provider🤖 Prompt for AI Agents
In @backend/app/services/audio/speech_to_text.py around lines 53 - 66, In the
__init__ of the speech-to-text class fix the typo and clarify the error: replace
the incorrect attribute assignment self.provide = provider with self.provider =
provider, and change the API-key missing ValueError message ("Missing OpenAI API
Key for Client STT Client initialization") to a provider-agnostic message like
"Missing API key for speech-to-text client initialization" so it no longer
references only OpenAI; keep the rest of the provider branching (provider ==
"openai" / provider == "gemini") and the unsupported-provider ValueError as-is.
| try: | ||
| # Handle file path vs file object | ||
| if isinstance(audio_file, str): | ||
| audio_file = open(audio_file, "rb") | ||
|
|
||
| transcription = self.openai_client.audio.transcriptions.create( | ||
| model=model, | ||
| file=audio_file, | ||
| # language="hi", | ||
| response_format="text", | ||
| prompt=prompt or DEFAULT_PROMPT, | ||
| ) | ||
| logger.info(f"Successfully transcribed audio using OpenAI model: {model}") | ||
| return transcription | ||
| except Exception as e: | ||
| logger.error(f"OpenAI transcription failed: {str(e)}", exc_info=True) | ||
| raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resource leak: file handle not closed when path is provided.
When audio_file is a string path, the file is opened at line 79 but never closed, causing a resource leak. Also, log messages should be prefixed with the function name per coding guidelines.
Proposed fix
def transcribe_with_openai(
self,
audio_file: BinaryIO | str,
model: str = "gpt-4o-transcribe",
prompt: str | None = None,
):
if self.openai_client is None:
raise ValueError("OpenAI client is not initialized.")
+ should_close = False
try:
# Handle file path vs file object
if isinstance(audio_file, str):
audio_file = open(audio_file, "rb")
+ should_close = True
transcription = self.openai_client.audio.transcriptions.create(
model=model,
file=audio_file,
# language="hi",
response_format="text",
prompt=prompt or DEFAULT_PROMPT,
)
- logger.info(f"Successfully transcribed audio using OpenAI model: {model}")
+ logger.info(f"[transcribe_with_openai] Successfully transcribed audio using OpenAI model: {model}")
return transcription
except Exception as e:
- logger.error(f"OpenAI transcription failed: {str(e)}", exc_info=True)
+ logger.error(f"[transcribe_with_openai] OpenAI transcription failed: {str(e)}", exc_info=True)
raise
+ finally:
+ if should_close and hasattr(audio_file, 'close'):
+ audio_file.close()🤖 Prompt for AI Agents
In @backend/app/services/audio/speech_to_text.py around lines 76 - 92, The code
opens audio_file when it's a string but never closes it; change the handling
around isinstance(audio_file, str) to open the file with a context manager or
track the opened file in a local variable and ensure it's closed in a finally
block so that the file handle is always released even on exceptions; keep using
self.openai_client.audio.transcriptions.create for transcription and return the
result, and update the logger calls (logger.info and logger.error) to prefix
messages with the function/method name (e.g., the enclosing method name) per
guidelines so logs include that context.
| def transcribe_audio_with_indic_conformer(audio_file_path: str): | ||
| indic_conformer_api_url = str(os.getenv("AI4B_STT_URL")) | ||
| with open(audio_file_path, "rb") as file: | ||
| audio_content = file.read() | ||
|
|
||
| response = requests.post( | ||
| url=indic_conformer_api_url, | ||
| data={"language_code": "hi", "decoding_strategy": "ctc"}, | ||
| files={"audio_file": audio_content}, | ||
| ) | ||
| logger.info(response.json()) | ||
| transcription = response.json()["transcription"] | ||
| return transcription |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing timeout and error handling for external API call.
The requests.post call has no timeout, which can cause the application to hang indefinitely if the AI4B service is unresponsive. Additionally, there's no error handling for HTTP errors or JSON parsing failures.
Proposed fix
def transcribe_audio_with_indic_conformer(audio_file_path: str):
indic_conformer_api_url = str(os.getenv("AI4B_STT_URL"))
+ if not indic_conformer_api_url:
+ raise ValueError("AI4B_STT_URL environment variable not set")
+
with open(audio_file_path, "rb") as file:
audio_content = file.read()
response = requests.post(
url=indic_conformer_api_url,
data={"language_code": "hi", "decoding_strategy": "ctc"},
files={"audio_file": audio_content},
+ timeout=60,
)
- logger.info(response.json())
- transcription = response.json()["transcription"]
+ response.raise_for_status()
+ result = response.json()
+ logger.info(f"[transcribe_audio_with_indic_conformer] Response received: {result}")
+ transcription = result["transcription"]
return transcriptionThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
backend/app/crud/evaluations/batch.py (2)
63-91: Docstring is inconsistent with the new parameter type.The
configparameter is nowKaapiLLMParams, but the docstring at line 77 still describes it as "Evaluation configuration dict with OpenAI Responses API parameters." Update the docstring to reflect the typed model and its actual fields.📝 Suggested docstring update
Args: dataset_items: List of dataset items from Langfuse - config: Evaluation configuration dict with OpenAI Responses API parameters. - This config is used as-is in the body, with only "input" being added - from the dataset. Config can include any fields like: - - model (required) - - instructions - - tools - - reasoning - - text - - temperature - - include - etc. + config: KaapiLLMParams containing LLM configuration parameters including + model, instructions, knowledge_base_ids, reasoning, temperature, + and max_num_results. Returns: List of dictionaries (JSONL data)
142-148: Docstring is inconsistent with the new parameter type.The
configparameter docstring at line 147 still describes it as "Evaluation configuration dict" but the type is nowKaapiLLMParams.📝 Suggested docstring update
Args: langfuse: Configured Langfuse client openai_client: Configured OpenAI client session: Database session eval_run: EvaluationRun database object (with run_name, dataset_name, config) - config: Evaluation configuration dict with llm, instructions, vector_store_ids + config: KaapiLLMParams containing LLM configuration parameters
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/crud/evaluations/batch.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets:logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase
Files:
backend/app/crud/evaluations/batch.py
🧬 Code graph analysis (1)
backend/app/crud/evaluations/batch.py (1)
backend/app/models/llm/request.py (1)
KaapiLLMParams(8-41)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.12, 6)
🔇 Additional comments (2)
backend/app/crud/evaluations/batch.py (2)
20-20: LGTM!The import of
KaapiLLMParamsis appropriate and aligns with the refactoring to use typed configuration objects instead of raw dictionaries.
176-182: LGTM!Good use of
model_dump(exclude_none=True)to serialize the config withoutNonevalues. This ensures clean storage of the evaluation configuration.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
backend/app/tests/api/routes/test_evaluation.py (4)
510-540: Critical: Corrupted method definition causing syntax error.This method has a duplicated/malformed signature that's causing the CI pipeline to fail with a Black parsing error. Lines 510-517 start defining the method, but then lines 518-522 appear to be remnants of an old or conflicting definition with type hints.
The method body also has inconsistencies - it references
sample_evaluation_configfixture in the old signature but never uses it.🐛 Proposed fix
def test_start_batch_evaluation_invalid_dataset_id( - self, client, user_api_key_header, db, user_api_key - ): - """Test batch evaluation fails with invalid dataset_id.""" - # Create a valid config to use - config = create_test_config(db, project_id=user_api_key.project.id) - - # Try to start evaluation with non-existent dataset_id self, client: TestClient, user_api_key_header: dict[str, str], - sample_evaluation_config: dict[str, Any], + db: Session, + user_api_key: TestAuthContext, ) -> None: """Test batch evaluation fails with invalid/non-existent dataset_id.""" + # Create a valid config to use + config = create_test_config(db, project_id=user_api_key.project.id) + response = client.post( "/api/v1/evaluations", json={ "experiment_name": "test_evaluation_run", "dataset_id": 99999, # Non-existent "config_id": str(config.id), "config_version": 1, }, headers=user_api_key_header, )
845-854: Inconsistent: Test still uses deprecatedconfigfield.This test (
test_get_evaluation_run_without_trace_info) creates anEvaluationRunwith the oldconfigdict field instead of the newconfig_id/config_versionpattern used in other updated tests. Based on the AI summary and related model changes, the schema now expectsconfig_idandconfig_versionfields.🔧 Suggested fix for consistency
+ # Create a config for the evaluation run + config = create_test_config(db, project_id=user_api_key.project.id) + eval_run = EvaluationRun( run_name="test_simple_run", dataset_name=create_test_dataset.name, dataset_id=create_test_dataset.id, - config={"model": "gpt-4o"}, + config_id=config.id, + config_version=1, status="completed", total_items=3, organization_id=user_api_key.organization_id, project_id=user_api_key.project_id, )
880-888: Inconsistent: Another test using deprecatedconfigfield.Same issue as above -
test_get_evaluation_run_resync_without_trace_info_failsstill uses the oldconfigdict pattern. Should be updated to useconfig_id/config_versionfor consistency with the refactored schema.🔧 Suggested fix for consistency
+ # Create a config for the evaluation run + config = create_test_config(db, project_id=user_api_key.project.id) + eval_run = EvaluationRun( run_name="test_run", dataset_name=create_test_dataset.name, dataset_id=create_test_dataset.id, - config={"model": "gpt-4o"}, + config_id=config.id, + config_version=1, status="completed", total_items=3, organization_id=user_api_key.organization_id, project_id=user_api_key.project_id, )
577-654: Type mismatch: Tests pass dict butbuild_evaluation_jsonlexpectsKaapiLLMParams.The function at
backend/app/crud/evaluations/batch.py(lines 63-65) declaresconfig: KaapiLLMParams, and its implementation (lines 110-122) accesses attributes directly:config.model,config.instructions,config.temperature,config.reasoning,config.knowledge_base_ids,config.max_num_results.Plain dicts do not support attribute access, so these tests would fail at runtime with
AttributeError: 'dict' object has no attribute 'model'. ConstructKaapiLLMParamsinstances instead:from app.models.llm import KaapiLLMParams config = KaapiLLMParams( model="gpt-4o", temperature=0.2, instructions="You are a helpful assistant", )
🧹 Nitpick comments (1)
backend/app/tests/api/routes/test_evaluation.py (1)
542-572: Missing type hints on method parameters.Per coding guidelines, all function parameters and return values should have type hints. The
clientanduser_api_key_headerparameters are missing type annotations.Also, the test docstring says "model is missing from config" but the test now validates config-not-found since it uses
config_id/config_version. Consider updating the docstring to reflect the actual test behavior.♻️ Suggested improvement
def test_start_batch_evaluation_missing_model( - self, client: TestClient, user_api_key_header: dict[str, str] + self, + client: TestClient, + user_api_key_header: dict[str, str], ) -> None: - """Test batch evaluation fails when model is missing from config.""" - # We don't need a real dataset for this test - the validation should happen - # before dataset lookup. Use any dataset_id and expect config validation error - invalid_config = { - "instructions": "You are a helpful assistant", - "temperature": 0.5, - } - + """Test batch evaluation fails when config_id references non-existent config.""" response = client.post( "/api/v1/evaluations", json={ "experiment_name": "test_no_config", "dataset_id": 1, # Dummy ID, config validation happens first "config_id": str(uuid4()), # Non-existent config "config_version": 1, }, headers=user_api_key_header, )Note: The
invalid_configvariable on lines 548-551 is now dead code and should be removed.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/tests/api/routes/test_evaluation.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets:logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase
Files:
backend/app/tests/api/routes/test_evaluation.py
backend/app/tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use factory pattern for test fixtures in
backend/app/tests/
Files:
backend/app/tests/api/routes/test_evaluation.py
🧬 Code graph analysis (1)
backend/app/tests/api/routes/test_evaluation.py (5)
backend/app/tests/crud/documents/test_doc_transformation_job.py (1)
crud(24-25)backend/app/crud/evaluations/batch.py (1)
build_evaluation_jsonl(63-126)backend/app/models/evaluation.py (2)
EvaluationDataset(74-168)EvaluationRun(171-322)backend/app/tests/utils/test_data.py (1)
create_test_config(238-301)backend/app/tests/utils/document.py (2)
project(65-66)get(118-122)
🪛 GitHub Actions: Kaapi CI
backend/app/tests/api/routes/test_evaluation.py
[error] 519-519: Black formatting failed. Cannot parse: 519:26: client: TestClient,
🔇 Additional comments (3)
backend/app/tests/api/routes/test_evaluation.py (3)
4-4: LGTM: New imports support config-based testing.The additions of
uuid4andcreate_test_configproperly support the refactored test cases that now use stored configs withconfig_id/config_versioninstead of inline config dicts.Also applies to: 12-12
742-758: LGTM: Correctly updated to use config_id/config_version.The test now properly creates a test config and uses
config_idandconfig_versionfields when constructing theEvaluationRun, aligning with the new schema.
782-804: LGTM: Test updated to use config references.Correctly creates a config and uses
config_id/config_versionfields in theEvaluationRunconstruction.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
backend/app/tests/api/routes/test_evaluation.py (3)
839-863: Test uses deprecatedconfigdict field instead ofconfig_id/config_version.The
EvaluationRunmodel no longer has aconfigdict field (per the model definition in relevant snippets showingconfig_id: UUIDandconfig_version: int). This test creates anEvaluationRunwithconfig={"model": "gpt-4o"}which is the old pattern and will likely fail.def test_get_evaluation_run_without_trace_info( self, client: TestClient, user_api_key_header: dict[str, str], db: Session, user_api_key: TestAuthContext, create_test_dataset: EvaluationDataset, ) -> None: """Test getting evaluation run without requesting trace info.""" + # Create a config for the evaluation run + config = create_test_config(db, project_id=user_api_key.project.id) + eval_run = EvaluationRun( run_name="test_simple_run", dataset_name=create_test_dataset.name, dataset_id=create_test_dataset.id, - config={"model": "gpt-4o"}, + config_id=config.id, + config_version=1, status="completed", total_items=3, organization_id=user_api_key.organization_id, project_id=user_api_key.project_id, )
865-902: Same issue: Test uses deprecatedconfigdict field.This test also creates
EvaluationRunwithconfig={"model": "gpt-4o"}which needs to be updated to useconfig_idandconfig_version.def test_get_evaluation_run_resync_without_trace_info_fails( self, client: TestClient, user_api_key_header: dict[str, str], db: Session, user_api_key: TestAuthContext, create_test_dataset: EvaluationDataset, ) -> None: """Test that resync_score=true requires get_trace_info=true.""" + # Create a config for the evaluation run + config = create_test_config(db, project_id=user_api_key.project.id) + eval_run = EvaluationRun( run_name="test_run", dataset_name=create_test_dataset.name, dataset_id=create_test_dataset.id, - config={"model": "gpt-4o"}, + config_id=config.id, + config_version=1, status="completed", total_items=3, organization_id=user_api_key.organization_id, project_id=user_api_key.project_id, )
571-600: Replace dict config withKaapiLLMParamsinstantiation—test will fail at runtime.The test passes a plain dictionary to
build_evaluation_jsonl, which expects aKaapiLLMParamsobject. Since the function accesses config using attribute notation (e.g.,config.model,config.temperature), the test will fail with anAttributeErrorat runtime when trying to access attributes on a dict. UseKaapiLLMParams(model="gpt-4o", temperature=0.2, instructions="You are a helpful assistant")instead, consistent with how other tests in the codebase instantiate this model. This also aligns with the factory pattern requirement for test fixtures.
🤖 Fix all issues with AI agents
In `@backend/app/tests/api/routes/test_evaluation.py`:
- Around line 510-517: The test method
test_start_batch_evaluation_invalid_dataset_id contains a duplicated docstring —
remove the redundant docstring so the method has a single descriptive docstring
(keep one of the two identical strings and delete the other) to eliminate the
editing artifact.
🧹 Nitpick comments (4)
backend/app/crud/evaluations/core.py (2)
308-358: Return type should bestrnotstr | None.The function raises
ValueErrorin all cases where the model would be missing (lines 353-357), so it can never returnNone. The return type annotation should bestrto accurately reflect the contract.def resolve_model_from_config( session: Session, eval_run: EvaluationRun, -) -> str | None: +) -> str:
63-69: Log messages should include function name prefix.Per coding guidelines, log messages should be prefixed with the function name in square brackets. The log statements here and in other CRUD functions (
list_evaluation_runs,get_evaluation_run_by_id,update_evaluation_run) don't follow this pattern.Example fix for this function
except Exception as e: session.rollback() - logger.error(f"Failed to create EvaluationRun: {e}", exc_info=True) + logger.error(f"[create_evaluation_run] Failed to create EvaluationRun: {e}", exc_info=True) raise - logger.info( - f"Created EvaluationRun record: id={eval_run.id}, run_name={run_name}, " - f"config_id={config_id}, config_version={config_version}" - ) + logger.info( + f"[create_evaluation_run] Created EvaluationRun record: id={eval_run.id}, run_name={run_name}, " + f"config_id={config_id}, config_version={config_version}" + )As per coding guidelines, prefix all log messages with the function name in square brackets.
backend/app/api/routes/evaluation.py (1)
501-510: Add defensive None check forconfigbefore accessing.completion.The
elifon line 506 accessesconfig.completion.providerbut only checksif error:. Ifresolve_config_blobreturns(None, None)unexpectedly, this would raiseAttributeError.Suggested fix
if error: raise HTTPException( status_code=400, detail=f"Failed to resolve config from stored config: {error}", ) - elif config.completion.provider != LLMProvider.OPENAI: + if config is None: + raise HTTPException( + status_code=400, + detail="Config resolution returned empty config", + ) + if config.completion.provider != LLMProvider.OPENAI: raise HTTPException( status_code=422, detail="Only 'openai' provider is supported for evaluation configs", )backend/app/tests/api/routes/test_evaluation.py (1)
536-565: Test name and docstring are now misleading.The test
test_start_batch_evaluation_missing_modelwith docstring "Test batch evaluation fails when model is missing from config" no longer tests that scenario. It now tests that a non-existent config_id returns an error. Consider renaming totest_start_batch_evaluation_invalid_configand updating the docstring.- def test_start_batch_evaluation_missing_model( + def test_start_batch_evaluation_invalid_config( self, client: TestClient, user_api_key_header: dict[str, str] ) -> None: - """Test batch evaluation fails when model is missing from config.""" + """Test batch evaluation fails when config_id does not exist."""
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
backend/app/api/routes/evaluation.pybackend/app/crud/evaluations/batch.pybackend/app/crud/evaluations/core.pybackend/app/services/llm/providers/registry.pybackend/app/tests/api/routes/test_evaluation.py
🚧 Files skipped from review as they are similar to previous changes (2)
- backend/app/services/llm/providers/registry.py
- backend/app/crud/evaluations/batch.py
🧰 Additional context used
📓 Path-based instructions (3)
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
backend/app/api/**/*.py: Define FastAPI REST endpoints inbackend/app/api/organized by domain
Load Swagger endpoint descriptions from external markdown files instead of inline strings usingload_description("domain/action.md")
Files:
backend/app/api/routes/evaluation.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets:logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase
Files:
backend/app/api/routes/evaluation.pybackend/app/tests/api/routes/test_evaluation.pybackend/app/crud/evaluations/core.py
backend/app/tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use factory pattern for test fixtures in
backend/app/tests/
Files:
backend/app/tests/api/routes/test_evaluation.py
🧠 Learnings (2)
📚 Learning: 2025-12-17T15:39:30.469Z
Learnt from: CR
Repo: ProjectTech4DevAI/kaapi-backend PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-17T15:39:30.469Z
Learning: Organize backend code in `backend/app/` following the layered architecture: Models, CRUD, Routes, Core, Services, and Celery directories
Applied to files:
backend/app/api/routes/evaluation.py
📚 Learning: 2025-12-17T10:16:25.880Z
Learnt from: nishika26
Repo: ProjectTech4DevAI/kaapi-backend PR: 502
File: backend/app/models/collection.py:29-32
Timestamp: 2025-12-17T10:16:25.880Z
Learning: In backend/app/models/collection.py, the `provider` field indicates the LLM provider name (e.g., "openai"), while `llm_service_name` specifies which particular service from that provider is being used. These fields serve different purposes and are not redundant.
Applied to files:
backend/app/api/routes/evaluation.py
🧬 Code graph analysis (2)
backend/app/tests/api/routes/test_evaluation.py (3)
backend/app/crud/evaluations/batch.py (1)
build_evaluation_jsonl(63-133)backend/app/models/evaluation.py (2)
EvaluationDataset(74-168)EvaluationRun(171-322)backend/app/tests/utils/test_data.py (1)
create_test_config(238-301)
backend/app/crud/evaluations/core.py (3)
backend/app/crud/config/version.py (1)
ConfigVersionCrud(15-142)backend/app/models/llm/request.py (1)
LLMCallConfig(132-188)backend/app/services/llm/jobs.py (1)
resolve_config_blob(84-116)
🔇 Additional comments (7)
backend/app/crud/evaluations/core.py (2)
348-357: Good defensive validation for nested config access.The defensive checks for
config.completion,config.completion.params, andmodeladdress the concerns raised in previous reviews. This ensures proper error messages when the config structure is incomplete.
18-71: LGTM: Function signature update for config management integration.The
create_evaluation_runfunction has been properly updated to acceptconfig_id: UUIDandconfig_version: intinstead of a config dict. Type hints are complete and the docstring reflects the new parameters.backend/app/api/routes/evaluation.py (3)
446-534: LGTM: Evaluation endpoint updated for config management.The
evaluatefunction properly:
- Accepts
config_id: UUIDandconfig_version: intas required body parameters- Resolves the config via
ConfigVersionCrudandresolve_config_blob- Validates the provider is OpenAI
- Passes
config.completion.paramstostart_evaluation_batch, which aligns with theKaapiLLMParamstype expected bybuild_evaluation_jsonl
512-512: Log message follows the coding guidelines.Good use of function name prefix in the log message:
[evaluate] Successfully resolved config from config management.
493-510:LLMProvider.OPENAIconstant exists and is properly implemented.The constant
LLMProvider.OPENAI = "openai"is defined atbackend/app/services/llm/providers/registry.py:16and is already integrated into the provider registry. It is also used in theget_llm_providerfunction at line 66.Additionally, the None-checking for
configat line 506 is safe. Theresolve_config_blobfunction returnstuple[ConfigBlob | None, str | None]where the contract guarantees either(config_blob, None)on success or(None, error_message)on failure. Since line 506 is only reached after theif error:check at line 501 (which raises on any error),configcannot beNoneat that point.Likely an incorrect or invalid review comment.
backend/app/tests/api/routes/test_evaluation.py (2)
510-534: LGTM: Test updated to use config management pattern.The test correctly creates a test config using
create_test_configand passesconfig_idandconfig_versionin the API payload, aligning with the new endpoint contract.
736-752: LGTM: Test properly updated for config management.The test correctly creates a test config and uses
config_idandconfig_versionwhen constructing theEvaluationRun.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| def test_start_batch_evaluation_invalid_dataset_id( | ||
| self, | ||
| client: TestClient, | ||
| user_api_key_header: dict[str, str], | ||
| sample_evaluation_config: dict[str, Any], | ||
| ) -> None: | ||
| self, client, user_api_key_header, db, user_api_key | ||
| ): | ||
| """Test batch evaluation fails with invalid dataset_id.""" | ||
| # Create a valid config to use | ||
| config = create_test_config(db, project_id=user_api_key.project.id) | ||
|
|
||
| """Test batch evaluation fails with invalid/non-existent dataset_id.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate docstring in test method.
Lines 513 and 517 both contain docstrings for the same test method. This appears to be an editing artifact - remove one of them.
def test_start_batch_evaluation_invalid_dataset_id(
self, client, user_api_key_header, db, user_api_key
):
- """Test batch evaluation fails with invalid dataset_id."""
# Create a valid config to use
config = create_test_config(db, project_id=user_api_key.project.id)
"""Test batch evaluation fails with invalid/non-existent dataset_id."""Should be:
def test_start_batch_evaluation_invalid_dataset_id(
self, client, user_api_key_header, db, user_api_key
):
"""Test batch evaluation fails with invalid/non-existent dataset_id."""
# Create a valid config to use
config = create_test_config(db, project_id=user_api_key.project.id)🤖 Prompt for AI Agents
In `@backend/app/tests/api/routes/test_evaluation.py` around lines 510 - 517, The
test method test_start_batch_evaluation_invalid_dataset_id contains a duplicated
docstring — remove the redundant docstring so the method has a single
descriptive docstring (keep one of the two identical strings and delete the
other) to eliminate the editing artifact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
backend/app/tests/api/routes/test_evaluation.py (6)
551-580: Remove unusedinvalid_configvariable.The
invalid_configdict at lines 557-560 is defined but never used. The test now usesuuid4()for a non-existent config_id instead.🧹 Suggested fix
def test_start_batch_evaluation_missing_model( self, client: TestClient, user_api_key_header: dict[str, str] ) -> None: """Test batch evaluation fails when model is missing from config.""" # We don't need a real dataset for this test - the validation should happen # before dataset lookup. Use any dataset_id and expect config validation error - invalid_config = { - "instructions": "You are a helpful assistant", - "temperature": 0.5, - } response = client.post(
601-631: Tests will fail:build_evaluation_jsonlnow expectsKaapiLLMParams, not dict.All tests in
TestBatchEvaluationJSONLBuildingpass dict configs tobuild_evaluation_jsonl, but the function signature changed to acceptKaapiLLMParams. These tests will raiseAttributeErrorwhen accessingconfig.model,config.instructions, etc.🔧 Suggested fix for test_build_batch_jsonl_basic
+from app.models.llm.request import KaapiLLMParams + class TestBatchEvaluationJSONLBuilding: """Test JSONL building logic for batch evaluation.""" def test_build_batch_jsonl_basic(self) -> None: """Test basic JSONL building with minimal config.""" dataset_items = [ { "id": "item1", "input": {"question": "What is 2+2?"}, "expected_output": {"answer": "4"}, "metadata": {}, } ] - config = { - "model": "gpt-4o", - "temperature": 0.2, - "instructions": "You are a helpful assistant", - } + config = KaapiLLMParams( + model="gpt-4o", + temperature=0.2, + instructions="You are a helpful assistant", + ) jsonl_data = build_evaluation_jsonl(dataset_items, config)Similar changes are needed for all other tests in this class.
632-659: Test usestoolskey which doesn't exist inKaapiLLMParams.This test assumes a
toolsfield in the config, butKaapiLLMParamsusesknowledge_base_idsinstead. The function internally convertsknowledge_base_idsto thetoolsformat for OpenAI. Update to useKaapiLLMParamswithknowledge_base_ids.🔧 Suggested fix
def test_build_batch_jsonl_with_tools(self) -> None: """Test JSONL building with tools configuration.""" dataset_items = [ { "id": "item1", "input": {"question": "Search the docs"}, "expected_output": {"answer": "Answer from docs"}, "metadata": {}, } ] - config = { - "model": "gpt-4o-mini", - "instructions": "Search documents", - "tools": [ - { - "type": "file_search", - "vector_store_ids": ["vs_abc123"], - } - ], - } + config = KaapiLLMParams( + model="gpt-4o-mini", + instructions="Search documents", + knowledge_base_ids=["vs_abc123"], + ) jsonl_data = build_evaluation_jsonl(dataset_items, config) assert len(jsonl_data) == 1 request = jsonl_data[0] assert request["body"]["tools"][0]["type"] == "file_search" assert "vs_abc123" in request["body"]["tools"][0]["vector_store_ids"]
661-710: Update remaining tests to useKaapiLLMParams.
test_build_batch_jsonl_minimal_configandtest_build_batch_jsonl_skips_empty_questionsalso pass dict configs and need similar updates.🔧 Suggested fixes
def test_build_batch_jsonl_minimal_config(self) -> None: """Test JSONL building with minimal config (only model required).""" dataset_items = [...] - config = {"model": "gpt-4o"} # Only model provided + config = KaapiLLMParams(model="gpt-4o") jsonl_data = build_evaluation_jsonl(dataset_items, config)def test_build_batch_jsonl_skips_empty_questions(self) -> None: """Test that items with empty questions are skipped.""" dataset_items = [...] - config = {"model": "gpt-4o", "instructions": "Test"} + config = KaapiLLMParams(model="gpt-4o", instructions="Test") jsonl_data = build_evaluation_jsonl(dataset_items, config)
712-736: Updatetest_build_batch_jsonl_multiple_itemsto useKaapiLLMParams.🔧 Suggested fix
def test_build_batch_jsonl_multiple_items(self) -> None: """Test JSONL building with multiple items.""" dataset_items = [...] - config = { - "model": "gpt-4o", - "instructions": "Answer questions", - } + config = KaapiLLMParams( + model="gpt-4o", + instructions="Answer questions", + ) jsonl_data = build_evaluation_jsonl(dataset_items, config)
869-877: Update tests to useconfig_idandconfig_versioninstead ofconfigdict.The
EvaluationRunmodel does not have aconfigfield—it hasconfig_id(UUID) andconfig_version(int). The test initializesEvaluationRunwithconfig={"model": "gpt-4o"}, which will be rejected by the model. Update lines 869-877 and the similar test at lines 904-912 to useconfig_idandconfig_versionto match the pattern in lines 766-778 and 806-827.
♻️ Duplicate comments (1)
backend/app/tests/api/routes/test_evaluation.py (1)
525-549: Duplicate docstring in test method.Lines 528 and 532 both contain docstrings for the same test method.
🔧 Suggested fix
def test_start_batch_evaluation_invalid_dataset_id( self, client, user_api_key_header, db, user_api_key ): """Test batch evaluation fails with invalid dataset_id.""" # Create a valid config to use config = create_test_config(db, project_id=user_api_key.project.id) - """Test batch evaluation fails with invalid/non-existent dataset_id.""" response = client.post(
🧹 Nitpick comments (4)
backend/app/crud/evaluations/__init__.py (1)
40-70: Inconsistency:save_scoreandfetch_trace_scores_from_langfuseimported but not in__all__.Two functions are imported at the module level (lines 9 and 29) but omitted from
__all__:
save_score(fromcore)fetch_trace_scores_from_langfuse(fromlangfuse)If these are intended as internal-only, consider removing them from the top-level imports to avoid confusion. Otherwise, add them to
__all__for consistency.Option A: Add missing exports to `__all__`
__all__ = [ # Core "create_evaluation_run", "get_evaluation_run_by_id", "list_evaluation_runs", "resolve_model_from_config", + "save_score", # Cron "process_all_pending_evaluations", "process_all_pending_evaluations_sync", ... # Langfuse "create_langfuse_dataset_run", + "fetch_trace_scores_from_langfuse", "update_traces_with_cosine_scores", "upload_dataset_to_langfuse", ]Option B: Remove unused top-level imports
from app.crud.evaluations.core import ( create_evaluation_run, get_evaluation_run_by_id, list_evaluation_runs, resolve_model_from_config, - save_score, ) ... from app.crud.evaluations.langfuse import ( create_langfuse_dataset_run, - fetch_trace_scores_from_langfuse, update_traces_with_cosine_scores, upload_dataset_to_langfuse, )backend/app/crud/evaluations/embeddings.py (1)
365-366: Stale comment: model is hard-coded, not read from config.The comment on line 365 states "Get embedding model from config" but the code now hard-codes the value directly. Consider updating the comment to reflect the intentional hard-coding, or remove the comment entirely since the assignment is self-explanatory.
📝 Suggested comment fix
- # Get embedding model from config (default: text-embedding-3-large) + # Use fixed embedding model for similarity scoring embedding_model = "text-embedding-3-large"backend/app/crud/evaluations/batch.py (2)
74-86: Docstring references fields not present inKaapiLLMParams.The docstring mentions config fields like
tools,text, andinclude, butKaapiLLMParams(per the relevant snippet) only has:model,instructions,knowledge_base_ids,reasoning,temperature, andmax_num_results. Update the docstring to reflect the actualKaapiLLMParamsstructure.📝 Suggested docstring update
Args: dataset_items: List of dataset items from Langfuse - config: Evaluation configuration dict with OpenAI Responses API parameters. - This config is used as-is in the body, with only "input" being added - from the dataset. Config can include any fields like: - - model (required) - - instructions - - tools - - reasoning - - text - - temperature - - include - etc. + config: KaapiLLMParams containing evaluation configuration: + - model (required): Model identifier + - instructions: System instructions + - knowledge_base_ids: Vector store IDs for file_search tools + - reasoning: Reasoning effort level + - temperature: Sampling temperature + - max_num_results: Max results for file search Returns: List of dictionaries (JSONL data)
148-153: Update docstring to reflectKaapiLLMParamstype.The docstring still describes
configas "Evaluation configuration dict with llm, instructions, vector_store_ids" but the parameter is nowKaapiLLMParams.📝 Suggested fix
session: Database session eval_run: EvaluationRun database object (with run_name, dataset_name, config) - config: Evaluation configuration dict with llm, instructions, vector_store_ids + config: KaapiLLMParams containing model, instructions, and other LLM parameters
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
backend/app/crud/evaluations/__init__.pybackend/app/crud/evaluations/batch.pybackend/app/crud/evaluations/embeddings.pybackend/app/crud/evaluations/processing.pybackend/app/tests/api/routes/test_evaluation.py
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/crud/evaluations/processing.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets:logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase
Files:
backend/app/crud/evaluations/embeddings.pybackend/app/crud/evaluations/batch.pybackend/app/tests/api/routes/test_evaluation.pybackend/app/crud/evaluations/__init__.py
backend/app/tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use factory pattern for test fixtures in
backend/app/tests/
Files:
backend/app/tests/api/routes/test_evaluation.py
🧬 Code graph analysis (2)
backend/app/crud/evaluations/batch.py (1)
backend/app/models/llm/request.py (1)
KaapiLLMParams(8-41)
backend/app/tests/api/routes/test_evaluation.py (2)
backend/app/models/evaluation.py (2)
EvaluationDataset(74-168)EvaluationRun(171-322)backend/app/tests/utils/test_data.py (1)
create_test_config(238-301)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.12, 6)
🔇 Additional comments (4)
backend/app/crud/evaluations/__init__.py (1)
4-10: LGTM on the newresolve_model_from_configimport.The new function is properly imported from
coreand correctly added to__all__, making it part of the module's public API.backend/app/crud/evaluations/batch.py (1)
103-128: LGTM! Body building logic correctly handles None values.The conditional field inclusion properly addresses the previous review feedback about not sending
Nonevalues to the OpenAI API. Fields are only added when they have valid values.backend/app/tests/api/routes/test_evaluation.py (2)
766-778: LGTM! Test correctly usesconfig_idandconfig_version.The test properly creates a config using
create_test_configand references it viaconfig_idandconfig_versionon theEvaluationRun, aligning with the new config management pattern.
806-827: LGTM! Consistent config reference pattern.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
…aapiLLMParams for jsonl bu ild func
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
backend/app/tests/api/routes/test_evaluation.py (4)
644-660: Type mismatch:build_evaluation_jsonlexpectsKaapiLLMParams, notdict.The function signature is
build_evaluation_jsonl(dataset_items: list[dict[str, Any]], config: KaapiLLMParams). This test passes a plain dict, which will cause a runtime error when the function accessesconfig.model,config.instructions, etc. as attributes.Proposed fix using KaapiLLMParams
- config = { - "model": "gpt-4o-mini", - "instructions": "Search documents", - "tools": [ - { - "type": "file_search", - "vector_store_ids": ["vs_abc123"], - } - ], - } + config = KaapiLLMParams( + model="gpt-4o-mini", + instructions="Search documents", + knowledge_base_ids=["vs_abc123"], + )Note: Based on the
build_evaluation_jsonlimplementation, tools are constructed fromknowledge_base_ids, not passed directly. You may need to adjust the test assertions accordingly.
870-882:EvaluationRunno longer has aconfigfield - useconfig_idandconfig_version.Based on the
EvaluationRunmodel, theconfigdict field has been replaced withconfig_id(UUID) andconfig_version(int). This test will fail at runtime because it's setting a non-existent field.Proposed fix
+ # Create a config for the evaluation run + config = create_test_config(db, project_id=user_api_key.project.id) + eval_run = EvaluationRun( run_name="test_simple_run", dataset_name=create_test_dataset.name, dataset_id=create_test_dataset.id, - config={"model": "gpt-4o"}, + config_id=config.id, + config_version=1, status="completed", total_items=3, organization_id=user_api_key.organization_id, project_id=user_api_key.project_id, )
905-917: Same issue:configfield doesn't exist onEvaluationRun.This test also uses the deprecated
configdict field instead ofconfig_id/config_version.Proposed fix
+ # Create a config for the evaluation run + config = create_test_config(db, project_id=user_api_key.project.id) + eval_run = EvaluationRun( run_name="test_run", dataset_name=create_test_dataset.name, dataset_id=create_test_dataset.id, - config={"model": "gpt-4o"}, + config_id=config.id, + config_version=1, status="completed", total_items=3, organization_id=user_api_key.organization_id, project_id=user_api_key.project_id, )
656-660: Test setup and assertions need updating for KaapiLLMParams structure.The test setup passes a plain dict with pre-formed
tools, butbuild_evaluation_jsonlexpects aKaapiLLMParamsobject withknowledge_base_idsfield. Update the config to useknowledge_base_idsinstead of pre-formed tools, and add assertions for themax_num_resultsfield that is automatically included in the generated tools array (defaults to 20 if not specified).Example correction for test setup:
config = KaapiLLMParams( model="gpt-4o-mini", instructions="Search documents", knowledge_base_ids=["vs_abc123"], ) jsonl_data = build_evaluation_jsonl(dataset_items, config) assert len(jsonl_data) == 1 request = jsonl_data[0] assert request["body"]["tools"][0]["type"] == "file_search" assert "vs_abc123" in request["body"]["tools"][0]["vector_store_ids"] assert request["body"]["tools"][0]["max_num_results"] == 20
♻️ Duplicate comments (1)
backend/app/tests/api/routes/test_evaluation.py (1)
526-533: Duplicate docstring in test method.Lines 529 and 533 both contain docstrings for the same test method. This appears to be an editing artifact from the refactor - remove one of them.
Proposed fix
def test_start_batch_evaluation_invalid_dataset_id( self, client, user_api_key_header, db, user_api_key ): - """Test batch evaluation fails with invalid dataset_id.""" # Create a valid config to use config = create_test_config(db, project_id=user_api_key.project.id) """Test batch evaluation fails with invalid/non-existent dataset_id."""Should become:
def test_start_batch_evaluation_invalid_dataset_id( self, client, user_api_key_header, db, user_api_key ): """Test batch evaluation fails with invalid/non-existent dataset_id.""" # Create a valid config to use config = create_test_config(db, project_id=user_api_key.project.id)
🧹 Nitpick comments (1)
backend/app/tests/api/routes/test_evaluation.py (1)
12-15: Duplicate import ofcreate_test_config.
create_test_configis imported twice - once on line 13 and implicitly available from line 15's import oftest_datamodule. Remove the redundant import.Proposed fix
from app.models import EvaluationDataset, EvaluationRun from app.models.llm.request import KaapiLLMParams -from app.tests.utils.test_data import create_test_config from app.tests.utils.auth import TestAuthContext -from app.tests.utils.test_data import create_test_evaluation_dataset +from app.tests.utils.test_data import create_test_config, create_test_evaluation_dataset
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backend/app/tests/api/routes/test_evaluation.pybackend/app/tests/crud/evaluations/test_processing.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets:logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase
Files:
backend/app/tests/crud/evaluations/test_processing.pybackend/app/tests/api/routes/test_evaluation.py
backend/app/tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use factory pattern for test fixtures in
backend/app/tests/
Files:
backend/app/tests/crud/evaluations/test_processing.pybackend/app/tests/api/routes/test_evaluation.py
🧬 Code graph analysis (1)
backend/app/tests/api/routes/test_evaluation.py (3)
backend/app/crud/evaluations/batch.py (1)
build_evaluation_jsonl(62-132)backend/app/models/evaluation.py (2)
EvaluationDataset(74-168)EvaluationRun(171-322)backend/app/models/llm/request.py (1)
KaapiLLMParams(8-41)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.12, 6)
🔇 Additional comments (11)
backend/app/tests/crud/evaluations/test_processing.py (6)
279-295: LGTM! Test fixture correctly updated to use config_id and config_version.The
eval_run_with_batchfixture properly creates an evaluation run with the new config reference fields. The hardcoded UUID is acceptable for test purposes.
401-410: LGTM! Consistent config reference usage in test.The test for processing evaluation without batch_job_id correctly uses the same config_id/config_version pattern.
469-485: LGTM! Embedding batch test fixture updated correctly.The
eval_run_with_embedding_batchfixture follows the same pattern as other fixtures.
618-632: LGTM! Check and process evaluation test updated.The test correctly creates an evaluation run with config references for testing completed batch processing.
681-695: LGTM! Failed batch test fixture updated.Consistent with other test fixtures in the file.
781-795: LGTM! Poll pending evaluations test updated.The final test fixture correctly uses config_id/config_version for testing poll functionality.
backend/app/tests/api/routes/test_evaluation.py (5)
602-631: LGTM! Basic JSONL building test correctly usesKaapiLLMParams.The test properly constructs a
KaapiLLMParamsinstance and validates the output structure.
662-680: LGTM! Minimal config test usesKaapiLLMParamscorrectly.Good coverage of the minimal required configuration.
682-711: LGTM! Empty questions skip test usesKaapiLLMParamscorrectly.Proper testing of edge case handling.
713-738: LGTM! Multiple items test usesKaapiLLMParamscorrectly.Good coverage of batch processing multiple items.
758-796: LGTM! Trace info tests properly updated to use config references.Both
test_get_evaluation_run_trace_info_not_completedandtest_get_evaluation_run_trace_info_completedcorrectly create a test config and useconfig_id/config_version.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@backend/app/tests/crud/evaluations/test_processing.py`:
- Around line 356-359: Remove the debug print statements that output Result and
mock call counts (the three print(...) lines referencing
mock_download.call_count, mock_fetch_dataset.call_count, and
mock_create_langfuse.call_count) from the test; the existing assertions already
verify behavior so delete those prints and keep the assertions unchanged to
avoid noisy test output.
♻️ Duplicate comments (1)
backend/app/tests/api/routes/test_evaluation.py (1)
526-533: Duplicate docstring still present.Lines 529 and 533 both contain docstrings for the same test method. This was flagged in a previous review and remains unresolved.
Proposed fix
def test_start_batch_evaluation_invalid_dataset_id( self, client, user_api_key_header, db, user_api_key ): - """Test batch evaluation fails with invalid dataset_id.""" + """Test batch evaluation fails with invalid/non-existent dataset_id.""" # Create a valid config to use config = create_test_config(db, project_id=user_api_key.project.id) - """Test batch evaluation fails with invalid/non-existent dataset_id.""" response = client.post(
🧹 Nitpick comments (3)
backend/app/tests/api/routes/test_evaluation.py (3)
558-561: Remove unused variableinvalid_config.This dictionary is defined but never used. It appears to be a leftover from when the test passed an inline config instead of referencing a stored config via
config_id.Proposed fix
def test_start_batch_evaluation_missing_model( self, client: TestClient, user_api_key_header: dict[str, str] ) -> None: """Test batch evaluation fails when model is missing from config.""" - # We don't need a real dataset for this test - the validation should happen - # before dataset lookup. Use any dataset_id and expect config validation error - invalid_config = { - "instructions": "You are a helpful assistant", - "temperature": 0.5, - } - response = client.post(
633-661: Commented test contains syntax errors.When this test is uncommented, it will fail because:
- Lines 645-654 use dict literal syntax inside
KaapiLLMParams()constructor instead of keyword argumentsKaapiLLMParamsdoesn't have atoolsparameter—it usesknowledge_base_idswhich maps tofile_searchtools internally (perbuild_evaluation_jsonl)If you plan to re-enable this test, consider:
config = KaapiLLMParams( model="gpt-4o-mini", instructions="Search documents", knowledge_base_ids=["vs_abc123"], # This becomes file_search tool )
871-879: Consider migrating remaining tests to use stored config.This test still uses the embedded
config={"model": "gpt-4o"}pattern while other tests in this PR have been updated to useconfig_id/config_version. Per the AI summary, this is pending migration work.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backend/app/tests/api/routes/test_evaluation.pybackend/app/tests/crud/evaluations/test_processing.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets:logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase
Files:
backend/app/tests/crud/evaluations/test_processing.pybackend/app/tests/api/routes/test_evaluation.py
backend/app/tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use factory pattern for test fixtures in
backend/app/tests/
Files:
backend/app/tests/crud/evaluations/test_processing.pybackend/app/tests/api/routes/test_evaluation.py
🧬 Code graph analysis (2)
backend/app/tests/crud/evaluations/test_processing.py (2)
backend/app/tests/utils/test_data.py (1)
create_test_config(238-301)backend/app/crud/evaluations/core.py (1)
create_evaluation_run(18-71)
backend/app/tests/api/routes/test_evaluation.py (4)
backend/app/crud/evaluations/batch.py (1)
build_evaluation_jsonl(62-132)backend/app/models/evaluation.py (2)
EvaluationDataset(74-168)EvaluationRun(171-322)backend/app/models/llm/request.py (1)
KaapiLLMParams(8-41)backend/app/tests/utils/test_data.py (1)
create_test_config(238-301)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.12, 6)
🔇 Additional comments (14)
backend/app/tests/crud/evaluations/test_processing.py (7)
16-16: LGTM!The import of
create_test_configfollows the factory pattern for test fixtures as per coding guidelines.
259-296: LGTM!The fixture correctly uses the new
config_id/config_versionpattern and follows the factory approach with proper type hints.
401-430: LGTM!Test correctly updated to use the new config management pattern.
454-492: LGTM!Fixture correctly updated to use config management pattern consistent with other test fixtures.
594-657: LGTM!Test correctly updated to use the new config management pattern with proper fixture setup.
659-721: LGTM!Test correctly updated for the failed batch scenario with new config management pattern.
759-821: LGTM!Test correctly updated to use the new config management pattern for polling pending evaluations.
backend/app/tests/api/routes/test_evaluation.py (7)
4-15: LGTM!Imports are correctly added and all are utilized in the test file.
613-631: LGTM!Test correctly migrated from dict to
KaapiLLMParams. Assertions properly verify the JSONL structure.
674-681: LGTM!Minimal config test correctly uses
KaapiLLMParamswith only the requiredmodelparameter.
706-712: LGTM!Test correctly validates that items with empty or missing questions are filtered out.
726-738: LGTM!Test properly validates batch processing of multiple items with
KaapiLLMParams.
768-784: LGTM!Test correctly migrated to use stored config reference via
config_idandconfig_version. The factory pattern withcreate_test_configaligns with coding guidelines.
808-827: LGTM!Test correctly uses stored config reference and properly validates the completed evaluation response structure.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| print(f"Result: {result}") | ||
| print(f"Download called: {mock_download.call_count}") | ||
| print(f"Fetch dataset called: {mock_fetch_dataset.call_count}") | ||
| print(f"Create langfuse called: {mock_create_langfuse.call_count}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove debug print statements.
These debug print statements should be removed before merging. The assertions below already validate the expected behavior, making these statements unnecessary test artifacts that pollute output.
🧹 Suggested fix
- print(f"Result: {result}")
- print(f"Download called: {mock_download.call_count}")
- print(f"Fetch dataset called: {mock_fetch_dataset.call_count}")
- print(f"Create langfuse called: {mock_create_langfuse.call_count}")
-
assert result is not None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print(f"Result: {result}") | |
| print(f"Download called: {mock_download.call_count}") | |
| print(f"Fetch dataset called: {mock_fetch_dataset.call_count}") | |
| print(f"Create langfuse called: {mock_create_langfuse.call_count}") | |
| assert result is not None |
🤖 Prompt for AI Agents
In `@backend/app/tests/crud/evaluations/test_processing.py` around lines 356 -
359, Remove the debug print statements that output Result and mock call counts
(the three print(...) lines referencing mock_download.call_count,
mock_fetch_dataset.call_count, and mock_create_langfuse.call_count) from the
test; the existing assertions already verify behavior so delete those prints and
keep the assertions unchanged to avoid noisy test output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/tests/api/routes/test_evaluation.py (1)
552-581: Remove unusedinvalid_configvariable.The
invalid_configdict defined at lines 558-561 is never used - it appears to be leftover from the previous implementation that passed config as a dict. The test now correctly usesconfig_idwith a non-existent UUID.def test_start_batch_evaluation_missing_model( self, client: TestClient, user_api_key_header: dict[str, str] ) -> None: """Test batch evaluation fails when model is missing from config.""" # We don't need a real dataset for this test - the validation should happen # before dataset lookup. Use any dataset_id and expect config validation error - invalid_config = { - "instructions": "You are a helpful assistant", - "temperature": 0.5, - } response = client.post(
♻️ Duplicate comments (2)
backend/app/tests/api/routes/test_evaluation.py (1)
526-541: Duplicate docstring in test method.Lines 529 and 533 both contain docstrings. Line 533 appears to be misplaced - it should be removed as the docstring at line 529 already describes the test purpose.
def test_start_batch_evaluation_invalid_dataset_id( self, client, user_api_key_header, db, user_api_key ): """Test batch evaluation fails with invalid dataset_id.""" # Create a valid config to use config = create_test_config(db, project_id=user_api_key.project.id) - """Test batch evaluation fails with invalid/non-existent dataset_id.""" response = client.post(backend/app/tests/crud/evaluations/test_processing.py (1)
356-359: Remove debug print statements.These debug print statements should be removed before merging. The assertions below already validate the expected behavior, making these statements unnecessary test artifacts that pollute output.
result = await process_completed_evaluation( eval_run=eval_run_with_batch, session=db, openai_client=mock_openai, langfuse=mock_langfuse, ) - print(f"Result: {result}") - print(f"Download called: {mock_download.call_count}") - print(f"Fetch dataset called: {mock_fetch_dataset.call_count}") - print(f"Create langfuse called: {mock_create_langfuse.call_count}") assert result is not None
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backend/app/tests/api/routes/test_evaluation.pybackend/app/tests/crud/evaluations/test_processing.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets:logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase
Files:
backend/app/tests/api/routes/test_evaluation.pybackend/app/tests/crud/evaluations/test_processing.py
backend/app/tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use factory pattern for test fixtures in
backend/app/tests/
Files:
backend/app/tests/api/routes/test_evaluation.pybackend/app/tests/crud/evaluations/test_processing.py
🧬 Code graph analysis (1)
backend/app/tests/crud/evaluations/test_processing.py (2)
backend/app/tests/utils/test_data.py (1)
create_test_config(238-301)backend/app/crud/evaluations/core.py (1)
create_evaluation_run(18-71)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.12, 6)
🔇 Additional comments (17)
backend/app/tests/api/routes/test_evaluation.py (9)
4-4: LGTM!Import of
uuid4is correctly added for generating non-existent config IDs in tests.
12-13: LGTM!Imports for
KaapiLLMParamsandcreate_test_configare properly added to support the config-based evaluation flow.
613-617: LGTM!Correctly migrated to use
KaapiLLMParamsfor typed configuration in JSONL building tests.
644-648: LGTM!Properly uses
KaapiLLMParamswithknowledge_base_idsfor testing tools configuration.
668-668: LGTM!Minimal config test correctly uses
KaapiLLMParamswith only the requiredmodelparameter.
700-700: LGTM!Test correctly uses
KaapiLLMParamsfor the empty questions skipping test.
720-723: LGTM!Multiple items test properly migrated to use
KaapiLLMParams.
762-770: LGTM!Test correctly creates a config via
create_test_configand usesconfig_id/config_versionwhen constructing theEvaluationRuninstance.
802-810: LGTM!Properly updated to use config-based approach consistent with the model changes.
backend/app/tests/crud/evaluations/test_processing.py (8)
16-16: LGTM!Import of
create_test_configis correctly added alongside existingcreate_test_evaluation_dataset.
263-263: LGTM!Correctly creates a test config with
use_kaapi_schema=Trueto match the evaluation run requirements.
280-289: LGTM!The
create_evaluation_runcall correctly usesconfig_idandconfig_versionparameters matching the updated function signature fromcore.py.
406-416: LGTM!Test correctly creates config and uses
config_id/config_versionfor the no batch job ID test case.
454-485: LGTM!Fixture properly creates a config and uses the config-based approach for the embedding batch evaluation run.
607-635: LGTM!Test correctly migrated to use config creation and config_id/config_version parameters.
670-699: LGTM!Failed evaluation test properly uses the config-based pattern.
772-800: LGTM!Poll pending evaluations test correctly creates config and uses config_id/config_version.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
…_completed_evaluation_success
Summary
This change refactors the evaluation run process to utilize a stored configuration instead of a configuration dictionary. It introduces fields for
config_id,config_version, andmodelin the evaluation run table, streamlining the evaluation process and improving data integrity.Checklist
Before submitting a pull request, please ensure that you mark these tasks.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Summary by CodeRabbit
New Features
Bug Fixes
Chores
Tests
✏️ Tip: You can customize this high-level summary in your review settings.