-
Notifications
You must be signed in to change notification settings - Fork 8
Evaluation: Refactor #503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluation: Refactor #503
Conversation
📝 WalkthroughWalkthroughSplit evaluation functionality into an Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API as Evaluations API
participant Validator as Validators
participant Service as Dataset Service
participant ObjectStore as Object Store
participant Langfuse as Langfuse
participant DB as Database
Client->>API: POST /evaluations/datasets/ (CSV)
API->>Validator: validate_csv_file(file)
Validator-->>API: csv_content (bytes)
API->>Service: upload_dataset(csv_content, name, ...)
Service->>Service: sanitize_dataset_name / parse_csv_items
Service->>ObjectStore: upload CSV (optional)
ObjectStore-->>Service: object_store_url / error
Service->>Langfuse: upload dataset
Langfuse-->>Service: langfuse_dataset_id
Service->>DB: create EvaluationDataset record
DB-->>Service: EvaluationDataset
Service-->>API: EvaluationDataset
API-->>Client: DatasetUploadResponse
sequenceDiagram
participant Client
participant API as Evaluations API
participant Service as Evaluation Service
participant DB as Database
participant Langfuse as Langfuse
participant Provider as Batch Provider
Client->>API: POST /evaluations/ (dataset_id, config, ...)
API->>Service: start_evaluation(dataset_id, config, ...)
Service->>DB: fetch EvaluationDataset
DB-->>Service: EvaluationDataset
Service->>Service: build_evaluation_config (merge assistant defaults)
Service->>DB: create EvaluationRun record
DB-->>Service: EvaluationRun
Service->>Langfuse: initialize evaluation run
Langfuse-->>Service: langfuse_run_id
Service->>Provider: start batch job
Provider-->>Service: provider_batch_id
Service->>DB: update EvaluationRun with provider ids
Service-->>API: EvaluationRunPublic
API-->>Client: EvaluationRunPublic
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/api/routes/evaluation.py (1)
52-82: Fix event loop blocking: async endpoint calling sync I/O.The endpoint is
async def(required forawait validate_csv_file), but line 72 calls the synchronousupload_datasetservice, which performs blocking I/O operations (database queries, Langfuse API calls, object store uploads). This blocks FastAPI's event loop and degrades concurrency.🔎 Recommended fixes (choose one):
Option 1 (simplest): Run the sync service in a thread pool
+from fastapi.concurrency import run_in_threadpool + async def upload_dataset_endpoint( ... ) -> APIResponse[DatasetUploadResponse]: """Upload an evaluation dataset.""" # Validate and read CSV file csv_content = await validate_csv_file(file) # Upload dataset using service - dataset = upload_dataset( + dataset = await run_in_threadpool( + upload_dataset, session=_session, csv_content=csv_content, dataset_name=dataset_name, description=description, duplication_factor=duplication_factor, organization_id=auth_context.organization.id, project_id=auth_context.project.id, ) return APIResponse.success_response(data=_dataset_to_response(dataset))Option 2: Make the service async (requires refactoring service layer)
This would require making
upload_datasetand its dependencies async, which is a larger refactor but provides better async integration.Note: The other endpoints (
evaluate,get_evaluation_run_status) are correctly defined as sync (def), so FastAPI automatically runs them in a thread pool.
🧹 Nitpick comments (5)
backend/app/services/evaluation/dataset.py (1)
131-138: Consider sanitizing exception details in client-facing error message.Exposing raw exception messages (
{e}) could leak internal implementation details (stack traces, connection strings, etc.) to API clients.🔎 Apply this diff to provide a generic error message:
except Exception as e: logger.error( f"[upload_dataset] Failed to upload dataset to Langfuse | {e}", exc_info=True, ) raise HTTPException( - status_code=500, detail=f"Failed to upload dataset to Langfuse: {e}" + status_code=500, detail="Failed to upload dataset to Langfuse. Please check your Langfuse credentials and try again." )backend/app/services/evaluation/evaluation.py (3)
23-28: Consider more specific type hints forconfigparameter.Using
dictwithout type parameters reduces type safety. As per coding guidelines requiring type hints, consider usingdict[str, Any]for clarity.🔎 Apply this diff:
+from typing import Any + def build_evaluation_config( session: Session, - config: dict, + config: dict[str, Any], assistant_id: str | None, project_id: int, -) -> dict: +) -> dict[str, Any]:
101-109: Consider more specific type hints forconfigparameter here as well.Same suggestion as
build_evaluation_configfor consistency.
313-319: Consider sanitizing exception details in error message.Similar to the dataset service, exposing raw exception messages could leak internal details.
🔎 Apply this diff:
except Exception as e: logger.error( f"[get_evaluation_with_scores] Failed to fetch trace info | " f"evaluation_id={evaluation_id} | error={e}", exc_info=True, ) - return eval_run, f"Failed to fetch trace info from Langfuse: {str(e)}" + return eval_run, "Failed to fetch trace info from Langfuse. Please try again later."backend/app/api/routes/evaluation.py (1)
67-67: Consider enriching docstrings with parameter and behavior details.The added docstrings are present but minimal, mostly restating the function names. For route handlers that delegate to services, consider documenting:
- Key parameter constraints or validation rules
- Important response scenarios (e.g., 404 vs 422)
- Side effects or async behavior notes
Example for
upload_dataset_endpoint:""" Upload an evaluation dataset from a CSV file. Validates CSV format and content, uploads to Langfuse and optional object store, and persists metadata. Runs synchronously in a thread pool to avoid blocking. """This is optional since the Swagger descriptions are loaded from external markdown files.
Also applies to: 96-96, 124-124, 156-156, 200-200, 225-225, 269-269
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
backend/app/api/routes/evaluation.py(10 hunks)backend/app/crud/evaluations/__init__.py(4 hunks)backend/app/services/evaluation/__init__.py(1 hunks)backend/app/services/evaluation/dataset.py(1 hunks)backend/app/services/evaluation/evaluation.py(1 hunks)backend/app/services/evaluation/validators.py(1 hunks)backend/app/tests/api/routes/test_evaluation.py(7 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
backend/app/services/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Implement business logic in services located in
backend/app/services/
Files:
backend/app/services/evaluation/validators.pybackend/app/services/evaluation/dataset.pybackend/app/services/evaluation/__init__.pybackend/app/services/evaluation/evaluation.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets:logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase
Files:
backend/app/services/evaluation/validators.pybackend/app/crud/evaluations/__init__.pybackend/app/services/evaluation/dataset.pybackend/app/services/evaluation/__init__.pybackend/app/services/evaluation/evaluation.pybackend/app/tests/api/routes/test_evaluation.pybackend/app/api/routes/evaluation.py
backend/app/tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use factory pattern for test fixtures in
backend/app/tests/
Files:
backend/app/tests/api/routes/test_evaluation.py
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
backend/app/api/**/*.py: Define FastAPI REST endpoints inbackend/app/api/organized by domain
Load Swagger endpoint descriptions from external markdown files instead of inline strings usingload_description("domain/action.md")
Files:
backend/app/api/routes/evaluation.py
🧬 Code graph analysis (4)
backend/app/crud/evaluations/__init__.py (2)
backend/app/crud/evaluations/core.py (1)
save_score(259-295)backend/app/crud/evaluations/langfuse.py (1)
fetch_trace_scores_from_langfuse(304-543)
backend/app/services/evaluation/__init__.py (3)
backend/app/services/evaluation/dataset.py (1)
upload_dataset(24-163)backend/app/services/evaluation/evaluation.py (2)
build_evaluation_config(23-98)get_evaluation_with_scores(229-329)backend/app/services/evaluation/validators.py (2)
parse_csv_items(118-174)sanitize_dataset_name(23-68)
backend/app/services/evaluation/evaluation.py (7)
backend/app/crud/assistants.py (1)
get_assistant_by_id(19-30)backend/app/crud/evaluations/core.py (3)
create_evaluation_run(14-61)get_evaluation_run_by_id(103-141)save_score(259-295)backend/app/crud/evaluations/langfuse.py (1)
fetch_trace_scores_from_langfuse(304-543)backend/app/crud/evaluations/dataset.py (1)
get_dataset_by_id(107-140)backend/app/crud/evaluations/batch.py (1)
start_evaluation_batch(118-212)backend/app/models/evaluation.py (1)
EvaluationRun(170-319)backend/app/utils.py (2)
get_langfuse_client(212-248)get_openai_client(179-209)
backend/app/api/routes/evaluation.py (3)
backend/app/services/evaluation/evaluation.py (1)
get_evaluation_with_scores(229-329)backend/app/services/evaluation/dataset.py (1)
upload_dataset(24-163)backend/app/services/evaluation/validators.py (1)
validate_csv_file(71-115)
🔇 Additional comments (29)
backend/app/tests/api/routes/test_evaluation.py (7)
56-64: LGTM! Mock paths correctly updated to reflect service layer refactoring.The patch targets are properly updated from
app.api.routes.evaluation.*toapp.services.evaluation.dataset.*, aligning with the new service-oriented architecture.
142-150: Mock paths consistently updated for empty rows test.
188-196: Mock paths consistently updated for default duplication test.
229-237: Mock paths consistently updated for custom duplication test.
270-278: Mock paths consistently updated for description test.
362-370: Mock paths consistently updated for boundary minimum test.
407-410: Mock paths consistently updated for Langfuse configuration failure test.backend/app/crud/evaluations/__init__.py (3)
4-9: LGTM! Properly exportssave_scorefrom core module.The new export aligns with the service layer's need to persist evaluation scores.
26-31: LGTM! Properly exportsfetch_trace_scores_from_langfusefrom langfuse module.This enables the evaluation service to retrieve trace scores via the CRUD layer's public API.
39-70: LGTM!__all__correctly updated with new exports.Both
save_scoreandfetch_trace_scores_from_langfuseare properly listed in the appropriate sections.backend/app/services/evaluation/validators.py (4)
13-20: LGTM! Well-defined security constants.Clear file upload limits with appropriate MIME type allowances including
text/plainfor compatibility.
23-68: LGTM! Well-documented sanitization with comprehensive rules.The function handles edge cases properly, includes clear examples in the docstring, and follows coding guidelines with type hints.
100-115: LGTM! Thorough file size validation.Properly seeks to end for size check and resets position before reading.
118-174: LGTM! Robust CSV parsing with case-insensitive headers.Good approach to normalize headers while preserving original column names for row access. Properly filters empty rows and re-raises
HTTPExceptionwithout wrapping.backend/app/services/evaluation/dataset.py (4)
24-32: LGTM! Well-typed function signature.All parameters have proper type hints including the union type for optional
description. As per coding guidelines, type hints are correctly applied.
58-74: LGTM! Proper sanitization with informative logging.Logging follows the
[function_name]prefix convention as per coding guidelines.
86-108: LGTM! Graceful degradation for object store failures.Non-fatal handling allows the workflow to continue when object storage is unavailable, which is appropriate for optional storage backends.
140-163: LGTM! Clean metadata construction and database persistence.The metadata structure is well-organized and the final logging confirms successful creation.
backend/app/services/evaluation/__init__.py (1)
1-32: LGTM! Clean public API surface definition.Well-organized exports with clear categorization. The
__all__list provides a clear contract for the evaluation services package.backend/app/services/evaluation/evaluation.py (8)
47-86: LGTM! Robust assistant config merging logic.Properly handles the merge precedence where provided config values override assistant values, and conditionally adds tools when vector stores are available.
88-98: LGTM! Proper validation for direct config usage.Clear error message when model is missing and no assistant_id is provided.
142-170: LGTM! Thorough dataset validation with clear error messages.Properly validates both dataset existence and Langfuse ID presence before proceeding.
179-200: LGTM! Clean client initialization and evaluation run creation.Follows the documented workflow steps clearly.
229-236: LGTM! Well-typed return signature.The
tuple[EvaluationRun | None, str | None]return type clearly communicates the possible outcomes.
276-286: LGTM! Proper caching logic for trace scores.Correctly checks both completion status and existing cached scores before fetching from Langfuse.
321-329: LGTM! Clean score persistence flow.Uses the separate-session
save_scorefunction correctly to persist scores after the primary fetch operation.
219-226: The error handling behavior is correct and intentional.start_evaluation_batchproperly setsstatus = "failed"with anerror_messagebefore re-raising the exception. The exception is then caught instart_evaluation, andsession.refresh(eval_run)retrieves the updated failure state from the database. The approach is sound—error details are persisted before exception propagation, ensuring the evaluation run maintains failure context when returned to the caller.backend/app/api/routes/evaluation.py (2)
1-23: LGTM: Clean service-oriented refactoring.The refactoring to delegate business logic to service modules improves separation of concerns and maintainability. The route handlers are now thin wrappers focused on HTTP concerns.
294-295: Graceful degradation pattern is intentional and well-designed.Lines 294-295 correctly use
APIResponse.failure_response(), which is explicitly designed to support returning data alongside errors. Thefailure_response()method signature includes an optionaldata: Optional[T] = Noneparameter, and the response is constructed with bothsuccess=False,error=error_message, anddata=data. This pattern is appropriate for cases where the evaluation run exists but additional enrichment (like trace info) fails, allowing clients to receive partial data along with error context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/api/routes/evaluation.py (1)
51-81: Add logging to endpoint and address blocking I/O pattern.Two issues found:
Missing logging: This endpoint lacks logging despite other endpoints in the file logging operations with
[function_name]prefix. Per coding guidelines, add a log statement prefixed with[upload_dataset_endpoint].Blocking I/O in async context:
upload_datasetis a synchronous function that performs blocking I/O operations (database writes, object store uploads, Langfuse API calls). Calling it directly in an async endpoint blocks the event loop. Consider wrapping it withasyncio.to_thread()or refactoring the service function to be async.Suggested fix for logging:
async def upload_dataset_endpoint( ... ) -> APIResponse[DatasetUploadResponse]: """Upload an evaluation dataset.""" + logger.info( + f"[upload_dataset_endpoint] Uploading dataset | " + f"name={dataset_name} | org_id={auth_context.organization.id} | " + f"project_id={auth_context.project.id}" + ) # Validate and read CSV file csv_content = await validate_csv_file(file) # Upload dataset using service dataset = upload_dataset(
🧹 Nitpick comments (4)
backend/app/api/routes/evaluation.py (4)
33-43: Add type hint fordatasetparameter.The
datasetparameter is missing a type hint. As per coding guidelines, all function parameters should have type hints.🔎 Suggested fix:
-def _dataset_to_response(dataset) -> DatasetUploadResponse: +def _dataset_to_response(dataset: "Dataset") -> DatasetUploadResponse:You'll need to import the
Datasetmodel or use the appropriate type from your models. If you want to avoid a circular import, you can use a string annotation orTYPE_CHECKINGblock.
89-110: Add logging for list operation.This endpoint lacks logging. Per coding guidelines, consider adding a log statement prefixed with
[list_datasets_endpoint]to track dataset listing operations.🔎 Suggested addition:
def list_datasets_endpoint( ... ) -> APIResponse[list[DatasetUploadResponse]]: """List evaluation datasets.""" + logger.info( + f"[list_datasets_endpoint] Listing datasets | " + f"org_id={auth_context.organization.id} | " + f"project_id={auth_context.project.id} | limit={limit} | offset={offset}" + ) # Enforce maximum limit if limit > 100: limit = 100
186-210: Add logging for evaluation start.This endpoint lacks logging for the evaluation start operation. Starting an evaluation is a significant action that should be logged for observability and debugging purposes.
🔎 Suggested addition:
def evaluate( ... ) -> APIResponse[EvaluationRunPublic]: """Start an evaluation run.""" + logger.info( + f"[evaluate] Starting evaluation | dataset_id={dataset_id} | " + f"experiment_name={experiment_name} | assistant_id={assistant_id} | " + f"org_id={auth_context.organization.id} | project_id={auth_context.project.id}" + ) eval_run = start_evaluation(
247-296: Add logging for evaluation status fetch.This endpoint lacks logging. Consider adding a log statement prefixed with
[get_evaluation_run_status]for consistency with other endpoints likeget_dataset.🔎 Suggested addition:
def get_evaluation_run_status( ... ) -> APIResponse[EvaluationRunPublic]: """Get evaluation run status with optional trace info.""" + logger.info( + f"[get_evaluation_run_status] Fetching evaluation | id={evaluation_id} | " + f"get_trace_info={get_trace_info} | resync_score={resync_score} | " + f"org_id={auth_context.organization.id}" + ) if resync_score and not get_trace_info:
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/api/routes/evaluation.py(10 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
backend/app/api/**/*.py: Define FastAPI REST endpoints inbackend/app/api/organized by domain
Load Swagger endpoint descriptions from external markdown files instead of inline strings usingload_description("domain/action.md")
Files:
backend/app/api/routes/evaluation.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets:logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase
Files:
backend/app/api/routes/evaluation.py
🔇 Additional comments (2)
backend/app/api/routes/evaluation.py (2)
113-142: LGTM!Proper logging with
[get_dataset]prefix, complete type hints, and appropriate error handling for the 404 case.
145-178: LGTM!Proper logging with
[delete_dataset]prefix at entry and exit. Type hints are complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/crud/evaluations/__init__.py (1)
1-37: Add an explicit__all__list to declare the public API.Per PEP 8, modules should explicitly declare the names in their public API using the
__all__attribute. This module currently imports 21+ functions from six submodules without defining__all__, which means all imported names become part of the public API by default. Using__all__allows you to explicitly specify the public interface of your packages and modules and prevents accidental usage of objects that shouldn't be used from outside the module. Define an__all__list after the module docstring containing the functions intended as part of the public API.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backend/app/crud/evaluations/__init__.py(2 hunks)backend/app/services/evaluation/__init__.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/services/evaluation/init.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets:logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase
Files:
backend/app/crud/evaluations/__init__.py
🧬 Code graph analysis (1)
backend/app/crud/evaluations/__init__.py (2)
backend/app/crud/evaluations/core.py (1)
save_score(259-295)backend/app/crud/evaluations/langfuse.py (1)
fetch_trace_scores_from_langfuse(304-543)
🔇 Additional comments (2)
backend/app/crud/evaluations/__init__.py (2)
8-8: LGTM!The import of
save_scoreis correctly added and maintains alphabetical ordering with other imports from the core module.
28-28: LGTM!The import of
fetch_trace_scores_from_langfuseis correctly added and maintains alphabetical ordering with other imports from the langfuse module.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
backend/app/api/routes/evaluation.py (3)
5-36: Critical: Missing imports causing pipeline failure.The code references
require_permissionandPermissionon lines 60, 99, 129, 162, 199, 232, and 262, but these are not imported. This is causing the pytest collection to fail withNameError: name 'require_permission' is not defined.🔎 Add missing imports
from app.api.deps import AuthContextDep, SessionDep +from app.api.deps import require_permission, Permission from app.crud.evaluations import (
43-53: Add type hint fordatasetparameter.The
datasetparameter is missing a type hint, which violates the coding guideline requiring type hints on all function parameters.Based on coding guidelines, all function parameters must have type hints.
🔎 Add type hint
-def _dataset_to_response(dataset) -> DatasetUploadResponse: +def _dataset_to_response(dataset: EvaluationDataset) -> DatasetUploadResponse: """Convert a dataset model to a DatasetUploadResponse."""Note: You may need to import
EvaluationDatasetfrom the appropriate module if not already imported.
101-122: Unify inconsistentauth_contextattribute access patterns across all endpoints.This function uses
auth_context.organization_.idandauth_context.project_.id(non-optional property accessors), while other endpoints likeupload_dataset_endpoint(lines 88-89) andstart_evaluation(lines 221-222) useauth_context.organization.idandauth_context.project.id(optional attribute accessors). All endpoints haveREQUIRE_PROJECTpermission dependencies, ensuring these values are never None; however, the codebase should consistently use the non-optional property accessors (organization_andproject_) for semantic clarity and fail-fast error handling.Update lines 114-115 and other inconsistent endpoints to use the underscore versions for uniformity.
♻️ Duplicate comments (3)
backend/app/api/routes/evaluation.py (3)
131-155: Verify inconsistentauth_contextattribute access.Same issue as in
list_datasets_endpoint: this function usesauth_context.organization_.idandauth_context.project_.id(with underscores) on lines 139-140 and 146-147, while refactored endpoints use the non-underscore version. This inconsistency needs to be resolved.
164-192: Verify inconsistentauth_contextattribute access.Same issue: this function uses
auth_context.organization_.idandauth_context.project_.id(with underscores) on lines 172-173, while refactored endpoints use the non-underscore version.
234-255: Verify inconsistentauth_contextattribute access.Same issue: this function uses
auth_context.organization_.idandauth_context.project_.id(with underscores) on lines 243-244 and 250-251, while refactored endpoints use the non-underscore version.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/api/routes/evaluation.py
🧰 Additional context used
📓 Path-based instructions (2)
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
backend/app/api/**/*.py: Define FastAPI REST endpoints inbackend/app/api/organized by domain
Load Swagger endpoint descriptions from external markdown files instead of inline strings usingload_description("domain/action.md")
Files:
backend/app/api/routes/evaluation.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets:logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase
Files:
backend/app/api/routes/evaluation.py
🧬 Code graph analysis (1)
backend/app/api/routes/evaluation.py (4)
backend/app/services/evaluation/evaluation.py (1)
get_evaluation_with_scores(229-329)backend/app/services/evaluation/dataset.py (1)
upload_dataset(24-163)backend/app/services/evaluation/validators.py (1)
validate_csv_file(71-115)backend/app/utils.py (3)
APIResponse(33-57)success_response(40-43)failure_response(46-57)
🪛 GitHub Actions: Kaapi CI
backend/app/api/routes/evaluation.py
[error] 60-60: NameError: name 'require_permission' is not defined. This occurred during pytest collection while running: 'coverage run --source=app -m pytest'.
🔇 Additional comments (3)
backend/app/api/routes/evaluation.py (3)
62-92: LGTM: Clean service delegation.The refactored endpoint properly delegates CSV validation and dataset upload to the service layer, maintaining clear separation of concerns. The error handling is appropriately managed by the service functions.
201-225: LGTM: Simplified evaluation orchestration.The endpoint correctly delegates evaluation orchestration to
start_evaluation, simplifying the route handler and moving business logic to the service layer as intended by this refactor.
264-312: LGTM: Proper error handling for evaluation scores.The endpoint correctly handles the tuple return from
get_evaluation_with_scoresand appropriately usesfailure_responsewhen an error is present. This maintains a consistent API response structure while delegating the complex Langfuse interaction to the service layer.
…ech4DevAI/kaapi-backend into enhancement/evaluation-refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
backend/app/crud/evaluations/embeddings.py (1)
83-85: Missing function name prefix in log messages.Per coding guidelines, log messages should be prefixed with the function name in square brackets. The current format
"Building embedding JSONL..."should be"[build_embedding_jsonl] Building embedding JSONL...".♻️ Suggested logging format
logger.info( - f"Building embedding JSONL for {len(results)} items with model {embedding_model}" + f"[build_embedding_jsonl] Building JSONL | items={len(results)} | model={embedding_model}" )Similar updates needed for lines 95, 101, 106, and 127.
backend/app/services/evaluations/validators.py (1)
84-90: Consider handlingNonefilename edge case.
UploadFile.filenamecan potentially beNonein some edge cases.Path(None)would raise aTypeError.♻️ Suggested defensive check
# Security validation: Check file extension + if not file.filename: + raise HTTPException( + status_code=422, + detail="File must have a filename", + ) file_ext = Path(file.filename).suffix.lower()backend/app/api/routes/evaluations/dataset.py (1)
94-115: Add type hints for query parameters.Per coding guidelines, all function parameters should have type hints. The
limitandoffsetparameters are missing explicit type annotations in the function signature (they have defaults but no type hints).Suggested fix
def list_datasets_endpoint( _session: SessionDep, auth_context: AuthContextDep, - limit: int = 50, - offset: int = 0, + limit: int = Query(default=50, ge=1, le=100, description="Maximum number of datasets to return"), + offset: int = Query(default=0, ge=0, description="Number of datasets to skip"), ) -> APIResponse[list[DatasetUploadResponse]]:This also adds validation constraints, making the explicit
if limit > 100check on line 102-103 unnecessary.backend/app/api/routes/evaluations/evaluation.py (2)
37-61: Add logging for the evaluate endpoint.Per coding guidelines, log messages should be prefixed with the function name. The
evaluatefunction has no logging, unlike the other endpoints in this file.Suggested addition
def evaluate( _session: SessionDep, auth_context: AuthContextDep, dataset_id: int = Body(..., description="ID of the evaluation dataset"), experiment_name: str = Body( ..., description="Name for this evaluation experiment/run" ), config: dict = Body(default_factory=dict, description="Evaluation configuration"), assistant_id: str | None = Body( None, description="Optional assistant ID to fetch configuration from" ), ) -> APIResponse[EvaluationRunPublic]: """Start an evaluation run.""" + logger.info( + f"[evaluate] Starting evaluation | dataset_id={dataset_id} | " + f"experiment_name={experiment_name} | " + f"org_id={auth_context.organization_.id} | " + f"project_id={auth_context.project_.id}" + ) eval_run = start_evaluation(
70-91: Consider adding validation constraints for pagination parameters.Similar to the dataset listing endpoint,
limitandoffsetlack validation constraints. Consider usingQuerywithge/leconstraints for consistency across endpoints.Suggested fix
def list_evaluation_runs( _session: SessionDep, auth_context: AuthContextDep, - limit: int = 50, - offset: int = 0, + limit: int = Query(default=50, ge=1, le=100, description="Maximum number of runs to return"), + offset: int = Query(default=0, ge=0, description="Number of runs to skip"), ) -> APIResponse[list[EvaluationRunPublic]]:backend/app/services/evaluations/evaluation.py (1)
23-98: Consider usingTypedDictfor the config parameter and return type.The
config: dictand-> dicttype hints are loose. Using aTypedDictwould provide better type safety and documentation for the expected structure.Additionally, the logging at line 60-64 truncates instructions to 50 chars which is good, but consider using
mask_stringif instructions could contain sensitive data (as per coding guidelines for sensitive values).
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
backend/app/api/main.pybackend/app/api/routes/evaluations/__init__.pybackend/app/api/routes/evaluations/dataset.pybackend/app/api/routes/evaluations/evaluation.pybackend/app/core/batch/__init__.pybackend/app/core/batch/operations.pybackend/app/core/batch/polling.pybackend/app/crud/evaluations/batch.pybackend/app/crud/evaluations/embeddings.pybackend/app/crud/evaluations/processing.pybackend/app/crud/job/__init__.pybackend/app/crud/job/batch.pybackend/app/crud/job/job.pybackend/app/services/evaluations/__init__.pybackend/app/services/evaluations/dataset.pybackend/app/services/evaluations/evaluation.pybackend/app/services/evaluations/validators.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets:logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase
Files:
backend/app/crud/job/batch.pybackend/app/crud/evaluations/batch.pybackend/app/core/batch/operations.pybackend/app/core/batch/polling.pybackend/app/api/routes/evaluations/dataset.pybackend/app/services/evaluations/__init__.pybackend/app/services/evaluations/validators.pybackend/app/crud/evaluations/processing.pybackend/app/api/routes/evaluations/evaluation.pybackend/app/services/evaluations/evaluation.pybackend/app/api/main.pybackend/app/core/batch/__init__.pybackend/app/services/evaluations/dataset.pybackend/app/api/routes/evaluations/__init__.pybackend/app/crud/job/__init__.pybackend/app/crud/evaluations/embeddings.py
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
backend/app/api/**/*.py: Define FastAPI REST endpoints inbackend/app/api/organized by domain
Load Swagger endpoint descriptions from external markdown files instead of inline strings usingload_description("domain/action.md")
Files:
backend/app/api/routes/evaluations/dataset.pybackend/app/api/routes/evaluations/evaluation.pybackend/app/api/main.pybackend/app/api/routes/evaluations/__init__.py
backend/app/services/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Implement business logic in services located in
backend/app/services/
Files:
backend/app/services/evaluations/__init__.pybackend/app/services/evaluations/validators.pybackend/app/services/evaluations/evaluation.pybackend/app/services/evaluations/dataset.py
🧠 Learnings (1)
📚 Learning: 2025-12-17T15:39:30.469Z
Learnt from: CR
Repo: ProjectTech4DevAI/kaapi-backend PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-17T15:39:30.469Z
Learning: Applies to backend/app/crud/*.py : Use CRUD pattern for database access operations located in `backend/app/crud/`
Applied to files:
backend/app/crud/job/__init__.py
🧬 Code graph analysis (11)
backend/app/crud/job/batch.py (2)
backend/app/core/batch/operations.py (4)
download_batch_results(86-112)process_completed_batch(115-158)start_batch_job(17-83)upload_batch_results_to_object_store(161-188)backend/app/core/batch/polling.py (1)
poll_batch_status(15-53)
backend/app/crud/evaluations/batch.py (2)
backend/app/core/batch/openai.py (1)
OpenAIBatchProvider(14-254)backend/app/core/batch/operations.py (1)
start_batch_job(17-83)
backend/app/core/batch/operations.py (1)
backend/app/crud/job/job.py (2)
create_batch_job(13-56)update_batch_job(76-120)
backend/app/core/batch/polling.py (3)
backend/app/core/batch/base.py (1)
BatchProvider(7-105)backend/app/crud/job/job.py (1)
update_batch_job(76-120)backend/app/models/batch_job.py (2)
BatchJob(15-136)BatchJobUpdate(156-165)
backend/app/api/routes/evaluations/dataset.py (4)
backend/app/crud/evaluations/dataset.py (2)
get_dataset_by_id(107-140)list_datasets(175-210)backend/app/models/evaluation.py (2)
DatasetUploadResponse(25-44)EvaluationDataset(73-167)backend/app/api/permissions.py (2)
Permission(10-15)require_permission(45-70)backend/app/utils.py (2)
APIResponse(33-57)success_response(40-43)
backend/app/services/evaluations/__init__.py (3)
backend/app/services/evaluations/dataset.py (1)
upload_dataset(24-163)backend/app/services/evaluations/evaluation.py (3)
build_evaluation_config(23-98)get_evaluation_with_scores(229-329)start_evaluation(101-226)backend/app/services/evaluations/validators.py (3)
parse_csv_items(118-174)sanitize_dataset_name(23-68)validate_csv_file(71-115)
backend/app/crud/evaluations/processing.py (5)
backend/app/core/batch/openai.py (2)
OpenAIBatchProvider(14-254)download_batch_results(143-191)backend/app/core/batch/operations.py (1)
download_batch_results(86-112)backend/app/core/batch/base.py (1)
download_batch_results(55-72)backend/app/core/batch/polling.py (1)
poll_batch_status(15-53)backend/app/crud/job/job.py (1)
get_batch_job(59-73)
backend/app/core/batch/__init__.py (3)
backend/app/core/batch/openai.py (2)
OpenAIBatchProvider(14-254)download_batch_results(143-191)backend/app/core/batch/operations.py (4)
download_batch_results(86-112)process_completed_batch(115-158)start_batch_job(17-83)upload_batch_results_to_object_store(161-188)backend/app/core/batch/polling.py (1)
poll_batch_status(15-53)
backend/app/services/evaluations/dataset.py (6)
backend/app/core/cloud/storage.py (1)
get_cloud_storage(261-278)backend/app/crud/evaluations/dataset.py (1)
create_evaluation_dataset(31-104)backend/app/crud/evaluations/langfuse.py (1)
upload_dataset_to_langfuse(224-315)backend/app/models/evaluation.py (1)
EvaluationDataset(73-167)backend/app/services/evaluations/validators.py (2)
parse_csv_items(118-174)sanitize_dataset_name(23-68)backend/app/utils.py (1)
get_langfuse_client(212-248)
backend/app/crud/job/__init__.py (1)
backend/app/crud/job/job.py (6)
create_batch_job(13-56)delete_batch_job(197-222)get_batch_job(59-73)get_batch_jobs_by_ids(123-149)get_batches_by_type(152-194)update_batch_job(76-120)
backend/app/crud/evaluations/embeddings.py (2)
backend/app/core/batch/openai.py (1)
OpenAIBatchProvider(14-254)backend/app/core/batch/operations.py (1)
start_batch_job(17-83)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.12, 6)
🔇 Additional comments (34)
backend/app/core/batch/operations.py (2)
11-11: LGTM: Import path updated correctly.The import path has been updated to reflect the new CRUD structure where batch job operations are now under
app.crud.job.job. This aligns with the broader refactor that consolidates batch-related CRUD operations.
17-188: Type hints and logging look good.All functions have complete type hints on parameters and return values, and log messages properly use the
[function_name]prefix format.backend/app/crud/job/__init__.py (1)
1-24: Well-structured public API aggregator.This module correctly re-exports batch job CRUD operations and provides clear guidance to users about where to import batch operations (from
app.core.batch). The separation between CRUD operations (here) and batch operations (inapp.core.batch) follows good architectural practices.backend/app/crud/evaluations/batch.py (2)
17-17: LGTM: Import consolidation improves maintainability.The consolidated import from
app.core.batchsimplifies the code and aligns with the refactor that centralizes batch operations. This makes it easier to maintain and understand the dependency structure.
23-211: Code follows all coding guidelines.All functions have complete type hints and proper log message formatting with function names in square brackets. The evaluation batch orchestration logic is well-structured and delegates appropriately to the generic batch infrastructure.
backend/app/api/main.py (1)
23-23: Router refactor is consistent.The change from
evaluationtoevaluationsaligns with the refactor that reorganizes the evaluation routes from a single file into a package structure. The router wiring correctly includes the newevaluations.router.Also applies to: 40-40
backend/app/api/routes/evaluations/__init__.py (1)
1-13: Router structure follows best practices.The router configuration properly organizes evaluation endpoints by domain with nested dataset routes under
/evaluations/datasetsand direct evaluation routes under/evaluations. This provides a clean, hierarchical API structure.Verified that endpoint implementations in
dataset.pyandevaluation.pyproperly load Swagger descriptions from external markdown files usingload_description(). All endpoint functions include complete type hints on parameters and return types, and all logger calls use the correct[function_name]prefix format as specified in the coding guidelines.backend/app/crud/job/batch.py (1)
1-22: Clean re-export layer for batch operations.This module correctly consolidates batch-related CRUD operations from the core layer, providing a convenient import surface for consumers. The
__all__list properly documents the public API.backend/app/crud/evaluations/processing.py (6)
22-39: Import consolidation aligns with the refactor.The updated imports from
app.core.batchandapp.crud.jobcorrectly centralize batch utilities, improving code organization and reducing import fragmentation.
46-177: Well-structured parsing with proper error handling.The function correctly:
- Uses type hints for parameters and return value
- Follows the
[function_name]logging prefix convention- Gracefully handles parsing errors with
continuestatements- Creates a lookup map for O(1) dataset item access
180-317: Comprehensive evaluation processing with good observability.The async function properly:
- Declares all parameter and return type hints
- Uses consistent log prefixes with org/project/eval context
- Handles embedding batch failures gracefully without failing the entire evaluation
- Updates evaluation status appropriately on errors
320-441: Robust embedding batch processing with trace updates.The function correctly handles the complete embedding workflow including Langfuse trace updates. The decision to mark as "completed" with an error message (rather than "failed") when embedding processing fails is a reasonable design choice to avoid blocking the overall evaluation.
444-643: Comprehensive batch status checking with dual-batch handling.The function properly orchestrates both embedding and main response batch processing, with appropriate status transitions and error handling throughout.
646-807: Well-organized batch polling with project-level credential isolation.The function correctly groups evaluations by project to handle per-project API credentials, with comprehensive error handling that persists failure status to the database for each failed run.
backend/app/crud/evaluations/embeddings.py (6)
18-18: Import consolidation aligns with the batch refactor.The centralized import from
app.core.batchcorrectly sources bothOpenAIBatchProviderandstart_batch_job, matching the new package structure.
32-46: Simple validation function with proper type hints.The function correctly validates embedding models against the allowed list and raises a descriptive
ValueErrorfor invalid models.
131-217: Robust embedding result parsing with proper error handling.The function correctly parses embedding pairs by index and handles various error conditions gracefully. Type hints are complete.
220-251: Correct cosine similarity implementation with edge case handling.The function properly handles zero vectors to avoid division by zero, and the numpy-based implementation is efficient.
254-256: Type hints and trailing comma added.The function signature update maintains good formatting practices.
333-433: Well-orchestrated embedding batch creation.The function correctly:
- Validates and falls back to default embedding model
- Uses the centralized
start_batch_jobinfrastructure- Links the embedding batch job to the evaluation run
backend/app/core/batch/polling.py (1)
15-53: Clean polling implementation with proper logging.The function correctly:
- Uses type hints for all parameters and return value
- Follows the
[function_name]logging prefix convention- Only updates the database when status actually changes (optimization)
- Propagates optional fields like
provider_output_file_idanderror_messagebackend/app/core/batch/__init__.py (1)
1-21: Well-organized public API surface for batch operations.The package init correctly consolidates all batch-related utilities into a single import surface, making it easy for consumers to access
BatchProvider,OpenAIBatchProvider, and all batch operations from one import.backend/app/services/evaluations/dataset.py (1)
24-163: Well-structured service layer for dataset upload orchestration.The function correctly:
- Uses complete type hints for all parameters and return value
- Follows the
[function_name]logging prefix convention throughout- Implements a resilient workflow that continues without object store if upload fails
- Properly sequences the steps: sanitize → parse → upload to storage → upload to Langfuse → create DB record
- Uses appropriate HTTP status codes (422 for validation, 500 for server errors)
As per coding guidelines, business logic is correctly implemented in the services layer.
backend/app/services/evaluations/validators.py (2)
23-68: Robust dataset name sanitization for Langfuse compatibility.The function correctly handles:
- Case normalization
- Space/hyphen replacement
- Special character removal
- Underscore collapse
- Empty name validation
The docstring examples clearly document expected behavior.
118-174: Robust CSV parsing with case-insensitive header matching.The function correctly:
- Uses type hints for parameters and return value
- Handles case-insensitive header matching
- Filters out rows with empty question or answer
- Provides clear error messages with column information
- Follows the logging prefix convention
backend/app/api/routes/evaluations/dataset.py (4)
36-46: LGTM! Helper function is well-structured.The conversion function correctly maps
EvaluationDatasetfields toDatasetUploadResponse, with sensible defaults for missing metadata keys.
49-85: LGTM! Upload endpoint follows best practices.The endpoint correctly validates CSV content, delegates to the service layer, and uses project-level permissions. The form parameters with validation constraints (
ge=1, le=5for duplication_factor) are appropriate.
118-148: LGTM! Get endpoint is well-implemented.The endpoint correctly logs access with the function name prefix, handles the not-found case with a 404, and uses the helper function for response conversion.
151-185: LGTM! Delete endpoint handles error cases appropriately.The endpoint properly distinguishes between "not found" (404) and other errors (400) based on the message content, and logs both the attempt and success.
backend/app/services/evaluations/__init__.py (1)
1-16: LGTM! Clean package exports.The
__init__.pyproperly consolidates the public API surface for the evaluations services package, making imports cleaner for consumers (e.g.,from app.services.evaluations import upload_dataset).backend/app/api/routes/evaluations/evaluation.py (1)
100-148: LGTM! Status endpoint is well-implemented.The endpoint correctly:
- Validates the
resync_score/get_trace_infodependency- Returns appropriate 404 for missing evaluations
- Uses
failure_responsewhen there's an error but data is still availablebackend/app/services/evaluations/evaluation.py (3)
101-226: LGTM! Evaluation start flow is well-orchestrated.The function:
- Validates dataset existence and Langfuse ID requirement
- Builds config via the dedicated function
- Creates the run record before starting the batch (good for error tracking)
- Handles batch failures gracefully by refreshing and returning the run
The logging follows the
[function_name]prefix convention correctly.
283-286: Potential issue: cache check may fail ifscoreis an empty dict.The check
eval_run.score is not None and "traces" in eval_run.scorewill pass ifscoreis{}, but this edge case seems unlikely given the flow. Just flagging for awareness.Consider whether an empty
scoredict (withouttraceskey) could ever be persisted. If so, the condition correctly handles it by fetching from Langfuse.
229-329: LGTM! Score fetching with caching is well-designed.The function properly:
- Returns early for non-completed evaluations with a helpful message
- Uses cached scores when available (unless
resync_scoreis true)- Handles both
ValueErrorand general exceptions distinctly- Captures necessary data before external calls to avoid session issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
backend/app/crud/evaluations/embeddings.py (4)
49-128: Apply the log prefix guideline to all log messages in this function.Line 84 correctly follows the coding guideline by prefixing the log message with
[build_embedding_jsonl]. However, the log messages at lines 95, 101, 106, and 127 are missing the function name prefix.As per coding guidelines: "Prefix all log messages with the function name in square brackets"
♻️ Apply consistent log prefixes
- logger.warning("Skipping result with no item_id") + logger.warning("[build_embedding_jsonl] Skipping result with no item_id") - logger.warning(f"Skipping item {item_id} - no trace_id found") + logger.warning(f"[build_embedding_jsonl] Skipping item {item_id} - no trace_id found") - logger.warning(f"Skipping item {item_id} - empty output or ground_truth") + logger.warning(f"[build_embedding_jsonl] Skipping item {item_id} - empty output or ground_truth") - logger.info(f"Built {len(jsonl_data)} embedding JSONL lines") + logger.info(f"[build_embedding_jsonl] Built {len(jsonl_data)} embedding JSONL lines")Based on coding guidelines.
131-217: Apply the log prefix guideline to all log messages in this function.All log messages in this function are missing the required function name prefix. As per coding guidelines: "Prefix all log messages with the function name in square brackets"
♻️ Apply consistent log prefixes
- logger.info(f"Parsing embedding results from {len(raw_results)} lines") + logger.info(f"[parse_embedding_results] Parsing embedding results from {len(raw_results)} lines") - logger.warning(f"Line {line_num}: No custom_id found, skipping") + logger.warning(f"[parse_embedding_results] Line {line_num}: No custom_id found, skipping") - logger.error(f"Trace {trace_id} had error: {error_msg}") + logger.error(f"[parse_embedding_results] Trace {trace_id} had error: {error_msg}") - logger.warning( - f"Trace {trace_id}: Expected 2 embeddings, got {len(embedding_data)}" - ) + logger.warning( + f"[parse_embedding_results] Trace {trace_id}: Expected 2 embeddings, got {len(embedding_data)}" + ) - logger.warning( - f"Trace {trace_id}: Missing embeddings (output={output_embedding is not None}, " - f"ground_truth={ground_truth_embedding is not None})" - ) + logger.warning( + f"[parse_embedding_results] Trace {trace_id}: Missing embeddings (output={output_embedding is not None}, " + f"ground_truth={ground_truth_embedding is not None})" + ) - logger.error(f"Line {line_num}: Unexpected error: {e}", exc_info=True) + logger.error(f"[parse_embedding_results] Line {line_num}: Unexpected error: {e}", exc_info=True) - logger.info( - f"Parsed {len(embedding_pairs)} embedding pairs from {len(raw_results)} lines" - ) + logger.info( + f"[parse_embedding_results] Parsed {len(embedding_pairs)} embedding pairs from {len(raw_results)} lines" + )Based on coding guidelines.
254-330: Apply the log prefix guideline to all log messages in this function.All log messages in this function are missing the required function name prefix. As per coding guidelines: "Prefix all log messages with the function name in square brackets"
♻️ Apply consistent log prefixes
- logger.info(f"Calculating similarity for {len(embedding_pairs)} pairs") + logger.info(f"[calculate_average_similarity] Calculating similarity for {len(embedding_pairs)} pairs") - logger.error( - f"Error calculating similarity for trace {pair.get('trace_id')}: {e}" - ) + logger.error( + f"[calculate_average_similarity] Error calculating similarity for trace {pair.get('trace_id')}: {e}" + ) - logger.warning("No valid similarities calculated") + logger.warning("[calculate_average_similarity] No valid similarities calculated") - logger.info( - f"Calculated similarity stats: avg={stats['cosine_similarity_avg']:.3f}, " - f"std={stats['cosine_similarity_std']:.3f}" - ) + logger.info( + f"[calculate_average_similarity] Calculated similarity stats: avg={stats['cosine_similarity_avg']:.3f}, " + f"std={stats['cosine_similarity_std']:.3f}" + )Based on coding guidelines.
333-433: Apply the log prefix guideline to all log messages in this function.All log messages in this function are missing the required function name prefix. As per coding guidelines: "Prefix all log messages with the function name in square brackets"
♻️ Apply consistent log prefixes
- logger.info(f"Starting embedding batch for evaluation run {eval_run.id}") + logger.info(f"[start_embedding_batch] Starting embedding batch for evaluation run {eval_run.id}") - logger.warning( - f"Invalid embedding model '{embedding_model}' in config: {e}. " - f"Falling back to text-embedding-3-large" - ) + logger.warning( + f"[start_embedding_batch] Invalid embedding model '{embedding_model}' in config: {e}. " + f"Falling back to text-embedding-3-large" + ) - logger.info( - f"Successfully started embedding batch: batch_job_id={batch_job.id}, " - f"provider_batch_id={batch_job.provider_batch_id} " - f"for evaluation run {eval_run.id} with {batch_job.total_items} items" - ) + logger.info( + f"[start_embedding_batch] Successfully started embedding batch: batch_job_id={batch_job.id}, " + f"provider_batch_id={batch_job.provider_batch_id} " + f"for evaluation run {eval_run.id} with {batch_job.total_items} items" + ) - logger.error(f"Failed to start embedding batch: {e}", exc_info=True) + logger.error(f"[start_embedding_batch] Failed to start embedding batch: {e}", exc_info=True)Based on coding guidelines.
🧹 Nitpick comments (4)
backend/app/api/routes/evaluations/dataset.py (4)
55-85: Add structured logging for the upload operation.The
get_datasetanddelete_datasetendpoints include logging, but this upload endpoint does not. Consider adding logging after the upload completes to maintain consistency and aid in debugging.📝 Suggested logging addition
Add logging after line 83:
dataset = upload_dataset( session=_session, csv_content=csv_content, dataset_name=dataset_name, description=description, duplication_factor=duplication_factor, organization_id=auth_context.organization_.id, project_id=auth_context.project_.id, ) logger.info( f"[upload_dataset_endpoint] Successfully uploaded dataset | " f"id={dataset.id} | name={dataset_name} | " f"org_id={auth_context.organization_.id} | " f"project_id={auth_context.project_.id}" ) return APIResponse.success_response(data=_dataset_to_response(dataset))Based on coding guidelines requiring log message prefixes with function names.
94-117: Add structured logging for consistency.Like the upload endpoint, this lacks logging while
get_datasetanddelete_datasetinclude it. Adding a log statement would improve observability for list operations.📝 Suggested logging addition
Add logging after line 113:
datasets = list_datasets( session=_session, organization_id=auth_context.organization_.id, project_id=auth_context.project_.id, limit=limit, offset=offset, ) logger.info( f"[list_datasets_endpoint] Retrieved datasets | " f"count={len(datasets)} | limit={limit} | offset={offset} | " f"org_id={auth_context.organization_.id} | " f"project_id={auth_context.project_.id}" ) return APIResponse.success_response( data=[_dataset_to_response(dataset) for dataset in datasets] )Based on coding guidelines requiring log message prefixes with function names.
104-105: Remove redundant limit check.The
Queryparameter on line 98 already enforcesle=100, making this manual check unnecessary.♻️ Proposed simplification
offset: int = Query(default=0, ge=0, description="Number of datasets to skip"), ) -> APIResponse[list[DatasetUploadResponse]]: """List evaluation datasets.""" - # Enforce maximum limit - if limit > 100: - limit = 100 - datasets = list_datasets( session=_session,
178-182: Consider more robust error handling.The string matching on line 179 (
if "not found" in message.lower()) is fragile. If the CRUD layer changes its error message format, this logic could break.Consider one of these approaches:
- Have
delete_dataset_crudraise specific exception types (e.g.,DatasetNotFoundError) instead of returning success/message tuples- Return an enum or error code alongside the message for more reliable classification
- Document the expected message formats in the CRUD layer
For now, the current implementation works, but a future refactor would improve maintainability.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
backend/app/api/routes/evaluations/dataset.pybackend/app/crud/evaluations/embeddings.pybackend/app/services/evaluations/validators.py
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/services/evaluations/validators.py
🧰 Additional context used
📓 Path-based instructions (2)
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
backend/app/api/**/*.py: Define FastAPI REST endpoints inbackend/app/api/organized by domain
Load Swagger endpoint descriptions from external markdown files instead of inline strings usingload_description("domain/action.md")
Files:
backend/app/api/routes/evaluations/dataset.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets:logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase
Files:
backend/app/api/routes/evaluations/dataset.pybackend/app/crud/evaluations/embeddings.py
🧬 Code graph analysis (2)
backend/app/api/routes/evaluations/dataset.py (6)
backend/app/crud/evaluations/dataset.py (2)
get_dataset_by_id(107-140)list_datasets(175-210)backend/app/models/evaluation.py (2)
DatasetUploadResponse(25-44)EvaluationDataset(73-167)backend/app/api/permissions.py (2)
Permission(10-15)require_permission(45-70)backend/app/services/evaluations/dataset.py (1)
upload_dataset(24-163)backend/app/services/evaluations/validators.py (1)
validate_csv_file(71-120)backend/app/utils.py (2)
APIResponse(33-57)success_response(40-43)
backend/app/crud/evaluations/embeddings.py (2)
backend/app/core/batch/openai.py (1)
OpenAIBatchProvider(14-254)backend/app/core/batch/operations.py (1)
start_batch_job(17-83)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.12, 6)
🔇 Additional comments (3)
backend/app/api/routes/evaluations/dataset.py (2)
36-46: LGTM! Clean conversion helper.The function properly converts the database model to the response schema with safe dictionary access using defaults.
120-150: LGTM! Well-structured endpoint.Proper logging, error handling, and type hints. The 404 response for missing datasets is appropriate.
backend/app/crud/evaluations/embeddings.py (1)
18-18: The centralized batch imports are properly re-exported.The refactoring correctly imports
OpenAIBatchProviderandstart_batch_jobfromapp.core.batch, which properly re-exports both symbols from their respective implementation files (openai.pyandoperations.py) and includes them in the module's__all__declaration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In @backend/app/api/routes/evaluations/dataset.py:
- Around line 94-113: The endpoint list_datasets_endpoint lacks structured
logging; add a log call prefixed with "[list_datasets_endpoint]" before/after
calling list_datasets that records relevant fields (organization id, project id,
limit, offset, and number of datasets returned) using the same logger used
elsewhere (e.g., processLogger or module logger); place the log near the call to
list_datasets and ensure it executes whether the call succeeds or fails (include
error logging on exception), leaving the existing return via
APIResponse.success_response and conversion via _dataset_to_response unchanged.
- Around line 55-85: Add structured logging to upload_dataset_endpoint: log an
info entry prefixed with "[upload_dataset_endpoint]" before calling
upload_dataset and another on success (including dataset.id/name) and log an
error with context on exceptions. Use the existing logger used by
get_dataset/delete_dataset (match their format), include key fields like
auth_context.organization_.id, auth_context.project_.id, dataset_name,
duplication_factor, and any validation errors from validate_csv_file; ensure
logs reference function name in square brackets and do not leak sensitive data.
- Line 72: Add structured info-level logging to the upload_dataset_endpoint:
insert a logger.info call prefixed with "[upload_dataset_endpoint]" (matching
get_dataset/delete_dataset) to log the upload operation after validating the
file (near the call to validate_csv_file and variable csv_content). Use the same
message pattern as other endpoints (function name in square brackets and a
concise description of the action) so logs are consistent with the existing
logger usage.
🧹 Nitpick comments (1)
backend/app/api/routes/evaluations/dataset.py (1)
174-178: Consider more robust error detection.The error type detection relies on string matching (
"not found" in message.lower()), which is fragile. If the CRUD layer changes its error message format, the HTTP status code mapping will break.💡 More robust alternatives
Consider one of these approaches:
Option 1: Have
delete_dataset_crudraise specific exceptions instead of returning(bool, str):# In CRUD layer raise DatasetNotFoundError(f"Dataset {dataset_id} not found") raise DatasetDeleteError(f"Failed to delete: {reason}") # In API layer try: delete_dataset_crud(...) except DatasetNotFoundError as e: raise HTTPException(status_code=404, detail=str(e)) except DatasetDeleteError as e: raise HTTPException(status_code=400, detail=str(e))Option 2: Return an error code enum alongside the message:
# CRUD returns: (success: bool, message: str, error_code: ErrorCode | None) success, message, error_code = delete_dataset_crud(...) if not success: status = 404 if error_code == ErrorCode.NOT_FOUND else 400 raise HTTPException(status_code=status, detail=message)
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/api/routes/evaluations/dataset.py
🧰 Additional context used
📓 Path-based instructions (2)
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
backend/app/api/**/*.py: Define FastAPI REST endpoints inbackend/app/api/organized by domain
Load Swagger endpoint descriptions from external markdown files instead of inline strings usingload_description("domain/action.md")
Files:
backend/app/api/routes/evaluations/dataset.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets:logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase
Files:
backend/app/api/routes/evaluations/dataset.py
🧬 Code graph analysis (1)
backend/app/api/routes/evaluations/dataset.py (7)
backend/app/crud/evaluations/dataset.py (2)
get_dataset_by_id(107-140)list_datasets(175-210)backend/app/models/evaluation.py (2)
DatasetUploadResponse(25-44)EvaluationDataset(73-167)backend/app/api/permissions.py (2)
Permission(10-15)require_permission(45-70)backend/app/services/evaluations/dataset.py (1)
upload_dataset(24-163)backend/app/services/evaluations/validators.py (1)
validate_csv_file(71-120)backend/app/utils.py (2)
APIResponse(33-57)success_response(40-43)backend/app/models/auth.py (2)
organization_(25-29)project_(32-36)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.12, 6)
🔇 Additional comments (3)
backend/app/api/routes/evaluations/dataset.py (3)
116-146: LGTM! Proper logging and error handling.The endpoint correctly implements structured logging with the function name prefix, appropriate 404 error handling, and has complete type hints.
161-165: LGTM! Proper structured logging.The endpoint correctly implements structured logging with the function name prefix as required by the coding guidelines.
Also applies to: 180-180
36-46: Remove the review comment - the concern is unfounded.The
dataset_metadatafield in theEvaluationDatasetmodel is guaranteed to never beNone. The field is declared asdataset_metadata: dict[str, Any]withdefault_factory=dictand a database-level constraint ofnullable=False. The code in_dataset_to_responseis safe and correct.Likely an incorrect or invalid review comment.
| async def upload_dataset_endpoint( | ||
| _session: SessionDep, | ||
| auth_context: AuthContextDep, | ||
| file: UploadFile = File( | ||
| ..., description="CSV file with 'question' and 'answer' columns" | ||
| ), | ||
| dataset_name: str = Form(..., description="Name for the dataset"), | ||
| description: str | None = Form(None, description="Optional dataset description"), | ||
| duplication_factor: int = Form( | ||
| default=1, | ||
| ge=1, | ||
| le=5, | ||
| description="Number of times to duplicate each item (min: 1, max: 5)", | ||
| ), | ||
| ) -> APIResponse[DatasetUploadResponse]: | ||
| """Upload an evaluation dataset.""" | ||
| # Validate and read CSV file | ||
| csv_content = await validate_csv_file(file) | ||
|
|
||
| # Upload dataset using service | ||
| dataset = upload_dataset( | ||
| session=_session, | ||
| csv_content=csv_content, | ||
| dataset_name=dataset_name, | ||
| description=description, | ||
| duplication_factor=duplication_factor, | ||
| organization_id=auth_context.organization_.id, | ||
| project_id=auth_context.project_.id, | ||
| ) | ||
|
|
||
| return APIResponse.success_response(data=_dataset_to_response(dataset)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add structured logging for the upload operation.
According to the coding guidelines, all operations should include structured logging with the function name prefixed in square brackets. The get_dataset and delete_dataset endpoints follow this pattern, but this endpoint is missing logging.
📝 Suggested logging additions
async def upload_dataset_endpoint(
_session: SessionDep,
auth_context: AuthContextDep,
file: UploadFile = File(
..., description="CSV file with 'question' and 'answer' columns"
),
dataset_name: str = Form(..., description="Name for the dataset"),
description: str | None = Form(None, description="Optional dataset description"),
duplication_factor: int = Form(
default=1,
ge=1,
le=5,
description="Number of times to duplicate each item (min: 1, max: 5)",
),
) -> APIResponse[DatasetUploadResponse]:
"""Upload an evaluation dataset."""
+ logger.info(
+ f"[upload_dataset_endpoint] Uploading dataset | "
+ f"dataset_name={dataset_name} | "
+ f"duplication_factor={duplication_factor} | "
+ f"org_id={auth_context.organization_.id} | "
+ f"project_id={auth_context.project_.id}"
+ )
+
# Validate and read CSV file
csv_content = await validate_csv_file(file)
# Upload dataset using service
dataset = upload_dataset(
session=_session,
csv_content=csv_content,
dataset_name=dataset_name,
description=description,
duplication_factor=duplication_factor,
organization_id=auth_context.organization_.id,
project_id=auth_context.project_.id,
)
+ logger.info(
+ f"[upload_dataset_endpoint] Successfully uploaded dataset | "
+ f"dataset_id={dataset.id} | "
+ f"total_items={dataset.dataset_metadata.get('total_items_count', 0)}"
+ )
+
return APIResponse.success_response(data=_dataset_to_response(dataset))As per coding guidelines, all log messages should be prefixed with the function name in square brackets.
🤖 Prompt for AI Agents
In @backend/app/api/routes/evaluations/dataset.py around lines 55 - 85, Add
structured logging to upload_dataset_endpoint: log an info entry prefixed with
"[upload_dataset_endpoint]" before calling upload_dataset and another on success
(including dataset.id/name) and log an error with context on exceptions. Use the
existing logger used by get_dataset/delete_dataset (match their format), include
key fields like auth_context.organization_.id, auth_context.project_.id,
dataset_name, duplication_factor, and any validation errors from
validate_csv_file; ensure logs reference function name in square brackets and do
not leak sensitive data.
| ) -> APIResponse[DatasetUploadResponse]: | ||
| """Upload an evaluation dataset.""" | ||
| # Validate and read CSV file | ||
| csv_content = await validate_csv_file(file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for validate_csv_file function definition with async keyword
ast-grep --pattern $'async def validate_csv_file($$$):
$$$'Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 57
🏁 Script executed:
#!/bin/bash
# Search for validate_csv_file function definition without async
ast-grep --pattern $'def validate_csv_file($$$):
$$$'Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 57
🏁 Script executed:
#!/bin/bash
# Find the validators module and examine validate_csv_file
fd -t f "validators.py" backend/app/services/Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 121
🏁 Script executed:
#!/bin/bash
# Get the full endpoint function to check for logging and other issues
fd -t f "dataset.py" backend/app/api/routes/evaluations/Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 120
🏁 Script executed:
#!/bin/bash
# Check the validators file to find validate_csv_file definition
wc -l backend/app/services/evaluations/validators.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 125
🏁 Script executed:
#!/bin/bash
# Read the validators file
cat -n backend/app/services/evaluations/validators.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 6580
🏁 Script executed:
#!/bin/bash
# Check the dataset.py file size
wc -l backend/app/api/routes/evaluations/dataset.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 124
🏁 Script executed:
#!/bin/bash
# Read the dataset.py file
cat -n backend/app/api/routes/evaluations/dataset.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 7149
Add structured logging to the endpoint.
The upload_dataset_endpoint function is missing structured logging. Other endpoints in the same file (get_dataset, delete_dataset) follow the coding guideline of prefixing log messages with the function name in square brackets. Add logger.info(f"[upload_dataset_endpoint] ...") to log the upload operation, matching the pattern used elsewhere in this file.
🤖 Prompt for AI Agents
In @backend/app/api/routes/evaluations/dataset.py at line 72, Add structured
info-level logging to the upload_dataset_endpoint: insert a logger.info call
prefixed with "[upload_dataset_endpoint]" (matching get_dataset/delete_dataset)
to log the upload operation after validating the file (near the call to
validate_csv_file and variable csv_content). Use the same message pattern as
other endpoints (function name in square brackets and a concise description of
the action) so logs are consistent with the existing logger usage.
| def list_datasets_endpoint( | ||
| _session: SessionDep, | ||
| auth_context: AuthContextDep, | ||
| limit: int = Query( | ||
| default=50, ge=1, le=100, description="Maximum number of datasets to return" | ||
| ), | ||
| offset: int = Query(default=0, ge=0, description="Number of datasets to skip"), | ||
| ) -> APIResponse[list[DatasetUploadResponse]]: | ||
| """List evaluation datasets.""" | ||
| datasets = list_datasets( | ||
| session=_session, | ||
| organization_id=auth_context.organization_.id, | ||
| project_id=auth_context.project_.id, | ||
| limit=limit, | ||
| offset=offset, | ||
| ) | ||
|
|
||
| return APIResponse.success_response( | ||
| data=[_dataset_to_response(dataset) for dataset in datasets] | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add structured logging for the list operation.
According to the coding guidelines, all operations should include structured logging with the function name prefixed in square brackets. This endpoint is missing logging, while get_dataset and delete_dataset follow the correct pattern.
📝 Suggested logging additions
def list_datasets_endpoint(
_session: SessionDep,
auth_context: AuthContextDep,
limit: int = Query(
default=50, ge=1, le=100, description="Maximum number of datasets to return"
),
offset: int = Query(default=0, ge=0, description="Number of datasets to skip"),
) -> APIResponse[list[DatasetUploadResponse]]:
"""List evaluation datasets."""
+ logger.info(
+ f"[list_datasets_endpoint] Listing datasets | "
+ f"limit={limit} | offset={offset} | "
+ f"org_id={auth_context.organization_.id} | "
+ f"project_id={auth_context.project_.id}"
+ )
+
datasets = list_datasets(
session=_session,
organization_id=auth_context.organization_.id,
project_id=auth_context.project_.id,
limit=limit,
offset=offset,
)
+ logger.info(
+ f"[list_datasets_endpoint] Retrieved {len(datasets)} dataset(s)"
+ )
+
return APIResponse.success_response(
data=[_dataset_to_response(dataset) for dataset in datasets]
)As per coding guidelines, all log messages should be prefixed with the function name in square brackets.
🤖 Prompt for AI Agents
In @backend/app/api/routes/evaluations/dataset.py around lines 94 - 113, The
endpoint list_datasets_endpoint lacks structured logging; add a log call
prefixed with "[list_datasets_endpoint]" before/after calling list_datasets that
records relevant fields (organization id, project id, limit, offset, and number
of datasets returned) using the same logger used elsewhere (e.g., processLogger
or module logger); place the log near the call to list_datasets and ensure it
executes whether the call succeeds or fails (include error logging on
exception), leaving the existing return via APIResponse.success_response and
conversion via _dataset_to_response unchanged.
| router = APIRouter(prefix="/evaluations", tags=["evaluation"]) | ||
|
|
||
| # Include dataset routes under /evaluations/datasets | ||
| router.include_router(dataset.router, prefix="/datasets") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what was the need of a specific init file for evaluation router since we have not followed this pattern till now
|
|
||
| poll_batch_status(session=session, provider=provider, batch_job=batch_job) | ||
|
|
||
| # Refresh batch_job to get the updated provider_status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comments that are not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need this comment
| ) | ||
|
|
||
| __all__ = [ | ||
| # CRUD operations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment is not needed here either
| organization_id: Organization ID | ||
| project_id: Project ID | ||
| Returns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
returns and raises not needed in the function doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't get this, can you elaborate
| Returns: | ||
| CSV content as bytes if valid | ||
| Raises: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this raises here part not needed in the function doc
| "/api/v1/evaluations/", | ||
| json={ | ||
| "experiment_name": "test_evaluation_run", | ||
| "dataset_id": 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the test coverage of this PR is really low
| response_model=APIResponse[DatasetUploadResponse], | ||
| dependencies=[Depends(require_permission(Permission.REQUIRE_PROJECT))], | ||
| ) | ||
| def get_dataset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either this router function could be list_one_dataset_by_id to maintain consistency with getAllDatasets endpoint above or vice versa. IMO get_dataset instead of list_dataset convention makes a bit more semantic sense to me.
| project_id=auth_context.project_.id, | ||
| ) | ||
|
|
||
| return APIResponse.success_response(data=eval_run) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the evaluation run creation fails (something like error code 422 or rate limiting)? Is proper error handling taken care of inside start_evaluation function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed it
| response_model=APIResponse[EvaluationRunPublic], | ||
| dependencies=[Depends(require_permission(Permission.REQUIRE_PROJECT))], | ||
| ) | ||
| def get_evaluation_run_status( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise semantic consistency with getall endpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
backend/app/api/routes/evaluations/dataset.py (2)
56-86: Consider adding logging for observability.Unlike
get_datasetanddelete_dataset, this endpoint lacks logging. For consistency and to aid debugging/monitoring, consider adding a log statement.♻️ Suggested logging addition
async def upload_dataset_endpoint( _session: SessionDep, auth_context: AuthContextDep, file: UploadFile = File( ..., description="CSV file with 'question' and 'answer' columns" ), dataset_name: str = Form(..., description="Name for the dataset"), description: str | None = Form(None, description="Optional dataset description"), duplication_factor: int = Form( default=1, ge=1, le=5, description="Number of times to duplicate each item (min: 1, max: 5)", ), ) -> APIResponse[DatasetUploadResponse]: """Upload an evaluation dataset.""" + logger.info( + f"[upload_dataset_endpoint] Uploading dataset | name={dataset_name} | " + f"org_id={auth_context.organization_.id} | " + f"project_id={auth_context.project_.id}" + ) # Validate and read CSV file csv_content = await validate_csv_file(file)
95-114: Consider adding logging for consistency with other endpoints.This endpoint lacks logging while
get_datasetanddelete_datasethave it. Adding a log statement would improve observability and maintain consistency. As per coding guidelines, prefix log messages with the function name.♻️ Suggested logging addition
def list_datasets_endpoint( _session: SessionDep, auth_context: AuthContextDep, limit: int = Query( default=50, ge=1, le=100, description="Maximum number of datasets to return" ), offset: int = Query(default=0, ge=0, description="Number of datasets to skip"), ) -> APIResponse[list[DatasetUploadResponse]]: """List evaluation datasets.""" + logger.info( + f"[list_datasets_endpoint] Listing datasets | " + f"org_id={auth_context.organization_.id} | " + f"project_id={auth_context.project_.id} | " + f"limit={limit} | offset={offset}" + ) datasets = list_datasets(
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/api/routes/evaluations/dataset.py
🧰 Additional context used
📓 Path-based instructions (2)
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
backend/app/api/**/*.py: Define FastAPI REST endpoints inbackend/app/api/organized by domain
Load Swagger endpoint descriptions from external markdown files instead of inline strings usingload_description("domain/action.md")
Files:
backend/app/api/routes/evaluations/dataset.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets:logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase
Files:
backend/app/api/routes/evaluations/dataset.py
🧠 Learnings (1)
📚 Learning: 2025-12-23T03:57:06.891Z
Learnt from: nishika26
Repo: ProjectTech4DevAI/kaapi-backend PR: 488
File: backend/app/tests/seed_data/seed_data.py:344-354
Timestamp: 2025-12-23T03:57:06.891Z
Learning: In backend/app/tests/seed_data/seed_data.py, log messages should use the file name as the prefix (e.g., `[tests.seed_data]`) rather than the function name prefix. This file-based logging prefix convention is preferred for test seed data modules.
Applied to files:
backend/app/api/routes/evaluations/dataset.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.12, 6)
🔇 Additional comments (3)
backend/app/api/routes/evaluations/dataset.py (3)
117-147: LGTM!Proper logging with function name prefix, appropriate 404 handling, and type hints are all in place.
150-184: LGTM with a minor observation.Logging and error handling are well implemented. The string-matching approach for distinguishing 404 vs 400 errors (line 176) is a bit fragile—consider having the CRUD layer return a status code or enum instead if this becomes problematic.
The
APIResponse[dict]return type is loose; a dedicated response model could improve type safety, but this is optional.
37-47: No changes required. Thedataset_metadatafield cannot beNonebecause it is typed asdict[str, Any](notOptional) withdefault_factory=dictand the database column is markednullable=False. Calling.get()on it is safe and will not raise anAttributeError. The function has appropriate type hints.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
backend/app/services/evaluations/evaluation.py (3)
59-63: Consider masking potentially sensitive instructions content.Per coding guidelines, sensitive values in logs should use
mask_string(). Assistant instructions might contain sensitive information. Consider either removing the instructions preview from the log or masking it.♻️ Suggested change
logger.info( f"[build_evaluation_config] Found assistant in DB | id={assistant.id} | " - f"model={assistant.model} | instructions=" - f"{assistant.instructions[:50] if assistant.instructions else 'None'}..." + f"model={assistant.model} | has_instructions={assistant.instructions is not None}" )
76-82: Simplify the truthiness check.The
len(vector_store_ids) > 0check is redundant whenvector_store_idsis already being checked for truthiness. A non-empty list is truthy in Python.♻️ Suggested simplification
- if vector_store_ids and len(vector_store_ids) > 0: + if vector_store_ids: merged_config["tools"] = [ { "type": "file_search", "vector_store_ids": vector_store_ids, } ]
263-264: Consider logging when evaluation run is not found.When
eval_runisNone, the function silently returns(None, None). Adding a log statement would help with debugging and monitoring when callers request non-existent or inaccessible evaluation runs.♻️ Suggested change
if not eval_run: + logger.warning( + f"[get_evaluation_with_scores] Evaluation run not found | " + f"evaluation_id={evaluation_id} | org_id={organization_id} | project_id={project_id}" + ) return None, None
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/services/evaluations/evaluation.py
🧰 Additional context used
📓 Path-based instructions (2)
backend/app/services/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Implement business logic in services located in
backend/app/services/
Files:
backend/app/services/evaluations/evaluation.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets:logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase
Files:
backend/app/services/evaluations/evaluation.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.12, 6)
🔇 Additional comments (3)
backend/app/services/evaluations/evaluation.py (3)
1-20: LGTM!Imports are well-organized with clear separation between FastAPI/SQLModel framework imports, CRUD operations, models, and utilities. Logger setup follows standard Python conventions.
312-317: No action needed –save_scoreintentionally creates its own session.The function is correctly designed to manage its own database session, as documented in the docstring: "This function creates its own database session to persist the score, allowing it to be called after releasing the request's main session." This is an intentional architectural pattern, not a bug. The function properly implements type hints and logging guidelines.
Likely an incorrect or invalid review comment.
196-219: Exception handling instart_evaluation_batchis correctly implemented.The CRUD function properly updates
eval_run.statusto "failed" and persists the change to the database before re-raising the exception. The service catches this exception, refreshes the run object from the database, and returns it with the failure status intact. This pattern correctly allows callers to detect failures by inspecting the returned run's status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @backend/app/services/evaluations/validators.py:
- Around line 142-146: The HTTPException raised in validators.py builds the
detail string by concatenating two f-strings but omits punctuation/spacing
between the clauses; update the detail message in the raise HTTPException block
(the f-string that references csv_reader.fieldnames) to include proper
punctuation and spacing (e.g., add a period or colon and a space before "Found
columns:") so the error reads correctly.
- Around line 92-97: The validator uses file.content_type directly and may
report "got: None" when the header is missing; update the check around
content_type (from file.content_type) to explicitly handle a None value before
comparing to ALLOWED_MIME_TYPES and raise a 422 with a clearer message (e.g.,
"Missing Content-Type header" or similar) instead of including None, ensuring
the HTTPException raised by this validator communicates the missing header case
separately from an invalid MIME type.
🧹 Nitpick comments (1)
backend/app/api/routes/evaluations/dataset.py (1)
175-179: String-based error detection is fragile.The logic
if "not found" in message.lower()couples the HTTP status code to the exact wording of the error message from the CRUD layer. If the message changes, this could return incorrect status codes.Consider having the CRUD function return a more structured error indicator (e.g., an enum or specific exception type) rather than relying on string matching.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backend/app/api/routes/evaluations/dataset.pybackend/app/services/evaluations/validators.py
🧰 Additional context used
📓 Path-based instructions (3)
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
backend/app/api/**/*.py: Define FastAPI REST endpoints inbackend/app/api/organized by domain
Load Swagger endpoint descriptions from external markdown files instead of inline strings usingload_description("domain/action.md")
Files:
backend/app/api/routes/evaluations/dataset.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets:logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase
Files:
backend/app/api/routes/evaluations/dataset.pybackend/app/services/evaluations/validators.py
backend/app/services/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Implement business logic in services located in
backend/app/services/
Files:
backend/app/services/evaluations/validators.py
🧠 Learnings (1)
📚 Learning: 2025-12-23T03:57:06.891Z
Learnt from: nishika26
Repo: ProjectTech4DevAI/kaapi-backend PR: 488
File: backend/app/tests/seed_data/seed_data.py:344-354
Timestamp: 2025-12-23T03:57:06.891Z
Learning: In backend/app/tests/seed_data/seed_data.py, log messages should use the file name as the prefix (e.g., `[tests.seed_data]`) rather than the function name prefix. This file-based logging prefix convention is preferred for test seed data modules.
Applied to files:
backend/app/api/routes/evaluations/dataset.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.12, 6)
🔇 Additional comments (9)
backend/app/services/evaluations/validators.py (3)
1-20: LGTM!Well-structured imports and clearly defined constants for file validation constraints.
22-64: LGTM!Well-documented sanitization function with clear rules and examples. The validation logic correctly handles edge cases and raises an appropriate
ValueErrorfor empty results.
115-169: LGTM!Solid CSV parsing implementation with proper case-insensitive header matching, empty row filtering, and appropriate error handling. The logging follows the coding guidelines with the
[parse_csv_items]prefix.backend/app/api/routes/evaluations/dataset.py (6)
1-34: LGTM!Clean imports and router setup. Good separation between CRUD operations and service layer imports.
37-47: LGTM!Clean helper function with proper type hints and safe dictionary access with defaults.
50-86: LGTM!Well-structured upload endpoint with proper validation, form parameters with constraints, and permission checks. Uses
load_descriptionas per coding guidelines.
89-114: LGTM!Clean list endpoint with proper pagination parameters and constraints.
117-147: LGTM!Well-implemented get endpoint with proper logging following the
[function_name]format guideline, and appropriate 404 handling.
150-184: LGTM overall!Delete endpoint with proper logging following the coding guidelines and appropriate permission checks.
| content_type = file.content_type | ||
| if content_type not in ALLOWED_MIME_TYPES: | ||
| raise HTTPException( | ||
| status_code=422, | ||
| detail=f"Invalid content type. Expected CSV, got: {content_type}", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle potential None content type.
file.content_type can be None if the client doesn't send a Content-Type header. The current check content_type not in ALLOWED_MIME_TYPES would raise an HTTPException with "got: None", but it's cleaner to handle this explicitly.
Suggested improvement
content_type = file.content_type
+ if not content_type:
+ raise HTTPException(
+ status_code=422,
+ detail="Content-Type header is missing",
+ )
if content_type not in ALLOWED_MIME_TYPES:
raise HTTPException(
status_code=422,
detail=f"Invalid content type. Expected CSV, got: {content_type}",
)🤖 Prompt for AI Agents
In @backend/app/services/evaluations/validators.py around lines 92 - 97, The
validator uses file.content_type directly and may report "got: None" when the
header is missing; update the check around content_type (from file.content_type)
to explicitly handle a None value before comparing to ALLOWED_MIME_TYPES and
raise a 422 with a clearer message (e.g., "Missing Content-Type header" or
similar) instead of including None, ensuring the HTTPException raised by this
validator communicates the missing header case separately from an invalid MIME
type.
| raise HTTPException( | ||
| status_code=422, | ||
| detail=f"CSV must contain 'question' and 'answer' columns " | ||
| f"Found columns: {csv_reader.fieldnames}", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Missing separator in error message.
The error detail message is missing punctuation between the two parts.
Suggested fix
raise HTTPException(
status_code=422,
- detail=f"CSV must contain 'question' and 'answer' columns "
- f"Found columns: {csv_reader.fieldnames}",
+ detail=f"CSV must contain 'question' and 'answer' columns. "
+ f"Found columns: {csv_reader.fieldnames}",
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| raise HTTPException( | |
| status_code=422, | |
| detail=f"CSV must contain 'question' and 'answer' columns " | |
| f"Found columns: {csv_reader.fieldnames}", | |
| ) | |
| raise HTTPException( | |
| status_code=422, | |
| detail=f"CSV must contain 'question' and 'answer' columns. " | |
| f"Found columns: {csv_reader.fieldnames}", | |
| ) |
🤖 Prompt for AI Agents
In @backend/app/services/evaluations/validators.py around lines 142 - 146, The
HTTPException raised in validators.py builds the detail string by concatenating
two f-strings but omits punctuation/spacing between the clauses; update the
detail message in the raise HTTPException block (the f-string that references
csv_reader.fieldnames) to include proper punctuation and spacing (e.g., add a
period or colon and a space before "Found columns:") so the error reads
correctly.
…ech4DevAI/kaapi-backend into enhancement/evaluation-refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@backend/app/tests/api/routes/test_evaluation.py`:
- Around line 495-508: The test file has Black formatting issues; run the
formatter (e.g., `black .` or `pre-commit run --all-files`) and commit the
changes so tests like test_upload_without_authentication in
backend/app/tests/api/routes/test_evaluation.py conform to Black style; ensure
imports, line breaks, and indentation in that file are corrected and re-run the
tests/CI to verify the formatting error is resolved.
- Around line 495-508: The test function test_upload_without_authentication is
missing type hints on its parameters and return value; update its signature to
add appropriate annotations (e.g., annotate client with the test client type
used in your suite and valid_csv_content with its type such as str or IO) and
add a -> None return type so the function and parameters conform to the project
typing guidelines (adjust concrete types to match your test fixtures' types).
- Around line 576-589: The test function
test_start_batch_evaluation_without_authentication is missing type hints for its
parameters and return type; add appropriate annotations (e.g., client:
FlaskClient or TestClient and sample_evaluation_config: dict) and a return type
of None to the function signature to match the style used in
test_upload_without_authentication so linters and type checkers are satisfied.
🧹 Nitpick comments (2)
backend/app/api/routes/evaluations/evaluation.py (2)
37-67: Consider adding logging for observability.Unlike
list_evaluation_runs, this endpoint doesn't log incoming requests. Adding a log statement would improve traceability and align with the pattern established elsewhere in this file.♻️ Suggested logging
def evaluate( _session: SessionDep, auth_context: AuthContextDep, dataset_id: int = Body(..., description="ID of the evaluation dataset"), experiment_name: str = Body( ..., description="Name for this evaluation experiment/run" ), config: dict = Body(default_factory=dict, description="Evaluation configuration"), assistant_id: str | None = Body( None, description="Optional assistant ID to fetch configuration from" ), ) -> APIResponse[EvaluationRunPublic]: """Start an evaluation run.""" + logger.info( + f"[evaluate] Starting evaluation | " + f"org_id={auth_context.organization_.id} | " + f"project_id={auth_context.project_.id} | " + f"dataset_id={dataset_id} | experiment_name={experiment_name}" + ) eval_run = start_evaluation(As per coding guidelines, log messages should be prefixed with the function name in square brackets.
76-81: Add Query() wrapper with validation bounds for pagination parameters.The
limitandoffsetparameters lack explicit Query() declarations, which means they won't have OpenAPI documentation. Additionally, an unboundedlimitcould allow expensive queries.♻️ Suggested improvement
def list_evaluation_runs( _session: SessionDep, auth_context: AuthContextDep, - limit: int = 50, - offset: int = 0, + limit: int = Query(default=50, ge=1, le=100, description="Maximum number of runs to return"), + offset: int = Query(default=0, ge=0, description="Number of runs to skip"), ) -> APIResponse[list[EvaluationRunPublic]]:
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backend/app/api/routes/evaluations/evaluation.pybackend/app/tests/api/routes/test_evaluation.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets:logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase
Files:
backend/app/tests/api/routes/test_evaluation.pybackend/app/api/routes/evaluations/evaluation.py
backend/app/tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use factory pattern for test fixtures in
backend/app/tests/
Files:
backend/app/tests/api/routes/test_evaluation.py
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
backend/app/api/**/*.py: Define FastAPI REST endpoints inbackend/app/api/organized by domain
Load Swagger endpoint descriptions from external markdown files instead of inline strings usingload_description("domain/action.md")
Files:
backend/app/api/routes/evaluations/evaluation.py
🧠 Learnings (1)
📚 Learning: 2025-12-17T15:39:30.469Z
Learnt from: CR
Repo: ProjectTech4DevAI/kaapi-backend PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-17T15:39:30.469Z
Learning: Implement automatic coverage reporting for tests
Applied to files:
backend/app/tests/api/routes/test_evaluation.py
🧬 Code graph analysis (2)
backend/app/tests/api/routes/test_evaluation.py (1)
backend/app/tests/conftest.py (1)
client(65-68)
backend/app/api/routes/evaluations/evaluation.py (5)
backend/app/models/evaluation.py (1)
EvaluationRunPublic(336-354)backend/app/api/permissions.py (2)
Permission(10-15)require_permission(45-70)backend/app/services/evaluations/evaluation.py (2)
get_evaluation_with_scores(222-319)start_evaluation(100-219)backend/app/utils.py (4)
APIResponse(33-57)load_description(393-398)failure_response(46-57)success_response(40-43)backend/app/models/auth.py (2)
organization_(25-29)project_(32-36)
🪛 GitHub Actions: Kaapi CI
backend/app/tests/api/routes/test_evaluation.py
[error] Black formatting failed. The hook reformatted 1 file (backend/app/tests/api/routes/test_evaluation.py). Please run 'pre-commit run --all-files' or 'black .' to fix formatting.
🔇 Additional comments (4)
backend/app/api/routes/evaluations/evaluation.py (2)
1-28: LGTM!Imports are well-organized and the logger is properly initialized. The router setup follows standard FastAPI patterns.
106-154: LGTM with optional logging suggestion.The endpoint correctly validates that
resync_score=truerequiresget_trace_info=true, properly handles the not-found case with a 404, and returns partial data with an error message when score fetching fails. The separation of concerns between "not found" (404) and "found but with error" (failure_response with data) is appropriate.Consider adding a log statement for consistency with
list_evaluation_runs:logger.info( f"[get_evaluation_run_status] Fetching evaluation | " f"evaluation_id={evaluation_id} | get_trace_info={get_trace_info}" )backend/app/tests/api/routes/test_evaluation.py (2)
62-109: LGTM!The patch targets correctly reflect the service layer migration (from
app.api.routes.evaluation.*toapp.services.evaluations.dataset.*), and the trailing slash on the endpoint URL aligns with the router restructuring.
732-918: LGTM!The
TestGetEvaluationRunStatusclass provides good coverage for the various scenarios:
- Incomplete evaluation with trace info request
- Completed evaluation with cached scores
- Not found case
- Without trace info
- Invalid
resync_scorewithoutget_trace_infoThe test fixtures correctly use the factory pattern via
create_test_evaluation_dataset.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| def test_upload_without_authentication(self, client, valid_csv_content): | ||
| """Test uploading without authentication.""" | ||
| filename, file_obj = create_csv_file(valid_csv_content) | ||
|
|
||
| response = client.post( | ||
| "/api/v1/evaluations/datasets/", | ||
| files={"file": (filename, file_obj, "text/csv")}, | ||
| data={ | ||
| "dataset_name": "test_dataset", | ||
| "duplication_factor": 5, | ||
| }, | ||
| ) | ||
|
|
||
| assert response.status_code == 401 # Unauthorized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix Black formatting to unblock CI.
The pipeline failed due to Black formatting issues in this file. Run pre-commit run --all-files or black . to fix the formatting before merging.
🧰 Tools
🪛 GitHub Actions: Kaapi CI
[error] Black formatting failed. The hook reformatted 1 file (backend/app/tests/api/routes/test_evaluation.py). Please run 'pre-commit run --all-files' or 'black .' to fix formatting.
🤖 Prompt for AI Agents
In `@backend/app/tests/api/routes/test_evaluation.py` around lines 495 - 508, The
test file has Black formatting issues; run the formatter (e.g., `black .` or
`pre-commit run --all-files`) and commit the changes so tests like
test_upload_without_authentication in
backend/app/tests/api/routes/test_evaluation.py conform to Black style; ensure
imports, line breaks, and indentation in that file are corrected and re-run the
tests/CI to verify the formatting error is resolved.
Add missing type hints.
The test function is missing type hints for parameters and return type, which violates the coding guidelines and may be contributing to the Black formatting failure.
🔧 Proposed fix
- def test_upload_without_authentication(self, client, valid_csv_content):
+ def test_upload_without_authentication(
+ self, client: TestClient, valid_csv_content: str
+ ) -> None:
"""Test uploading without authentication."""As per coding guidelines, always add type hints to all function parameters and return values in Python code.
🧰 Tools
🪛 GitHub Actions: Kaapi CI
[error] Black formatting failed. The hook reformatted 1 file (backend/app/tests/api/routes/test_evaluation.py). Please run 'pre-commit run --all-files' or 'black .' to fix formatting.
🤖 Prompt for AI Agents
In `@backend/app/tests/api/routes/test_evaluation.py` around lines 495 - 508, The
test function test_upload_without_authentication is missing type hints on its
parameters and return value; update its signature to add appropriate annotations
(e.g., annotate client with the test client type used in your suite and
valid_csv_content with its type such as str or IO) and add a -> None return type
so the function and parameters conform to the project typing guidelines (adjust
concrete types to match your test fixtures' types).
| def test_start_batch_evaluation_without_authentication( | ||
| self, client, sample_evaluation_config | ||
| ): | ||
| """Test batch evaluation requires authentication.""" | ||
| response = client.post( | ||
| "/api/v1/evaluations/", | ||
| json={ | ||
| "experiment_name": "test_evaluation_run", | ||
| "dataset_id": 1, | ||
| "config": sample_evaluation_config, | ||
| }, | ||
| ) | ||
|
|
||
| assert response.status_code == 401 # Unauthorized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing type hints.
Same issue as test_upload_without_authentication - missing type hints for parameters and return type.
🔧 Proposed fix
def test_start_batch_evaluation_without_authentication(
- self, client, sample_evaluation_config
- ):
+ self, client: TestClient, sample_evaluation_config: dict[str, Any]
+ ) -> None:
"""Test batch evaluation requires authentication."""🧰 Tools
🪛 GitHub Actions: Kaapi CI
[error] Black formatting failed. The hook reformatted 1 file (backend/app/tests/api/routes/test_evaluation.py). Please run 'pre-commit run --all-files' or 'black .' to fix formatting.
🤖 Prompt for AI Agents
In `@backend/app/tests/api/routes/test_evaluation.py` around lines 576 - 589, The
test function test_start_batch_evaluation_without_authentication is missing type
hints for its parameters and return type; add appropriate annotations (e.g.,
client: FlaskClient or TestClient and sample_evaluation_config: dict) and a
return type of None to the function signature to match the style used in
test_upload_without_authentication so linters and type checkers are satisfied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@backend/app/api/routes/evaluations/dataset.py`:
- Around line 180-191: The endpoint accesses dataset.name after the CRUD
delete/commit which can hit a detached SQLAlchemy instance; extract dataset.name
into a local variable before calling delete_dataset_crud and use that variable
in the success message and log. Specifically, read dataset_name = dataset.name
prior to calling delete_dataset_crud(session=_session, dataset=dataset) and then
use dataset_name in logger.info and the APIResponse.success_response message
instead of dataset.name.
♻️ Duplicate comments (2)
backend/app/api/routes/evaluations/dataset.py (2)
56-86: Add structured logging to upload_dataset endpoint.This endpoint is missing structured logging, which is inconsistent with
get_datasetanddelete_datasetin the same file. As per coding guidelines, log messages should be prefixed with the function name in square brackets.📝 Suggested logging additions
async def upload_dataset( _session: SessionDep, auth_context: AuthContextDep, file: UploadFile = File( ..., description="CSV file with 'question' and 'answer' columns" ), dataset_name: str = Form(..., description="Name for the dataset"), description: str | None = Form(None, description="Optional dataset description"), duplication_factor: int = Form( default=1, ge=1, le=5, description="Number of times to duplicate each item (min: 1, max: 5)", ), ) -> APIResponse[DatasetUploadResponse]: """Upload an evaluation dataset.""" + logger.info( + f"[upload_dataset] Uploading dataset | " + f"dataset_name={dataset_name} | " + f"duplication_factor={duplication_factor} | " + f"org_id={auth_context.organization_.id} | " + f"project_id={auth_context.project_.id}" + ) + # Validate and read CSV file csv_content = await validate_csv_file(file) # Upload dataset using service dataset = upload_evaluation_dataset( session=_session, csv_content=csv_content, dataset_name=dataset_name, description=description, duplication_factor=duplication_factor, organization_id=auth_context.organization_.id, project_id=auth_context.project_.id, ) + logger.info( + f"[upload_dataset] Successfully uploaded dataset | " + f"dataset_id={dataset.id} | dataset_name={dataset.name}" + ) + return APIResponse.success_response(data=_dataset_to_response(dataset))
95-114: Add structured logging to list_datasets endpoint.This endpoint is missing structured logging for consistency with other endpoints in this file.
📝 Suggested logging additions
def list_datasets( _session: SessionDep, auth_context: AuthContextDep, limit: int = Query( default=50, ge=1, le=100, description="Maximum number of datasets to return" ), offset: int = Query(default=0, ge=0, description="Number of datasets to skip"), ) -> APIResponse[list[DatasetUploadResponse]]: """List evaluation datasets.""" + logger.info( + f"[list_datasets] Listing datasets | " + f"limit={limit} | offset={offset} | " + f"org_id={auth_context.organization_.id} | " + f"project_id={auth_context.project_.id}" + ) + datasets = list_evaluation_datasets( session=_session, organization_id=auth_context.organization_.id, project_id=auth_context.project_.id, limit=limit, offset=offset, ) + logger.info(f"[list_datasets] Retrieved {len(datasets)} dataset(s)") + return APIResponse.success_response( data=[_dataset_to_response(dataset) for dataset in datasets] )
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backend/app/api/routes/evaluations/dataset.pybackend/app/crud/evaluations/dataset.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets:logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase
Files:
backend/app/crud/evaluations/dataset.pybackend/app/api/routes/evaluations/dataset.py
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
backend/app/api/**/*.py: Define FastAPI REST endpoints inbackend/app/api/organized by domain
Load Swagger endpoint descriptions from external markdown files instead of inline strings usingload_description("domain/action.md")
Files:
backend/app/api/routes/evaluations/dataset.py
🧠 Learnings (1)
📚 Learning: 2025-12-23T03:57:06.891Z
Learnt from: nishika26
Repo: ProjectTech4DevAI/kaapi-backend PR: 488
File: backend/app/tests/seed_data/seed_data.py:344-354
Timestamp: 2025-12-23T03:57:06.891Z
Learning: In backend/app/tests/seed_data/seed_data.py, log messages should use the file name as the prefix (e.g., `[tests.seed_data]`) rather than the function name prefix. This file-based logging prefix convention is preferred for test seed data modules.
Applied to files:
backend/app/api/routes/evaluations/dataset.py
🧬 Code graph analysis (2)
backend/app/crud/evaluations/dataset.py (2)
backend/app/api/routes/evaluations/dataset.py (1)
delete_dataset(156-191)backend/app/models/evaluation.py (2)
EvaluationDataset(73-167)EvaluationRun(170-319)
backend/app/api/routes/evaluations/dataset.py (6)
backend/app/api/permissions.py (2)
Permission(10-15)require_permission(45-70)backend/app/crud/evaluations/dataset.py (3)
get_dataset_by_id(107-140)list_datasets(175-210)delete_dataset(323-370)backend/app/models/evaluation.py (2)
DatasetUploadResponse(25-44)EvaluationDataset(73-167)backend/app/services/evaluations/dataset.py (1)
upload_dataset(24-163)backend/app/services/evaluations/validators.py (1)
validate_csv_file(67-112)backend/app/utils.py (2)
APIResponse(33-57)success_response(40-43)
🔇 Additional comments (4)
backend/app/crud/evaluations/dataset.py (1)
323-370: LGTM! Clean refactor of delete_dataset signature.The refactored signature accepting a pre-fetched
EvaluationDatasetobject is cleaner and follows the separation of concerns principle—lookup/validation is now handled by the caller. Good practice extractingdataset_id,dataset_name, etc. beforesession.delete()to avoid accessing the deleted object's attributes in the success log.backend/app/api/routes/evaluations/dataset.py (3)
1-34: LGTM! Clean module setup.Imports are properly organized, logger is initialized correctly, and required dependencies for permission checks and API responses are all present.
37-47: LGTM! Clean mapper function.Type hints are present, and using
.get()with defaults provides defensive handling for potentially missing metadata keys.
117-147: LGTM! Well-structured endpoint.Proper type hints, structured logging with function name prefix, permission checks, and appropriate 404 handling when dataset is not found.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
backend/app/tests/crud/evaluations/test_processing.py (1)
584-645: Configure or remove unusedmock_pollparameter.The
mock_pollparameter from thepoll_batch_statuspatch is neither configured with a return value nor verified. Sincecheck_and_process_evaluationcallspoll_batch_statuson the batch job (even when already completed), the mock should either be configured with an expected return value or verified withassert_called_once()to ensure it's called. Alternatively, if intentionally not testing this dependency, remove the patch decorator to keep the test focused.backend/app/tests/api/test_auth_failures.py (1)
34-36: Fix trailing slash in evaluation list endpoints to match route definitions.Lines 34-35 should use
/evaluations/(with trailing slash) instead of/evaluations. The evaluation router is mounted at/evaluationswith routes defined as/for list operations, resulting in the final path/evaluations/. This matches the pattern used in the evaluation route tests (test_evaluation.py) and is consistent with the dataset endpoint format on lines 28-29. Without the trailing slash, these requests will return 404 errors.
♻️ Duplicate comments (4)
backend/app/api/routes/evaluations/dataset.py (2)
56-86: Add structured logging for the upload operation.The
upload_datasetendpoint is missing structured logging. Other endpoints in the same file (get_dataset,delete_dataset) follow the coding guideline of prefixing log messages with the function name in square brackets.📝 Suggested logging additions
async def upload_dataset( ... ) -> APIResponse[DatasetUploadResponse]: """Upload an evaluation dataset.""" + logger.info( + f"[upload_dataset] Uploading dataset | " + f"dataset_name={dataset_name} | " + f"duplication_factor={duplication_factor} | " + f"org_id={auth_context.organization_.id} | " + f"project_id={auth_context.project_.id}" + ) + # Validate and read CSV file csv_content = await validate_csv_file(file) ... + logger.info( + f"[upload_dataset] Successfully uploaded dataset | " + f"dataset_id={dataset.id}" + ) + return APIResponse.success_response(data=_dataset_to_response(dataset))As per coding guidelines, all log messages should be prefixed with the function name in square brackets.
95-114: Add structured logging for the list operation.The
list_datasetsendpoint is missing structured logging, whileget_datasetanddelete_datasetfollow the correct pattern.📝 Suggested logging additions
def list_datasets( ... ) -> APIResponse[list[DatasetUploadResponse]]: """List evaluation datasets.""" + logger.info( + f"[list_datasets] Listing datasets | " + f"limit={limit} | offset={offset} | " + f"org_id={auth_context.organization_.id} | " + f"project_id={auth_context.project_.id}" + ) + datasets = list_evaluation_datasets(...) + logger.info(f"[list_datasets] Retrieved {len(datasets)} dataset(s)") + return APIResponse.success_response(...)As per coding guidelines, all log messages should be prefixed with the function name in square brackets.
backend/app/tests/api/routes/test_evaluation.py (2)
495-508: Add missing type hints.The test function is missing type hints for parameters and return type, which violates the coding guidelines.
🔧 Proposed fix
- def test_upload_without_authentication(self, client, valid_csv_content): + def test_upload_without_authentication( + self, client: TestClient, valid_csv_content: str + ) -> None: """Test uploading without authentication."""As per coding guidelines, always add type hints to all function parameters and return values in Python code.
577-590: Add missing type hints.Same issue as
test_upload_without_authentication- missing type hints for parameters and return type.🔧 Proposed fix
def test_start_batch_evaluation_without_authentication( - self, client, sample_evaluation_config - ): + self, client: TestClient, sample_evaluation_config: dict[str, Any] + ) -> None: """Test batch evaluation requires authentication."""
🧹 Nitpick comments (2)
backend/app/tests/crud/evaluations/test_processing.py (2)
647-707: Same issue:mock_pollis unconfigured and unverified.Consistent with the previous test,
mock_pollis injected but not used. Either configure it to simulatepoll_batch_statusbehavior and add assertions, or remove the patch if it's not needed for the "failed batch" scenario.
24-26: Consider adding return type hints to test methods.Per coding guidelines, all functions should have type hints on return values. Test methods should include
-> None:def test_parse_evaluation_output_basic(self) -> None:This applies to all test methods in the file. This is a minor consistency improvement.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
backend/app/api/routes/evaluations/dataset.pybackend/app/tests/api/routes/test_evaluation.pybackend/app/tests/api/test_auth_failures.pybackend/app/tests/crud/evaluations/test_dataset.pybackend/app/tests/crud/evaluations/test_processing.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets:logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase
Files:
backend/app/tests/api/routes/test_evaluation.pybackend/app/tests/crud/evaluations/test_dataset.pybackend/app/api/routes/evaluations/dataset.pybackend/app/tests/crud/evaluations/test_processing.pybackend/app/tests/api/test_auth_failures.py
backend/app/tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use factory pattern for test fixtures in
backend/app/tests/
Files:
backend/app/tests/api/routes/test_evaluation.pybackend/app/tests/crud/evaluations/test_dataset.pybackend/app/tests/crud/evaluations/test_processing.pybackend/app/tests/api/test_auth_failures.py
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
backend/app/api/**/*.py: Define FastAPI REST endpoints inbackend/app/api/organized by domain
Load Swagger endpoint descriptions from external markdown files instead of inline strings usingload_description("domain/action.md")
Files:
backend/app/api/routes/evaluations/dataset.py
🧠 Learnings (2)
📚 Learning: 2025-12-17T15:39:30.469Z
Learnt from: CR
Repo: ProjectTech4DevAI/kaapi-backend PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-17T15:39:30.469Z
Learning: Implement automatic coverage reporting for tests
Applied to files:
backend/app/tests/api/routes/test_evaluation.py
📚 Learning: 2025-12-23T03:57:06.891Z
Learnt from: nishika26
Repo: ProjectTech4DevAI/kaapi-backend PR: 488
File: backend/app/tests/seed_data/seed_data.py:344-354
Timestamp: 2025-12-23T03:57:06.891Z
Learning: In backend/app/tests/seed_data/seed_data.py, log messages should use the file name as the prefix (e.g., `[tests.seed_data]`) rather than the function name prefix. This file-based logging prefix convention is preferred for test seed data modules.
Applied to files:
backend/app/api/routes/evaluations/dataset.py
🧬 Code graph analysis (3)
backend/app/tests/api/routes/test_evaluation.py (1)
backend/app/tests/conftest.py (1)
client(65-68)
backend/app/tests/crud/evaluations/test_dataset.py (2)
backend/app/api/routes/evaluations/dataset.py (1)
delete_dataset(156-191)backend/app/crud/evaluations/dataset.py (2)
delete_dataset(323-370)get_dataset_by_id(107-140)
backend/app/api/routes/evaluations/dataset.py (4)
backend/app/crud/evaluations/dataset.py (3)
get_dataset_by_id(107-140)list_datasets(175-210)delete_dataset(323-370)backend/app/models/evaluation.py (2)
DatasetUploadResponse(25-44)EvaluationDataset(73-167)backend/app/services/evaluations/dataset.py (1)
upload_dataset(24-163)backend/app/utils.py (3)
APIResponse(33-57)load_description(393-398)success_response(40-43)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.12, 6)
🔇 Additional comments (13)
backend/app/tests/crud/evaluations/test_processing.py (1)
8-14: LGTM on the import updates.The imports correctly reference the new
app.crud.evaluations.processingmodule path, aligning with the refactored structure.backend/app/tests/api/test_auth_failures.py (1)
28-29: LGTM! Trailing slashes added for datasets endpoints.The changes correctly align the protected endpoints with the new route structure for
/evaluations/datasets/.backend/app/tests/api/routes/test_evaluation.py (3)
62-72: LGTM! Patch targets correctly updated to the new service layer paths.The mock patches now target
app.services.evaluations.dataset.*which aligns with the refactored module structure.
82-84: LGTM!Endpoint path correctly updated with trailing slash.
530-538: LGTM!Endpoint path correctly updated with trailing slash for the batch evaluation POST request.
backend/app/tests/crud/evaluations/test_dataset.py (4)
402-416: LGTM!The test correctly implements the new
delete_datasetAPI contract: capturesdataset_idbefore deletion, passes the dataset object todelete_dataset, and verifies the dataset is removed by checkingget_dataset_by_idreturnsNone.
418-434: LGTM!The test correctly validates the new pattern where the caller must fetch the dataset first. For non-existent datasets,
get_dataset_by_idreturnsNone, and the caller handles the "not found" case.
451-467: LGTM!The test correctly validates that datasets cannot be fetched (and thus deleted) using the wrong organization ID. The access control check happens at the fetch level.
499-513: LGTM!The test correctly validates that datasets with associated evaluation runs cannot be deleted. The error message check appropriately uses flexible matching for the expected phrases.
backend/app/api/routes/evaluations/dataset.py (4)
1-34: LGTM!Imports are well-organized following PEP8 guidelines, and the logger is properly configured.
37-47: LGTM!Clean helper function with proper type hints and defensive defaults for metadata fields.
123-147: LGTM!Well-implemented endpoint with proper structured logging, error handling, and type hints.
156-191: LGTM!Well-implemented endpoint with proper structured logging and error handling. Good practice extracting
dataset_namebefore deletion (line 179) to avoid accessing attributes on a detached instance.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Summary
Target issue is #492
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
Summary by CodeRabbit
New Features
Improvements
Tests
✏️ Tip: You can customize this high-level summary in your review settings.