Skip to content

Conversation

@are-ces
Copy link
Contributor

@are-ces are-ces commented Jan 28, 2026

Description

  • now extracting doc_title, doc_url, doc_id from the "attributes" in response
  • added doc_id to keep a link between rag_chunks and referenced_docs
  • removed chunk metadata extraction from results which are not "type": "file_search_call"
  • deleted deprecated method in old query endpoint

new:

  • updated the BYOK config generation

Important change

There is an additional field in the referenced_documents field: doc_id- which points to the OpenAI Files object created by llama-stack has been added to the attributes of referenced_documents.

E.g.

"referenced_documents":` [
    {
      "doc_url": "https://www.redhat.com/openshift_docs/virt/about_virt/about-virt.txt",
      "doc_title": "About OpenShift Virtualization",
      "doc_id": "file-6376556fe5b8435d830140109b73834e"
    }
],
"rag_chunks": [
    {
      "content": "# About OpenShift [...]",
      "source": "file-6376556fe5b8435d830140109b73834e",
      "score": 2.5908213494396537
    }
]

The reason is to keep a link between referenced_documents and rag_chunks in the query endpoint.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude

Related Tickets & Documents

  • Related Issue # LCORE-1179
  • Closes # LCORE-1179

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

I have registered a vector store, called /v1/query and /v1/streaming_query endpoints, making sure that the referenced_docs field is well structured.

E.g.

"referenced_documents":` [
    {
      "doc_url": "https://www.redhat.com/openshift_docs/virt/about_virt/about-virt.txt",
      "doc_title": "About OpenShift Virtualization",
      "doc_id": "file-6376556fe5b8435d830140109b73834e"
    },

Summary by CodeRabbit

  • Improvements

    • Referenced documents now include a doc_id for more reliable identification; extraction consistently captures title, URL and ID.
    • BYOK RAG config key renamed to vector_store_id for clearer storage naming and backends.
  • Documentation

    • API docs updated to expose the new doc_id field and expanded endpoint request/context parameters.
  • Tests

    • Unit tests updated to reflect the new document schema, BYOK naming, and configuration-driven storage behavior.

✏️ Tip: You can customize this high-level summary in your review settings.

@are-ces are-ces requested review from asimurka and tisnik January 28, 2026 15:57
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 28, 2026

Walkthrough

Removed TextContentItem metadata parsing from query.py and consolidated referenced-document extraction into query_v2.py; added an optional doc_id field to ReferencedDocument and propagated that change through tests, docs, cache expectations, and BYOK/llama-stack configuration helpers (vector_db_id → vector_store_id and new storage/providers/models sections).

Changes

Cohort / File(s) Summary
Query endpoints
src/app/endpoints/query.py, src/app/endpoints/query_v2.py
Removed parse_metadata_from_text_item and related imports from query.py. query_v2.py now builds ReferencedDocument with doc_url, doc_title, and doc_id, unifying extraction paths and using seen-doc identity (doc_url, doc_title, doc_id). Message-annotation/file-citation branching consolidated.
Data model
src/models/responses.py, docs/openapi.json, docs/openapi.md
Added doc_id: Optional[str] to the public ReferencedDocument model/schema; updated OpenAPI docs and schema references accordingly.
BYOK / Llama Stack config
src/llama_stack_configuration.py, src/models/config.py, tests/configuration/run.yaml
Renamed vector_db_idvector_store_id in ByokRag; replaced construct_vector_dbs_section with construct_storage_backends_section and added construct_vector_stores_section, construct_models_section, and updated construct_vector_io_providers_section; enrich_byok_rag now populates storage.backends, providers.vector_io, registered_resources.vector_stores, and registered_resources.models.
Tests (unit & config)
tests/unit/app/endpoints/test_query.py, tests/unit/app/endpoints/test_query_v2.py, tests/unit/cache/test_postgres_cache.py, tests/unit/models/config/*, tests/unit/test_llama_stack_configuration.py, tests/unit/models/config/test_dump_configuration.py
Removed tests for parse_metadata_from_text_item; updated expectations to include doc_id (often None) and changed file-search result mappings to docs_url/title/document_id. Adjusted tests to the vector_store_id rename and added assertions for generated storage/providers/models entries.
Config / Manifests
tests/configuration/run.yaml
Large config changes to use storage-backed providers, new storage/backend/provider sections, updated vector_io persistence, added registered_resources skeleton and telemetry; aligns with new BYOK handling and vector_store_id naming.
Docs / OpenAPI changes
docs/openapi.json, docs/openapi.md
Expanded /v1/infer documented inputs (FastAPI request, background_tasks), added doc_id to ReferencedDocument schema, and renamed vector_db_idvector_store_id in ByokRag schema and references.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

ok-to-test

Suggested reviewers

  • tisnik
  • asimurka
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change—updating metadata extraction from chunks to include doc_id and refine how metadata is derived from file_search_call responses.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

🧪 Unit Test Generation v2 is now available!

We have significantly improved our unit test generation capabilities.

To enable: Add this to your .coderabbit.yaml configuration:

reviews:
  finishing_touches:
    unit_tests:
      enabled: true

Try it out by using the @coderabbitai generate unit tests command on your code files or under ✨ Finishing Touches on the walkthrough!

Have feedback? Share your thoughts on our Discord thread!


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/app/endpoints/query_v2.py (1)

547-588: Include doc_id in de-duplication to avoid dropping distinct documents.
If two documents share title/URL (or URL is missing) but have different doc_id values, the current (final_url, doc_title) key will collapse them, which undermines the new doc_id linkage.

✅ Suggested fix
-    # Use a set to track unique documents by (doc_url, doc_title) tuple
-    seen_docs: set[tuple[Optional[str], Optional[str]]] = set()
+    # Use a set to track unique documents by (doc_id, doc_url, doc_title) tuple
+    seen_docs: set[tuple[Optional[str], Optional[str], Optional[str]]] = set()
@@
-                if (final_url, doc_title) not in seen_docs:
+                unique_key = (doc_id, final_url, doc_title)
+                if unique_key not in seen_docs:
                     documents.append(
                         ReferencedDocument(
                             doc_url=final_url, doc_title=doc_title, doc_id=doc_id
                         )
                     )
-                    seen_docs.add((final_url, doc_title))
+                    seen_docs.add(unique_key)

@are-ces are-ces requested a review from onmete January 28, 2026 16:15
)
seen_docs.add((final_url, filename))

# 2. Parse from message content annotations
Copy link
Contributor

@asimurka asimurka Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for dropping this section?
Check also OpenAIResponseOutputMessageContent type that can be part of OpenAIResponseMessage content. Isn't this also relevant? More specifically, OpenAIResponseAnnotations object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already get the references to the chunks used for the response.

Annotations is useful if we want to determine exactly which files the model has used to give the answer (determined by the model) and at which position in the text.

Personally I think that we can use this data, but right now it is not needed. Because the scope of referenced_docs is to keep data of all chunks retrieved and used as input for reference. The data that we get from citation is just redundant since it is a small subset of these docs.

I think we can make use of citations in a separate feature, after agreement across our customers since it adds a new functionality.

Copy link
Contributor

@asimurka asimurka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@are-ces are-ces marked this pull request as draft January 30, 2026 09:31
@are-ces are-ces force-pushed the fix-metadata branch 2 times, most recently from 2dd8162 to 88ed424 Compare January 30, 2026 12:05
- now extracting doc_title, doc_url, doc_id from the "attributes" in response
- added doc_id to keep a link between rag_chunks and referenced_docs
- removed chunk metadata extraction from results which are not "type": "file_search_call"
- citations could be added in future, needs discussion with other teams
@are-ces are-ces marked this pull request as ready for review January 30, 2026 12:10
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@docs/openapi.md`:
- Around line 5324-5328: The ReferencedDocument markdown table pipes are
misaligned with the header row causing MD060; fix by aligning the pipe
separators vertically and ensuring the header separator row matches the number
of columns and header styles—update the table containing the fields doc_url,
doc_title, and doc_id so the header row "| Field | Type | Description |" and the
separator row (the row of dashes) have matching columns and properly placed
pipes, and then ensure each data row (the lines with doc_url, doc_title, doc_id)
uses the same pipe alignment and column count.

In `@src/app/endpoints/query_v2.py`:
- Around line 561-566: The code sets attributes = result.get("attributes", {})
or uses getattr(result, "attributes", {}), but if the key/attribute exists with
value None the default isn't applied and attributes becomes None causing
attributes.get(...) to raise AttributeError; change the assignment so attributes
is always a dict by applying the "or {}" pattern after both retrievals (i.e.,
ensure attributes = result.get("attributes", {}) or {} and attributes =
getattr(result, "attributes", {}) or {}), referencing the variables result and
attributes and the subsequent attributes.get(...) usage to locate where to apply
the fix.
- Around line 575-586: The current filter only appends ReferencedDocument when
doc_title or doc_url exists, which drops entries that only have doc_id; change
the if-condition to check any of doc_title, doc_url, or doc_id so documents with
only a document_id are preserved. Keep the existing final_url normalization
(treat empty string as None), use the same dedupe tuple (final_url, doc_title,
doc_id) against seen_docs, and append ReferencedDocument(doc_url=final_url,
doc_title=doc_title, doc_id=doc_id) when that tuple is not in seen_docs.
🧹 Nitpick comments (1)
src/app/endpoints/query_v2.py (1)

503-507: Consider using doc_id as RAGChunk.source (with filename fallback).

If clients are expected to join rag_chunks with referenced_documents via doc_id, source should carry that ID.

🔄 Suggested linkage-friendly source
-            rag_chunk = RAGChunk(
-                content=result.text, source=result.filename, score=result.score
-            )
+            attributes = getattr(result, "attributes", None) or {}
+            doc_id = None
+            if isinstance(attributes, dict):
+                doc_id = attributes.get("document_id") or attributes.get("doc_id")
+            source = doc_id or result.filename
+            rag_chunk = RAGChunk(content=result.text, source=source, score=result.score)

Comment on lines +575 to +586
doc_title = attributes.get("title")
doc_id = attributes.get("document_id") or attributes.get("doc_id")

# If we have at least a filename or url
if filename or doc_url:
if doc_title or doc_url:
# Treat empty string as None for URL to satisfy Optional[AnyUrl]
final_url = doc_url if doc_url else None
if (final_url, filename) not in seen_docs:
if (final_url, doc_title, doc_id) not in seen_docs:
documents.append(
ReferencedDocument(doc_url=final_url, doc_title=filename)
)
seen_docs.add((final_url, filename))

# 2. Parse from message content annotations
elif item_type == "message":
content = getattr(output_item, "content", None)
if isinstance(content, list):
for part in content:
# Skip if part is a string or doesn't have annotations
if isinstance(part, str):
continue

annotations = getattr(part, "annotations", []) or []
for annotation in annotations:
# Handle both object and dict access for annotations
if isinstance(annotation, dict):
anno_type = annotation.get("type")
anno_url = annotation.get("url")
anno_title = annotation.get("title") or annotation.get(
"filename"
)
else:
anno_type = getattr(annotation, "type", None)
anno_url = getattr(annotation, "url", None)
anno_title = getattr(annotation, "title", None) or getattr(
annotation, "filename", None
ReferencedDocument(
doc_url=final_url, doc_title=doc_title, doc_id=doc_id
)

if anno_type == "url_citation":
# Treat empty string as None
final_url = anno_url if anno_url else None
if (final_url, anno_title) not in seen_docs:
documents.append(
ReferencedDocument(
doc_url=final_url, doc_title=anno_title
)
)
seen_docs.add((final_url, anno_title))

elif anno_type == "file_citation":
if (None, anno_title) not in seen_docs:
documents.append(
ReferencedDocument(
doc_url=None, doc_title=anno_title
)
)
seen_docs.add((None, anno_title))
)
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the actual code around lines 575-586 in query_v2.py
cat -n src/app/endpoints/query_v2.py | sed -n '570,590p'

Repository: lightspeed-core/lightspeed-stack

Length of output: 1147


🏁 Script executed:

# Let's also check the ReferencedDocument class definition to understand its structure
rg -A 10 'class ReferencedDocument' src/

Repository: lightspeed-core/lightspeed-stack

Length of output: 708


🏁 Script executed:

# Let's see more context around this code section to understand the full logic
cat -n src/app/endpoints/query_v2.py | sed -n '550,600p'

Repository: lightspeed-core/lightspeed-stack

Length of output: 2444


🏁 Script executed:

# Get full ReferencedDocument class definition
cat -n src/models/responses.py | grep -A 20 'class ReferencedDocument'

Repository: lightspeed-core/lightspeed-stack

Length of output: 840


🏁 Script executed:

# Search for where ReferencedDocument is used to understand the intent
rg 'ReferencedDocument' src/ --type py -B 2 -A 2

Repository: lightspeed-core/lightspeed-stack

Length of output: 10260


🏁 Script executed:

# Check if there are any tests that show expected behavior with doc_id-only documents
fd 'test.*query' src/ --type py | head -5

Repository: lightspeed-core/lightspeed-stack

Length of output: 245


Include doc_id-only documents to prevent data loss.

Documents with only document_id but no title or url are currently filtered out. Since doc_id is explicitly extracted and used in deduplication, it should be preserved. Update the condition to include doc_id:

Suggested fix
-                if doc_title or doc_url:
+                if doc_title or doc_url or doc_id:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
doc_title = attributes.get("title")
doc_id = attributes.get("document_id") or attributes.get("doc_id")
# If we have at least a filename or url
if filename or doc_url:
if doc_title or doc_url:
# Treat empty string as None for URL to satisfy Optional[AnyUrl]
final_url = doc_url if doc_url else None
if (final_url, filename) not in seen_docs:
if (final_url, doc_title, doc_id) not in seen_docs:
documents.append(
ReferencedDocument(doc_url=final_url, doc_title=filename)
)
seen_docs.add((final_url, filename))
# 2. Parse from message content annotations
elif item_type == "message":
content = getattr(output_item, "content", None)
if isinstance(content, list):
for part in content:
# Skip if part is a string or doesn't have annotations
if isinstance(part, str):
continue
annotations = getattr(part, "annotations", []) or []
for annotation in annotations:
# Handle both object and dict access for annotations
if isinstance(annotation, dict):
anno_type = annotation.get("type")
anno_url = annotation.get("url")
anno_title = annotation.get("title") or annotation.get(
"filename"
)
else:
anno_type = getattr(annotation, "type", None)
anno_url = getattr(annotation, "url", None)
anno_title = getattr(annotation, "title", None) or getattr(
annotation, "filename", None
ReferencedDocument(
doc_url=final_url, doc_title=doc_title, doc_id=doc_id
)
if anno_type == "url_citation":
# Treat empty string as None
final_url = anno_url if anno_url else None
if (final_url, anno_title) not in seen_docs:
documents.append(
ReferencedDocument(
doc_url=final_url, doc_title=anno_title
)
)
seen_docs.add((final_url, anno_title))
elif anno_type == "file_citation":
if (None, anno_title) not in seen_docs:
documents.append(
ReferencedDocument(
doc_url=None, doc_title=anno_title
)
)
seen_docs.add((None, anno_title))
)
doc_title = attributes.get("title")
doc_id = attributes.get("document_id") or attributes.get("doc_id")
if doc_title or doc_url or doc_id:
# Treat empty string as None for URL to satisfy Optional[AnyUrl]
final_url = doc_url if doc_url else None
if (final_url, doc_title, doc_id) not in seen_docs:
documents.append(
ReferencedDocument(
doc_url=final_url, doc_title=doc_title, doc_id=doc_id
)
)
🤖 Prompt for AI Agents
In `@src/app/endpoints/query_v2.py` around lines 575 - 586, The current filter
only appends ReferencedDocument when doc_title or doc_url exists, which drops
entries that only have doc_id; change the if-condition to check any of
doc_title, doc_url, or doc_id so documents with only a document_id are
preserved. Keep the existing final_url normalization (treat empty string as
None), use the same dedupe tuple (final_url, doc_title, doc_id) against
seen_docs, and append ReferencedDocument(doc_url=final_url, doc_title=doc_title,
doc_id=doc_id) when that tuple is not in seen_docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

referenced_docs are used because we need to show in the frontend what document have been referenced, IMO it does not make sense to return empty documents without TItle / Url

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@are-ces, that makes perfect sense! You're right that a document reference with only an ID wouldn't provide meaningful information for the frontend to display. The current condition ensures that only documents with at least a title or URL—something actionable and human-readable—are returned to the UI. Thank you for the clarification!


✏️ Learnings added
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1073
File: src/app/endpoints/query_v2.py:575-586
Timestamp: 2026-01-30T13:33:34.749Z
Learning: In `src/app/endpoints/query_v2.py`, the `parse_referenced_documents_from_responses_api` function filters referenced documents to include only those with `doc_title` or `doc_url` because these documents are displayed in the frontend. Documents with only `doc_id` are intentionally excluded as they wouldn't provide useful information to end users.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@are-ces are-ces marked this pull request as draft January 30, 2026 13:55
@are-ces are-ces marked this pull request as ready for review January 30, 2026 16:10
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/openapi.json (1)

4287-4312: ⚠️ Potential issue | 🟠 Major

Ensure unique operationId values for GET vs POST.
Line 4293 and Line 4311 both use handle_a2a_jsonrpc_a2a_post, which can break client codegen (operationId must be unique). Consider distinct IDs per method.

🔧 Suggested fix
-                "operationId": "handle_a2a_jsonrpc_a2a_post",
+                "operationId": "handle_a2a_jsonrpc_a2a_get",
🧹 Nitpick comments (2)
src/llama_stack_configuration.py (1)

310-347: Avoid in-place mutation of ls_config in enrich_byok_rag.
Returning a new mapping reduces side‑effects and aligns with the project guideline.

♻️ Proposed refactor to avoid in-place mutation
-def enrich_byok_rag(ls_config: dict[str, Any], byok_rag: list[dict[str, Any]]) -> None:
+def enrich_byok_rag(ls_config: dict[str, Any], byok_rag: list[dict[str, Any]]) -> dict[str, Any]:
     """Enrich Llama Stack config with BYOK RAG settings.

     Args:
-        ls_config: Llama Stack configuration dict (modified in place)
+        ls_config: Llama Stack configuration dict
         byok_rag: List of BYOK RAG configurations
+    Returns:
+        dict[str, Any]: Enriched configuration mapping.
     """
-    if len(byok_rag) == 0:
-        logger.info("BYOK RAG is not configured: skipping")
-        return
+    if len(byok_rag) == 0:
+        logger.info("BYOK RAG is not configured: skipping")
+        return ls_config
+
+    output = {**ls_config}

     logger.info("Enriching Llama Stack config with BYOK RAG")

     # Add storage backends
-    if "storage" not in ls_config:
-        ls_config["storage"] = {}
-    ls_config["storage"]["backends"] = construct_storage_backends_section(
-        ls_config, byok_rag
-    )
+    storage = {**output.get("storage", {})}
+    storage["backends"] = construct_storage_backends_section(output, byok_rag)
+    output["storage"] = storage

     # Add vector_io providers
-    if "providers" not in ls_config:
-        ls_config["providers"] = {}
-    ls_config["providers"]["vector_io"] = construct_vector_io_providers_section(
-        ls_config, byok_rag
-    )
+    providers = {**output.get("providers", {})}
+    providers["vector_io"] = construct_vector_io_providers_section(output, byok_rag)
+    output["providers"] = providers

     # Add registered vector stores
-    if "registered_resources" not in ls_config:
-        ls_config["registered_resources"] = {}
-    ls_config["registered_resources"]["vector_stores"] = (
-        construct_vector_stores_section(ls_config, byok_rag)
-    )
-
-    # Add embedding models
-    ls_config["registered_resources"]["models"] = construct_models_section(
-        ls_config, byok_rag
-    )
+    registered_resources = {**output.get("registered_resources", {})}
+    registered_resources["vector_stores"] = construct_vector_stores_section(
+        output, byok_rag
+    )
+    registered_resources["models"] = construct_models_section(output, byok_rag)
+    output["registered_resources"] = registered_resources
+
+    return output
🔁 Caller update
-    enrich_byok_rag(ls_config, config.get("byok_rag", []))
+    ls_config = enrich_byok_rag(ls_config, config.get("byok_rag", []))

As per coding guidelines: Avoid in-place parameter modification anti-patterns; instead return new data structures.

docs/openapi.json (1)

8434-8451: Clarify when doc_id will actually appear.
The schema allows doc_id alone, but the API filters out entries lacking doc_title/doc_url. A short note will prevent consumer confusion.

📝 Suggested doc clarification
-                "description": "Model representing a document referenced in generating a response.\n\nAttributes:\n    doc_url: Url to the referenced doc.\n    doc_title: Title of the referenced doc.\n    doc_id: ID of the referenced doc."
+                "description": "Model representing a document referenced in generating a response.\n\nAttributes:\n    doc_url: Url to the referenced doc.\n    doc_title: Title of the referenced doc.\n    doc_id: ID of the referenced doc.\n\nNote: referenced documents are only emitted when doc_title or doc_url is present; doc_id-only entries are filtered."

Based on learnings: In src/app/endpoints/query_v2.py, the parse_referenced_documents_from_responses_api function filters referenced documents to include only those with doc_title or doc_url because these documents are displayed in the frontend. Documents with only doc_id are intentionally excluded as they wouldn't provide useful information to end users.

Copy link
Contributor

@asimurka asimurka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants