Skip to content

Conversation

@tschellenbach
Copy link
Member

@tschellenbach tschellenbach commented Dec 5, 2025

  • Inbound and outbound calling examples
  • Turbopuffer or gemini filesearch rag
  • RAG docs in progress

TODO

  • remove or cleanup twilio readme
  • release ulaw conversion utils
  • @d3xvn to implement tool support for gemini ✅
  • @tbarbugli - ulaw utils from stream py
  • hashing for gemini doesn't work properly (it keeps a check in memory, it should check again the API). when the same document is uploaded twice it should skip the update API call. this is much faster when syncing a larger directory. (1 API call instead of always 5 when uploading a directory with 5 items) @d3xvn
  • documents returned by turbopuffer are often too small (don't include enough context. for instance ask about our chat API) @d3xvn ✅ (The chunk size was set too low when uploading, nothing wrong with implementation)

Summary by CodeRabbit

Release Notes

  • New Features

    • Abstract RAG framework with Document class and search/indexing interface
    • Gemini File Search RAG implementation with content deduplication
    • TurboPuffer hybrid RAG supporting vector, keyword, and combined search modes
    • Twilio phone integration for inbound/outbound calls with real-time WebSocket media streaming
    • Phone & RAG example combining voice calls with knowledge base responses
    • Gemini built-in tools support including File Search, Google Search, and Computer Use
    • Audio codec utilities for Twilio streams
    • Batch user creation support in Stream edge transport
  • Documentation

    • New guides for Phone & RAG example, TurboPuffer, and Twilio plugins

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 5, 2025

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

Introduces a core RAG framework abstraction with Document dataclass and async base class supporting document ingestion, directory indexing, and hybrid search. Adds Gemini File Search RAG and TurboPuffer hybrid RAG (vector + BM25) implementations. Implements new Twilio plugin with phone call management, WebSocket media streaming, and mulaw/PCM audio conversion. Provides phone+RAG example with inbound/outbound call orchestration via AI agents.

Changes

Cohort / File(s) Summary
Core RAG Framework
agents-core/vision_agents/core/rag/__init__.py, agents-core/vision_agents/core/rag/rag.py
New abstract RAG base class with Document dataclass; methods for document indexing, directory ingestion, search, and lifecycle management; concrete add_directory normalizes extensions and handles file discovery.
Gemini Plugin – File Search
plugins/gemini/vision_agents/plugins/gemini/file_search.py
Implements GeminiFilesearchRAG with store lifecycle, deduplication via content hashing, concurrent file uploads, and integration with Gemini's File Search tool for hybrid retrieval.
Gemini Plugin – Tools
plugins/gemini/vision_agents/plugins/gemini/tools.py
New GeminiTool abstractions for FileSearch, GoogleSearch, CodeExecution, URLContext, GoogleMaps, ComputerUse; each provides to_tool conversion with validation and configuration.
Gemini Plugin – Integration
plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py, plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py, plugins/gemini/vision_agents/plugins/gemini/__init__.py
Updated GeminiLLM and GeminiRealtime to accept and configure built-in Gemini tools; exports new file search and tools APIs.
Gemini Plugin – Tests
plugins/gemini/tests/test_gemini_file_search.py, plugins/gemini/tests/test_gemini_tools.py
Integration and unit tests for File Search deduplication, batch upload, and tool conversions with store lifecycle validation.
TurboPuffer Plugin
plugins/turbopuffer/vision_agents/plugins/turbopuffer/turbopuffer_rag.py, plugins/turbopuffer/vision_agents/plugins/turbopuffer/__init__.py, plugins/turbopuffer/pyproject.toml, plugins/turbopuffer/README.md, plugins/turbopuffer/tests/test_turbopuffer_rag.py
Full hybrid RAG implementation using Reciprocal Rank Fusion to merge vector (Gemini embeddings) and BM25 search results; includes directory indexing, result caching, and convenience factory.
Twilio Plugin – Core
plugins/twilio/vision_agents/plugins/twilio/audio.py, plugins/twilio/vision_agents/plugins/twilio/call_registry.py, plugins/twilio/vision_agents/plugins/twilio/media_stream.py, plugins/twilio/vision_agents/plugins/twilio/models.py, plugins/twilio/vision_agents/plugins/twilio/utils.py
Twilio integration with mulaw/PCM audio conversion (lookup tables + fallback encoders), in-memory call registry with lifecycle tracking, WebSocket media stream handler, signature verification, and TwiML generation.
Twilio Plugin – Package & Tests
plugins/twilio/vision_agents/plugins/twilio/__init__.py, plugins/twilio/pyproject.toml, plugins/twilio/README.md, plugins/twilio/tests/test_audio.py, plugins/twilio/tests/test_twilio.py
Public API aggregation, project configuration, comprehensive test coverage for audio round-trip conversion and call registry operations.
Phone + RAG Example
examples/03_phone_and_rag_example/inbound_phone_and_rag_example.py, examples/03_phone_and_rag_example/outbound_phone_example.py, examples/03_phone_and_rag_example/utils.py, examples/03_phone_and_rag_example/pyproject.toml, examples/03_phone_and_rag_example/conftest.py, examples/03_phone_and_rag_example/pytest.ini
FastAPI servers demonstrating inbound/outbound Twilio call handling with RAG backend selection (Gemini File Search or TurboPuffer), agent orchestration, real-time audio streaming, and call lifecycle management.
Example Knowledge Base
examples/03_phone_and_rag_example/knowledge/chat.md, examples/03_phone_and_rag_example/knowledge/feeds.md, examples/03_phone_and_rag_example/knowledge/moderation.md, examples/03_phone_and_rag_example/knowledge/needle_in_haystack.md, examples/03_phone_and_rag_example/knowledge/video.md, examples/03_phone_and_rag_example/instructions.md, examples/03_phone_and_rag_example/README.md
Documentation files for Stream product features (Chat, Feeds, Moderation, Video) and example instructions; not code logic.
GetStream & Root Config Updates
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py, plugins/getstream/pyproject.toml, examples/02_golf_coach_example/pyproject.toml, pyproject.toml
Added batch create_users method to StreamEdge; bumped getstream dependency to >=2.5.20; tightened Python version constraints in examples; adjusted blockbuster dev dependency floor.
Root Tests
tests/test_mulaw_conversion.py
Validates mulaw↔PCM round-trip conversion, resampling behavior, and speech signal preservation.

Sequence Diagram(s)

sequenceDiagram
    actor User as Phone User
    participant Twilio as Twilio SIP
    participant FastAPI as FastAPI Server
    participant WS as WebSocket
    participant Agent as AI Agent
    participant RAG as RAG Backend
    
    User->>Twilio: Initiates call
    Twilio->>FastAPI: POST /twilio/voice (with signature)
    FastAPI->>FastAPI: Validate signature & create call
    FastAPI->>Twilio: Return TwiML with WebSocket URL
    
    Twilio->>WS: Connect to /twilio/media/{call_id}/{token}
    WS->>FastAPI: WebSocket established
    FastAPI->>Agent: Initialize agent session
    FastAPI->>Agent: Greet caller
    
    User-->>Twilio: Speaks (mulaw audio)
    Twilio->>WS: Send media message (base64 mulaw)
    WS->>FastAPI: Decode mulaw → PCM
    FastAPI->>Agent: Feed PCM audio to agent
    
    Agent->>RAG: Query knowledge base
    RAG-->>Agent: Return search results
    Agent->>Agent: Generate response
    Agent-->>FastAPI: PCM audio response
    
    FastAPI->>WS: Encode PCM → mulaw
    WS->>Twilio: Send media message (base64 mulaw)
    Twilio->>User: Play audio
    
    Note over User,FastAPI: Real-time streaming loop continues
    
    User->>Twilio: Ends call
    Twilio->>WS: Send stop message
    WS->>FastAPI: Close connection
    FastAPI->>FastAPI: Clean up call registry
Loading
sequenceDiagram
    participant Example as Outbound Example
    participant Twilio as Twilio Client
    participant FastAPI as FastAPI Server
    participant WS as WebSocket
    participant Agent as AI Agent
    participant User as Phone User
    
    Example->>Twilio: Initiate outbound call
    Twilio->>Twilio: Dial phone (from → to)
    Twilio->>FastAPI: Connect to WebSocket URL
    
    WS->>FastAPI: WebSocket established
    FastAPI->>Agent: Prepare agent & user
    FastAPI->>Agent: Attach phone user to call
    
    Agent->>Agent: Run agent session
    Agent->>User: Start greeting
    
    User-->>Twilio: Responds (mulaw audio)
    Twilio->>WS: Media message (mulaw)
    WS->>FastAPI: Decode → PCM
    FastAPI->>Agent: Feed to agent session
    
    Agent->>Agent: Process input & generate response
    Agent-->>FastAPI: PCM output
    
    FastAPI->>WS: Encode → mulaw
    WS->>Twilio: Media message
    Twilio->>User: Play audio
    
    Note over Twilio,Agent: Streaming continues until end
    
    User->>Twilio: Ends call
    Twilio->>WS: Stop signal
    WS->>FastAPI: Close & cleanup
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Update for Gemini 3 #203: Modifies plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py for tools/config handling, overlapping with this PR's Gemini LLM enhancements.

Suggested labels

cli

Suggested reviewers

  • dangusev

Poem

In wax and wire, the mouth is made—
Twilio's voice through void conveys
a fractured song in eight-bit chains.
The RAG learns, speaks, forgets,
while mulaw bleeds to PCM.
A phone call's fever dream encoded.

✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5c966d3 and 78200fd.

⛔ Files ignored due to path filters (2)
  • examples/04_football_commentator_example/images/sam3_example_highlighting.png is excluded by !**/*.png
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (53)
  • agents-core/vision_agents/core/rag/__init__.py
  • agents-core/vision_agents/core/rag/rag.py
  • examples/02_golf_coach_example/pyproject.toml
  • examples/03_phone_and_rag_example/README.md
  • examples/03_phone_and_rag_example/__init__.py
  • examples/03_phone_and_rag_example/conftest.py
  • examples/03_phone_and_rag_example/inbound_phone_and_rag_example.py
  • examples/03_phone_and_rag_example/instructions.md
  • examples/03_phone_and_rag_example/knowledge/chat.md
  • examples/03_phone_and_rag_example/knowledge/feeds.md
  • examples/03_phone_and_rag_example/knowledge/moderation.md
  • examples/03_phone_and_rag_example/knowledge/needle_in_haystack.md
  • examples/03_phone_and_rag_example/knowledge/video.md
  • examples/03_phone_and_rag_example/outbound_phone_example.py
  • examples/03_phone_and_rag_example/pyproject.toml
  • examples/03_phone_and_rag_example/pytest.ini
  • examples/03_phone_and_rag_example/utils.py
  • examples/04_football_commentator_example/README.md
  • examples/04_football_commentator_example/RUNNING_THE_EXAMPLE.md
  • examples/04_football_commentator_example/__init__.py
  • examples/04_football_commentator_example/env.example
  • examples/04_football_commentator_example/football_commentator_example.py
  • examples/04_football_commentator_example/instructions.md
  • examples/04_football_commentator_example/utils.py
  • plugins/gemini/tests/test_gemini_file_search.py
  • plugins/gemini/tests/test_gemini_tools.py
  • plugins/gemini/vision_agents/plugins/gemini/__init__.py
  • plugins/gemini/vision_agents/plugins/gemini/file_search.py
  • plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py
  • plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py
  • plugins/gemini/vision_agents/plugins/gemini/tools.py
  • plugins/getstream/pyproject.toml
  • plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py
  • plugins/turbopuffer/README.md
  • plugins/turbopuffer/py.typed
  • plugins/turbopuffer/pyproject.toml
  • plugins/turbopuffer/tests/test_turbopuffer_rag.py
  • plugins/turbopuffer/vision_agents/plugins/turbopuffer/__init__.py
  • plugins/turbopuffer/vision_agents/plugins/turbopuffer/turbopuffer_rag.py
  • plugins/twilio/README.md
  • plugins/twilio/py.typed
  • plugins/twilio/pyproject.toml
  • plugins/twilio/tests/__init__.py
  • plugins/twilio/tests/test_audio.py
  • plugins/twilio/tests/test_twilio.py
  • plugins/twilio/vision_agents/plugins/twilio/__init__.py
  • plugins/twilio/vision_agents/plugins/twilio/audio.py
  • plugins/twilio/vision_agents/plugins/twilio/call_registry.py
  • plugins/twilio/vision_agents/plugins/twilio/media_stream.py
  • plugins/twilio/vision_agents/plugins/twilio/models.py
  • plugins/twilio/vision_agents/plugins/twilio/utils.py
  • pyproject.toml
  • tests/test_mulaw_conversion.py

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

tschellenbach and others added 16 commits December 5, 2025 15:50
- Add tools.py with wrapper classes for all 6 Gemini tools:
  - FileSearch: RAG over documents
  - GoogleSearch: Ground responses with web data
  - CodeExecution: Run Python code
  - URLContext: Read specific web pages
  - GoogleMaps: Location-aware queries
  - ComputerUse: Browser automation

- Replace hardcoded file_search_store param with generic tools list
- Update _build_config() to handle multiple tools
- Update phone example to use new tools API
- Add unit tests for all tool wrappers

Usage:
  llm = gemini.LLM(tools=[
      gemini.tools.FileSearch(store),
      gemini.tools.GoogleSearch(),
      gemini.tools.CodeExecution(),
  ])
- Reuse existing stores with the same display_name instead of creating new ones
- Store content hash (SHA-256) in document custom_metadata for persistence
- Load existing hashes from API on startup to skip duplicate uploads
- Works across app restarts: same content = skipped, regardless of filename
- Update tests to use unique store names to avoid interference
@github-actions github-actions bot added the assets label Jan 6, 2026
@tschellenbach tschellenbach marked this pull request as ready for review January 6, 2026 19:07
@tschellenbach tschellenbach merged commit 8b35466 into main Jan 6, 2026
5 of 10 checks passed
@tschellenbach tschellenbach deleted the rag_and_sip branch January 6, 2026 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants