-
Notifications
You must be signed in to change notification settings - Fork 284
Phone & RAG #239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Phone & RAG #239
Conversation
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughIntroduces a core RAG framework abstraction with Document dataclass and async base class supporting document ingestion, directory indexing, and hybrid search. Adds Gemini File Search RAG and TurboPuffer hybrid RAG (vector + BM25) implementations. Implements new Twilio plugin with phone call management, WebSocket media streaming, and mulaw/PCM audio conversion. Provides phone+RAG example with inbound/outbound call orchestration via AI agents. Changes
Sequence Diagram(s)sequenceDiagram
actor User as Phone User
participant Twilio as Twilio SIP
participant FastAPI as FastAPI Server
participant WS as WebSocket
participant Agent as AI Agent
participant RAG as RAG Backend
User->>Twilio: Initiates call
Twilio->>FastAPI: POST /twilio/voice (with signature)
FastAPI->>FastAPI: Validate signature & create call
FastAPI->>Twilio: Return TwiML with WebSocket URL
Twilio->>WS: Connect to /twilio/media/{call_id}/{token}
WS->>FastAPI: WebSocket established
FastAPI->>Agent: Initialize agent session
FastAPI->>Agent: Greet caller
User-->>Twilio: Speaks (mulaw audio)
Twilio->>WS: Send media message (base64 mulaw)
WS->>FastAPI: Decode mulaw → PCM
FastAPI->>Agent: Feed PCM audio to agent
Agent->>RAG: Query knowledge base
RAG-->>Agent: Return search results
Agent->>Agent: Generate response
Agent-->>FastAPI: PCM audio response
FastAPI->>WS: Encode PCM → mulaw
WS->>Twilio: Send media message (base64 mulaw)
Twilio->>User: Play audio
Note over User,FastAPI: Real-time streaming loop continues
User->>Twilio: Ends call
Twilio->>WS: Send stop message
WS->>FastAPI: Close connection
FastAPI->>FastAPI: Clean up call registry
sequenceDiagram
participant Example as Outbound Example
participant Twilio as Twilio Client
participant FastAPI as FastAPI Server
participant WS as WebSocket
participant Agent as AI Agent
participant User as Phone User
Example->>Twilio: Initiate outbound call
Twilio->>Twilio: Dial phone (from → to)
Twilio->>FastAPI: Connect to WebSocket URL
WS->>FastAPI: WebSocket established
FastAPI->>Agent: Prepare agent & user
FastAPI->>Agent: Attach phone user to call
Agent->>Agent: Run agent session
Agent->>User: Start greeting
User-->>Twilio: Responds (mulaw audio)
Twilio->>WS: Media message (mulaw)
WS->>FastAPI: Decode → PCM
FastAPI->>Agent: Feed to agent session
Agent->>Agent: Process input & generate response
Agent-->>FastAPI: PCM output
FastAPI->>WS: Encode → mulaw
WS->>Twilio: Media message
Twilio->>User: Play audio
Note over Twilio,Agent: Streaming continues until end
User->>Twilio: Ends call
Twilio->>WS: Stop signal
WS->>FastAPI: Close & cleanup
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
✨ Finishing touches
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Disabled knowledge base sources:
⛔ Files ignored due to path filters (2)
📒 Files selected for processing (53)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- Add tools.py with wrapper classes for all 6 Gemini tools:
- FileSearch: RAG over documents
- GoogleSearch: Ground responses with web data
- CodeExecution: Run Python code
- URLContext: Read specific web pages
- GoogleMaps: Location-aware queries
- ComputerUse: Browser automation
- Replace hardcoded file_search_store param with generic tools list
- Update _build_config() to handle multiple tools
- Update phone example to use new tools API
- Add unit tests for all tool wrappers
Usage:
llm = gemini.LLM(tools=[
gemini.tools.FileSearch(store),
gemini.tools.GoogleSearch(),
gemini.tools.CodeExecution(),
])
- Reuse existing stores with the same display_name instead of creating new ones - Store content hash (SHA-256) in document custom_metadata for persistence - Load existing hashes from API on startup to skip duplicate uploads - Works across app restarts: same content = skipped, regardless of filename - Update tests to use unique store names to avoid interference
TODO
Summary by CodeRabbit
Release Notes
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.