-
-
Notifications
You must be signed in to change notification settings - Fork 94
Phase 2B: Octopus API caching optimization with JWT token persistence #3045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Phase 2B: Octopus API caching optimization with JWT token persistence #3045
Conversation
Added debug print statements to trace: - When fetch_octopus_rates is called with entity_id - Which entity is being queried for rates - Data import results (type and length) - Total accumulated rate entries - Sample rate data structure These are temporary development logs for troubleshooting tariff data loading issues. Can be removed or made conditional on debug_enable flag. Also includes gecloud.py cleanup changes from previous work.
Removed 5 debug print statements that were added during Phase 2B development to trace fetch_octopus_rates() execution. These were temporary debugging aids used to verify that: - fetch_octopus_rates was being called correctly - Entity IDs were properly constructed - Data was successfully fetched from Supabase - Rate data had expected structure Now that the Octopus NATS integration is working correctly, these debug prints are no longer needed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Implements cache refresh strategy that prevents all pods from simultaneously fetching from Octopus API when cache expires. ## Problem When cache expires at exactly the same time for all 1000 pods: - 4:30:00.000 - Cache expires for ALL pods - 4:30:00.001-500 - All 1000 pods fetch from Octopus API simultaneously - Result: Thundering herd 💥 overwhelming Octopus API ## Solution: Stale-While-Revalidate Implements a three-tier cache strategy: 1. **Fresh (< 30 min)**: Return cached data immediately 2. **Stale (30-35 min)**: Serve stale data while ONE pod refreshes 3. **Too stale (> 35 min)**: Must fetch fresh data ## How It Works When cache is 30-35 minutes old: - First pod to check: Acquires atomic file lock, refreshes cache - Other pods: See lock exists, serve 5-min-stale data (acceptable for tariff rates) - No blocking: All pods return immediately - Eventually consistent: Fresh data available within seconds ## Lock Implementation Uses atomic file creation with O_CREAT | O_EXCL flags: - Non-blocking: Failed acquisition means another pod is refreshing - Automatic cleanup: Lock file removed after refresh - No deadlock risk: Lock holder always completes and removes lock ## Expected Impact Before: - 1000 pods × cache expiry = 1000 simultaneous API calls - Octopus API rate limiting and potential failures After: - 1 pod fetches, 999 pods serve stale data - 99.9% reduction in API calls during cache expiry - 5-minute staleness is acceptable for tariff optimization 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add decode_kraken_token_expiry() to extract expiry from JWT payload - Update async_refresh_token() to use JWT expiry instead of hardcoded 1-hour - Save/load Kraken token in per-user cache (octopus_user.yaml) - Add error-driven token refresh on auth errors (KT-CT-1139, KT-CT-1111, KT-CT-1143) - Auto-retry GraphQL queries once on authentication failure Benefits: - Token survives pod restarts (loaded from cache) - Accurate expiry tracking (directly from JWT exp field) - Automatic recovery from expired tokens
| "args": { | ||
| "ge_cloud_direct": { | ||
| "required_true": True, | ||
| "required": True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should still be required_true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah agreed, ignore these, they need reverting!
| "name": "GivEnergy Cloud Data", | ||
| "args": { | ||
| "ge_cloud_data": { | ||
| "required_true": True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah agreed, ignore these :D
|
The octopus change was re-implemented on main |
Summary
This PR implements a comprehensive caching optimization for the Octopus Energy API integration, reducing API load by ~99.8% and adding JWT token persistence across pod restarts.
Problem Statement
Current inefficiency: Every PredBat instance (1000+ users) independently:
Solution: 3-Part Caching Architecture
1. Cache Split Architecture (Commit 1)
Separated user-specific from shared data:
User-specific cache (`/tmp/cache/{user_id}/octopus_user.yaml`):
Shared cache (`/tmp/cache/shared/`):
Benefits:
2. Stale-While-Revalidate Pattern (Commit 2)
Problem: Thundering herd when cache expires (all 1000 pods fetch simultaneously)
Solution: 3-tier cache strategy
Benefits:
3. JWT Token Caching (Commit 3)
Problem: Kraken tokens were lost on pod restart
Solution: JWT-based token persistence with error-driven refresh
Benefits:
Expected Impact
API Load Reduction:
Storage Efficiency:
Files Changed
🤖 Generated with Claude Code
Co-Authored-By: Claude noreply@anthropic.com