Phase 2B: Octopus API caching optimization with JWT token persistence #3045

mgazza · 2025-12-08T12:28:30Z

Summary

This PR implements a comprehensive caching optimization for the Octopus Energy API integration, reducing API load by ~99.8% and adding JWT token persistence across pod restarts.

Problem Statement

Current inefficiency: Every PredBat instance (1000+ users) independently:

Fetches the SAME Octopus Agile rates every 30 minutes
Stores duplicate copies in per-user cache files
Makes redundant API calls (500 users on AGILE-24-10-01 = 500 identical API calls)
Loses Kraken auth tokens on pod restart (in-memory only)

Solution: 3-Part Caching Architecture

1. Cache Split Architecture (Commit 1)

Separated user-specific from shared data:

User-specific cache (`/tmp/cache/{user_id}/octopus_user.yaml`):

Account agreements (which tariffs apply when)
Saving session enrollments
Intelligent device settings
Kraken authentication token

Shared cache (`/tmp/cache/shared/`):

`tariffs/{product_code}_{tariff_code}.yaml` - Tariff rates (one file per tariff)
`urls/{sha256_hash}.yaml` - HTTP responses (one file per URL)

Benefits:

99.9% storage reduction (10 tariff files vs 10,000 duplicate entries)
99.8% API call reduction (1 call per tariff vs 500 calls)
Instant propagation (Pod A fetches, Pods B/C/D benefit immediately)

2. Stale-While-Revalidate Pattern (Commit 2)

Problem: Thundering herd when cache expires (all 1000 pods fetch simultaneously)

Solution: 3-tier cache strategy

Fresh (< 30 min): Return immediately
Stale (30-35 min): Serve stale data while ONE pod refreshes
Too stale (> 35 min): Must fetch

Benefits:

Only 1 pod fetches, 999 serve stale data
No race conditions (atomic file locking)

3. JWT Token Caching (Commit 3)

Problem: Kraken tokens were lost on pod restart

Solution: JWT-based token persistence with error-driven refresh

Benefits:

Token survives pod restarts
Accurate expiry tracking (JWT exp field)
Automatic recovery from expired tokens

Expected Impact

API Load Reduction:

Before: 500 users = 500 API calls every 30 min
After: 1 API call per tariff every 30 min
Reduction: 99.8%

Storage Efficiency:

Before: 10,000 duplicate entries
After: 10 unique tariff files
Reduction: 99.9%

Files Changed

`apps/predbat/octopus.py` - Core caching implementation

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

Added debug print statements to trace: - When fetch_octopus_rates is called with entity_id - Which entity is being queried for rates - Data import results (type and length) - Total accumulated rate entries - Sample rate data structure These are temporary development logs for troubleshooting tariff data loading issues. Can be removed or made conditional on debug_enable flag. Also includes gecloud.py cleanup changes from previous work.

Removed 5 debug print statements that were added during Phase 2B development to trace fetch_octopus_rates() execution. These were temporary debugging aids used to verify that: - fetch_octopus_rates was being called correctly - Entity IDs were properly constructed - Data was successfully fetched from Supabase - Rate data had expected structure Now that the Octopus NATS integration is working correctly, these debug prints are no longer needed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Implements cache refresh strategy that prevents all pods from simultaneously fetching from Octopus API when cache expires. ## Problem When cache expires at exactly the same time for all 1000 pods: - 4:30:00.000 - Cache expires for ALL pods - 4:30:00.001-500 - All 1000 pods fetch from Octopus API simultaneously - Result: Thundering herd 💥 overwhelming Octopus API ## Solution: Stale-While-Revalidate Implements a three-tier cache strategy: 1. **Fresh (< 30 min)**: Return cached data immediately 2. **Stale (30-35 min)**: Serve stale data while ONE pod refreshes 3. **Too stale (> 35 min)**: Must fetch fresh data ## How It Works When cache is 30-35 minutes old: - First pod to check: Acquires atomic file lock, refreshes cache - Other pods: See lock exists, serve 5-min-stale data (acceptable for tariff rates) - No blocking: All pods return immediately - Eventually consistent: Fresh data available within seconds ## Lock Implementation Uses atomic file creation with O_CREAT | O_EXCL flags: - Non-blocking: Failed acquisition means another pod is refreshing - Automatic cleanup: Lock file removed after refresh - No deadlock risk: Lock holder always completes and removes lock ## Expected Impact Before: - 1000 pods × cache expiry = 1000 simultaneous API calls - Octopus API rate limiting and potential failures After: - 1 pod fetches, 999 pods serve stale data - 99.9% reduction in API calls during cache expiry - 5-minute staleness is acceptable for tariff optimization 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Add decode_kraken_token_expiry() to extract expiry from JWT payload - Update async_refresh_token() to use JWT expiry instead of hardcoded 1-hour - Save/load Kraken token in per-user cache (octopus_user.yaml) - Add error-driven token refresh on auth errors (KT-CT-1139, KT-CT-1111, KT-CT-1143) - Auto-retry GraphQL queries once on authentication failure Benefits: - Token survives pod restarts (loaded from cache) - Accurate expiry tracking (directly from JWT exp field) - Automatic recovery from expired tokens

springfall2008 · 2025-12-08T19:19:10Z

apps/predbat/components.py

        "args": {
            "ge_cloud_direct": {
-                "required_true": True,
+                "required": True,


Probably should still be required_true

yeah agreed, ignore these, they need reverting!

springfall2008 · 2025-12-08T19:19:18Z

apps/predbat/components.py

        "name": "GivEnergy Cloud Data",
        "args": {
            "ge_cloud_data": {
-                "required_true": True,


yeah agreed, ignore these :D

springfall2008 · 2025-12-15T07:45:36Z

The octopus change was re-implemented on main

springfall2008 and others added 9 commits November 30, 2025 20:16

Incomplete component server

febd4ff

[pre-commit.ci lite] apply automatic fixes

af7c9b5

Running

cef6ad4

Merge branch 'fixes31' of github.com:springfall2008/batpred into fixes31

f29301d

[pre-commit.ci lite] apply automatic fixes

ed2902a

mgazza mentioned this pull request Dec 8, 2025

Octopus shared cache optimization with thundering herd prevention mgazza/batpred#2

Closed

springfall2008 and others added 2 commits December 8, 2025 13:58

Merge branch 'main' into feature/phase2b-octopus-service

ec88a07

[pre-commit.ci lite] apply automatic fixes

b6cbe1a

springfall2008 reviewed Dec 8, 2025

View reviewed changes

springfall2008 closed this Dec 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Phase 2B: Octopus API caching optimization with JWT token persistence #3045

Phase 2B: Octopus API caching optimization with JWT token persistence #3045

Uh oh!

mgazza commented Dec 8, 2025

Uh oh!

springfall2008 Dec 8, 2025

Uh oh!

mgazza Dec 8, 2025

Uh oh!

springfall2008 Dec 8, 2025

Uh oh!

mgazza Dec 8, 2025

Uh oh!

springfall2008 commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Phase 2B: Octopus API caching optimization with JWT token persistence #3045

Phase 2B: Octopus API caching optimization with JWT token persistence #3045

Uh oh!

Conversation

mgazza commented Dec 8, 2025

Summary

Problem Statement

Solution: 3-Part Caching Architecture

1. Cache Split Architecture (Commit 1)

2. Stale-While-Revalidate Pattern (Commit 2)

3. JWT Token Caching (Commit 3)

Expected Impact

Files Changed

Uh oh!

springfall2008 Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

mgazza Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

springfall2008 Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

mgazza Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

springfall2008 commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants