Skip to content

Conversation

@mgazza
Copy link
Owner

@mgazza mgazza commented Dec 8, 2025

Summary

Implements critical performance optimization for Octopus Energy integration by splitting cache architecture and preventing thundering herd on cache expiry, reducing API load by ~99.9%.

Problem

Current inefficiency: Every PredBat instance (1000+ users in SaaS deployment) independently:

  1. Fetches the SAME Octopus Agile rates every 30 minutes
  2. Stores duplicate copies in per-user cache files
  3. Makes redundant API calls (500 users on AGILE-24-10-01 = 500 identical API calls)
  4. Thundering herd: When cache expires, all 1000 pods fetch simultaneously

Example:

  • 500 users on AGILE-24-10-01 tariff
  • Each pod fetches independently every 30 min
  • 500 identical API calls + 500 duplicate storage entries
  • Cache expiry at 4:30:00 → 1000 simultaneous API calls 💥

Solution

1. Split Cache Architecture

User-specific cache (/tmp/cache/{user_id}/octopus_user.yaml):

  • Account agreements (which tariffs apply when)
  • Intelligent device settings
  • Saving session enrollments

Shared tariff cache (/tmp/cache/shared/tariffs/{product}_{tariff}.yaml):

  • One file per unique tariff
  • Shared across ALL users on that tariff
  • No duplicate storage

Shared URL cache (/tmp/cache/shared/urls/{hash}.yaml):

  • SHA256-hashed URL responses
  • 30-minute TTL with automatic expiry

2. Stale-While-Revalidate (Thundering Herd Prevention)

Three-tier cache strategy:

  1. Fresh (< 30 min): Return cached data immediately
  2. Stale (30-35 min): Serve stale data while ONE pod refreshes
  3. Too stale (> 35 min): Must fetch fresh data

How it works when cache expires:

  • First pod: Acquires atomic file lock, refreshes cache
  • Other 999 pods: See lock exists, serve 5-min-stale data
  • No blocking, eventually consistent within seconds

Lock implementation:

  • Uses O_CREAT | O_EXCL for atomic, non-blocking lock
  • Automatic cleanup after refresh
  • No deadlock risk

Changes

Commit 1: Shared cache architecture

  • octopus.py:342-352 - Initialize shared cache directory structure
  • octopus.py:449-514 - Add tariff/URL cache helper methods
  • octopus.py:516-563 - Split save/load cache methods
  • Cache key generation uses product_code + tariff_code (no date component)

Commit 2: Stale-while-revalidate

  • octopus.py:1016-1069 - Implement three-tier cache with atomic locking
  • Prevents thundering herd on cache expiry
  • Only ONE pod fetches, others serve stale data

Expected Impact

API Load Reduction: 99.9%

Before:

  • Normal operation: 500 users × 1 call = 500 API calls per 30 min
  • Cache expiry: 1000 pods × 1 call = 1000 simultaneous API calls

After:

  • Normal operation: 1 API call per tariff per 30 min
  • Cache expiry: 1 pod fetches, 999 serve stale = 1 API call

Storage Efficiency: 99.9%

Before: 1000 users × 10 tariffs = 10,000 duplicate entries
After: 10 unique tariff files shared by all users

Instant Propagation

  • Pod A fetches new Agile rates at 4:05pm
  • Pod B checks at 4:06pm → cache hit (no API call)
  • All pods benefit from single fetch

Reliability

  • No thundering herd overwhelming Octopus API
  • No rate limiting failures
  • 5-minute staleness is acceptable for battery optimization

Compatibility

  • Backward compatible: Single-user installations continue working (no pods to coordinate)
  • No migration needed: Old cache files ignored, new structure created
  • Octopus Agile timing: 30-min polling handles variable publication time (4pm-6pm)
  • Multi-user safe: File locking prevents race conditions in shared deployments

Use Cases

Single-User (Home Assistant Add-on)

  • Cache split still beneficial (cleaner organization)
  • No thundering herd (only one instance)
  • Works exactly as before

Multi-User (SaaS Deployment)

  • Massive API reduction (99.9%)
  • Shared cache across all customer pods
  • Thundering herd prevention critical

Testing

Tested in PredBat SaaS infrastructure with 1000+ customer instances. Will benefit open-source users running multiple instances or preparing for multi-user deployments.

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

mgazza and others added 30 commits September 13, 2025 10:32
* Fix GivEnergy EV charger API integration

- Add missing measurands[] parameter to EV charger meter data endpoint
- Add page=1 parameter for API pagination
- Update response parsing to handle new API format with data wrapper
- Request all measurands (0-21) for comprehensive EV charger monitoring

This fixes the issue where EV charger entities were not being created
due to missing required API parameters causing authentication failures.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix EV charger entity creation - use measurand keys not names

* Fix EV charger serial number extraction - use serial_number not serial

---------

Co-authored-by: Claude <noreply@anthropic.com>
- Replace DOM-dependent HTML parsing with Go API integration
- Use dedicated octopus-free-sessions service for reliable parsing
- Add fallback to legacy HTML parsing for backward compatibility
- Improve session data format conversion with proper timezone handling
- This resolves issues where free sessions weren't being detected due to
  changes in Octopus Energy website HTML structure

The Go API uses goquery for better DOM parsing and will be upgraded
to use proper CSS rendering in the future for maximum reliability.
* Update fox.yaml to give example of grid_power

* Update fox.yaml

* [pre-commit.ci lite] apply automatic fixes

* Added grid_ct for grid_power

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
* Fixes

* Web tidy

* Fiox registers done

* Controls

* [pre-commit.ci lite] apply automatic fixes

* Fox work

* [pre-commit.ci lite] apply automatic fixes

* Fox schedule

* [pre-commit.ci lite] apply automatic fixes

* Fox controls

* [pre-commit.ci lite] apply automatic fixes

* Update custom-dictionary-workspace.txt

* Add 'temperation' to custom dictionary

* [pre-commit.ci lite] apply automatic fixes

* Typo

* Remove unused imports

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
* Component fixes

* [pre-commit.ci lite] apply automatic fixes

* Fixes

* Octopus free slots TZ issue

* [pre-commit.ci lite] apply automatic fixes

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
* Handle the new 15-minute data structure

* [pre-commit.ci lite] apply automatic fixes

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
* Fox fixes

* Fox fixes

* Auto config + fixes

* Fix issue with scaling of charge rate

* [pre-commit.ci lite] apply automatic fixes

* HA Interface alive issue when not using HA

* Update custom-dictionary-workspace.txt

* [pre-commit.ci lite] apply automatic fixes

* GEneration PV sensor

* Update custom-dictionary-workspace.txt

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
* Fox fixes

* [pre-commit.ci lite] apply automatic fixes

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
)

* Add Fox Cloud integration section to inverter setup

Added Fox Cloud integration details and noted its experimental status.

* Add initial PredBat configuration template

* Update inverter-setup.md
* Version

* New logfile viewer

* [pre-commit.ci lite] apply automatic fixes

* FIx path for log api

* Update custom-dictionary-workspace.txt

* [pre-commit.ci lite] apply automatic fixes

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
…ing mode settings and adding detailed comments on charging and discharging limits. (springfall2008#2691)
* Fix AIO gateway and cloud direct

* Serial

* [pre-commit.ci lite] apply automatic fixes

* Update custom-dictionary-workspace.txt

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
* Misc

* [pre-commit.ci lite] apply automatic fixes

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
* Enable real time control for GECloud
Add log file search

* [pre-commit.ci lite] apply automatic fixes

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
* Dynamic load feature

* [pre-commit.ci lite] apply automatic fixes

* Fixes

* Fixes to ge cloud timeout
fixes to logfile search on web
fixes to code for dynamic load

* [pre-commit.ci lite] apply automatic fixes

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
* Fixes for dynamic load

* [pre-commit.ci lite] apply automatic fixes

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
* Crash with car slot dynamic

* [pre-commit.ci lite] apply automatic fixes

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
… event route (springfall2008#2715)

* Reset discharge downto

* Entity editting
routing for external events

* [pre-commit.ci lite] apply automatic fixes

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
* GECloud fixes for Gateway
Smoothing data fix for after the last sample

* [pre-commit.ci lite] apply automatic fixes

* Typo

* Unit test fixes

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
…st_days (springfall2008#2719)

The minute_data function has a 'days' parameter but was incorrectly using
self.forecast_days in one location (line 655) when calculating to_time.
This could cause inconsistent behavior when the function is called with
a different days value than self.forecast_days.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <noreply@anthropic.com>
* Auto config fixes

* [pre-commit.ci lite] apply automatic fixes

* Update custom-dictionary-workspace.txt

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
…ngfall2008#2724)

* Typo

* Fix unit test

* Fix to log file flicker with filter

* Web tidy

* Fix fetch crash
Support day/night tariff

* [pre-commit.ci lite] apply automatic fixes

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
…t time (springfall2008#2725)

* Web config form fix

* Fox fixes
Fix Web UI override issue with now time
Data fill issues

* [pre-commit.ci lite] apply automatic fixes

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
…change (springfall2008#2728)

* Improve fox stability
Force re-compute after dynamic load detection change

* Update customisation.md
* Bug fix gap filling and smoothing

* [pre-commit.ci lite] apply automatic fixes

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
* Document sensor data output for power flow

Added sensor data section detailing current power flow from inverters, including load power, battery power, PV power, and grid power.

* Update output-data.md
springfall2008 and others added 27 commits November 23, 2025 19:26
* Refactoring ongoing

* [pre-commit.ci lite] apply automatic fixes

* Unused imports

* Fix

* Update apps/predbat/component_base.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update apps/predbat/web.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update apps/predbat/ha.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update apps/predbat/component_base.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Cleanup gecloud refactor

* Solcast refactor

* Octopus refactor

* [pre-commit.ci lite] apply automatic fixes

* Refactor ohme

* Fox refactor

* Refactor carbon and alertfeed

* Test fixes, copilot info

* [pre-commit.ci lite] apply automatic fixes

* Add custom dictionary entries for Python keywords

* Tidy imports

* [pre-commit.ci lite] apply automatic fixes

* Misc

* [pre-commit.ci lite] apply automatic fixes

* Lint fixes

* [pre-commit.ci lite] apply automatic fixes

* Add abstract method

* Remove record status in alertfeed

* Update apps/predbat/component_base.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Major version

* Updates

* Refactoring component start methods

* More refactor

* Save ram

* Fix typo

* Fixes

* [pre-commit.ci lite] apply automatic fixes

* Cleanup

* [pre-commit.ci lite] apply automatic fixes

* Octopus request get

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
…de handlers as modal popups

- Replace browser alert() dialogs with styled modal popups for error messages
- Create reusable showErrorMessage() function to eliminate code duplication
- Modal features: centered dialog with overlay, warning icon, dismiss button
- Display actual server error messages instead of generic "Failed to set override" text
- Update all three handlers: handleRateOverride, handleLoadOverride, handleTimeOverride
- Modal works in both light and dark modes with proper styling
- User can dismiss by clicking button or clicking outside modal
* Dummy

* Copilot instructions

* [pre-commit.ci lite] apply automatic fixes

* Add custom dictionary words for spell checking

* [pre-commit.ci lite] apply automatic fixes

* Extend manual override selectors to 48 hours with day-of-week format

- Changed time format from 'HH:MM:SS' to 'Day HH:MM' (e.g., 'Mon 12:00', 'Wed 15:30')
- Extended manual override time range from 17-18 hours to 48 hours
- Updated manual_times() and manual_rates() in userinterface.py to use new format
- Fixed day-of-week calculation bug in highlighting logic
- Refactored to use get_override_time_from_string() utility function from utils.py
- Updated web interface endpoints (html_rate_override, html_plan_override) to match new format
- Fixed get_override_time_from_string() in utils.py to correctly handle day offset calculation
- Removed backwards compatibility code for old time format

* [pre-commit.ci lite] apply automatic fixes

* Fix pre-commit failure: remove unused datetime import

- Removed unused `datetime` import from userinterface.py
- Only `timedelta` is needed from datetime module
- Fixes ruff F401 error

* Update copilot instructions with pre-commit and git workflow guidance

- Added detailed pre-commit setup and usage instructions
- Documented GitKraken MCP tools vs local git commands
- Added common pre-commit failures and how to fix them
- Included git pull --rebase workflow for handling remote changes
- Provided examples of fixing unused imports (F401) and other issues

* [pre-commit.ci lite] apply automatic fixes

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
…ringfall2008#2977)

* Initial plan

* Fix Plan History cost reset at midnight by making cost cumulative across day boundary

Co-authored-by: springfall2008 <48591903+springfall2008@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: springfall2008 <48591903+springfall2008@users.noreply.github.com>
* Fix issue with octopus and compare

* Update apps/predbat/octopus.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…for grid power (springfall2008#2981)

* Quick fix for: springfall2008#2942

* Improve plan history to show charge and export in same slot

* [pre-commit.ci lite] apply automatic fixes

* Fox fix grid power

* Update custom-dictionary-workspace.txt

* Correction to 5 minute offset

* Fix

* Typo

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
* Quick fix for: springfall2008#2942

* Improve plan history to show charge and export in same slot

* [pre-commit.ci lite] apply automatic fixes

* Fox fix grid power

* Update custom-dictionary-workspace.txt

* Correction to 5 minute offset

* Fix

* Typo

* Allow manual override to omit day
springfall2008#2975

* [pre-commit.ci lite] apply automatic fixes

* Unit test

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
…sable charge window when its not required (springfall2008#2993)

* Octopus stop issue, more debug

* Fox fixes to avoid repeated API calls

* Fix to avoid disabling the charge window that just finished

* [pre-commit.ci lite] apply automatic fixes

* Fix for SOC Percent charge curve

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
…just warn (springfall2008#2994)

* Change missing gas rates into warning

* Bug fix export window is set when freeze exporting on some inverters

* [pre-commit.ci lite] apply automatic fixes

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
springfall2008#2984) (springfall2008#2998)

* fix: Prevent low power exports gaming metric_keep without cost savings (springfall2008#2984)

Low power exports (0.3, 0.5, 0.7) and freeze exports (99%) could incorrectly
be selected when they only improved the metric by reducing metric_keep penalty
without actually saving money.

The issue occurred because slow exports would cause early grid imports during
the export window. While this import would happen anyway later, shifting its
timing reduced time below best_soc_keep, artificially improving the metric
without genuine cost benefits.

Changes:
- Add off_cost tracking alongside off_metric in optimise_export()
- Require cost improvement (cost + min_improvement_scaled <= off_cost) for
  any export to be selected, not just metric improvement
- Update expected test output for pre_saving1 scenario which now correctly
  excludes two freeze export slots that didn't provide cost savings

This ensures exports are only selected when they provide real financial
benefit, preventing the optimizer from gaming the keep penalty system.

* [pre-commit.ci lite] apply automatic fixes

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
…008#2999)

- Updated test_adjust_charge_window to verify charge_start_time_minutes
  and charge_end_time_minutes values after adjust_charge_window call
- Added test cases for:
  - Normal window (end > start)
  - Midnight span when past midnight but before end (start moved back)
  - Midnight span when end has passed (end moved forward)
  - Window entirely in the past (both moved forward)
…ns (springfall2008#3000)

* Refactor compute_window_minutes to accept timestamps directly

Modified the helper function to accept datetime/time objects instead of
pre-computed minutes. The function now internally computes hour*60 + minute,
simplifying the calling code.

* Refactor midnight-spanning window logic into utils.py

Extract compute_window_minutes and window2minutes functions from
Inverter class into utils.py as standalone functions. This consolidates
the duplicated midnight-spanning window logic that was in:
- update_status (charge window, discharge window, idle time)
- adjust_charge_window
- adjust_idle_time

The helper function handles:
- Converting start/end times to minutes from midnight
- Adjusting for windows that span midnight
- Moving windows forward when they've already passed

* Bump version

* Add new tests

* Refactor unit tests with test registry, timing, and --quick option

- Add TEST_REGISTRY table with all tests, descriptions, and slow flag
- Add --list option to show available tests
- Add --test/-t option to run a single test by name
- Add --quick/-q option to skip slow tests (optimise_levels, optimise_windows, debug_cases)
- Add timing for each test showing PASSED/FAILED with duration
- Add total time summary at end of test run
- Move debug_cases into registry as run_debug_cases function
- Move octopus_free into registry as test_octopus_free function
- Update run_inverter_tests to accept my_predbat parameter
- Remove redundant --perf_only argument

* [pre-commit.ci lite] apply automatic fixes

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
…8#2986)

* Correct Conversion of SoC Percent to SoC kWh for Charge Curve

When a SoC was in %, the code needs to convert it to kWh. However it was using a function that converts kWh->%, which meant the values were nonsense.

Add a new utility that does the same conversion, but in the opposite direction, %->kWh, and apply it in the charge curve code.

* Correct Charge Curve Logging Order

Logs are shown in reverse. The actual values were correctly reversed, but the heading was still shown at the bottom. By adding at the end, it will display in the logs in the correct order to copy to YAML.

* Allow charge curves to detect multi-% jumps

With some batteries it is possible for the final charge curve to jump more than one percent within any given minute, particularly towards the end of the curve.

Allow the code to correctly handle this by averaging the power across the total number of percentage points crossed in a given step. For example if SoC charged 2% in 1 minute, then the average charger each of the two %age points will actually be half the calculated amount.

* Remove Change to Charge Curve Average Power

The change should not have modified the average power, but is partially still required to ensure multi-% changes are detected.

* Remove Superfluous change to Utils.py

The changes were made in a different way to main, so this is not required...

* Revert "Correct Charge Curve Logging Order"

This reverts commit 061e075.
* Improve unit tests

* [pre-commit.ci lite] apply automatic fixes

* Add hass into Predbat repo

* Remove unused import

* [pre-commit.ci lite] apply automatic fixes

* Lint

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
* Cleanup unit tests

* [pre-commit.ci lite] apply automatic fixes

* Split up tests

* [pre-commit.ci lite] apply automatic fixes

* Refactor unit tests

* Fix cov script

* Quick tests in CI

* [pre-commit.ci lite] apply automatic fixes

* Pre commit fixes

* Fix

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
Added debug print statements to trace:
- When fetch_octopus_rates is called with entity_id
- Which entity is being queried for rates
- Data import results (type and length)
- Total accumulated rate entries
- Sample rate data structure

These are temporary development logs for troubleshooting tariff
data loading issues. Can be removed or made conditional on
debug_enable flag.

Also includes gecloud.py cleanup changes from previous work.
Removed 5 debug print statements that were added during Phase 2B
development to trace fetch_octopus_rates() execution. These were
temporary debugging aids used to verify that:

- fetch_octopus_rates was being called correctly
- Entity IDs were properly constructed
- Data was successfully fetched from Supabase
- Rate data had expected structure

Now that the Octopus NATS integration is working correctly, these
debug prints are no longer needed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Implements cache refresh strategy that prevents all pods from simultaneously
fetching from Octopus API when cache expires.

## Problem
When cache expires at exactly the same time for all 1000 pods:
- 4:30:00.000 - Cache expires for ALL pods
- 4:30:00.001-500 - All 1000 pods fetch from Octopus API simultaneously
- Result: Thundering herd 💥 overwhelming Octopus API

## Solution: Stale-While-Revalidate

Implements a three-tier cache strategy:
1. **Fresh (< 30 min)**: Return cached data immediately
2. **Stale (30-35 min)**: Serve stale data while ONE pod refreshes
3. **Too stale (> 35 min)**: Must fetch fresh data

## How It Works

When cache is 30-35 minutes old:
- First pod to check: Acquires atomic file lock, refreshes cache
- Other pods: See lock exists, serve 5-min-stale data (acceptable for tariff rates)
- No blocking: All pods return immediately
- Eventually consistent: Fresh data available within seconds

## Lock Implementation

Uses atomic file creation with O_CREAT | O_EXCL flags:
- Non-blocking: Failed acquisition means another pod is refreshing
- Automatic cleanup: Lock file removed after refresh
- No deadlock risk: Lock holder always completes and removes lock

## Expected Impact

Before:
- 1000 pods × cache expiry = 1000 simultaneous API calls
- Octopus API rate limiting and potential failures

After:
- 1 pod fetches, 999 pods serve stale data
- 99.9% reduction in API calls during cache expiry
- 5-minute staleness is acceptable for tariff optimization

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add decode_kraken_token_expiry() to extract expiry from JWT payload
- Update async_refresh_token() to use JWT expiry instead of hardcoded 1-hour
- Save/load Kraken token in per-user cache (octopus_user.yaml)
- Add error-driven token refresh on auth errors (KT-CT-1139, KT-CT-1111, KT-CT-1143)
- Auto-retry GraphQL queries once on authentication failure

Benefits:
- Token survives pod restarts (loaded from cache)
- Accurate expiry tracking (directly from JWT exp field)
- Automatic recovery from expired tokens
@mgazza
Copy link
Owner Author

mgazza commented Dec 8, 2025

Closing fork PR - created upstream PR instead: springfall2008#3045

@mgazza mgazza closed this Dec 8, 2025
mgazza pushed a commit that referenced this pull request Jan 11, 2026
* Problems with Fox and minSocOnGrid

* Version bump

* Fox minSOC fixes again
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.