Skip to content

Conversation

@magnetised
Copy link
Contributor

@magnetised magnetised commented Jan 15, 2026

Reduce load on sqlite by moving the core metadata checked by most read requests into ETS and prevent timeouts during sustained parallel shape creation by writing to sqlite asynchronously in a background process. Writes are now done directly to ETS and flushed to disk in batches.

Only shape -> handle and handle -> shapes lookups are now handled by reading directly from sqlite. All other, small, metadata is held in an ets wriite-through cache and populated from sqlite on boot.

Pending writes are flushed to disk on shutdown. This is quick, ~1,000 writes/s, but could cause delays to shutdown in an overloaded system. There is a risk of loss of in-flight writes if the system crashes.

Associated changes:

  • Move ShapeCleaner up in the supervisor hierarchy so it's available before the ShapeStatus process attempts to cleanup incomplete/invalid shapes
  • Allow for configuring load shedding via an environment variable - makes it easier to run load tests locally without maintaining a patch on the default configuration

Summary by CodeRabbit

  • New Features
    • SQLite-backed shape store with an ETS-backed write buffer, background flush controls (pause/resume), labeled DB checkout telemetry, pending-buffer telemetry, and optional max-concurrent-requests config.
  • Bug Fixes
    • Reduced races between buffer and persistent store; more consistent snapshot lifecycle, LRU/meta handling, and removals.
  • Chores
    • Supervision changes: added cleanup-task supervisor and removed legacy cleaner wiring.
  • Tests
    • Expanded unit and integration tests for buffering, snapshot flows, crash recovery, and ordering.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 15, 2026

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

Replaces per-shape ETS last-used table with a 4-tuple shape_meta_table and adds an ETS-backed WriteBuffer GenServer to batch/asynchronously persist shape metadata to SQLite; ShapeDb, queries, connection checkout labeling, supervisors, telemetry, runtime config, and tests are updated to be buffer-aware and support snapshot lifecycle events.

Changes

Cohort / File(s) Summary
ETS metadata refactor
packages/sync-service/lib/electric/shape_cache/shape_status.ex
Replace shape_last_used_table with shape_meta_table storing 4-tuples (handle, hash, snapshot_started, last_read_time); update ETS helpers, validation, LRU logic, and snapshot flag handling.
Shape DB core & buffering
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex, packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/*.ex
Add SQLite-backed ShapeDb semantics with a WriteBuffer: reads merge buffer+SQLite, tombstone-aware lookups/listings/counts, buffering for adds/removes/snapshot events, rename reduce_shape_handlesreduce_shape_meta, add pending_buffer_size/1, validate_existing_shapes/1, and ShapeDb.Error.
WriteBuffer implementation
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex
New GenServer providing per-stack ETS op queue and shape buffer, tombstoning, snapshot lifecycle queueing, periodic/background batch flush to SQLite, synchronous flush API, lookup APIs, telemetry/tracing, error/retry handling, and public inspection APIs.
Connection pool & labeled checkouts
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex
Enqueue/checkout accept labels ({:checkout, label}), compute enqueue duration, emit OpenTelemetry with queue_time_μs and label, and change checkout!/checkout_write! signatures to accept labels.
SQL queries & streaming
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex
Replace list_handles with list_shape_meta (handle, hash, snapshot_state), rename streaming entry to list_shape_meta_stream/1, and remove single-field hash/snapshot queries.
Supervisor wiring
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/supervisor.ex, packages/sync-service/lib/electric/monitored_core_supervisor.ex, packages/sync-service/lib/electric/core_supervisor.ex, packages/sync-service/lib/electric/shapes/supervisor.ex
Add ShapeDb.WriteBuffer to ShapeDb supervision, add a CleanupTaskSupervisor child via tweaks, and remove legacy shape_cleaner wiring.
Telemetry & metrics
packages/sync-service/lib/electric/stack_supervisor/telemetry.ex
Add periodic measurement report_write_buffer_size/2 and metric [:electric, :shape_db, :write_buffer, :pending_writes] reporting pending buffer size.
Runtime config
packages/sync-service/config/runtime.exs
Add optional max_concurrent_requests config read from ELECTRIC_MAX_CONCURRENT_REQUESTS (JSON-decoded) and wired into :electric config.
Tests & test helpers
packages/sync-service/test/.../shape_db_test.exs, packages/sync-service/test/support/component_setup.ex, integration-tests/tests/crash-recovery.lux, packages/sync-service/test/.../request_batcher_test.exs
Update tests to use reduce_shape_meta/3, exercise snapshot states and buffer flushes, set manual_flush_only: true in test setup, expand crash-recovery to two shapes, and assert deterministic removal ordering in batching tests.
Changelog / changeset
.changeset/gorgeous-readers-return.md
Add release note documenting ETS write-through cache and asynchronous SQLite writes.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant ShapeDb as ShapeDb
  participant WriteBuffer as WriteBuffer
  participant SQLite as SQLite
  participant Telemetry as Telemetry

  Client->>ShapeDb: add_shape(stack_id, handle, shape_bin, hash, relations)
  ShapeDb->>WriteBuffer: enqueue {:add, handle, hash, shape_bin, relations}
  WriteBuffer-->>ShapeDb: {:ok, shape_hash}
  Note right of WriteBuffer: rgba(255,165,0,0.5) Background flush or flush_sync batches ops
  WriteBuffer->>SQLite: batch_apply(adds/removes/snapshot events)
  SQLite-->>WriteBuffer: ok
  WriteBuffer->>Telemetry: emit batch traces/metrics
  WriteBuffer-->>ShapeDb: flush completion / tombstone pruning
  ShapeDb-->>Client: {:ok, shape_hash}
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • icehaunter

Poem

🐰 I cached my shapes in ETS, snug and round,
I queued small hops till the flusher came down.
Batches scurried off through sqlite's night,
Labels kept checkouts timed and bright.
Hop—metadata safe with a soft, steady sound.

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly describes the main architectural change: moving SQLite writes to background processing and buffering read state via ETS, which is the core objective of this PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Jan 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.36%. Comparing base (4c7a4b6) to head (5f99549).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3716   +/-   ##
=======================================
  Coverage   87.36%   87.36%           
=======================================
  Files          23       23           
  Lines        2011     2011           
  Branches      531      533    +2     
=======================================
  Hits         1757     1757           
  Misses        252      252           
  Partials        2        2           
Flag Coverage Δ
packages/experimental 87.73% <ø> (ø)
packages/react-hooks 86.48% <ø> (ø)
packages/start 82.83% <ø> (ø)
packages/typescript-client 93.47% <ø> (ø)
packages/y-electric 56.05% <ø> (ø)
typescript 87.36% <ø> (ø)
unit-tests 87.36% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@blacksmith-sh

This comment has been minimized.

@magnetised magnetised force-pushed the magnetised/sqlite-ets-cache branch from 1cdec3c to 2734d7b Compare January 15, 2026 12:34
@magnetised
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Jan 15, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@magnetised
Copy link
Contributor Author

benchmark this

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex`:
- Around line 106-125: The fallback clause handle_checkout({:checkout, _label},
_from, conn, pool_state) is unreachable because handle_enqueue({:checkout,
label}, pool_state) always enqueues {:checkout, label, now()}, so remove the
dead clause (the two-arity pattern) from connection.ex (or instead replace it
with a clear comment if you intentionally want defensive behavior); update tests
accordingly to ensure no behavior changes.
📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c25526c and 2734d7b.

📒 Files selected for processing (5)
  • packages/sync-service/lib/electric/shape_cache/shape_status.ex
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex
  • packages/sync-service/test/electric/shape_cache/shape_status/shape_db_test.exs
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.838Z
Learning: Define shapes in server/proxy – no client-defined tables/WHERE clauses
📚 Learning: 2026-01-14T14:45:05.838Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.838Z
Learning: Avoid old Electric patterns (bidirectional SQLite sync, `electrify()` API) – use Electric HTTP streaming with TanStack DB collections instead

Applied to files:

  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex
🧬 Code graph analysis (2)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex (3)
packages/sync-service/lib/electric/connection/manager.ex (1)
  • pool_name (180-182)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (1)
  • explain (182-186)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex (2)
  • explain (263-265)
  • explain (267-269)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (1)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex (2)
  • checkout_write! (168-180)
  • checkout! (159-166)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Check and build examples/linearlite-read-only
  • GitHub Check: Check and build examples/tanstack-db-web-starter
  • GitHub Check: Check and build examples/linearlite
  • GitHub Check: Check and build examples/nextjs
  • GitHub Check: Check and build examples/react
  • GitHub Check: Check and build examples/encryption
  • GitHub Check: Check and build examples/proxy-auth
  • GitHub Check: Run Lux integration tests
🔇 Additional comments (18)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex (2)

6-8: LGTM!

The new list_shape_meta query correctly selects the required metadata fields (handle, hash, snapshot_state) for the ETS cache population.


249-253: LGTM!

The stream function correctly transforms the database's integer snapshot_state to a boolean representing "snapshot started" status, consistent with the snapshot_started? function logic at Line 199.

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (3)

12-15: LGTM!

The updated imports correctly reflect the new function signatures that accept action labels for telemetry tracking.


35-47: LGTM!

Action labels are consistently applied across all checkout operations, enabling meaningful telemetry tracking per operation type.


100-106: LGTM!

The renamed reduce_shape_meta/3 correctly uses the new list_shape_meta_stream and the name accurately reflects the richer metadata being processed.

packages/sync-service/test/electric/shape_cache/shape_status/shape_db_test.exs (1)

176-199: LGTM!

The test comprehensively validates the renamed reduce_shape_meta/3 function, correctly verifying the {handle, hash, snapshot_started} tuple structure with proper setup of both started and non-started snapshots.

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex (3)

132-140: LGTM!

The reduced SQLite page cache (512KB) is appropriate since hot-path lookups now go through the ETS cache. The comment clearly explains the negative value convention.


114-118: LGTM!

The telemetry emission provides valuable observability for pool checkout latency, enabling monitoring and debugging of potential connection pool bottlenecks per operation type.


159-180: LGTM!

The updated function signatures correctly propagate the label through to NimblePool, enabling the telemetry instrumentation in handle_checkout.

packages/sync-service/lib/electric/shape_cache/shape_status.ex (9)

32-34: LGTM!

The tuple structure {handle, hash, snapshot_started, last_read_time} and position constant are clearly documented, making the code self-explanatory.


69-75: LGTM!

The ETS insertion correctly uses insert_new to prevent race conditions and properly sequences the persistent store write before the cache write.


169-175: LGTM!

The hash validation via ETS lookup is an effective optimization, avoiding SQLite queries for shape validation on the hot path.


182-186: LGTM!

The update correctly sequences the persistent write before the cache update and properly uses update_element with the correct tuple position.


190-193: LGTM!

Reading snapshot_started? from ETS cache provides fast hot-path access while maintaining a safe default of false for missing entries.


203-205: Verify: snapshot_complete? not cached in ETS.

While snapshot_started? now reads from the ETS cache (Line 190-193), snapshot_complete? still queries SQLite directly. This asymmetry may be intentional if snapshot_complete? is called less frequently, but it creates inconsistent performance characteristics.

If snapshot_complete? is also on a hot path, consider extending the meta tuple to {handle, hash, snapshot_started, snapshot_complete, last_read_time} and caching both states.


231-267: LGTM!

The LRU algorithm using gb_trees is an efficient approach (O(n) scan with O(log k) tree operations) and correctly handles the new 4-tuple structure.


297-310: LGTM!

The population logic correctly transforms the 3-tuple from ShapeDb.reduce_shape_meta into the 4-tuple required by ETS, initializing all shapes with the same start_time for a fair LRU baseline after restart.


283-295: LGTM!

The ETS table configuration with read_concurrency: true and write_concurrency: :auto is well-suited for a cache that's primarily read-heavy with occasional writes.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@blacksmith-sh

This comment has been minimized.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 15, 2026

Benchmark results, triggered for 2734d

  • write fanout completed

write fanout results

  • unrelated shapes one client latency completed

unrelated shapes one client latency results

  • many shapes one client latency completed

many shapes one client latency results

  • concurrent shape creation completed

concurrent shape creation results

  • diverse shape fanout completed

diverse shape fanout results

@magnetised magnetised force-pushed the magnetised/sqlite-ets-cache branch from 2734d7b to bb41c9a Compare January 15, 2026 13:49
@balegas
Copy link
Contributor

balegas commented Jan 19, 2026

Concurrent shape creation looks significantly slower. Do you think this might cause timeouts with high load?

@magnetised magnetised force-pushed the magnetised/sqlite-ets-cache branch 2 times, most recently from fee08a1 to 62d685a Compare January 19, 2026 15:32
@blacksmith-sh

This comment has been minimized.

@magnetised magnetised force-pushed the magnetised/sqlite-ets-cache branch from 62d685a to 67a2d9f Compare January 19, 2026 15:42
@magnetised
Copy link
Contributor Author

benchmark this

@github-actions
Copy link
Contributor

github-actions bot commented Jan 19, 2026

Benchmark results, triggered for 55864

  • write fanout completed

write fanout results

  • unrelated shapes one client latency completed

unrelated shapes one client latency results

  • many shapes one client latency completed

many shapes one client latency results

  • concurrent shape creation completed

concurrent shape creation results

  • diverse shape fanout completed

diverse shape fanout results

@blacksmith-sh

This comment has been minimized.

@blacksmith-sh

This comment has been minimized.

@magnetised
Copy link
Contributor Author

benchmark this

@github-actions
Copy link
Contributor

github-actions bot commented Jan 20, 2026

Benchmark results, triggered for b02c0

  • write fanout completed

write fanout results

  • unrelated shapes one client latency completed

unrelated shapes one client latency results

  • many shapes one client latency completed

many shapes one client latency results

  • concurrent shape creation completed

concurrent shape creation results

  • diverse shape fanout completed

diverse shape fanout results

@magnetised magnetised self-assigned this Jan 20, 2026
@blacksmith-sh

This comment has been minimized.

@magnetised
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Jan 20, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@magnetised magnetised force-pushed the magnetised/sqlite-ets-cache branch from b9b0fce to fceff29 Compare January 20, 2026 13:52
@magnetised magnetised changed the title Put ets cache in front of sqlite Move SQLite writes to a background thread and buffer read state Jan 20, 2026
@magnetised magnetised marked this pull request as ready for review January 20, 2026 14:29
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@packages/sync-service/config/runtime.exs`:
- Around line 199-206: The current config decodes JSON into atoms using
Jason.decode!(..., keys: :atoms) which can create new atoms and exhaust the BEAM
atom table; update the decode call for max_concurrent_requests to use keys:
:atoms! instead (i.e., replace Jason.decode!(json, keys: :atoms) with
Jason.decode!(json, keys: :atoms!)) so only existing atoms are used; locate the
max_concurrent_requests assignment and the Jason.decode! invocation to make this
change.

In
`@packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex`:
- Around line 401-425: The batch write silently ignores non-:ok returns causing
failed DB ops to be dropped; update do_batch_write so each call to
Query.remove_shape, Query.mark_snapshot_started, and
Query.mark_snapshot_complete is enforced with pattern matching (e.g., expect :ok
and let a non-:ok return raise) similar to the existing :ok = Query.add_shape
line, so failures propagate to the rescue block instead of being ignored during
the Enum.each over entries.
- Around line 339-381: flush_until_empty currently leaves entries marked with
flushing=true when do_batch_write returns :error, preventing retries; after
detecting a non-:ok result from do_batch_write(stack_id, entries) you should
iterate the collected entries and reset their flushing flag back to false in the
ops_table so they can be retried later (use the same key shape used by
mark_and_collect_entries/ops_table entries, updating the flushing field via
:ets.update_element or a select/replace on ops_table for each {ts, _op} in
entries). Ensure this logic runs only on failure paths and references the local
variables ops_table, entries, and the flushing flag so successful paths are
unchanged.
- Around line 56-58: The ops table currently keys only by timestamp() which can
collide; add a new function op_key/0 that returns a composite monotonic
timestamp plus a unique integer (e.g., {System.monotonic_time(),
:erlang.unique_integer([:positive])}) and keep timestamp/0 if needed for
readability, then replace all uses of ts = timestamp() in add_shape,
remove_shape, queue_snapshot_started, and queue_snapshot_complete with ts =
op_key() so each op key is ordered and unique, preventing overwrites in the
:ordered_set.

In
`@packages/sync-service/test/electric/replication/shape_log_collector/request_batcher_test.exs`:
- Around line 117-120: Replace the fixed Process.sleep(5) with a deterministic
synchronization: wait (via assert_receive) for the RequestBatcher to confirm the
shape is fully enqueued or poll a public/readable status function on
RequestBatcher (e.g., RequestBatcher.enqueued?/2 or similar) until it returns
true before calling RequestBatcher.remove_shape(stack_id, `@shape_handle_2`);
ensure you add or use an existing test hook/message (like an :shape_enqueued
notification) from the add_shape handling so the test can reliably
assert_receive that notification instead of sleeping.
🧹 Nitpick comments (2)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex (1)

123-125: Emit checkout telemetry for non-queued checkouts too.

Fast-path checkouts skip telemetry, which can skew queue-time metrics. Consider emitting the same event with queue_time_μs: 0 in this clause for consistency.

♻️ Suggested adjustment
 def handle_checkout({:checkout, _label}, _from, conn, pool_state) do
+  OpenTelemetry.execute(
+    [:electric, :shape_db, :pool, :checkout],
+    %{queue_time_μs: 0},
+    %{label: label, stack_id: Keyword.get(pool_state, :stack_id)}
+  )
   {:ok, conn, conn, pool_state}
 end
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex (1)

223-229: Align snapshot_started? with bitmask semantics.

The comment defines bit 0 as “started.” Using > 0 treats any nonzero state as started. Consider using a bitmask check to match the stated semantics and future-proof if snapshot_state ever becomes 2.

♻️ Suggested adjustment
-    snapshot_started? = snapshot_state > 0
+    snapshot_started? = Bitwise.band(snapshot_state, 1) == 1
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2734d7b and fceff29.

📒 Files selected for processing (15)
  • integration-tests/tests/crash-recovery.lux
  • packages/sync-service/config/runtime.exs
  • packages/sync-service/lib/electric/core_supervisor.ex
  • packages/sync-service/lib/electric/monitored_core_supervisor.ex
  • packages/sync-service/lib/electric/shape_cache/shape_status.ex
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/supervisor.ex
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex
  • packages/sync-service/lib/electric/shapes/supervisor.ex
  • packages/sync-service/lib/electric/stack_supervisor/telemetry.ex
  • packages/sync-service/test/electric/replication/shape_log_collector/request_batcher_test.exs
  • packages/sync-service/test/electric/shape_cache/shape_status/shape_db_test.exs
  • packages/sync-service/test/support/component_setup.ex
💤 Files with no reviewable changes (2)
  • packages/sync-service/lib/electric/core_supervisor.ex
  • packages/sync-service/lib/electric/shapes/supervisor.ex
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-01-14T14:45:05.838Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.838Z
Learning: Avoid old Electric patterns (bidirectional SQLite sync, `electrify()` API) – use Electric HTTP streaming with TanStack DB collections instead

Applied to files:

  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex
📚 Learning: 2026-01-15T13:54:51.778Z
Learnt from: magnetised
Repo: electric-sql/electric PR: 3716
File: packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex:106-126
Timestamp: 2026-01-15T13:54:51.778Z
Learning: In NimblePool, when a worker is immediately available, the pool skips `handle_enqueue/2` and calls `handle_checkout/4` directly with the original checkout command. When no worker is available, the request goes through `handle_enqueue/2` (where state can be transformed) before `handle_checkout/4` is called. This means both patterns must be handled in `handle_checkout/4` to support both immediate and queued checkouts.

Applied to files:

  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex
🧬 Code graph analysis (4)
packages/sync-service/lib/electric/monitored_core_supervisor.ex (1)
packages/sync-service/lib/electric/telemetry/sentry.ex (1)
  • set_tags_context (23-25)
packages/sync-service/lib/electric/shape_cache/shape_status.ex (2)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex (2)
  • add_shape (100-131)
  • mark_snapshot_started (176-183)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (1)
  • mark_snapshot_started (67-73)
packages/sync-service/test/electric/shape_cache/shape_status/shape_db_test.exs (4)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex (2)
  • add_shape (205-221)
  • remove_shape (224-240)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex (7)
  • add_shape (100-131)
  • mark_snapshot_started (176-183)
  • shape_for_handle (145-149)
  • handle_for_shape (139-143)
  • mark_snapshot_complete (185-192)
  • remove_shape (194-211)
  • handle_exists? (151-156)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (2)
  • mark_snapshot_started (67-73)
  • mark_snapshot_complete (75-81)
packages/sync-service/lib/electric/shapes.ex (1)
  • mark_snapshot_started (123-127)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex (3)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex (4)
  • add_shape (100-131)
  • remove_shape (194-211)
  • mark_snapshot_started (176-183)
  • mark_snapshot_complete (185-192)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex (1)
  • checkout_write! (169-181)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (2)
  • mark_snapshot_started (67-73)
  • mark_snapshot_complete (75-81)
🔇 Additional comments (26)
packages/sync-service/lib/electric/monitored_core_supervisor.ex (1)

17-17: No action needed — tweaks is guaranteed to be a keyword list by schema validation.

The codebase explicitly defines tweaks as type: :keyword_list in the schema validation (stack_supervisor.ex lines 104–110) with a default of []. This type constraint is enforced across all code paths. Additionally, core_supervisor.ex uses Keyword.fetch!(opts, :tweaks), which would fail immediately if tweaks were ever a map. All existing uses of tweaks in the codebase employ Keyword.get() or Keyword.fetch!() without any defensive logic, confirming that the type is safe and consistent.

Likely an incorrect or invalid review comment.

packages/sync-service/lib/electric/stack_supervisor/telemetry.ex (2)

43-62: Pending-write metric wiring looks good.
Adds the metric and schedules periodic reporting, improving visibility into buffered writes.


84-96: Defensive write-buffer telemetry emission looks good.
Gracefully handles the buffer not being available yet during early startup.

packages/sync-service/test/electric/shape_cache/shape_status/shape_db_test.exs (4)

176-202: Nice coverage for reduce_shape_meta + snapshot_started propagation.
Flushing before reduction makes the SQLite-backed state deterministic in the test.


223-250: Snapshot start/complete tests look good.
These updates align with the new snapshot lifecycle APIs.


273-277: Helper keeps snapshot assertions compact.
Makes the new snapshot-state checks easier to follow.


336-415: Good write-buffer behavior coverage.
The suite exercises buffered visibility, flush behavior, tombstones, and snapshot transitions well.

packages/sync-service/lib/electric/shape_cache/shape_status.ex (5)

32-61: shape_meta_table initialization looks solid.
The new meta-table lifecycle (create/populate) is clear and coherent.


69-139: Add/remove/reset keep the meta cache in sync.
The ETS updates align with the buffered write path.


157-194: Handle validation + snapshot_started caching aligns with the new tuple.
Consistent use of the meta table makes these paths straightforward.


216-263: LRU/last_read_time updates align with the new tuple layout.
The fold now correctly pulls the last_read_time element.


275-304: Meta-table helpers are clean.
Creation and population helpers look good.

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (7)

2-23: Updated ShapeDb doc + buffering wiring look good.
The module-level intent is clear.


39-86: Buffered mutation path looks consistent.
Add/remove/snapshot/reset all route through the buffer cleanly.


88-131: Buffer-first reads with tombstone checks look good.
The fallback to SQLite is clear and safe.


133-176: Merged buffered/SQLite listing with tombstone filtering looks good.
Deduping by handle makes sense here.


184-209: Reduction/count paths align with the buffer model.
The updated flow matches the new read/write semantics.


216-248: Buffer-aware handle checks and pre-validation flush are good.
This keeps validation consistent with persisted state.


256-260: Pending-buffer metric accessor is a good addition.
Useful for telemetry and load testing.

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex (7)

1-62: Module structure and high-level API look good.
The doc + API layout is easy to follow.


64-69: flush_sync API is handy for tests and shutdown.
Nice addition for deterministic flushes.


73-127: Lookup helpers and tombstone gating read clearly.
The semantics are easy to reason about.


129-153: Buffered shape listing/count helpers look good.
They match the expected buffered semantics.


155-200: Relation filtering and tombstone accounting look consistent.
The logic aligns with the Query-level expectations.


258-315: Init and ETS table setup look good.
The table configuration and manual_flush_only flag are clear.


317-333: Poll/flush/terminate wiring looks good.
The periodic flush loop is straightforward.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@magnetised magnetised force-pushed the magnetised/sqlite-ets-cache branch 2 times, most recently from bb30dae to 1b8a197 Compare January 22, 2026 09:23
@blacksmith-sh

This comment has been minimized.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@packages/sync-service/lib/electric/shape_cache/shape_status.ex`:
- Around line 179-185: The function mark_snapshot_started currently ignores a
false result from :ets.update_element which leaves ETS stale; change
mark_snapshot_started to capture the result of
:ets.update_element(shape_meta_table(stack_id), shape_handle, {3, true}) and
handle the false case by either inserting the correct shape metadata into ETS
(fetching the full record from ShapeDb if needed) or returning :error so callers
see the failure; ensure the function only returns :ok when the ETS update/insert
actually succeeded and update any callers/uses of snapshot_started? to rely on
the consistent ETS state.

In
`@packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex`:
- Around line 223-229: The current list_shape_meta_stream's snapshot_started?
uses "snapshot_state > 0" which can crash or misreport when snapshot_state is
nil or equals legacy 2; change the logic in list_shape_meta_stream (the clause
that calls stream_query) to guard for nil/legacy values and decode both bits via
bitmask: treat nil and 2 as "not started", compute snapshot_started? with
Bitwise.band(snapshot_state, 1) == 1 and snapshot_complete? with
Bitwise.band(snapshot_state, 2) == 2 (or return false if snapshot_state is nil),
keeping the rest of the stream_query mapping the same.

In
`@packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex`:
- Around line 352-354: The clause flush_until_empty(%{paused: true} = state) has
an unused variable causing compile warnings; update the function head to remove
the binding or mark it as unused—either change it to flush_until_empty(%{paused:
true}) or flush_until_empty(%{paused: true} = _state) so the unused variable
warning is resolved while preserving pattern matching in flush_until_empty/1.
🧹 Nitpick comments (2)
packages/sync-service/lib/electric/monitored_core_supervisor.ex (1)

25-26: Confirm rest_for_one restart scope with the new CleanupTaskSupervisor placement.

With :rest_for_one, a crash in CleanupTaskSupervisor will restart all later children (ShapeDb supervisor, ShapeStatusOwner, CoreSupervisor). If that’s not the intended failure domain, consider moving it after those children or supervising it separately.

packages/sync-service/test/support/component_setup.ex (1)

253-257: Make manual_flush_only configurable in tests.
Hard-coding true makes it harder to exercise background flush behavior in tests that need it. Consider using a ctx override with a default.

♻️ Proposed tweak
-      {Electric.ShapeCache.ShapeStatus.ShapeDb.Supervisor,
-       stack_id: ctx.stack_id, storage_dir: ctx.tmp_dir, manual_flush_only: true},
+      {Electric.ShapeCache.ShapeStatus.ShapeDb.Supervisor,
+       stack_id: ctx.stack_id,
+       storage_dir: ctx.tmp_dir,
+       manual_flush_only: Map.get(ctx, :manual_flush_only, true)},
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 42ac3a4 and bb30dae.

📒 Files selected for processing (16)
  • .changeset/gorgeous-readers-return.md
  • integration-tests/tests/crash-recovery.lux
  • packages/sync-service/config/runtime.exs
  • packages/sync-service/lib/electric/core_supervisor.ex
  • packages/sync-service/lib/electric/monitored_core_supervisor.ex
  • packages/sync-service/lib/electric/shape_cache/shape_status.ex
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/supervisor.ex
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex
  • packages/sync-service/lib/electric/shapes/supervisor.ex
  • packages/sync-service/lib/electric/stack_supervisor/telemetry.ex
  • packages/sync-service/test/electric/replication/shape_log_collector/request_batcher_test.exs
  • packages/sync-service/test/electric/shape_cache/shape_status/shape_db_test.exs
  • packages/sync-service/test/support/component_setup.ex
💤 Files with no reviewable changes (2)
  • packages/sync-service/lib/electric/core_supervisor.ex
  • packages/sync-service/lib/electric/shapes/supervisor.ex
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/sync-service/config/runtime.exs
  • .changeset/gorgeous-readers-return.md
  • packages/sync-service/test/electric/replication/shape_log_collector/request_batcher_test.exs
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.855Z
Learning: Define shapes in server/proxy – no client-defined tables/WHERE clauses
📚 Learning: 2026-01-20T16:17:48.415Z
Learnt from: magnetised
Repo: electric-sql/electric PR: 3716
File: packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex:207-224
Timestamp: 2026-01-20T16:17:48.415Z
Learning: In Electric's ShapeCache, shape handles are unique identifiers that are never reused. A handle will never be removed and then re-added, so tombstone+re-add race conditions in the WriteBuffer are not a concern.

Applied to files:

  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex
  • packages/sync-service/test/electric/shape_cache/shape_status/shape_db_test.exs
  • integration-tests/tests/crash-recovery.lux
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex
  • packages/sync-service/lib/electric/shape_cache/shape_status.ex
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex
📚 Learning: 2026-01-14T14:45:05.855Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.855Z
Learning: Avoid old Electric patterns (bidirectional SQLite sync, `electrify()` API) – use Electric HTTP streaming with TanStack DB collections instead

Applied to files:

  • integration-tests/tests/crash-recovery.lux
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex
📚 Learning: 2026-01-15T13:54:59.352Z
Learnt from: magnetised
Repo: electric-sql/electric PR: 3716
File: packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex:106-126
Timestamp: 2026-01-15T13:54:59.352Z
Learning: In NimblePool, when a worker is immediately available, the pool skips `handle_enqueue/2` and calls `handle_checkout/4` directly with the original checkout command. When no worker is available, the request goes through `handle_enqueue/2` (where state can be transformed) before `handle_checkout/4` is called. This means both patterns must be handled in `handle_checkout/4` to support both immediate and queued checkouts.

Applied to files:

  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex
🧬 Code graph analysis (3)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex (4)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/supervisor.ex (3)
  • name (6-8)
  • start_link (10-12)
  • init (14-52)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex (4)
  • add_shape (100-131)
  • remove_shape (194-211)
  • mark_snapshot_started (176-183)
  • mark_snapshot_complete (185-192)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex (1)
  • checkout_write! (169-181)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (2)
  • mark_snapshot_started (67-73)
  • mark_snapshot_complete (75-81)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (3)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex (11)
  • add_shape (209-228)
  • remove_shape (231-256)
  • queue_snapshot_started (259-264)
  • queue_snapshot_complete (267-272)
  • clear (275-279)
  • lookup_handle (105-115)
  • is_tombstoned? (118-120)
  • tombstoned_handles (200-204)
  • handles_for_relations (170-185)
  • buffered_shape_count (161-167)
  • pending_operations_count (145-147)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex (8)
  • add_shape (100-131)
  • remove_shape (194-211)
  • handle_for_shape (139-143)
  • shape_for_handle (145-149)
  • list_shapes (133-137)
  • shape_handles_for_relations (158-168)
  • list_shape_meta_stream (223-230)
  • count_shapes (170-174)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex (2)
  • checkout_write! (169-181)
  • checkout! (160-167)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex (2)
packages/sync-service/lib/electric/connection/manager.ex (1)
  • pool_name (180-182)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex (2)
  • explain (240-242)
  • explain (244-246)
🪛 GitHub Actions: Elixir formatting
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex

[error] 352-352: Warning: variable "state" is unused (prefix with an underscore). Treat as error because --warnings-as-errors is enabled during compilation.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Test packages/experimental w/ sync-service
  • GitHub Check: Test packages/y-electric w/ sync-service
  • GitHub Check: Test packages/start w/ sync-service
  • GitHub Check: Test packages/react-hooks w/ sync-service
  • GitHub Check: Test packages/typescript-client w/ sync-service
  • GitHub Check: Build and test sync-service, pg15
  • GitHub Check: Build and test sync-service, pg14
  • GitHub Check: Build and test sync-service, pg17
  • GitHub Check: Build and test sync-service, pg18
  • GitHub Check: Run Lux integration tests
🔇 Additional comments (53)
packages/sync-service/lib/electric/monitored_core_supervisor.ex (1)

17-26: Keyword.get/3 usage is safe; tweaks is guaranteed to be a keyword list.

The type of tweaks is explicitly constrained to :keyword_list in stack_supervisor.ex (line 104-105), and all usages across the codebase treat it as such. There is no evidence of tweaks being supplied as a map. The suggested refactor using get_in/2 may be a minor optimization, but the concern about type safety does not apply.

Likely an incorrect or invalid review comment.

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/supervisor.ex (1)

46-48: WriteBuffer child addition looks good.
Supervising the write buffer alongside the write pool fits the batching goal.

packages/sync-service/lib/electric/stack_supervisor/telemetry.ex (2)

43-64: Telemetry wiring for the write-buffer metric looks good.
Default metrics and periodic measurements are updated consistently.


84-96: report_write_buffer_size/2 implementation looks good.
The guard via rescue keeps early-boot telemetry safe.

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex (1)

2-8: list_shape_meta query addition aligns with the new meta model.
Nice to expose handle/hash/snapshot_state in the read query set.

packages/sync-service/test/electric/shape_cache/shape_status/shape_db_test.exs (4)

176-202: Nice coverage for reduce_shape_meta/3 including snapshot flags and flush.
The explicit flush makes the SQLite read path deterministic.


223-250: Snapshot start/complete tests updated appropriately.
Assertions align with the new snapshot state flow.


273-277: Helper for snapshot state lookup keeps tests concise.
Good test ergonomics.


336-415: Write-buffer behavior tests are comprehensive.
Covers visibility, flush, and pending-removal edge cases well.

packages/sync-service/lib/electric/shape_cache/shape_status.ex (11)

32-34: Tuple layout documentation is clear.
This makes the ETS shape_meta_table usage easier to reason about.


49-61: Initialization now creates and populates the meta table—LGTM.
Flow aligns with the new metadata model.


69-75: Write-through meta insert on add looks good.
Storing {handle, hash, snapshot_started, last_read_time} is consistent.


125-130: ETS cleanup on removal is correct.
Keeps the cache consistent with ShapeDb.


135-139: Reset clears ETS state as expected.
Good parity with ShapeDb.reset/1.


157-159: ETS membership check for handles is fine.
This is the right fast-path.


166-175: Hash validation via ETS looks good.
Keeps validation cheap and local.


189-193: Snapshot-started lookup is consistent with the new meta table.
Nice and simple.


215-220: last_read_time updates now use the tuple position constant—LGTM.
Keeps tuple access centralized.


227-235: LRU fold logic updated correctly for new tuple shape.
Pattern matches now account for hash/snapshot fields.


275-305: Meta-table helper functions look solid.
Creation and population logic align with the new ShapeDb reduce API.

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex (5)

4-6: Telemetry alias wiring looks good.

The alias is consistent with the new checkout instrumentation below.


106-125: Checkout labeling + queue-time telemetry are solid (incl. fast-path).

The enqueue timestamping, queue-time calculation, and non-enqueued clause keep both queued and immediate checkouts working with consistent telemetry.

Based on learnings, NimblePool can bypass handle_enqueue/2 on immediate checkouts, so the fast-path clause is important.


133-140: Pragma adjustments make sense with the ETS hot path.

Lower cache size and NORMAL synchronous align with the new buffering strategy.


160-180: Labeled checkout API update looks good.

The label threading aligns with the new telemetry hooks.


326-333: Explain path now correctly labels checkouts.

Consistent labeling for read/write explain paths is a nice touch.

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (14)

2-23: Docs and buffering wiring are clear.

The updated module doc and WriteBuffer alias/imports align with the new write path.


39-56: Buffered add path looks good.

Writing to ETS first and enqueueing for SQLite keeps the API responsive.


59-81: Removal + snapshot markers are coherently buffered.

The guard via handle_exists?/2 keeps snapshot writes scoped to valid handles.


83-86: Reset correctly clears buffer before DB reset.

This avoids stale buffer state leaking into the fresh DB.


88-113: Handle lookup path is safe with tombstone re-check.

The buffer-first lookup plus tombstone guard avoids serving deleted handles.


115-141: Shape lookup path mirrors handle lookup well.

The post-SQLite tombstone re-check closes the race window.


143-164: Merged list_shapes is well-structured.

Buffer + SQLite merge with tombstone filtering and dedupe should behave as intended.


170-186: Relations lookup merge is clean.

Filtering SQLite results by tombstones and unioning with buffered handles is sensible.


194-199: reduce_shapes now correctly relies on buffer-aware list.

Keeps reduction consistent with buffered state.


201-208: Boot-time meta reduction looks appropriate.

Using the stream for shape meta during boot avoids extra buffering logic.


210-220: Count logic matches the buffer + SQLite model.

Nice to call out potential non-atomicity in the docstring.


235-257: Validation flow is robust.

Flushing before validation and using a longer timeout for the cleanup path is sensible.


266-270: Pending buffer size accessor is clear.

Nice lightweight introspection hook for telemetry.


226-233: No action required. Query.handle_exists?/2 returns a plain boolean (true or false), not {:ok, boolean}. The function explicitly unwraps the result from fetch_one/3 and converts it to a boolean in the case statement (lines 153-154), so the concern about {:ok, false} being truthy does not apply here. The code is correct.

Likely an incorrect or invalid review comment.

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex (9)

1-57: Module structure and op_key uniqueness look good.

The composite op key avoids timestamp collisions while preserving ordering.


58-82: Control APIs (flush/pause/resume) are clear and test-friendly.

These hooks should make shutdown and test scenarios predictable.


87-205: Buffered lookup/list helpers are consistent with tombstone semantics.

The tombstone checks and reverse index lookups keep reads coherent.


208-279: Buffered write operations look solid.

Add/remove/snapshot operations and clear behavior align with the buffer design.


282-319: Init sets up ETS and process metadata cleanly.

Concurrency options and process labeling look appropriate.


321-351: Polling/flush controls are straightforward.

The manual flush mode and pause toggles are handled cleanly.


356-402: Flush loop and retry-on-failure behavior look good.

Resetting the flushing flag on failure prevents permanent stalls.


404-418: Entry selection and marking look safe.

The select + update sequence is clear and bounded by the drain limit.


420-444: Batch write error propagation is handled correctly.

Ensuring write ops must return :ok improves reliability and error visibility.

integration-tests/tests/crash-recovery.lux (5)

6-6: Good call making handle_valid global for cross-shell assertions.


17-52: Nice split into items1/items2 with separate seeding for dual-path coverage.


77-124: Must‑refetch path + refreshed handle/offset looks solid.


126-180: Good end‑to‑end coverage of pre‑crash + post‑crash updates.


60-75: The test flow already ensures items1 is flushed to SQLite before pausing the WriteBuffer.

The test waits for the [debug] ShapeDb: committed logs (lines 69-70) before calling pause_flush (line 74). These logs are printed after do_batch_write completes successfully, which means the shape has already been written to SQLite. No additional flush call or wait is needed.

Likely an incorrect or invalid review comment.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Copy link
Contributor

@msfstef msfstef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I've noticed a slower concurrent shape creation on the benchmarks, I don't know how up to date the base line is, and I am slightly worried about any impacts telemetry might have on the system due to the counting of the shapes.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
integration-tests/tests/crash-recovery.lux (1)

152-158: Flaky synchronization causes pipeline failure.

The test waits for [debug] Txn received in Shapes.Consumer, but this message may never appear if PostgreSQL acknowledged the replication slot position before the crash. Since the client successfully read the pre-crash insert (lines 136-140), the transaction was already processed and may not be "un-acked" on restart.

Consider one of these alternatives:

  1. Use a timeout pattern that allows proceeding if the message doesn't appear within a reasonable window
  2. Insert data after restart rather than relying on replay of potentially-acked transactions
  3. Use a different synchronization point that reliably indicates the stack is ready for queries
🔧 Suggested fix using optional timeout
   # (that hasn't been flushed to disk). The sqlite-based startup is slightly
   # slower than the pure-ets version (~50ms vs ~5ms) which means that this request
   # arrives before the stack has been received the pending operation.
-  # To mitigate this we ensure the txn has been processed.
-  ??[debug] Txn received in Shapes.Consumer
+  # To mitigate this we add a small delay to allow any pending txn to be processed.
+  [sleep 1]

Alternatively, if the intent is to test that un-acked transactions are replayed, the test should ensure the transaction is definitely not acknowledged before the crash—perhaps by pausing replication acknowledgment or crashing immediately after the insert without waiting for client visibility.

🤖 Fix all issues with AI agents
In
`@packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex`:
- Around line 329-401: The flush_sync path currently always replies :ok because
handle_call(:flush_sync, ...) calls flush_until_empty which swallows batch
failures from do_batch_write/2; change flush_until_empty (and its callers) to
return :ok or {:error, reason} and propagate that result back to
handle_call(:flush_sync, ...) so the caller sees failures. Specifically, update
flush_until_empty/1 (used by handle_call(:flush_sync,...), terminate/1 and
recursive calls) to return the result of do_batch_write (or an error tuple)
instead of ignoring it, propagate errors out of the recursion (stop further
attempts on failure), and adjust handle_call(:flush_sync, _from, state) to reply
with the returned value from flush_until_empty(%{state | paused: false}); keep
terminate/1 calling flush_until_empty but ignore its return value to preserve
shutdown behavior. Ensure the error branch still resets the flushing flag on
ops_table entries (via :ets.update_element) so retries remain possible.
🧹 Nitpick comments (1)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex (1)

94-97: Add :safe option to binary_to_term calls for defense-in-depth against unsafe term deserialization.

Using binary_to_term/1 without the :safe option can create new atoms or raise on corrupted binaries. The :safe option (available via binary_to_term/2) prevents atom creation and other unsafe operations, raising ArgumentError for malicious or corrupted data. This is a security best practice for deserializing untrusted binaries.

♻️ Suggested change
-          {:ok, :erlang.binary_to_term(shape_binary)}
+          {:ok, :erlang.binary_to_term(shape_binary, [:safe])}
-    |> Enum.map(fn [handle, shape_binary] -> {handle, :erlang.binary_to_term(shape_binary)} end)
+    |> Enum.map(fn [handle, shape_binary] -> {handle, :erlang.binary_to_term(shape_binary, [:safe])} end)

Applies to lines 96 and 156.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bb30dae and 1b8a197.

📒 Files selected for processing (2)
  • integration-tests/tests/crash-recovery.lux
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.855Z
Learning: Avoid old Electric patterns (bidirectional SQLite sync, `electrify()` API) – use Electric HTTP streaming with TanStack DB collections instead
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.855Z
Learning: Define shapes in server/proxy – no client-defined tables/WHERE clauses
📚 Learning: 2026-01-20T16:17:48.415Z
Learnt from: magnetised
Repo: electric-sql/electric PR: 3716
File: packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex:207-224
Timestamp: 2026-01-20T16:17:48.415Z
Learning: In Electric's ShapeCache, shape handles are unique identifiers that are never reused. A handle will never be removed and then re-added, so tombstone+re-add race conditions in the WriteBuffer are not a concern.

Applied to files:

  • integration-tests/tests/crash-recovery.lux
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex
📚 Learning: 2026-01-14T14:45:05.855Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.855Z
Learning: Avoid old Electric patterns (bidirectional SQLite sync, `electrify()` API) – use Electric HTTP streaming with TanStack DB collections instead

Applied to files:

  • integration-tests/tests/crash-recovery.lux
🧬 Code graph analysis (1)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex (4)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/supervisor.ex (1)
  • name (6-8)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/query.ex (4)
  • add_shape (100-131)
  • remove_shape (194-211)
  • mark_snapshot_started (176-183)
  • mark_snapshot_complete (185-192)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/connection.ex (1)
  • checkout_write! (169-181)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (2)
  • mark_snapshot_started (67-73)
  • mark_snapshot_complete (75-81)
🪛 GitHub Actions: Lux Integration Tests
integration-tests/tests/crash-recovery.lux

[error] 158-158: Command './run.sh' failed. Crash test 'tests/crash-recovery.lux' failed with 'match_timeout' at line 158.

🔇 Additional comments (8)
integration-tests/tests/crash-recovery.lux (5)

6-53: Test setup for two-shape crash recovery scenario looks correct.

The setup creates two separate tables and populates them, establishing a clear scenario for testing both persisted and unpersisted shape recovery paths.


60-76: Good use of pause_flush to eliminate timing-based flakiness.

Pausing the flush after items1 is persisted ensures deterministic behavior for the unpersisted shape test case.


85-124: First crash/recovery flow correctly validates both persisted and unpersisted shape behaviors.

The test properly verifies that items1 (persisted) remains valid while items2 (unpersisted) requires a must-refetch, aligning with the expected write-buffer semantics.


160-180: Post-recovery verification logic is correct.

The test properly validates that the recovered shape can handle both existing and new data after the second crash, assuming the synchronization issue is resolved.


182-183: Cleanup is standard.

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex (3)

52-56: Nice: unique op_key prevents ordered_set collisions.

Clean, ordered uniqueness with {monotonic_time, unique_int} should keep the ops queue stable under concurrency.


208-255: Buffer updates + tombstone insert_new logic look solid.

The shape insert + comparable index and the insert_new tombstone guard are consistent and minimize double-remove churn.


282-318: ETS setup + process metadata look good.

Named tables with read_concurrency and write_concurrency plus labeling/metadata are well aligned for a hot-path buffer.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@magnetised magnetised force-pushed the magnetised/sqlite-ets-cache branch from 1b8a197 to e5556c2 Compare January 22, 2026 10:07
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex`:
- Around line 86-101: The current lookup_shape/2 reads the tombstone before
reading the shape which allows a race to return a stale shape; change
lookup_shape to first read from the shapes_table_name(stack_id) via
:ets.lookup(shapes_table, {:shape, handle}), then if a shape row is found check
:ets.member(shapes_table, {:tombstone, handle}) and return :not_found if a
tombstone exists, otherwise return {:ok, :erlang.binary_to_term(shape_binary)};
if the lookup returned [] return :not_found — this aligns lookup_shape/2 with
has_handle?/2 and avoids the tombstone/lookup race.
♻️ Duplicate comments (1)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex (1)

329-332: Flush sync still masks batch failures.

handle_call(:flush_sync, ...) always replies :ok, even if do_batch_write/2 fails; this is still the same issue flagged earlier. Callers can’t detect a failed flush.

Also applies to: 356-401

🧹 Nitpick comments (1)
integration-tests/tests/crash-recovery.lux (1)

99-110: Optional: keep offset_valid scope consistent.

You start with a global, then shadow it locally. Keeping it global avoids confusion if the test grows.

♻️ Suggested tweak
-  [local offset_valid=$1]
+  [global offset_valid=$1]
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1b8a197 and e5556c2.

📒 Files selected for processing (2)
  • integration-tests/tests/crash-recovery.lux
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.855Z
Learning: Define shapes in server/proxy – no client-defined tables/WHERE clauses
📚 Learning: 2026-01-20T16:17:48.415Z
Learnt from: magnetised
Repo: electric-sql/electric PR: 3716
File: packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex:207-224
Timestamp: 2026-01-20T16:17:48.415Z
Learning: In Electric's ShapeCache, shape handles are unique identifiers that are never reused. A handle will never be removed and then re-added, so tombstone+re-add race conditions in the WriteBuffer are not a concern.

Applied to files:

  • integration-tests/tests/crash-recovery.lux
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex
📚 Learning: 2026-01-14T14:45:05.855Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.855Z
Learning: Avoid old Electric patterns (bidirectional SQLite sync, `electrify()` API) – use Electric HTTP streaming with TanStack DB collections instead

Applied to files:

  • integration-tests/tests/crash-recovery.lux
📚 Learning: 2026-01-14T14:45:20.892Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: examples/tanstack-db-web-starter/AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:20.892Z
Learning: Applies to examples/tanstack-db-web-starter/src/lib/trpc/**/*.ts : Generate transaction IDs using `pg_current_xact_id()::xid::text` for Electric sync compatibility

Applied to files:

  • integration-tests/tests/crash-recovery.lux
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Test packages/typescript-client w/ sync-service
  • GitHub Check: Build and test sync-service, pg18
  • GitHub Check: Build and test sync-service, pg14
  • GitHub Check: Build and test sync-service, pg15
  • GitHub Check: Build and test sync-service, pg17
  • GitHub Check: Run Lux integration tests
🔇 Additional comments (10)
integration-tests/tests/crash-recovery.lux (10)

5-6: Clear global handle initialization for cross‑block use.

This keeps the persisted handle available for later log and client assertions.


17-52: Nice split into items1/items2 for dual‑shape coverage.

The table split and seeded data make the two‑shape recovery paths deterministic.


60-76: Snapshot capture + flush pause sequencing is solid.

Capturing handle/offset before pausing flush makes the persisted vs. unpersisted shapes clear.


77-84: Second shape initialization is clean.

Handle and offset capture for the unflushed shape is clear and self‑contained.


93-95: Restart expectations are explicit.

Asserting the valid/invalid shape counts makes the recovery intent obvious.


111-124: Must‑refetch flow for the unflushed shape is well covered.

The 409 path and subsequent 0_inf recovery are asserted clearly.


129-140: Pre‑crash update verification looks good.

Live fetch + offset update ensures the change is observed before the crash.


147-154: Txn priming and sync‑ready gating are well documented.

The inserted txn and log wait reduce restart race flakiness.

Also applies to: 160-166


169-173: Post‑restart continuity check is solid.

Confirms the same shape can be queried after restart.


179-188: Post‑crash update validation is clear.

Live fetch asserts the recovered shape continues to stream updates.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@magnetised magnetised force-pushed the magnetised/sqlite-ets-cache branch from e5556c2 to 6037b8a Compare January 22, 2026 10:32
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@integration-tests/tests/crash-recovery.lux`:
- Around line 63-64: The regex for ?electric-handle currently uses ([\d-]+)
which only matches digits and hyphens and will fail for UUID-style handles
containing hex letters; update the pattern to accept hex digits and hyphens
(e.g. use ([0-9a-fA-F-]+) or ([\da-fA-F-]+]) and apply the same change where
?electric-handle and the corresponding [global handle_valid=$1] pairs occur (the
other occurrences referenced in the review).

In
`@packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex`:
- Around line 320-325: The handle_info(:poll, %{manual_flush_only: false} =
state) currently uses a strict pattern match (:ok = flush_until_empty(state))
which will crash the GenServer if flush_until_empty returns {:error, _}, causing
ETS table loss; change it to call flush_until_empty(state) without
pattern-matching the return and explicitly handle both {:ok, new_state} and
{:error, reason, new_state} (or similar) so on error you log the failure, do not
crash, ensure any flush_entries cleanup/reset logic runs (reuse flush_entries'
error-path behavior to reset flushing flags), and then call schedule_poll(state)
to reschedule—this preserves ETS and buffered data and allows retries instead of
terminating the process.
🧹 Nitpick comments (2)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex (2)

2-32: Consider documenting crash/recovery semantics.

The module documentation thoroughly explains the architecture and operation flow, but doesn't cover what happens on crash or discarded in-flight requests. Adding a brief note about why this is a safe failure mode (e.g., no orphaned shapes, recovery on restart) would help maintainers understand the durability guarantees.


52-53: timestamp/0 appears unused after the op_key/0 fix.

All internal operations now use op_key/0 for unique keys. The timestamp/0 function remains exported but doesn't seem to be called anywhere in this module. If it's not used externally, consider removing it to avoid confusion.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e5556c2 and 6037b8a.

📒 Files selected for processing (2)
  • integration-tests/tests/crash-recovery.lux
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.855Z
Learning: Define shapes in server/proxy – no client-defined tables/WHERE clauses
📚 Learning: 2026-01-20T16:17:48.415Z
Learnt from: magnetised
Repo: electric-sql/electric PR: 3716
File: packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex:207-224
Timestamp: 2026-01-20T16:17:48.415Z
Learning: In Electric's ShapeCache, shape handles are unique identifiers that are never reused. A handle will never be removed and then re-added, so tombstone+re-add race conditions in the WriteBuffer are not a concern.

Applied to files:

  • integration-tests/tests/crash-recovery.lux
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex
📚 Learning: 2026-01-14T14:45:05.855Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.855Z
Learning: Avoid old Electric patterns (bidirectional SQLite sync, `electrify()` API) – use Electric HTTP streaming with TanStack DB collections instead

Applied to files:

  • integration-tests/tests/crash-recovery.lux
📚 Learning: 2026-01-14T14:45:20.892Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: examples/tanstack-db-web-starter/AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:20.892Z
Learning: Applies to examples/tanstack-db-web-starter/src/lib/trpc/**/*.ts : Generate transaction IDs using `pg_current_xact_id()::xid::text` for Electric sync compatibility

Applied to files:

  • integration-tests/tests/crash-recovery.lux
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Test packages/experimental w/ sync-service
  • GitHub Check: Test packages/start w/ sync-service
  • GitHub Check: Test packages/y-electric w/ sync-service
  • GitHub Check: Test packages/react-hooks w/ sync-service
  • GitHub Check: Test packages/typescript-client w/ sync-service
  • GitHub Check: Run Lux integration tests
  • GitHub Check: Build and test sync-service, pg18
  • GitHub Check: Build and test sync-service, pg14
  • GitHub Check: Build and test sync-service, pg17
  • GitHub Check: Build and test sync-service, pg15
🔇 Additional comments (6)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex (6)

58-82: LGTM!

The GenServer API is clean with appropriate timeout handling and well-documented testing utilities.


84-141: LGTM!

Lookup functions correctly address the tombstone race condition by checking the tombstone after the shape lookup, ensuring consistent reads even under concurrent deletions.


143-203: LGTM!

The count and list functions use targeted :ets.match patterns and build the tombstone set once per call, which is reasonable for telemetry and testing purposes where the buffer is expected to remain small.


205-278: LGTM!

Write functions are well-structured. The use of insert_new for atomic tombstone creation prevents double-remove races, and per the learned invariant, handles are unique and never reused so tombstone clearing on add is unnecessary.


354-406: LGTM!

The flush logic correctly handles errors by resetting flushing flags for retry and returning the error tuple. The recursive drain on success ensures all pending operations are flushed before returning.


424-451: LGTM!

The batch write logic properly enforces :ok returns from all Query.* calls and surfaces errors via the rescue block.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@magnetised magnetised force-pushed the magnetised/sqlite-ets-cache branch from 6037b8a to 4168319 Compare January 22, 2026 10:49
@blacksmith-sh

This comment has been minimized.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex`:
- Around line 29-31: Update the module docstring in the write_buffer.ex file to
correct the misspelled word "reconcilation" to "reconciliation" so the
documentation reads properly; locate the module-level `@moduledoc` (or the
top-of-file comment) in the WriteBuffer (or ShapeStatus/shape_db write buffer)
module and replace the typo with the correct spelling.
♻️ Duplicate comments (1)
integration-tests/tests/crash-recovery.lux (1)

63-64: Broaden handle regex to accept hex UUIDs.

([\d-]+) fails for a-f characters; use hex-friendly pattern.

🔧 Proposed fix
-  ?electric-handle: ([\d-]+)
+  ?electric-handle: ([\da-fA-F-]+)
-  ?electric-handle: ([\d-]+)
+  ?electric-handle: ([\da-fA-F-]+)
-  ?electric-handle: ([\d-]+)
+  ?electric-handle: ([\da-fA-F-]+)

Also applies to: 80-81, 115-116

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6037b8a and 4168319.

📒 Files selected for processing (2)
  • integration-tests/tests/crash-recovery.lux
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.855Z
Learning: Define shapes in server/proxy – no client-defined tables/WHERE clauses
📚 Learning: 2026-01-20T16:17:48.415Z
Learnt from: magnetised
Repo: electric-sql/electric PR: 3716
File: packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex:207-224
Timestamp: 2026-01-20T16:17:48.415Z
Learning: In Electric's ShapeCache, shape handles are unique identifiers that are never reused. A handle will never be removed and then re-added, so tombstone+re-add race conditions in the WriteBuffer are not a concern.

Applied to files:

  • integration-tests/tests/crash-recovery.lux
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex
📚 Learning: 2026-01-14T14:45:05.855Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.855Z
Learning: Avoid old Electric patterns (bidirectional SQLite sync, `electrify()` API) – use Electric HTTP streaming with TanStack DB collections instead

Applied to files:

  • integration-tests/tests/crash-recovery.lux
📚 Learning: 2026-01-14T14:45:20.892Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: examples/tanstack-db-web-starter/AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:20.892Z
Learning: Applies to examples/tanstack-db-web-starter/src/lib/trpc/**/*.ts : Generate transaction IDs using `pg_current_xact_id()::xid::text` for Electric sync compatibility

Applied to files:

  • integration-tests/tests/crash-recovery.lux
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Test packages/experimental w/ sync-service
  • GitHub Check: Test packages/y-electric w/ sync-service
  • GitHub Check: Test packages/typescript-client w/ sync-service
  • GitHub Check: Test packages/react-hooks w/ sync-service
  • GitHub Check: Test packages/start w/ sync-service
  • GitHub Check: Build and test sync-service, pg14
  • GitHub Check: Build and test sync-service, pg18
  • GitHub Check: Build and test sync-service, pg17
  • GitHub Check: Run Lux integration tests
  • GitHub Check: Build and test sync-service, pg15
🔇 Additional comments (2)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex (2)

65-67: Good: unique op key avoids ordered_set collisions.

The composite {monotonic_time, unique_integer} key is the right move for uniqueness + ordering.


374-413: Good: failed flush resets flags for retry.

Resetting the flushing flag on error preserves retries without dropping ops.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@magnetised magnetised force-pushed the magnetised/sqlite-ets-cache branch from 1471947 to dbf0748 Compare January 22, 2026 11:41
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (1)

233-255: Propagate flush_sync/1 failures before validation.

validate_existing_shapes/1 relies on persisted state. If the flush fails, continuing can validate against stale data and silently drop buffered writes. Make the flush result part of the with so failures are returned to the caller.

🛠️ Suggested fix
-    # Must flush buffer before validating so we check persisted state
-    WriteBuffer.flush_sync(stack_id)
-
-    with {:ok, removed_handles} <-
+    # Must flush buffer before validating so we check persisted state
+    with :ok <- WriteBuffer.flush_sync(stack_id),
+         {:ok, removed_handles} <-
            checkout_write!(
              stack_id,
              :validate_existing_shapes,
              fn %Connection{} = conn ->
                with {:ok, handles} <- Query.select_invalid(conn) do
                  Enum.each(handles, fn handle ->
                    :ok = Query.remove_shape(conn, handle)
                  end)

                  {:ok, handles}
                end
              end,
              # increase timeout because we may end up doing a lot of work here
              60_000
            ),
          {:ok, count} <- count_shapes(stack_id) do
       {:ok, removed_handles, count}
     end
packages/sync-service/lib/electric/shape_cache/shape_status.ex (1)

157-175: Guard against undefined ETS tables in shape status queries.

The functions has_shape_handle?/2, validate_shape_handle/3, and snapshot_started?/2 will raise ArgumentError (badarg) if called before ShapeStatus.initialize/1 creates the table, risking crashes in tests or during startup races. Each should check table existence and return safe defaults (false/:error) rather than crashing.

Guard pattern
-  def has_shape_handle?(stack_id, shape_handle) do
-    :ets.member(shape_meta_table(stack_id), shape_handle)
-  end
+  def has_shape_handle?(stack_id, shape_handle) do
+    case :ets.whereis(shape_meta_table(stack_id)) do
+      :undefined -> false
+      _ -> :ets.member(shape_meta_table(stack_id), shape_handle)
+    end
+  end

Apply to validate_shape_handle/3 (return :error), and snapshot_started?/2 (return false).

🤖 Fix all issues with AI agents
In `@packages/sync-service/lib/electric/shape_cache/shape_status.ex`:
- Around line 69-75: The pattern match forcing :ets.insert_new to be true can
crash if the shape handle already exists; instead, change the logic in the block
that calls ShapeDb.add_shape so you explicitly handle both :ets.insert_new
returns (true | false). After calling
:ets.insert_new(shape_meta_table(stack_id), {shape_handle, shape_hash, false,
System.monotonic_time()}), check the result: if true proceed normally, if false
fetch the existing ETS entry (via :ets.lookup/2) and verify/merge state or log a
warning and continue without crashing; ensure any divergence between DB and ETS
is reconciled or surfaced via error logging rather than letting a MatchError
terminate the process.
🧹 Nitpick comments (1)
packages/sync-service/test/electric/shape_cache/shape_status/shape_db_test.exs (1)

336-469: Assert flush_sync/1 results to fail fast in buffer tests.

These tests depend on flush_sync/1 to expose SQLite state. Asserting :ok makes failures explicit and avoids false positives. Consider applying this pattern to all flush_sync/1 calls in this block (and elsewhere in this file).

🧪 Example tweak
-      ShapeDb.WriteBuffer.flush_sync(ctx.stack_id)
+      assert :ok = ShapeDb.WriteBuffer.flush_sync(ctx.stack_id)
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1471947 and dbf0748.

📒 Files selected for processing (4)
  • packages/sync-service/lib/electric/shape_cache/shape_status.ex
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex
  • packages/sync-service/test/electric/shape_cache/shape_status/shape_db_test.exs
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.855Z
Learning: Avoid old Electric patterns (bidirectional SQLite sync, `electrify()` API) – use Electric HTTP streaming with TanStack DB collections instead
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.855Z
Learning: Define shapes in server/proxy – no client-defined tables/WHERE clauses
📚 Learning: 2026-01-20T16:17:48.415Z
Learnt from: magnetised
Repo: electric-sql/electric PR: 3716
File: packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex:207-224
Timestamp: 2026-01-20T16:17:48.415Z
Learning: In Electric's ShapeCache, shape handles are unique identifiers that are never reused. A handle will never be removed and then re-added, so tombstone+re-add race conditions in the WriteBuffer are not a concern.

Applied to files:

  • packages/sync-service/lib/electric/shape_cache/shape_status.ex
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex
  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex
📚 Learning: 2026-01-14T14:45:05.855Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:05.855Z
Learning: Avoid old Electric patterns (bidirectional SQLite sync, `electrify()` API) – use Electric HTTP streaming with TanStack DB collections instead

Applied to files:

  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex
📚 Learning: 2026-01-14T14:45:20.892Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: examples/tanstack-db-web-starter/AGENTS.md:0-0
Timestamp: 2026-01-14T14:45:20.892Z
Learning: Applies to examples/tanstack-db-web-starter/src/lib/trpc/**/*.ts : Generate transaction IDs using `pg_current_xact_id()::xid::text` for Electric sync compatibility

Applied to files:

  • packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Test packages/typescript-client w/ sync-service
  • GitHub Check: Build and test sync-service, pg17
  • GitHub Check: Build and test sync-service, pg15
  • GitHub Check: Build and test sync-service, pg18
  • GitHub Check: Build and test sync-service, pg14
  • GitHub Check: Run Lux integration tests
🔇 Additional comments (36)
packages/sync-service/test/electric/shape_cache/shape_status/shape_db_test.exs (4)

176-201: reduce_shape_meta/3 test matches new snapshot metadata shape.

The tuple expectations and flush-before-reduce flow align with the new SQLite-backed semantics.


223-231: Snapshot-start test update is consistent with the new API.

Covers the expected :ok behavior after the handle is created.


233-249: Snapshot completion flow test looks good.

Covers the “complete before start” allowance and the normal start→complete path.


273-276: get_snapshot_states/1 helper is clear and aligned with new shape meta.

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db.ex (14)

2-23: Module docs and buffer-aware imports read well.


39-56: add_shape/3 buffering path looks solid.

Serializing shape/comparable and queuing a buffered add is consistent with the new write-through design.


59-64: remove_shape/2 guard is clear and consistent.


67-81: Snapshot queueing gates on handle existence as expected.


83-86: Reset clears buffer before SQLite reset—good ordering.


88-113: Handle lookup correctly checks buffer first and honors tombstones.


115-140: Double tombstone check around SQLite read is a good race guard.


143-164: Buffered + SQLite merge with tombstone filtering looks correct.


170-186: Relation-based handle lookup merges buffered and persisted data cleanly.


194-198: reduce_shapes/3 delegation to list_shapes/1 is clean.


201-207: Boot-time reduce_shape_meta/3 usage is straightforward.


210-217: Count composition with pending diff is sensible and documented.


224-230: Buffer-aware handle_exists?/2 is concise and clear.


264-268: pending_buffer_size/1 is a clean public exposure of buffer depth.

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex (13)

1-68: WriteBuffer docs and op key uniqueness look solid.


69-93: Public control API (start/flush/pause/resume) is clean and minimal.


97-125: Lookup helpers correctly respect tombstones.


127-152: Handle existence semantics are well-defined and race-aware.


154-174: Pending counts and buffered listing logic are straightforward.


176-191: Relation-based handle filtering in the buffer looks good.


194-199: Tombstone extraction is simple and clear.


203-278: Buffered add/remove/snapshot ops and clear flow are consistent.


280-320: ETS setup and initialization look good.


326-365: Polling and flush orchestration are clean.


366-414: Flush cleanup and retry behavior looks solid.


416-430: Batch selection + flushing-flag updates are OK.


432-459: Batch write application with error handling is robust.

packages/sync-service/lib/electric/shape_cache/shape_status.ex (5)

32-35: Centralizing tuple position avoids magic indexes.

Nice to see the tuple position declared once and reused in updates—this reduces accidental drift as the meta tuple evolves.

Also applies to: 218-222


49-61: Init path cleanly encapsulates meta-table lifecycle.

Splitting table creation and population into helpers keeps the boot flow clear and maintainable.

Also applies to: 277-305


129-130: Cache cleanup stays aligned with DB removal/reset.

The ETS deletes on removal/reset keep the in-memory view consistent with persisted state.

Also applies to: 138-138


182-187: Snapshot-start tracking now fails fast on cache update failure.

Good to see the return value reflect consistency between storage and cache.


231-258: LRU fold updated for the 4-tuple meta shape.

The fold function now correctly extracts the last-read field from the new tuple structure.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@magnetised magnetised force-pushed the magnetised/sqlite-ets-cache branch from 011b43e to 5f99549 Compare January 23, 2026 16:07
@blacksmith-sh

This comment has been minimized.

@magnetised
Copy link
Contributor Author

benchmark this

@github-actions
Copy link
Contributor

github-actions bot commented Jan 23, 2026

Benchmark results, triggered for 5f9954

  • write fanout completed

write fanout results

  • unrelated shapes one client latency completed

unrelated shapes one client latency results

  • concurrent shape creation completed

concurrent shape creation results

  • diverse shape fanout completed

diverse shape fanout results

  • many shapes one client latency completed

many shapes one client latency results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants