Skip to content

Conversation

@AndrewWestberg
Copy link
Contributor

@AndrewWestberg AndrewWestberg commented Dec 2, 2025

For dApps that use witness datum values, you have to keep the datum values off-chain so that when you build a transaction that spends a utxo containing a datum hash, you can provide the actual datum value to witness it.

Since dolos is indexing the blockchain anyway, have it index these actual datum values that are represented in the ledger as only a hash.

This makes it easier when building new transactions to just use dolos as the off-chain store for the datum values.

Summary by CodeRabbit

  • New Features
    • Capture and track Plutus datum bytes added/removed during UTXO updates and recovery.
    • Persist datums with reference counting to manage lifecycle and avoid dangling data.
    • Expose datum lookup in the state API so services can retrieve datum bytes.
    • UTXO query responses now enrich outputs with associated datum payloads and emit clearer, operation‑specific warnings.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 2, 2025

📝 Walkthrough

Walkthrough

Adds end-to-end datum witness tracking: UtxoSetDelta now records datum additions/removals; cardano delta computation records per-transaction datums; redb3 persists datums with refcounts and exposes lookup; StateStore gains get_datum; gRPC UTXO responses are enriched with decoded Plutus datums from storage.

Changes

Cohort / File(s) Summary
Core data structures
crates/core/src/lib.rs
Adds witness_datums_add: HashMap<Hash<32>, Vec<u8>> and witness_datums_remove: HashSet<Hash<32>> to pub struct UtxoSetDelta.
StateStore trait
crates/core/src/state.rs
Adds fn get_datum(&self, datum_hash: &Hash<32>) -> Result<Option<Vec<u8>>, StateError> and imports Hash.
Delta computation (Cardano)
crates/cardano/src/utxoset.rs
Builds per-tx witness_datums_map from tx.plutus_data(); populates delta.witness_datums_add for produced outputs and delta.witness_datums_remove for consumed outputs; uses Arc-backed stxi bodies when inserting consumed/recovered entries.
Storage layer (redb3)
crates/redb3/src/state/mod.rs, crates/redb3/src/state/utxoset.rs
Adds DatumsTable (initialize/get/increment/decrement/stats), datum key/value aliases, wires datum refcount increment/decrement into UtxosTable::apply, exposes get_datum, and includes datums in utxoset stats.
Query / gRPC enrichment
src/serve/grpc/query.rs
Changes into_u5c_utxo signature to accept state and return boxed errors; when output has a datum hash, fetches datum via StateStore::get_datum, decodes PlutusData, enriches the UTXO response, and logs warnings on missing/invalid datums; updates call sites to pass state.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant gRPC as gRPC handler
    participant Mapper as interop::Mapper
    participant State as StateStore
    participant DB as redb3::DatumsTable

    Client->>gRPC: request UTXO(s)
    gRPC->>Mapper: map txo + era body -> parsed output (may include datum_hash)
    alt output has datum_hash
        gRPC->>State: get_datum(datum_hash)
        State->>DB: read DatumsTable
        DB-->>State: Option<Vec<u8>>
        alt datum present
            gRPC->>gRPC: decode PlutusData bytes
            gRPC-->>Client: enriched UTXO with decoded datum
        else missing / decode error
            gRPC-->>Client: UTXO without datum (warn logged)
        end
    else no datum_hash
        gRPC-->>Client: UTXO as-is
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • scarmuega

Poem

🐰 I hopped through deltas, neat and spry,
Collected datums, each byte a sigh.
Counts climb, then tumble, then settle down—
Queries fetch treasures to wear like a crown.
A rabbit's ledger hums—cheer and a nibble! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'feat: Track and ref-count witness-set datum values' accurately and concisely summarizes the main change—adding indexing and reference-counting for witness datum values across the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

🧹 Recent nitpick comments
crates/redb3/src/state/utxoset.rs (1)

170-190: Verify: silent no-op on missing datum during decrement.

The decrement method returns Ok(0) when the datum doesn't exist (lines 178-180). While this provides graceful degradation, it could mask logic errors where a decrement is attempted for a datum that was never tracked. Consider whether a warning log would be helpful for debugging.

♻️ Optional: Add debug logging for decrement on missing entry
         let Some((count, bytes)) = entry_data else {
+            tracing::debug!(
+                datum_hash = hex::encode(&**datum_hash),
+                "Decrement called for non-existent datum"
+            );
             return Ok(0);
         };

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8334b9b and 745ec93.

📒 Files selected for processing (6)
  • crates/cardano/src/utxoset.rs
  • crates/core/src/lib.rs
  • crates/core/src/state.rs
  • crates/redb3/src/state/mod.rs
  • crates/redb3/src/state/utxoset.rs
  • src/serve/grpc/query.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/core/src/lib.rs
🧰 Additional context used
🧬 Code graph analysis (3)
crates/core/src/state.rs (2)
crates/redb3/src/state/mod.rs (1)
  • get_datum (361-366)
crates/redb3/src/state/utxoset.rs (1)
  • get_datum (506-509)
crates/redb3/src/state/mod.rs (2)
crates/redb3/src/state/utxoset.rs (5)
  • initialize (52-56)
  • initialize (143-146)
  • initialize (237-245)
  • get_datum (506-509)
  • get (148-151)
crates/core/src/state.rs (1)
  • get_datum (412-412)
crates/redb3/src/state/utxoset.rs (3)
crates/redb3/src/state/mod.rs (2)
  • new (174-177)
  • get_datum (361-366)
crates/redb3/src/lib.rs (1)
  • initialize (87-98)
crates/redb3/src/archive/indexes.rs (1)
  • initialize (364-380)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Check Build
🔇 Additional comments (21)
crates/core/src/state.rs (2)

4-4: LGTM!

The import of Hash from pallas is appropriate for the new get_datum method signature.


411-412: LGTM!

The new get_datum trait method follows the established patterns in StateStore and provides a clean interface for datum retrieval by hash. The signature is consistent with other getter methods in the trait.

crates/redb3/src/state/mod.rs (2)

134-134: LGTM!

Correctly initializes the DatumsTable during schema setup, consistent with the initialization of other tables.


360-366: LGTM!

The get_datum implementation follows the established pattern of other state retrieval methods, properly opening a read transaction and delegating to DatumsTable::get.

crates/cardano/src/utxoset.rs (5)

61-66: LGTM!

Building the witness datums map from tx.plutus_data() using raw_cbor().to_vec() is the correct approach. This safely handles the CBOR data without the panic risk from the previous unwrap() pattern.


71-77: LGTM!

Correctly populates witness_datums_add only when the datum bytes are available in the witness set. This ensures we only track datums that were actually provided in the transaction.


91-100: LGTM!

The use of with_dependent to access the output and detect datum hashes before consumption is correct. The Arc cloning pattern ensures proper ownership handling.


118-123: LGTM!

Correctly marks datum hashes for removal when undoing produced outputs, ensuring the ref-count will be decremented.


144-156: LGTM!

The undo delta correctly restores datum tracking when recovering consumed UTXOs, re-adding datums from the witness set when available.

crates/redb3/src/state/utxoset.rs (7)

21-22: LGTM!

The type aliases for DatumKey and DatumValue are well-defined and consistent with the existing UtxosKey/UtxosValue pattern.


107-113: LGTM!

The integration of datum tracking into UtxosTable::apply correctly invokes increment for additions and decrement for removals, maintaining the ref-count alongside UTXO operations.


138-146: LGTM!

The DatumsTable struct and its initialization follow the established patterns for table definitions in this codebase.


148-151: LGTM!

The get method correctly retrieves the datum bytes by hash, extracting only the bytes (index 1) from the stored tuple.


153-168: LGTM!

The increment method correctly handles both new datum insertions and ref-count increments. The pattern of always storing the provided datum_value is correct since the same datum hash will always correspond to the same datum bytes.


192-196: LGTM!

The stats method follows the same pattern as UtxosTable::stats.


496-509: LGTM!

The stats integration and get_datum method correctly expose the new datum table functionality through the StateStore interface.

src/serve/grpc/query.rs (5)

204-213: LGTM!

The function signature change to accept state and return Box<dyn std::error::Error> is appropriate for integrating datum lookup, which may produce various error types.


216-253: LGTM!

The datum enrichment logic correctly:

  1. Checks for outputs with datum hashes
  2. Attempts to fetch the datum from storage
  3. Decodes and maps the PlutusData on success
  4. Logs warnings for missing or failed lookups without failing the request

This graceful degradation ensures that UTXO queries succeed even when datum lookup fails.


338-338: LGTM!

Correctly passes self.domain.state() to the updated into_u5c_utxo function.


385-385: LGTM!

Correctly passes self.domain.state() to the updated into_u5c_utxo function in search_utxos.


276-276: LGTM!

More descriptive log messages improve observability by clearly identifying which gRPC endpoint was invoked.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/cardano/src/utxoset.rs (1)

134-141: Duplicate import inside loop and same panic risk.

The OriginalHash import at line 136 is redundant since it's already imported at the function scope in compute_apply_delta (line 57). Additionally, the same unwrap() concern from the apply path applies here.

     for (_, tx) in txs.iter() {
+        use pallas::ledger::traverse::OriginalHash;
         let mut witness_datums_map: HashMap<_, _> = HashMap::new();
         for datum in tx.plutus_data() {
-            use pallas::ledger::traverse::OriginalHash;
             let datum_hash = datum.original_hash();
             let datum_bytes =
                 pallas::codec::minicbor::to_vec(datum.clone().unwrap()).unwrap_or_default();
             witness_datums_map.insert(datum_hash, datum_bytes);
         }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 17795f8 and a616896.

📒 Files selected for processing (6)
  • crates/cardano/src/utxoset.rs (3 hunks)
  • crates/core/src/lib.rs (1 hunks)
  • crates/core/src/state.rs (2 hunks)
  • crates/redb3/src/state/mod.rs (3 hunks)
  • crates/redb3/src/state/utxoset.rs (5 hunks)
  • src/serve/grpc/query.rs (11 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
crates/core/src/state.rs (2)
crates/redb3/src/state/mod.rs (1)
  • get_datum (362-367)
crates/redb3/src/state/utxoset.rs (1)
  • get_datum (545-548)
src/serve/grpc/query.rs (4)
crates/core/src/lib.rs (1)
  • state (607-607)
crates/core/src/state.rs (1)
  • get_datum (412-412)
crates/redb3/src/state/mod.rs (1)
  • get_datum (362-367)
crates/redb3/src/state/utxoset.rs (1)
  • get_datum (545-548)
🪛 GitHub Actions: CI
src/serve/grpc/query.rs

[error] 231-231: cargo clippy failed. Clippy error: the borrowed expression implements the required traits (src/serve/grpc/query.rs:231:54).

🪛 GitHub Check: Check Build
src/serve/grpc/query.rs

[failure] 247-247:
the borrowed expression implements the required traits


[failure] 241-241:
the borrowed expression implements the required traits


[failure] 240-240:
the borrowed expression implements the required traits


[failure] 231-231:
the borrowed expression implements the required traits

🔇 Additional comments (10)
crates/core/src/lib.rs (1)

213-214: LGTM!

The new witness_datums_add and witness_datums_remove fields are appropriately typed for tracking datum witnesses during delta computation. The use of HashMap<Hash<32>, Vec<u8>> for add operations and HashSet<Hash<32>> for removal tracking aligns well with the reference-counting storage pattern implemented in the redb3 layer.

crates/redb3/src/state/mod.rs (2)

134-135: LGTM!

Table initialization for DatumsTable and DatumRefCountTable follows the existing pattern used for UtxosTable and FilterIndexes.


361-367: LGTM!

The get_datum implementation follows the established pattern for read operations in this store, correctly opening a read transaction and delegating to the table-level method.

crates/core/src/state.rs (1)

411-412: LGTM!

The get_datum method signature is consistent with other data retrieval methods in the StateStore trait, returning Result<Option<Vec<u8>>, StateError> to handle both missing data and errors appropriately.

src/serve/grpc/query.rs (1)

204-213: LGTM on the signature and datum handling approach.

The approach of enriching UTXO responses with datum payloads when a datum hash is present is well-designed. The graceful degradation (warning log but continuing) when datum lookup fails is appropriate for non-critical enrichment.

crates/redb3/src/state/utxoset.rs (4)

107-121: LGTM on reference-counted datum storage integration.

The logic correctly implements reference counting: insert datum only on first reference (refcount == 1), remove datum when last reference is gone (refcount == 0). This ensures datums are retained as long as any UTXO references them.


207-224: Defensive handling for underflow is appropriate.

The early return at line 211-213 when current == 0 prevents underflow and handles edge cases gracefully. This could occur if a datum was never indexed (e.g., from pre-upgrade data) and is being consumed.


146-180: LGTM on DatumsTable implementation.

Clean implementation following the existing table patterns. The use of &**datum_hash to dereference Hash<32> to &[u8; 32] is correct for the key type.


544-548: LGTM!

The get_datum method on StateStore provides a convenient public API for datum retrieval, correctly delegating to DatumsTable::get.

crates/cardano/src/utxoset.rs (1)

64-70: The concern about .unwrap() panicking on datum.clone() is based on a misunderstanding of KeepRaw::unwrap(). This method has the signature pub fn unwrap(self) -> T and simply returns the decoded inner value—it does not panic. CBOR decoding errors occur at the time KeepRaw is created (during minicbor::decode), not when calling unwrap(). The existing code is safe; the .unwrap_or_default() on the minicbor::to_vec() already handles encoding failures appropriately.

Likely an incorrect or invalid review comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/serve/grpc/query.rs (1)

229-252: Previous clippy issues appear to be resolved.

The hex::encode calls now use the values directly (e.g., hex::encode(datum_hash)) rather than unnecessary borrows, which should satisfy the clippy lint mentioned in past review comments.

🧹 Nitpick comments (2)
crates/cardano/src/utxoset.rs (1)

134-141: Duplicate datum extraction logic - consider refactoring to a helper function.

This block duplicates the datum extraction logic from compute_apply_delta (lines 64-70). Consider extracting to a helper function for maintainability.

fn extract_witness_datums(tx: &MultiEraTx) -> HashMap<Hash<32>, Vec<u8>> {
    use pallas::ledger::traverse::OriginalHash;
    tx.plutus_data()
        .map(|datum| {
            let datum_hash = datum.original_hash();
            let datum_bytes = pallas::codec::minicbor::to_vec(datum.clone().unwrap())
                .unwrap_or_default();
            (datum_hash, datum_bytes)
        })
        .collect()
}
crates/redb3/src/state/utxoset.rs (1)

207-224: Consider logging when decrementing a non-existent datum.

The decrement function silently returns 0 when the datum doesn't exist in the refcount table (line 211-213). While this defensive behavior prevents panics, it could mask bugs where datums are being removed without having been properly added.

Consider adding a debug/trace log when this edge case occurs:

         if current == 0 {
+            // This shouldn't happen in normal operation - could indicate a bug
+            tracing::debug!(datum_hash = %hex::encode(&**datum_hash), "Attempted to decrement non-existent datum refcount");
             return Ok(0);
         }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a616896 and 8334b9b.

📒 Files selected for processing (6)
  • crates/cardano/src/utxoset.rs (3 hunks)
  • crates/core/src/lib.rs (1 hunks)
  • crates/core/src/state.rs (2 hunks)
  • crates/redb3/src/state/mod.rs (3 hunks)
  • crates/redb3/src/state/utxoset.rs (5 hunks)
  • src/serve/grpc/query.rs (11 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/redb3/src/state/mod.rs
🧰 Additional context used
🧬 Code graph analysis (3)
crates/core/src/state.rs (2)
crates/redb3/src/state/mod.rs (1)
  • get_datum (362-367)
crates/redb3/src/state/utxoset.rs (1)
  • get_datum (545-548)
src/serve/grpc/query.rs (2)
crates/core/src/state.rs (1)
  • get_datum (412-412)
crates/redb3/src/state/utxoset.rs (1)
  • get_datum (545-548)
crates/redb3/src/state/utxoset.rs (1)
crates/redb3/src/state/mod.rs (1)
  • get_datum (362-367)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Test (ubuntu-latest)
  • GitHub Check: Test (macos-14)
  • GitHub Check: Test (macos-13)
  • GitHub Check: Test (windows-latest)
🔇 Additional comments (13)
crates/core/src/lib.rs (1)

213-214: LGTM! Well-designed datum tracking fields.

The new witness_datums_add and witness_datums_remove fields are appropriately typed for tracking datum lifecycle. Using HashMap for additions (to store both hash and value) and HashSet for removals (only hash needed) is the right design choice.

crates/core/src/state.rs (1)

411-412: LGTM! Consistent API extension for datum retrieval.

The new get_datum method follows the established pattern of other lookup methods in the StateStore trait, with appropriate return type for optional datum bytes.

crates/cardano/src/utxoset.rs (3)

75-81: Datum is only added if found in witness set - verify this is intentional.

When a produced output has a DatumOption::Hash, the code only adds the datum to witness_datums_add if it exists in the transaction's witness set. This means inline datums or outputs referencing datums from other transactions won't be tracked.

This appears intentional based on the PR description (tracking "witness-set datum values"), but worth confirming this covers all required use cases.


97-104: Consumed outputs mark datums for removal regardless of witness availability.

When consuming a UTXO with a datum hash, the datum is marked for removal in witness_datums_remove. This is correct as the ref-count logic in the storage layer (redb3) will handle the actual removal only when the count reaches zero.


152-162: Undo logic correctly restores datums from witness set when recovering consumed outputs.

The logic properly attempts to restore datum values when recovering previously consumed outputs during a rollback. The asymmetry with the apply path (where removal happens unconditionally) is correct because the ref-counting in storage handles the actual deletion.

src/serve/grpc/query.rs (2)

204-209: Function signature changes look good.

The addition of state: &S::State parameter enables datum lookup from storage. The broadened return type Box<dyn std::error::Error> accommodates multiple error types from the new datum handling logic.


216-253: Datum enrichment logic is well-implemented with appropriate error handling.

The code properly:

  1. Only attempts datum fetch for DatumOption::Hash variants (not inline datums)
  2. Gracefully handles missing datums and decode errors with warnings
  3. Enriches the response with both hash and decoded payload when available

The warning logs provide good observability for debugging datum availability issues without failing the request.

crates/redb3/src/state/utxoset.rs (6)

21-22: LGTM! Clear type aliases for datum storage.

The DatumKey and DatumValue type aliases provide clarity and consistency with the existing UtxosKey/UtxosValue pattern.


107-121: Ref-counting logic is correctly integrated into the apply flow.

The implementation properly:

  1. Increments refcount on datum add, only inserting to DatumsTable on first reference
  2. Decrements refcount on datum remove, only deleting from DatumsTable when count reaches zero

This correctly handles the case where multiple UTXOs may reference the same datum hash.


146-181: DatumsTable implementation is clean and follows existing patterns.

The table provides standard CRUD operations with appropriate use of redb transactions. The implementation mirrors the patterns used by other tables in this file.


183-205: DatumRefCountTable correctly implements reference counting.

The increment function properly:

  1. Retrieves current count (defaulting to 0)
  2. Increments and persists the new value
  3. Returns the new count for caller decision-making

530-540: Stats reporting correctly includes new datum tables.

The utxoset_stats method now includes datums and datum_refcount table statistics, providing visibility into datum storage.


544-548: Public datum retrieval API is correctly implemented.

The get_datum method properly delegates to DatumsTable::get and follows the pattern of other public accessors in StateStore.

@AndrewWestberg AndrewWestberg force-pushed the amw/witness_datum branch 2 times, most recently from d971c87 to a9290fb Compare December 2, 2025 14:39
@scarmuega scarmuega merged commit 6d78f01 into txpipe:main Jan 16, 2026
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants