Skip to content

Conversation

@yannham
Copy link
Member

@yannham yannham commented Dec 4, 2025

This PR fixes a useless indirection when storing a closure/thunk as a value (or, before that, as a term). We would handle this case by storing a Thunk, which is itself of the form Rc<RefCell<ThunkData>>. However, value blocks are already reference-counted, so it doesn't make much sense to store an rc-ed pointer into another rc-ed pointer for no obvious benefit.

This commit implements the usual layout for thunks, which is to store directly the mutable thunk data into the value block, removing an indirection and an allocation.

There is the question of how to define Thunk, now that it is the same as a NickelValue or a ValueBlockRc, just one specific case. I started by performing operations on value::ThunkData := RefCell<lazy::ThunkData> first, but some operations do require not only the content, but the original Rc pointer (typically, for cloning or for testing physical equality). In the end, I made Thunk a smart constructor/transparent wrapper around NickelValue, which encodes at the type level that we know that this value is a block of type thunk.

This commit fixes a useless indirection when storing a closure/thunk as
a value (or, before that, as a term). We would handle this case by
storing a Thunk, which is itself of the form `Rc<RefCell<ThunkData>>`.
However, value blocks are already reference-counted, so it doesn't make
much sense to store an rc-ed pointer into another rc-ed pointer, for no
obvious benefit.

This commit implements the usual layout for thunks, which is to store
directly the mutable thunk data into the value block, removing an
indirection and an allocation.
/// converse isn't true: two thunks can be equal as expressions but have different UID.
///
/// In practice, the UID is currently the underlying `Rc` pointer value.
pub fn uid(&self) -> usize {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uid() seemed to not be used anymore.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 4, 2025

🐰 Bencher Report

Branchperf/thunks-inline
Testbedubuntu-latest
Click to view all benchmark results
BenchmarkLatencymicroseconds (µs)
diagnostics-benches/inputs/goto-perf.ncl📈 view plot
🚷 view threshold
10,245.00 µs
diagnostics-benches/inputs/large-record-tree.ncl📈 view plot
🚷 view threshold
191,570.00 µs
diagnostics-benches/inputs/redis-replication-controller.ncl📈 view plot
🚷 view threshold
306.61 µs
diagnostics-benches/inputs/small-record-tree.ncl📈 view plot
🚷 view threshold
426.88 µs
fibonacci 10📈 view plot
🚷 view threshold
228.37 µs
foldl arrays 50📈 view plot
🚷 view threshold
389.59 µs
foldl arrays 500📈 view plot
🚷 view threshold
2,859.00 µs
foldr strings 50📈 view plot
🚷 view threshold
3,296.70 µs
foldr strings 500📈 view plot
🚷 view threshold
30,834.00 µs
generate normal 250📈 view plot
🚷 view threshold
23,384.00 µs
generate normal 50📈 view plot
🚷 view threshold
1,222.00 µs
generate normal unchecked 1000📈 view plot
🚷 view threshold
1,759.70 µs
generate normal unchecked 200📈 view plot
🚷 view threshold
362.90 µs
init-diagnostics-benches/inputs/goto-perf.ncl📈 view plot
🚷 view threshold
55,665.00 µs
init-diagnostics-benches/inputs/large-record-tree.ncl📈 view plot
🚷 view threshold
208,840.00 µs
init-diagnostics-benches/inputs/redis-replication-controller.ncl📈 view plot
🚷 view threshold
48,898.00 µs
init-diagnostics-benches/inputs/small-record-tree.ncl📈 view plot
🚷 view threshold
48,854.00 µs
pidigits 100📈 view plot
🚷 view threshold
1,946.00 µs
pipe normal 20📈 view plot
🚷 view threshold
608.73 µs
pipe normal 200📈 view plot
🚷 view threshold
4,559.70 µs
product 30📈 view plot
🚷 view threshold
274.17 µs
requests-benches/inputs/goto-perf.ncl-000📈 view plot
🚷 view threshold
3,110.60 µs
requests-benches/inputs/large-record-tree.ncl-000📈 view plot
🚷 view threshold
595,160.00 µs
requests-benches/inputs/large-record-tree.ncl-001📈 view plot
🚷 view threshold
87.98 µs
scalar 10📈 view plot
🚷 view threshold
517.81 µs
sum 30📈 view plot
🚷 view threshold
274.82 µs
🐰 View full continuous benchmarking report in Bencher

@yannham
Copy link
Member Author

yannham commented Dec 4, 2025

This wins a noticeable chunk of memory usage, but is around 2% slower on the OPL benchmark. Investigating a bit. I can see with cachegrind that the instruction cache has different stats (otherwise there seems to be slightly less branches, slightly less mispredictions, and slightly less data access/data cache misses), so maybe it's just noise from the change of layout of the binary 🤷‍♂️ Another possibility is that the improvement is nullified by our less optimal implementation of Rc than in the stdlib, as clone/drop for NickelValue is probably stressed a lot.

@yannham
Copy link
Member Author

yannham commented Dec 5, 2025

Ok, after a round of optimizing a bit the reference counting part of NickelValue (mostly copying the stdlib), I got the 2% back, and even 2% more. I'll push the optimization in a follow-up PR, but let's move forward with this one independently first.

@yannham yannham marked this pull request as ready for review December 5, 2025 16:19
@yannham yannham requested a review from jneem December 5, 2025 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants