Skip to content

Conversation

@beeme1mr
Copy link
Member

@beeme1mr beeme1mr commented Sep 10, 2025

This PR

  • extends the fractional operator to support up to .001% distributions.
  • defines the expected behavior of several edge cases.

Notes

Addresses a limitation of the current fractional operator that prevents sub-percent traffic allocations. In high-throughput services, 1% of traffic may represent a significant number of requests.

Related issues

#1788

@beeme1mr beeme1mr requested review from a team as code owners September 10, 2025 16:52
@netlify
Copy link

netlify bot commented Sep 10, 2025

Deploy Preview for polite-licorice-3db33c ready!

Name Link
🔨 Latest commit 8ac7bb3
🔍 Latest deploy log https://app.netlify.com/projects/polite-licorice-3db33c/deploys/68c1e9f653a1c100086a43ec
😎 Deploy Preview https://deploy-preview-1800--polite-licorice-3db33c.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Sep 10, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @beeme1mr, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement to the fractional traffic allocation mechanism. It aims to provide much finer-grained control over traffic distribution, enabling precise sub-percent allocations critical for high-throughput environments, A/B testing, and canary deployments. The changes are designed to be backward-compatible with the existing API while improving reliability through robust error handling and validation.

Highlights

  • Enhanced Fractional Operator Precision: The fractional operator now supports traffic allocation down to 0.001% granularity, achieved by increasing the internal bucket count from 100 to 100,000.
  • API Compatibility Maintained: The existing weight-based API remains unchanged, ensuring backwards compatibility for current configurations.
  • Robust Edge Case Handling: New logic addresses various edge cases, including minimum allocation guarantees for variants with positive weights, management of excess buckets, and validation for weight sum overflows.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a well-written and thorough ADR that clearly outlines the proposal to enhance the fractional operator. The move to a 100,000-bucket system with a minimum allocation guarantee is a great improvement for fine-grained traffic control. My review includes a few suggestions to address potential issues with implementation details, particularly around ensuring deterministic behavior and handling all allocation scenarios correctly. These points focus on preventing bucket deficits and ensuring cross-language consistency in sorting and arithmetic.

return nil
}

func calculateBucketAllocations(variants []fractionalEvaluationVariant, totalWeight int) []bucketAllocation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The calculateBucketAllocations function handles the case where the sum of allocated buckets exceeds bucketCount (excess > 0). However, it doesn't account for the opposite scenario: when the total allocated buckets is less than bucketCount due to rounding down during the proportional calculation (int(...)). If a deficit of buckets occurs, some hash values in distributeValue will not fall into any variant's bucket range, causing the function to incorrectly return an empty string. The ADR should specify how to handle bucket deficits, for instance, by distributing the remaining buckets among the variants (e.g., starting with the largest ones) to ensure the total is exactly bucketCount.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Properly supporting guaranteed bucketing may add more complexity than I'd like. I'm sure I can address this issue but I'd like feedback on if it's worth supporting this at all.

The reason I added this is to avoid configurations like ["variant-a", 1], ["variant-b", 1000000], resulting in zero evaluations returning variant-a. You could argue it's a misconfiguration, but it may not be obvious the the user.

Comment on lines +87 to +88
When minimum allocations cause the total to exceed 100,000 buckets, excess buckets are removed from the variant with the largest allocation.
This approach:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The description of how excess buckets are handled is slightly inconsistent. This section states that excess buckets are removed from 'the variant with the largest allocation' (singular), while the 'Edge Case Handling' section on line 127 refers to it as 'Excess distributed fairly among largest variants' (plural). The code example shows a sequential removal process. For clarity and consistency, I suggest refining the description to accurately reflect the implementation, for example: 'Excess buckets are removed sequentially from variants with the largest allocations, starting with the largest, until the total bucket count is exactly 100,000.'

- Support weight values up to a reasonable maximum that works across multiple languages
- Maintain current performance characteristics
- Prevent users from being moved between buckets when only distribution percentages change
- Guarantee that any variant with weight > 0 receives some traffic allocation
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on this. It isn't necessary if the flag is configured properly but I'm afraid that it wouldn't be that obvious that there's a misconfiguration. This basically prevents 0% distribution if a weight is defined.

Copy link
Member

@toddbaert toddbaert Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also feel conflicted about this, if we were to go forward with a strictly defined max bucket size.

TBH I'm not sure the special handling is worth the possible user confusion in this extreme case.

The other obvious solution is to add a warning at evaluation time (we do similar things for other rules, like invalid semver params)

beeme1mr and others added 2 commits September 10, 2025 17:13
…g.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Michael Beemer <[email protected]>
…g.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Michael Beemer <[email protected]>
Comment on lines +47 to +49
- **Option 1: 10,000 buckets (0.01% precision)** - 1 in every 10,000 users, better but still not sufficient for many high-throughput use cases
- **Option 2: 100,000 buckets (0.001% precision)** - 1 in every 100,000 users, meets most high-precision needs
- **Option 3: 1,000,000 buckets (0.0001% precision)** - 1 in every 1,000,000 users, likely overkill and could impact performance
Copy link
Member

@toddbaert toddbaert Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any obvious reason that we don't want the max bucket amount to be the sum of all bucket ratios?

ie: ["variant-a", 1], ["variant-b", 1000000] results in 1000001 buckets?

This is:

  • backwards compatible
  • supports infinite precision (though we'd still have to do the max integer check you mentioned)
  • allows us to sidestep the "Minimum Allocation Guarantee" (what you are describing here) which I think is somewhat surprising behavior to end users and feels somewhat arbitrary

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must have a static bucket size, or we lose determinism.

Here's an example showing a static bucket:
image

And here's one showing a dynamic bucket size:
image

The important part is that the bucket used must always remain the same, regardless of the configuration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to think about this a bit more. I don't think my second image is accurate because, despite the bucket sizes changing, the location across the distribution should be consistent. I'll need to run a few tests to see which approach is better.

Copy link
Contributor

@cupofcat cupofcat Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the challenge here properly (and I am not 100% sure so please correct me), we want to ensure that, for a given value "X", it's always distributed into the same bucket no matter how we express the buckets proportion. So, [0.3, 0.5, 0, 0.2], [3, 5, 2], [9, 15, 6], etc should all work the same, across all platforms, all languages.

Additionally, we would like to ensure that very skewed distributions (e.g. [0.1, 1000000]) don't end up simplifying some buckets to 0.

Lastly, we need to be cautious about floating point arithmetic.

Overall, this is a pretty complex problem :D

To meet all these requirements, I think, we need to implement some sort of integer-based bucket normalization to get the canonical representation of the buckets proportion (e.g. the examples above would all normalize to [3, 5, 2]). To achieve that we might need the following:

  1. Use exact precision libraries (e.g. Decimal) when parsing the buckets rather than built-in float types (not sure if that exists for all our languages).
  2. If non-integer buckets are present, scale everything to integers
  3. Calculate the Greatest Common Divisor and normalize using that
  4. Use some advanced algorithm to translate the hash into the correct range (using simple mod operation might skew the distribution).
  5. We probably need to do scaling and normalization and subsequent bucket calculations using Bigints and not standard int32/int64 types (cause of overflows).

The challenge with that approach though is that now we can have the sum of all the buckets be larger than the maximum hash... There are ways around this (that would require using a different approach to "hashing" or downscaling the buckets to fit in int) or we can just say that such cases constitute invalid inputs (I think that saying that the minimum valid resolution is 0.001% essentially guarantees that?).

Copy link
Member

@toddbaert toddbaert Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya I think you understand the challenge.

I spoke with @beeme1mr a bit about this, and we suspect my proposal here might solve most of our concerns. I don't think we "lose determinism" in any substantially different way than alternatives.

We can lean on JSON schema to mark any non-integer inputs as invalid - we don't really need to support decimals here... as long as we give users the ability to describe relative weights (this can be done with ints, obviously) we can sidestep that mess, IMO.

Overflows might still be a concern for high numbers, but we can also specify and document limits to the total weight and error if we exceed that - that might be better than calculating a GCD, which adds a performance cost that's not going to be necessary in most cases (I'm quite confident most people will just use configs with weights like 1/1/1 or 50/50 etc).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to do a little PoC for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we get rid of floats, in your proposal, how would we deal with same proportions but different values, e.g. [3, 5, 2], [9, 15, 6]?


### Maximum Weight Considerations

To ensure cross-language compatibility, we establish a maximum total weight sum equal to the maximum 32-bit signed integer value (2,147,483,647). This limit:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good limit, regardless of this.

Copy link
Member

@toddbaert toddbaert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to this solution, but I would like to understand why you think this is inferior, as it seems like an obvious choice (but maybe I'm missing something). If I am, can we record why we wouldn't be interested in that approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants