-
Notifications
You must be signed in to change notification settings - Fork 94
docs(ADR): extends the fractional operator to support up to .001% distributions #1800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…allocations Signed-off-by: Michael Beemer <[email protected]>
✅ Deploy Preview for polite-licorice-3db33c ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @beeme1mr, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces a significant enhancement to the fractional traffic allocation mechanism. It aims to provide much finer-grained control over traffic distribution, enabling precise sub-percent allocations critical for high-throughput environments, A/B testing, and canary deployments. The changes are designed to be backward-compatible with the existing API while improving reliability through robust error handling and validation.
Highlights
- Enhanced Fractional Operator Precision: The fractional operator now supports traffic allocation down to 0.001% granularity, achieved by increasing the internal bucket count from 100 to 100,000.
- API Compatibility Maintained: The existing weight-based API remains unchanged, ensuring backwards compatibility for current configurations.
- Robust Edge Case Handling: New logic addresses various edge cases, including minimum allocation guarantees for variants with positive weights, management of excess buckets, and validation for weight sum overflows.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is a well-written and thorough ADR that clearly outlines the proposal to enhance the fractional operator. The move to a 100,000-bucket system with a minimum allocation guarantee is a great improvement for fine-grained traffic control. My review includes a few suggestions to address potential issues with implementation details, particularly around ensuring deterministic behavior and handling all allocation scenarios correctly. These points focus on preventing bucket deficits and ensuring cross-language consistency in sorting and arithmetic.
| return nil | ||
| } | ||
|
|
||
| func calculateBucketAllocations(variants []fractionalEvaluationVariant, totalWeight int) []bucketAllocation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calculateBucketAllocations function handles the case where the sum of allocated buckets exceeds bucketCount (excess > 0). However, it doesn't account for the opposite scenario: when the total allocated buckets is less than bucketCount due to rounding down during the proportional calculation (int(...)). If a deficit of buckets occurs, some hash values in distributeValue will not fall into any variant's bucket range, causing the function to incorrectly return an empty string. The ADR should specify how to handle bucket deficits, for instance, by distributing the remaining buckets among the variants (e.g., starting with the largest ones) to ensure the total is exactly bucketCount.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Properly supporting guaranteed bucketing may add more complexity than I'd like. I'm sure I can address this issue but I'd like feedback on if it's worth supporting this at all.
The reason I added this is to avoid configurations like ["variant-a", 1], ["variant-b", 1000000], resulting in zero evaluations returning variant-a. You could argue it's a misconfiguration, but it may not be obvious the the user.
| When minimum allocations cause the total to exceed 100,000 buckets, excess buckets are removed from the variant with the largest allocation. | ||
| This approach: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description of how excess buckets are handled is slightly inconsistent. This section states that excess buckets are removed from 'the variant with the largest allocation' (singular), while the 'Edge Case Handling' section on line 127 refers to it as 'Excess distributed fairly among largest variants' (plural). The code example shows a sequential removal process. For clarity and consistency, I suggest refining the description to accurately reflect the implementation, for example: 'Excess buckets are removed sequentially from variants with the largest allocations, starting with the largest, until the total bucket count is exactly 100,000.'
docs/architecture-decisions/high-precision-fractional-bucketing.md
Outdated
Show resolved
Hide resolved
| - Support weight values up to a reasonable maximum that works across multiple languages | ||
| - Maintain current performance characteristics | ||
| - Prevent users from being moved between buckets when only distribution percentages change | ||
| - Guarantee that any variant with weight > 0 receives some traffic allocation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went back and forth on this. It isn't necessary if the flag is configured properly but I'm afraid that it wouldn't be that obvious that there's a misconfiguration. This basically prevents 0% distribution if a weight is defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also feel conflicted about this, if we were to go forward with a strictly defined max bucket size.
TBH I'm not sure the special handling is worth the possible user confusion in this extreme case.
The other obvious solution is to add a warning at evaluation time (we do similar things for other rules, like invalid semver params)
…g.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Michael Beemer <[email protected]>
…g.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Michael Beemer <[email protected]>
| - **Option 1: 10,000 buckets (0.01% precision)** - 1 in every 10,000 users, better but still not sufficient for many high-throughput use cases | ||
| - **Option 2: 100,000 buckets (0.001% precision)** - 1 in every 100,000 users, meets most high-precision needs | ||
| - **Option 3: 1,000,000 buckets (0.0001% precision)** - 1 in every 1,000,000 users, likely overkill and could impact performance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any obvious reason that we don't want the max bucket amount to be the sum of all bucket ratios?
ie: ["variant-a", 1], ["variant-b", 1000000] results in 1000001 buckets?
This is:
- backwards compatible
- supports infinite precision (though we'd still have to do the max integer check you mentioned)
- allows us to sidestep the "Minimum Allocation Guarantee" (what you are describing here) which I think is somewhat surprising behavior to end users and feels somewhat arbitrary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to think about this a bit more. I don't think my second image is accurate because, despite the bucket sizes changing, the location across the distribution should be consistent. I'll need to run a few tests to see which approach is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the challenge here properly (and I am not 100% sure so please correct me), we want to ensure that, for a given value "X", it's always distributed into the same bucket no matter how we express the buckets proportion. So, [0.3, 0.5, 0, 0.2], [3, 5, 2], [9, 15, 6], etc should all work the same, across all platforms, all languages.
Additionally, we would like to ensure that very skewed distributions (e.g. [0.1, 1000000]) don't end up simplifying some buckets to 0.
Lastly, we need to be cautious about floating point arithmetic.
Overall, this is a pretty complex problem :D
To meet all these requirements, I think, we need to implement some sort of integer-based bucket normalization to get the canonical representation of the buckets proportion (e.g. the examples above would all normalize to [3, 5, 2]). To achieve that we might need the following:
- Use exact precision libraries (e.g. Decimal) when parsing the buckets rather than built-in float types (not sure if that exists for all our languages).
- If non-integer buckets are present, scale everything to integers
- Calculate the Greatest Common Divisor and normalize using that
- Use some advanced algorithm to translate the hash into the correct range (using simple mod operation might skew the distribution).
- We probably need to do scaling and normalization and subsequent bucket calculations using Bigints and not standard int32/int64 types (cause of overflows).
The challenge with that approach though is that now we can have the sum of all the buckets be larger than the maximum hash... There are ways around this (that would require using a different approach to "hashing" or downscaling the buckets to fit in int) or we can just say that such cases constitute invalid inputs (I think that saying that the minimum valid resolution is 0.001% essentially guarantees that?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya I think you understand the challenge.
I spoke with @beeme1mr a bit about this, and we suspect my proposal here might solve most of our concerns. I don't think we "lose determinism" in any substantially different way than alternatives.
We can lean on JSON schema to mark any non-integer inputs as invalid - we don't really need to support decimals here... as long as we give users the ability to describe relative weights (this can be done with ints, obviously) we can sidestep that mess, IMO.
Overflows might still be a concern for high numbers, but we can also specify and document limits to the total weight and error if we exceed that - that might be better than calculating a GCD, which adds a performance cost that's not going to be necessary in most cases (I'm quite confident most people will just use configs with weights like 1/1/1 or 50/50 etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to do a little PoC for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if we get rid of floats, in your proposal, how would we deal with same proportions but different values, e.g. [3, 5, 2], [9, 15, 6]?
|
|
||
| ### Maximum Weight Considerations | ||
|
|
||
| To ensure cross-language compatibility, we establish a maximum total weight sum equal to the maximum 32-bit signed integer value (2,147,483,647). This limit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good limit, regardless of this.
toddbaert
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to this solution, but I would like to understand why you think this is inferior, as it seems like an obvious choice (but maybe I'm missing something). If I am, can we record why we wouldn't be interested in that approach?


This PR
Notes
Addresses a limitation of the current fractional operator that prevents sub-percent traffic allocations. In high-throughput services, 1% of traffic may represent a significant number of requests.
Related issues
#1788