-
Notifications
You must be signed in to change notification settings - Fork 134
Open
Labels
Description
Description:
A token quota is defined on the AIGatewayRouteRuleBackendRef. Token quota will include maximum number of tokens, a conversion function from consumed tokens to consumed quota. Initially conversion function will be a simple multiplication factor and later on can be a CEL expression. quota is specified similarly to rate limit as the number of tokens per time interval.
If a quota is exceeded request is automatically failed over to the next backend ref in the AIGatewayRouteSpec. If all backend refs have exceeded their quota an 429 error response is served.
mathetake and johnugeorge