Skip to content

Support for token quota with failoverΒ #1571

@yanavlasov

Description

@yanavlasov

Description:

A token quota is defined on the AIGatewayRouteRuleBackendRef. Token quota will include maximum number of tokens, a conversion function from consumed tokens to consumed quota. Initially conversion function will be a simple multiplication factor and later on can be a CEL expression. quota is specified similarly to rate limit as the number of tokens per time interval.

If a quota is exceeded request is automatically failed over to the next backend ref in the AIGatewayRouteSpec. If all backend refs have exceeded their quota an 429 error response is served.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions