-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Skip Freshness check for streams not supporting offset lag #17560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR addresses a bug where the FreshnessChecker incorrectly marks Kinesis stream consumption as unhealthy because Kinesis does not always provide ending sequence numbers for active (OPEN) shards. The solution skips freshness checks for streams that don't support offset lag calculation.
Changes:
- Added early return logic to skip freshness checks for streams that don't support offset lag operations
- Logs a warning when freshness check is skipped for unsupported stream providers
| if (!streamMetadataProvider.supportsOffsetLag()) { | ||
| // Cannot conclude if segment has caught up or not. Skip such segments. | ||
| _logger.warn("Stream provider for segment: {} does not support offset subtraction. Current offset: {}", | ||
| segmentName, currentOffset); | ||
| return true; | ||
| } |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states 'Cannot conclude if segment has caught up or not' which implies uncertainty, but the code returns true (indicating the segment is caught up). This is misleading. The comment should clarify that returning true here means the segment is considered healthy/passing the freshness check by default for streams that don't support offset lag, not that it has actually caught up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this. The comment should be a bit more clear on why we are returning true here.
9aman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment. LGTM
| if (!streamMetadataProvider.supportsOffsetLag()) { | ||
| // Cannot conclude if segment has caught up or not. Skip such segments. | ||
| _logger.warn("Stream provider for segment: {} does not support offset subtraction. Current offset: {}", | ||
| segmentName, currentOffset); | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this. The comment should be a bit more clear on why we are returning true here.
|
|
||
| if (!streamMetadataProvider.supportsOffsetLag()) { | ||
| // Cannot conclude if segment has caught up or not. Skip such segments. | ||
| _logger.warn("Stream provider for segment: {} does not support offset subtraction. Current offset: {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think of just logging this once per table? This will be called for every consuming segment for every health check call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing this. This was the existing behaviour as well.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #17560 +/- ##
============================================
+ Coverage 63.14% 63.18% +0.03%
Complexity 1476 1476
============================================
Files 3172 3172
Lines 189783 189793 +10
Branches 29041 29043 +2
============================================
+ Hits 119842 119920 +78
+ Misses 60629 60552 -77
- Partials 9312 9321 +9
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Please fix test: FreshnessBasedConsumptionStatusCheckerTest |
|
cursor messing up. fixing |
Problem
Related to #15682
TLDR: In latest pinot version, FreshnessChecker fails for streams like Kinesis since latest stream offset cannot be determined always.
As per Kinesis code (from software.amazon.awssdk.services.kinesis.model):
When FreshnessChecker is enabler for Kinesis streams, the server status will be marked as bad.
Solution
We skip freshness check for streams not supporting offset lag.