-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[Draft] [Do not merge] Fix deleted keys workflow to prevent stale record resurrection during restarts #17488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #17488 +/- ##
============================================
+ Coverage 63.27% 63.28% +0.01%
+ Complexity 1477 1476 -1
============================================
Files 3167 3167
Lines 189062 189097 +35
Branches 28930 28939 +9
============================================
+ Hits 119624 119669 +45
+ Misses 60153 60150 -3
+ Partials 9285 9278 -7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
2b20f6f to
4be2257
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a critical race condition in the deleted keys workflow during server restarts. The issue occurs when segments load in arbitrary order, potentially causing deleted records to incorrectly reappear in query results. The fix defers primary key removal until all segments are loaded.
Changes:
- Added segment loading state check before removing expired primary keys
- Introduced a new metric
DELETED_KEYS_SKIPPED_SEGMENT_LOADINGfor observability - Added comprehensive tests to verify the fix works correctly during server restarts
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| BasePartitionUpsertMetadataManager.java | Added areAllSegmentsLoaded() method with caching to check if all segments are loaded before allowing PK removal |
| ConcurrentMapPartitionUpsertMetadataManager.java | Modified doRemoveExpiredPrimaryKeys() to skip deletion when segments are still loading |
| ConcurrentMapPartitionUpsertMetadataManagerForConsistentDeletes.java | Modified doRemoveExpiredPrimaryKeys() to skip deletion when segments are still loading |
| ServerMeter.java | Added new metric DELETED_KEYS_SKIPPED_SEGMENT_LOADING to track skipped deletions |
| ConcurrentMapPartitionUpsertMetadataManagerTest.java | Added test case to verify deleted keys are not removed when segments are still loading |
| ConcurrentMapPartitionUpsertMetadataManagerForConsistentDeletesTest.java | Added test case to verify deleted keys are not removed when segments are still loading |
| protected boolean areAllSegmentsLoaded() { | ||
| if (_allSegmentsLoaded) { | ||
| return true; | ||
| } | ||
| synchronized (this) { | ||
| if (_allSegmentsLoaded) { | ||
| return true; | ||
| } | ||
| long currentTimeMs = System.currentTimeMillis(); | ||
| if (currentTimeMs - _lastSegmentsLoadedCheckTimeMs <= SEGMENTS_LOADED_CHECK_INTERVAL_MS) { | ||
| return false; | ||
| } | ||
| _lastSegmentsLoadedCheckTimeMs = currentTimeMs; | ||
| TableDataManager tableDataManager = _context.getTableDataManager(); | ||
| if (tableDataManager != null) { | ||
| HelixManager helixManager = tableDataManager.getHelixManager(); | ||
| if (helixManager != null) { | ||
| _allSegmentsLoaded = TableStateUtils.isAllSegmentsLoaded(helixManager, _tableNameWithType); | ||
| if (_allSegmentsLoaded) { | ||
| _logger.info("All segments are now loaded for table: {}, partition: {}", _tableNameWithType, _partitionId); | ||
| } | ||
| return _allSegmentsLoaded; | ||
| } | ||
| } | ||
| return false; | ||
| } | ||
| } |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The double-checked locking pattern is incomplete. The _allSegmentsLoaded field is declared as volatile (line 142), but _lastSegmentsLoadedCheckTimeMs is not volatile. This could lead to visibility issues where one thread updates _lastSegmentsLoadedCheckTimeMs inside the synchronized block but another thread reads a stale value before entering the block. Either make _lastSegmentsLoadedCheckTimeMs volatile or always access it within the synchronized block.
| upsertMetadataManager.addRecord(segment2, new RecordInfo(makePrimaryKey(0), 0, new Integer(120), true)); | ||
|
|
||
| // Add another record with a higher timestamp to update largestSeenComparisonValue | ||
| // This makes timestamp 120 fall outside the TTL window (150 - 20 = 130 > 120) | ||
| upsertMetadataManager.addRecord(segment2, new RecordInfo(makePrimaryKey(1), 1, new Integer(150), false)); |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid using new Integer(120) constructor which is deprecated. Use Integer.valueOf(120) or simply 120 (autoboxing) instead. The Integer constructor was deprecated in Java 9 and should be replaced with valueOf or autoboxing for better performance and memory usage.
| upsertMetadataManager.addRecord(segment2, new RecordInfo(makePrimaryKey(0), 0, new Integer(120), true)); | |
| // Add another record with a higher timestamp to update largestSeenComparisonValue | |
| // This makes timestamp 120 fall outside the TTL window (150 - 20 = 130 > 120) | |
| upsertMetadataManager.addRecord(segment2, new RecordInfo(makePrimaryKey(1), 1, new Integer(150), false)); | |
| upsertMetadataManager.addRecord(segment2, new RecordInfo(makePrimaryKey(0), 0, 120, true)); | |
| // Add another record with a higher timestamp to update largestSeenComparisonValue | |
| // This makes timestamp 120 fall outside the TTL window (150 - 20 = 130 > 120) | |
| upsertMetadataManager.addRecord(segment2, new RecordInfo(makePrimaryKey(1), 1, 150, false)); |
| upsertMetadataManager.addRecord(segment2, new RecordInfo(makePrimaryKey(0), 0, new Integer(120), true)); | ||
|
|
||
| // Add another record with a higher timestamp to update largestSeenComparisonValue | ||
| // This makes timestamp 120 fall outside the TTL window (150 - 20 = 130 > 120) | ||
| upsertMetadataManager.addRecord(segment2, new RecordInfo(makePrimaryKey(1), 1, new Integer(150), false)); |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid using new Integer(150) constructor which is deprecated. Use Integer.valueOf(150) or simply 150 (autoboxing) instead. The Integer constructor was deprecated in Java 9 and should be replaced with valueOf or autoboxing for better performance and memory usage.
| upsertMetadataManager.addRecord(segment2, new RecordInfo(makePrimaryKey(0), 0, new Integer(120), true)); | |
| // Add another record with a higher timestamp to update largestSeenComparisonValue | |
| // This makes timestamp 120 fall outside the TTL window (150 - 20 = 130 > 120) | |
| upsertMetadataManager.addRecord(segment2, new RecordInfo(makePrimaryKey(1), 1, new Integer(150), false)); | |
| upsertMetadataManager.addRecord(segment2, new RecordInfo(makePrimaryKey(0), 0, 120, true)); | |
| // Add another record with a higher timestamp to update largestSeenComparisonValue | |
| // This makes timestamp 120 fall outside the TTL window (150 - 20 = 130 > 120) | |
| upsertMetadataManager.addRecord(segment2, new RecordInfo(makePrimaryKey(1), 1, 150, false)); |
Added checks to ensure all segments are loaded before removing deleted primary keys from memory during TTL cleanup. This prevents out-of-order segment loading during server restart from causing deleted records to incorrectly reappear in query results.
Issue scenario (step-by-step):
doRemoveExpiredPrimaryKeys()runs and removes the PK from memory (outside TTL + only in one segment S2). So the PK is deleted even though theenableDeletedKeysCompactionConsistencyis set to true.Fix: Defer PK removal until TableStateUtils.isAllSegmentsLoaded() returns true, ensuring all historical data is loaded before cleanup. Introduces DELETED_KEYS_SKIPPED_SEGMENT_LOADING metric for observability.