Skip to content

Conversation

@ymuppala
Copy link
Collaborator

@ymuppala ymuppala commented Jan 23, 2026

Problem Statement

Venice storage nodes with the discard mount option enabled are experiencing disk health check failures during large store deletions due to synchronous TRIM operations saturating disk I/O, causing the DiskHealthCheckService to timeout and mark nodes unhealthy.

Solution

Added configurable deletion rate limiting for RocksDB's SstFileManager to throttle file deletion operations during store cleanup:
Added two new configuration parameters to RocksDBServerConfig: rocksdb.sst.file.manager.delete.rate.bytes.per.second & rocksdb.sst.file.manager.max.trash.db.ratio
Updated RocksDBStorageEngineFactory to apply deletion rate limits to the shared SstFileManager instance when configured

Code changes

  • Added new code behind a config. If so list the config names and their default values in the PR description.
  • Introduced new log lines.
    • Confirmed if logs need to be rate limited to avoid excessive logging.

Concurrency-Specific Checks

Both reviewer and PR author to verify

  • Code has no race conditions or thread safety issues.
  • Proper synchronization mechanisms (e.g., synchronized, RWLock) are used where needed.
  • No blocking calls inside critical sections that could lead to deadlocks or performance degradation.
  • Verified thread-safe collections are used (e.g., ConcurrentHashMap, CopyOnWriteArrayList).
  • Validated proper exception handling in multi-threaded code to avoid silent thread termination.

How was this PR tested?

  • New unit tests added.
  • New integration tests added.
  • Modified or extended existing tests.
  • Verified backward compatibility (if applicable).

Does this PR introduce any user-facing or breaking changes?

  • No. You can skip the rest of this section.
  • Yes. Clearly explain the behavior change and its impact.

@ymuppala ymuppala force-pushed the delete_throttling branch 2 times, most recently from 2e7fca9 to cc511e8 Compare January 23, 2026 23:34
@ymuppala ymuppala marked this pull request as ready for review January 23, 2026 23:37
Copy link
Contributor

@huangminchn huangminchn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick turnaround Yug!
Can we double check the code path used during backup version deletion? My understanding is that it's going through "RocksDBStoragePartition#drop()".

We can add a test case for backup version deletion if none exists yet, and validate the deletion rate.

new VeniceStoreVersionConfig(testStore, veniceServerProperties, PersistenceType.ROCKS_DB);
StorageEngine storeEngine = factory.getStorageEngine(testStoreConfig);
Assert.assertNotNull(storeEngine, "Storage engine should be created successfully");
factory.removeStorageEngine(storeEngine);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we double check that this code path is used during backup version deletion?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is in the code path. The call stack looks like the following

VersionBackend.delete()
  → IngestionBackend.removeStorageEngine(topicName)
    → StorageService.removeStorageEngine(kafkaTopic)
      → StorageEngine.drop()
        → AbstractStorageEngine.drop() [iterates partitions]
          → RocksDBStoragePartition.drop()
            → RocksDB.destroyDB()

Comment on lines 446 to 452

// SstFileManager deletion rate limiting - disabled by default (0) for backward compatibility
// Recommended: 64 MB/s (67108864) for nodes with 'discard' mount option
this.sstFileManagerDeleteRateBytesPerSecond =
props.getSizeInBytes(ROCKSDB_SST_FILE_MANAGER_DELETE_RATE_BYTES_PER_SECOND, 0L);
this.sstFileManagerMaxTrashDBRatio = props.getDouble(ROCKSDB_SST_FILE_MANAGER_MAX_TRASH_DB_RATIO, 0.25); // 25%
// default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this mean that 75% of the DB will be deleted in a throttled manner, but then the last 25% will be immediately deleted ?

We need the throttling to continue to be applied all the way through .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that is what the config value means. It specifies the upper limit on trash size to live DB size ratio that can be tolerated before files are immediately deleted, overriding the rate limit. So, when the ratio of trash to live db hits 25% throttling will be ignored.

@ymuppala
Copy link
Collaborator Author

Thanks for the quick turnaround Yug! Can we double check the code path used during backup version deletion? My understanding is that it's going through "RocksDBStoragePartition#drop()".

We can add a test case for backup version deletion if none exists yet, and validate the deletion rate.

I have added a test to validate the deletion rate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants