Skip to content

Conversation

@armstrong0704
Copy link

What changes were proposed in this pull request?

Fixes HDDS-8655 - Improve listKey performance for OBS buckets

Description

This change optimizes the listKeys operation in Ozone Manager by removing a redundant seek operation during iterator initialization. Previously, the listKeys functionality would create an iterator which implicitly sought to the beginning of the table (seekToFirst), and then immediately sought to the specific start key. This caused a double-seek, which is inefficient for RocksDB, particularly when the initial seek triggers unnecessary block loads.

Changes

  • Framework: Extended the Table interface to support passing a seek key directly when creating an iterator.
  • Implementation: Updated RDBTable and TypedTable to handle this new parameter.
  • RocksDB: Modified RDBStoreByteArrayIterator to check for a seek key in its constructor. If one is provided, it seeks directly to that position, skipping the default behavior of seeking to the first key.
  • Ozone Manager: Updated the listKeys implementation to use this new iterator approach.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8655

How was this patch tested?

I added a new regression test, testIteratorWithSeek, to TestRDBStore to confirm that the iterator starts correctly at the designated key. I also verified thread safety with a new concurrency test, testConcurrentIteratorWithWrites, which runs a writer and a reader in parallel to ensure stability. Existing tests in TestOmMetadataManager also pass.

@armstrong0704 armstrong0704 force-pushed the HDDS-8655-optimize-listKeys branch 4 times, most recently from 3ed3c4a to 2277672 Compare December 22, 2025 00:50
@ivandika3
Copy link
Contributor

ivandika3 commented Dec 22, 2025

Thanks @armstrong0704 for the patch. Please enable the build-branch workflow in your fork.

@armstrong0704 armstrong0704 force-pushed the HDDS-8655-optimize-listKeys branch 3 times, most recently from ab850ad to 2a6cdd0 Compare December 22, 2025 01:56
@armstrong0704 armstrong0704 marked this pull request as draft December 22, 2025 02:00
@armstrong0704 armstrong0704 force-pushed the HDDS-8655-optimize-listKeys branch from 2a6cdd0 to 2d244f6 Compare December 22, 2025 04:28
@armstrong0704 armstrong0704 marked this pull request as ready for review December 22, 2025 04:29
@armstrong0704
Copy link
Author

@ivandika3 thanks for the input. Linters are making additional formatting. Any Guidance there?

@armstrong0704 armstrong0704 force-pushed the HDDS-8655-optimize-listKeys branch from 2d244f6 to 29cb2c7 Compare December 22, 2025 05:47
@ivandika3
Copy link
Contributor

ivandika3 commented Dec 22, 2025

@armstrong0704 Could you disable the auto format in the linter? Afterwards, if you are using IntelliJ IDEA, you can go to "Code Style" and import the https://github.com/apache/ozone/blob/master/hadoop-hdds/dev-support/checkstyle/checkstyle.xml

@sodonnel
Copy link
Contributor

Please revert all the whitespace and comment formatting changes. Its making this diff a lot bigger than it needs to be.

@armstrong0704 armstrong0704 force-pushed the HDDS-8655-optimize-listKeys branch from 29cb2c7 to 09129df Compare December 22, 2025 22:43
@armstrong0704
Copy link
Author

@sodonnel Done.

@ivandika3 ivandika3 requested a review from kerneltime December 27, 2025 03:04
@swamirishi swamirishi self-requested a review January 5, 2026 17:55
@swamirishi
Copy link
Contributor

@vyalamar do you wanna review this patch?

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @armstrong0704 for the patch.


RDBStoreByteArrayIterator(ManagedRocksIterator iterator,
RDBTable table, byte[] prefix, IteratorType type) {
RDBTable table, byte[] prefix, IteratorType type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RDBTable table, byte[] prefix, IteratorType type) {
RDBTable table, byte[] prefix, IteratorType type) {

nit: Please disable Align when multiline in settings (if using IDEA).

Comment on lines +36 to +37
RDBStoreByteArrayIterator(ManagedRocksIterator iterator,
RDBTable table, byte[] prefix, IteratorType type, byte[] seekKey) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Please do not format method signature like this. Whenever visibility / return type / method name / other modifiers are changed, we would have to reindent all parameters.

*
* @param beginKey start metadata key
* @param endKey end metadata key
* @param endKey end metadata key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param endKey end metadata key
* @param endKey end metadata key

Comment on lines +590 to +592
/**
* Covert the given key to the {@link RAW} type.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's not change this unnecessarily

Suggested change
/**
* Covert the given key to the {@link RAW} type.
*/
/** Covert the given key to the {@link RAW} type. */

keyIter = getKeyTable(getBucketLayout()).iterator(null, seekKey)) {
readFromRDbStartNs = Time.monotonicNowNanos();
KeyValue< String, OmKeyInfo > kv;
keyIter.seek(seekKey);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this seek now unnecessary?

Comment on lines +426 to +429
assertTrue(iter.hasNext());
assertArrayEquals(getBytesUtf16("a3"), iter.next().getKey());
assertTrue(iter.hasNext());
assertArrayEquals(getBytesUtf16("a5"), iter.next().getKey());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to suggest adding a helper method to verify the list of keys iterated. Something like:

  private static void assertIterator(String prefix, String seek, Table<byte[], ?> table, List<String> expected)
      throws RocksDatabaseException, CodecException {
    final byte[] prefixBytes = prefix != null ? getBytesUtf16(prefix) : null;
    final byte[] seekBytes = seek != null ? getBytesUtf16(seek) : null;
    try (TableIterator<byte[], ? extends Table.KeyValue<byte[], ?>> iter = table.iterator(prefixBytes, seekBytes)) {
      for (String e : expected) {
        assertTrue(iter.hasNext());
        assertArrayEquals(getBytesUtf16(e), iter.next().getKey());
      }
      assertFalse(iter.hasNext());
    }
  }

Then we can both simplify cases and make sure the iterator stops (like already done in two of the five).

    // Case 1: Seek to existing key, no prefix
    assertIterator(null, "a3", table, asList("a3", "a5", "b2", "b4"));

    // Case 2: Seek to non-existent key (should land on next greater), no prefix
    assertIterator(null, "a2", table, asList("a3", "a5", "b2", "b4"));

    // Case 3: Seek past all keys, no prefix
    assertIterator(null, "z9", table, emptyList());

    // Case 4: Seek with prefix
    assertIterator("b", "b3", table, singletonList("b4"));

    // Case 5: Seek with prefix to start of prefix
    assertIterator("b", "b2", table, asList("b2", "b4"));

It can be reused in testIteratorSeekEdgeCases, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants