Skip to content

Conversation

@CTTY
Copy link
Collaborator

@CTTY CTTY commented Nov 25, 2025

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

#[async_trait]
pub trait Storage: Debug + Send + Sync {
// File existence and metadata
async fn exists(&self, path: &str) -> Result<bool>;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migrating comments from @c-thiel

I know that we use these results everywhere, but I think introducing more specific error types that we can match on for storage operations makes sense. They can implement Into of course.
For example, a RateLimited error that we got from the storage service should be treated differently from NotFound or CredentialsExpired.
With Lakekeeper we are currently using our own trait based IO due to many limitations in iceberg-rust, mainly due to unsupported signing mechanisms, missing refresh mechanisms, intransparent errors and missing extendability.
I would gladly switch to iceberg-rust if we get these solved.
Maybe this can serve as some inspiration: https://github.com/lakekeeper/lakekeeper/blob/b8fcf54c627d48a547ef0baf6863949b68579388/crates/io/src/error.rs#L291

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To address @c-thiel 's comments, we have several approaches:

  1. Introduce another set of errors for storage.
  2. Extend current ErrorKind for storage errors.
  3. Extend current ErrorKind, but with another enum, for example
pub enum IoErrorKind {
    FileNotFound,
    CredentialExpired,
}

pub enum ErrorKind {
     // Existing variants
    ...
    Io(IoErrorKind)
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added this to the Open Questions section of RFC.

I think this is more of a phase 2 problem that we can think and discuss more later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After second think, I think we should make Storage trait serializable rather StorageRegistry. Think about the use case: in a distributed compute engine, the master node uses catalog to load table, and do planning. After that, it will send the scan split to worker node. In this case the scan split should contain Storage, since it will contains things like s3 credential, encryption key.

@CTTY CTTY changed the title rfc: Making Storage a trait rfc: Making Storage a Trait Nov 25, 2025
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CTTY for this pr, generally LGTM! One missing point is, I want the StorageBuilderRegistry to have some built in StorageBuilder registered when user creating a new catalog instance. I currenlty don't have a good solution, one approach would be to have a standalone crate, which loads built in StorageBuilders when StorageBuilderRegistry is initiated. And then we could have catalog crates to depend on it.

#[async_trait]
pub trait Storage: Debug + Send + Sync {
// File existence and metadata
async fn exists(&self, path: &str) -> Result<bool>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To address @c-thiel 's comments, we have several approaches:

  1. Introduce another set of errors for storage.
  2. Extend current ErrorKind for storage errors.
  3. Extend current ErrorKind, but with another enum, for example
pub enum IoErrorKind {
    FileNotFound,
    CredentialExpired,
}

pub enum ErrorKind {
     // Existing variants
    ...
    Io(IoErrorKind)
}

Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this and really happy that RFC methods been adopted.

@CTTY CTTY requested review from Xuanwo and liurenjie1024 December 30, 2025 05:10
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CTTY for this RFC! It almost aligned with our discussion, I left some comments to discuss.

@CTTY CTTY requested a review from liurenjie1024 January 6, 2026 05:14
@CTTY
Copy link
Collaborator Author

CTTY commented Jan 6, 2026

Hi @liurenjie1024 , thanks for taking a look! I've addressed the comments, PTAL again

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CTTY , I think we are almost done!

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CTTY for this rfc, LGTM!

@liurenjie1024
Copy link
Contributor

cc @Xuanwo @Fokko for another round of review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants