-
Notifications
You must be signed in to change notification settings - Fork 388
rfc: Making Storage a Trait #1885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| #[async_trait] | ||
| pub trait Storage: Debug + Send + Sync { | ||
| // File existence and metadata | ||
| async fn exists(&self, path: &str) -> Result<bool>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Migrating comments from @c-thiel
I know that we use these results everywhere, but I think introducing more specific error types that we can match on for storage operations makes sense. They can implement Into of course.
For example, a RateLimited error that we got from the storage service should be treated differently from NotFound or CredentialsExpired.
With Lakekeeper we are currently using our own trait based IO due to many limitations in iceberg-rust, mainly due to unsupported signing mechanisms, missing refresh mechanisms, intransparent errors and missing extendability.
I would gladly switch to iceberg-rust if we get these solved.
Maybe this can serve as some inspiration: https://github.com/lakekeeper/lakekeeper/blob/b8fcf54c627d48a547ef0baf6863949b68579388/crates/io/src/error.rs#L291
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To address @c-thiel 's comments, we have several approaches:
- Introduce another set of errors for storage.
- Extend current
ErrorKindfor storage errors. - Extend current
ErrorKind, but with another enum, for example
pub enum IoErrorKind {
FileNotFound,
CredentialExpired,
}
pub enum ErrorKind {
// Existing variants
...
Io(IoErrorKind)
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added this to the Open Questions section of RFC.
I think this is more of a phase 2 problem that we can think and discuss more later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After second think, I think we should make Storage trait serializable rather StorageRegistry. Think about the use case: in a distributed compute engine, the master node uses catalog to load table, and do planning. After that, it will send the scan split to worker node. In this case the scan split should contain Storage, since it will contains things like s3 credential, encryption key.
liurenjie1024
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @CTTY for this pr, generally LGTM! One missing point is, I want the StorageBuilderRegistry to have some built in StorageBuilder registered when user creating a new catalog instance. I currenlty don't have a good solution, one approach would be to have a standalone crate, which loads built in StorageBuilders when StorageBuilderRegistry is initiated. And then we could have catalog crates to depend on it.
| #[async_trait] | ||
| pub trait Storage: Debug + Send + Sync { | ||
| // File existence and metadata | ||
| async fn exists(&self, path: &str) -> Result<bool>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To address @c-thiel 's comments, we have several approaches:
- Introduce another set of errors for storage.
- Extend current
ErrorKindfor storage errors. - Extend current
ErrorKind, but with another enum, for example
pub enum IoErrorKind {
FileNotFound,
CredentialExpired,
}
pub enum ErrorKind {
// Existing variants
...
Io(IoErrorKind)
}Co-authored-by: Renjie Liu <[email protected]>
This reverts commit e7d67d8.
Xuanwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this and really happy that RFC methods been adopted.
liurenjie1024
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @CTTY for this RFC! It almost aligned with our discussion, I left some comments to discuss.
|
Hi @liurenjie1024 , thanks for taking a look! I've addressed the comments, PTAL again |
liurenjie1024
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @CTTY , I think we are almost done!
liurenjie1024
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @CTTY for this rfc, LGTM!
Which issue does this PR close?
What changes are included in this PR?
Are these changes tested?