Skip to content

Conversation

@sendya
Copy link
Contributor

@sendya sendya commented Dec 22, 2025

This pull request introduces significant improvements to the storage subsystem, focusing on enhanced configuration flexibility, better in-memory storage support, and consistent interface usage. The main changes include adding explicit support for in-memory buckets, unifying the way storage paths are configured, and updating interfaces and implementations for improved type safety and maintainability.

Key changes:

In-Memory Storage Support

  • Added a new memory bucket type with its own implementation (memoryBucket), supporting in-memory object storage with a dedicated LRU cache and indexdb, and ensured only one memory bucket can be used at a time. (storage/bucket/memory/memory.go, storage/builder.go, storage/storage.go) [1] [2] [3] [4] [5] [6] [7] [8]

  • Updated the pebble indexdb backend to support in-memory mode by using a memory-backed filesystem when the db_path is set to "inmemory". (storage/indexdb/pebble/pebble.go, storage/indexdb/indexdb.go) [1] [2] [3]

Configuration Enhancements

  • Introduced a new db_path field in both global storage and per-bucket configurations, allowing explicit control over indexdb storage paths, including support for in-memory stores. (conf/conf.go, config.example.yaml, storage/builder.go) [1] [2] [3] [4] [5]

  • Updated the disk bucket to use the configurable db_path, defaulting to the previous behavior if not specified. (storage/bucket/disk/disk.go)

Interface Consistency and Type Safety

  • Changed the Exist method signatures in storage interfaces and implementations to use the object.IDHash type instead of raw []byte, improving type safety and consistency across buckets. (api/defined/v1/storage/storage.go, storage/bucket/disk/disk.go, storage/bucket/empty/empty.go, storage/bucket/memory/memory.go) [1] [2] [3] [4]

Performance Tuning

  • Increased the default LRU cache size for disk buckets from 100,000 to 1,000,000 entries to improve performance. (storage/bucket/disk/disk.go)

These changes collectively provide more flexible and robust storage configuration, improved in-memory caching capabilities, and a cleaner, safer codebase for storage operations.

Copilot AI review requested due to automatic review settings December 22, 2025 15:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces in-memory storage support and improves storage configuration flexibility. However, the implementation appears to be a work-in-progress (as indicated by the "WIP" prefix in the title) with several incomplete critical components that prevent the memory bucket from functioning correctly.

Key Changes:

  • Added a new memory bucket type with dedicated structures but incomplete storage operations
  • Introduced db_path configuration for explicit indexdb path control, including support for in-memory mode via "inmemory" value
  • Changed the Exist method signature across all bucket implementations from []byte to object.IDHash for improved type safety

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
storage/bucket/memory/memory.go Introduces new memory bucket implementation with indexdb and LRU cache, but core storage operations (Store, Lookup, Remove) are currently stubs that don't perform actual work
storage/storage.go Updates memory bucket from slice to single instance with selection logic that always returns memory bucket when present, bypassing existence checks
storage/builder.go Registers the new memory bucket constructor and adds db_path to configuration merging logic
storage/indexdb/pebble/pebble.go Adds in-memory mode support using vfs.NewMem() when path equals "inmemory" constant
storage/indexdb/indexdb.go Defines TypeInMemory constant for identifying in-memory storage mode
storage/bucket/disk/disk.go Updates Exist signature to use object.IDHash and makes db_path configurable with fallback to default location; increases LRU cache from 100k to 1M entries
storage/bucket/empty/empty.go Updates Exist signature to use object.IDHash for consistency
api/defined/v1/storage/storage.go Changes Exist interface signature from []byte to object.IDHash
conf/conf.go Adds DBPath field to Storage and Bucket configuration structs
config.example.yaml Adds commented examples for db_path configuration at global and bucket levels
server/middleware/caching/processor.go Removes extraneous blank line (formatting cleanup)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

panic("implement me")
// Exist implements [storage.Bucket].
func (m *memoryBucket) Exist(ctx context.Context, id object.IDHash) bool {
return m.cache.Has(id)
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Exist method only checks the cache (which stores Mark objects) but not the indexdb. This could lead to inconsistencies where an object exists in the indexdb but Exist returns false because it's not in the cache. Consider checking both the cache and the indexdb, or ensure that the cache and indexdb are kept in sync. The disk bucket implementation checks the indexdb directly (see storage/bucket/disk/disk.go:242-244).

Suggested change
return m.cache.Has(id)
// Fast path: check the in-memory cache first.
if m.cache.Has(id) {
return true
}
// Fallback: consult the indexdb, which is the authoritative source.
if m.indexdb == nil {
return false
}
exist, err := m.indexdb.Exist(ctx, id)
if err != nil {
log.Errorf("failed to check existence in indexdb for %v: %v", id, err)
return false
}
return exist

Copilot uses AI. Check for mistakes.

func New(config *conf.Bucket, sharedkv storage.SharedKV) (storage.Bucket, error) {
mb := &memoryBucket{
fs: vfs.NewMem(),
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fs field is initialized with vfs.NewMem() but never used in any of the bucket methods. If the intention is to use this virtual filesystem for storing actual file data (similar to how disk bucket stores files), the field should be used in Store/Lookup operations. Otherwise, this field can be removed to simplify the implementation.

Copilot uses AI. Check for mistakes.
cache: lru.New[object.IDHash, storage.Mark](10_000), // in-memory object size
fileMode: fs.FileMode(0o755),
maxSize: 1024 * 1024 * 100, // e.g. 100 MB
stop: make(chan struct{}, 1),
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stop channel is created but never used in the implementation. If this channel is intended for coordinating shutdown or background operations, it should be used in the Close method or related operations. Otherwise, this field can be removed to simplify the implementation.

Copilot uses AI. Check for mistakes.
weight: 100, // default weight
sharedkv: sharedkv,
cache: lru.New[object.IDHash, storage.Mark](10_000), // in-memory object size
fileMode: fs.FileMode(0o755),
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fileMode field is initialized but never used in the implementation. If file permissions are needed for virtual filesystem operations, this should be used when creating directories or files. Otherwise, this field can be removed to simplify the implementation.

Copilot uses AI. Check for mistakes.
//TODO implement me
panic("implement me")
// Store implements [storage.Bucket].
func (m *memoryBucket) Store(ctx context.Context, meta *object.Metadata) error {
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Store method returns nil without actually storing the metadata. This means objects will never be cached in the memory bucket. The implementation should store the metadata in the indexdb and update the cache with a Mark, similar to how the disk bucket implementation works (see storage/bucket/disk/disk.go:271-292). Consider implementing actual storage logic that uses m.indexdb.Set() and m.cache.Set().

Suggested change
func (m *memoryBucket) Store(ctx context.Context, meta *object.Metadata) error {
func (m *memoryBucket) Store(ctx context.Context, meta *object.Metadata) error {
// Do not attempt to store a nil metadata.
if meta == nil {
return nil
}
// Persist metadata to the index database so it can be retrieved later.
if err := m.indexdb.Set(ctx, meta); err != nil {
return err
}

Copilot uses AI. Check for mistakes.
Comment on lines +112 to +113
// n.memoryBucket.Exist(ctx, id.Hash())
return n.memoryBucket
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Select method always returns the memory bucket when it exists and UseAllow() is true, without checking if the object actually exists in that bucket. This means all storage operations will be routed to the memory bucket, even for objects that haven't been cached there yet. The commented-out code suggests there should be a check using Exist() to determine if the object is in the memory bucket before returning it. Without this check, lookups for non-cached objects will fail even though they might exist in other buckets, and stores will only go to the memory bucket. Consider implementing the Exist check or a fallback mechanism to use other buckets when objects aren't found in memory.

Suggested change
// n.memoryBucket.Exist(ctx, id.Hash())
return n.memoryBucket
if hit, _ := n.memoryBucket.Exist(ctx, id.Hash()); hit {
return n.memoryBucket
}

Copilot uses AI. Check for mistakes.
- path: /cache2
type: normal
# db_type: pebble
# db_path: inmemory # index-db path, default: <path>/.indexdb/; use "inmemory" for in-memory store
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment describes using "inmemory" for in-memory indexdb storage via the db_path configuration, which is correct. However, this is different from the memory bucket type. To avoid confusion, consider clarifying in the comment that "inmemory" is for the indexdb backend, not the bucket type. For a fully in-memory bucket, users would set type to "memory" (not shown in this example).

Suggested change
# db_path: inmemory # index-db path, default: <path>/.indexdb/; use "inmemory" for in-memory store
# db_path: inmemory # index-db path, default: <path>/.indexdb/; use "inmemory" for in-memory indexdb backend (not the bucket type); for a fully in-memory bucket set type: "memory" (not shown here)

Copilot uses AI. Check for mistakes.
Comment on lines +122 to 126
if path == indexdb.TypeInMemory {
log.Infof("kvstore used in-memory")
fs = vfs.NewMem()
}
pdb, err := pebble.Open(path, &pebble.Options{
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using in-memory mode (fs = vfs.NewMem()), the path parameter is still passed to pebble.Open(). This might cause issues since the path could be a filesystem path like ".indexdb" which may not be meaningful in a memory-backed filesystem. Consider using a consistent path like "/inmemory" or an empty string when fs is set to vfs.NewMem() to avoid potential path-related issues.

Suggested change
if path == indexdb.TypeInMemory {
log.Infof("kvstore used in-memory")
fs = vfs.NewMem()
}
pdb, err := pebble.Open(path, &pebble.Options{
dbPath := path
if path == indexdb.TypeInMemory {
log.Infof("kvstore used in-memory")
fs = vfs.NewMem()
dbPath = "/inmemory"
}
pdb, err := pebble.Open(dbPath, &pebble.Options{

Copilot uses AI. Check for mistakes.
panic("implement me")
// Allow implements [storage.Bucket].
func (m *memoryBucket) Allow() int {
return int(m.maxSize)
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Allow method should return a percentage (0-100) according to the interface documentation (api/defined/v1/storage/storage.go:59), but it currently returns maxSize which is in bytes (104857600 for 100 MB). This could cause issues if other parts of the system expect a percentage value. Consider returning a proper percentage value or using the allow field for this purpose, similar to how empty bucket returns 100.

Suggested change
return int(m.maxSize)
if m == nil {
return 0
}
return m.weight

Copilot uses AI. Check for mistakes.
panic("implement me")
// Remove implements [storage.Bucket].
func (m *memoryBucket) Remove(ctx context.Context, id *object.ID) error {
return nil
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Remove method returns nil without actually removing the object from the indexdb or cache. This means removed objects will remain in storage and could be retrieved later. The implementation should delete from the indexdb using m.indexdb.Delete() and potentially remove from the cache, similar to how the disk bucket implementation works (see storage/bucket/disk/disk.go:266-268).

Suggested change
return nil
// Ensure the object is removed from the index database so it is no longer retrievable.
return m.indexdb.Delete(ctx, id)

Copilot uses AI. Check for mistakes.
@sendya sendya closed this Jan 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants