-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
First to be clear, I am 100% supportive of the ideas here, and the need for sharable data specification in the way you describe in the blog post.
Here are some more links for inspiration you might want to consult or mention.
- dandi archive ( https://dandiarchive.org/dandiset ) which is hosted n-D data, often in zarr format, but specialised to neuro. There are other "bring your data" services around with metadata requirements, e.g., https://ckan.org/ (which actually is more "deploy your own data hub")
- pangeo-forge ( https://pangeo-forge.readthedocs.io/en/latest/index.html ), an attempt at making existing datasets cloud-friendly. I expected better docs there for the "why", perhaps that design exists elsewhere. This should be mentioned only because you already spoke of the pangeo experience, and this was to be the next stage.
- many other attempts to standardize metadata description aside from source-coop (which I find not very complete); https://specs.frictionlessdata.io/ (rather tabular)
- various OOS catalogs like https://atlas.apache.org , https://www.unitycatalog.io/ which may be tied to specific tech stacks and data models
- some big data providers like NASA; just look at how many fields can be searched in one of my former go-to sites: https://archive.eso.org/eso/eso_archive_main.html ; this implies that any catalog model needs to be able not only to store that kind of detail, but to have specialised search interfaces for practitioners of different fields. They do a good job of linking each file (image) to a program ID with abstract and details (e.g., https://archive.eso.org/wdb/wdb/eso/sched_rep_arc/query?progid=075.D-0156(A) ) and any associated publications.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels