Zarr

This community standard defines an open-source specification for the storage of multi-dimensional arrays of data (also known as data cubes, N-dimensional arrays, ND-arrays, or tensors). Such arrays are ubiquitous in scientific research and engineering. In June 2022, the OGC endorsed a community standard of Zarr V2.0 (https://zarr.readthedocs.io/en/stable/spec/v2.html).

Documents

(Hover over Type for full description)
Document title Version OGC Doc No. Type
Zarr Storage Specification 2.0 Community Standard 2.0 21-050r1 CS

Related links

(Hover over Type for full description)
Document title Version OGC Doc No. Type

Multidimensional array data (a.k.a. N-dimensional arrays, ND-arrays, “tensors”) is ubiquitous in scientific research and engineering. Zarr is an open-source specification for the storage of ND-arrays and associated metadata. Zarr stores metadata using .json text files and array data as [optionally] compressed binary chunks. Zarr can store data into any storage system that can be described as a key/value store. In a standard filesystem, the keys are filenames within a directory hierarchy, and the values are the file contents. In a cloud object store (e.g., Amazon S3), the keys are the object IDs and the values are the object data. This flexibility allows implementations to experiment with novel storage technologies while maintaining a uniform API for downstream libraries and users.

Zarr arose in genomics research in 2016. It was created by Alistair Miles of Oxford as a library optimized for massively parallel array analytics. It has since grown into a community project with a range of developers and users from fields such as genomics, bioimaging, astronomy, physics, quantitative finance, oceanography, atmospheric science, climate science, and geospatial imaging. Because it can represent very large array datasets in a simple, scalable way, and is compatible with cloud object storage, Zarr is an ideal format for analysis-ready geospatial data in the cloud. A prominent example is the Google Cloud CMIP6 Public Dataset. While Zarr is not inherently a geospatial-specific format, because of its rapid growth and adoption in geospatial and related fields, it was proposed as an OGC community standard.