GeoZarr Mini Spec¶

This document specifies the GeoZarr model used in this repository. It's a "mini" version of the official GeoZarr spec that documents the specific subset of the specification that this implementation supports, along with implementation-specific details.

Relationship to Other Documentation¶

This mini spec is referenced by and aligns with:

Architecture - Technical implementation details that follow this specification
GeoZarr Specification Contribution - Our contributions to the official spec based on this implementation
Main Documentation - General library documentation and usage guides

The implementation described in this mini spec addresses specific requirements for Earth observation data processing while maintaining compliance with the broader GeoZarr specification.

Spec conventions¶

Array and Group attributes¶

This document only defines rules for a finite subset of the keys in Zarr array and group attributes. Unless otherwise stated, any external keys in Zarr array and group attributes are consistent with this specification. This means this specification composes with the presence of, e.g., CF metadata, at different levels of the Zarr hierarchy.

Organization¶

GeoZarr defines a Zarr hierarchy, i.e. a particular arrangements of Zarr arrays and groups, and their attributes. This document defines that hierarchy from the bottom-up, starting with arrays and their attributes before moving to higher-level structures, like groups and their attributes.

The GeoZarr specification can be implemented in Zarr V2 and V3. The main difference between the Zarr V2 and Zarr V3 implementations is how the dimension names of an array are specified.

DataArray¶

A DataArray is a Zarr array with named axes. The structure of a DataArray depends on the Zarr format.

This section contains the rules for individual DataArrays. Additional constraints on groups of DataArrays are defined in the section on Datasets

Zarr V2¶

Attributes¶

key	type	required	notes
_ARRAY_DIMENSIONS	array of strings, length matches number of axes of the array	yes	xarray convention for naming axes in Zarr V2

Array metadata¶

Zarr V2 DataArrays must have at least 1 dimension, i.e. scalar Zarr V2 DataArrays are not allowed.

In tabular form:

attribute	constraint	notes
`shape`	at least 1 element	No scalar arrays allowed

Example¶

{
    ".zarray": {
        "zarr_format": 2,
        "dtype": "|u1",
        "shape": [10,11,12],
        "chunks": [10,11,12],
        "filters": null
        "compressor": null
        "order": "C"
        "dimension_separator": "/"
        }
    ".zattrs": {
        "_ARRAY_DIMENSIONS": ["lat", "lon", "time"]
        }

}

Zarr V3¶

Attributes¶

No particular attributes are required for Zarr V3 DataArrays.

Array metadata¶

Zarr V3 DataArrays must have at least 1 dimension, i.e. scalar Zarr V3 DataArrays are not allowed. The dimension_names attribute of a Zarr V3 DataArray must be set, the elements of dimension_names must all be strings, and they must all be unique.

In tabular form:

attribute	constraint	notes
`shape`	at least 1 element	No scalar arrays allowed
`dimension_names`	an array of unique strings	all array axes must be uniquely named.

Example¶

{
    "zarr.json": {
        "zarr_format": 3,
        "node_type": "array",
        "shape": [10,11,12],
        "data_type": "uint8",
        "chunk_key_encoding": {"name": "default", "configuration": {"separator" : "/"}},
        "chunk_grid": {"name": "regular", "configuration": {"chunk_shape": [10,11,12]}},
        "codecs": [{"name": "bytes"}],
        "dimension_names": ["lat", "lon", "time"],
        "storage_transformers": [],
        }
}

Dataset¶

A GeoZarr dataset is a Zarr group that contains Zarr arrays that together describe a measured quantity, as well as arbitrary sub-groups.

Attributes¶

There are no required attributes for Datasets but to qualify as a GeoZarr Dataset, the group must contain at least one DataArray with spatial reference information. This DataArray is referenced in the grid_mapping attribute of the dataset and is usually named spatial_ref.

CF Compliance Requirements¶

The implementation enforces CF (Climate and Forecast) metadata conventions compliance:

Grid Mapping: All data variables MUST include a grid_mapping attribute that references a coordinate reference system variable
Standard Names: Data variables MUST include CF-compliant standard_name attributes. The implementation validates these against the official CF standard names table
Coordinate Variables: Coordinate variables (x, y, time, etc.) MUST include appropriate CF standard names:
For projected coordinates: projection_x_coordinate and projection_y_coordinate
For geographic coordinates: longitude and latitude
Units must be specified (m for projected, degrees_east/degrees_north for geographic)
Array Dimensions: All arrays MUST include _ARRAY_DIMENSIONS attributes for Zarr V2 compatibility

More information on spatial reference information can be found in the CF conventions. Another interesting resource is the rioxarray and more specifically the documentation on Coordinate Reference System Management.

Members¶

If any member of a GeoZarr Dataset is an array, then it must comply with the DataArray definition.

If the Dataset contains a DataArray D, then for each dimension name N in the list of D's named dimensions, the Dataset must contain a one-dimensional DataArray named N with a shape that matches the the length of D along the axis named by N. In this case, D is called a "data variable", and the each DataArrays matching a dimension names of D is called a "coordinate variable".

[!Note] These two definitions are not mutually exclusive, as a 1-dimensional DataArray named D with dimension names ["D"] is both a coordinate variable and a data variable.

Examples¶

This example demonstrates the stored representation of a valid Dataset. Notice how the dimension names defined on the DataArray named "data" (i.e., "lat" and "lon") are the names of one-dimensional DataArrays in the same Zarr group as "data".

In this case, "data" is a data variable, and "lat" and "lon" are coordinate variables.

{
    "zarr.json" : {
        "node_type": "group",
        "zarr_format": 3,
        },
    "data/zarr.json" : {
        "zarr_format": 3,
        "node_type": "array",
        "shape": [10,11],
        "data_type": "uint8",
        "chunk_key_encoding": {"name": "default", "configuration": {"separator" : "/"}},
        "chunk_grid": {"name": "regular", "configuration": {"chunk_shape": [10,11]}},
        "codecs": [{"name": "bytes"}],
        "dimension_names": ["lat", "lon"],
        "storage_transformers": [],
        },
    "lat/zarr.json": {
        "zarr_format": 3,
        "node_type": "array",
        "shape": [10],
        "data_type": "uint8",
        "chunk_key_encoding": {"name": "default", "configuration": {"separator" : "/"}},
        "chunk_grid": {"name": "regular", "configuration": {"chunk_shape": [10]}},
        "codecs": [{"name": "bytes"}],
        "dimension_names": ["lat"],
        "storage_transformers": [],
        },
    "lon/zarr.json": {
        "zarr_format": 3,
        "node_type": "array",
        "shape": [11],
        "data_type": "uint8",
        "chunk_key_encoding": {"name": "default", "configuration": {"separator" : "/"}},
        "chunk_grid": {"name": "regular", "configuration": {"chunk_shape": [11]}},
        "codecs": [{"name": "bytes"}],
        "dimension_names": ["lon"],
        "storage_transformers": [],
        },
}

This example demonstrates the layout of a Dataset with just one DataArray. A single array is only permitted if that array is one dimensional, and the name of that DataArray in the Dataset matches the (single) dimension name defined for that DataArray.

In this case lat is both a coordinate variable and a data variable.

{
    "zarr.json" : {
        "node_type": "group",
        "zarr_format": 3,
        },
    "lat/zarr.json" : {
        "zarr_format": 3,
        "node_type": "array",
        "shape": [10],
        "data_type": "uint8",
        "chunk_key_encoding": {"name": "default", "configuration": {"separator" : "/"}},
        "chunk_grid": {"name": "regular", "configuration": {"chunk_shape": [10,11]}},
        "codecs": [{"name": "bytes"}],
        "dimension_names": ["lat"],
        "storage_transformers": [],
    },
}

Multiscale Dataset¶

Downsampling is a process in which a collection of localized data points is resampled on a subset of its original sampling locations.

In the case of arrays, downsampling generally reduces an array's shape along at least one dimension. To downsample the contents of a Dataset D and generate a new Dataset E, all of the coordinate variable - data variable relationships in D must be preserved in E. If D/data is a data variable with dimension names ("a" , "b"), then D/a and D/b are coordinate variables with shapes aligned to the dimensions of D/data. If we downsample D/data and assign the result to E/data, we must also generate (e.g., by more downsampling) coordinate variables E/a and E/b so that E can be a valid Dataset according to the relevant Dataset members rule.

The downsampling transformation is thus well-defined for Datasets. Downsampling is often applied multiple times in a series, e.g. to generate multiple levels of detail for a data variable.

Implementation Approach¶

The implementation uses a pyramid-based downsampling approach with the following characteristics:

Factor-of-2 Downsampling: Each overview level reduces dimensions by a factor of 2 (COG-style downsampling)
Pyramid Generation: Overview levels are created sequentially, with each level generated from the previous level rather than from the native resolution
Minimum Dimension Threshold: Overview generation stops when the smallest dimension falls below a configurable threshold (default: 256 pixels)
Native CRS Preservation: All overview levels maintain the same coordinate reference system as the native data
Consistent Variable Structure: Each overview level contains the same set of variables as the native resolution level

GeoZarr defines a layout for downsampled Datasets (and the original dataset). Given some source Dataset s0, that dataset and all downsampled Datasets s1, s2, ... are stored in a flat layout inside a Multiscale Dataset D. The presence of downsampled Datsets in D is signalled by a special key in the attributes of D.

Attributes¶

The attributes of a Multiscale Dataset function as an entry point to a collection of downsampled Datasets. Accordingly, the attributes of a Multiscale Dataset declare the names of the downsampled datasets it contains, as well as spatial metadata for those datasets.

key	type	required	notes
`"multiscales"`	`MultiscaleMetadata`	yes	this field defines the layout of the multiscale Datasets inside this Dataset

MultiscaleMetadata¶

MultiscaleMetadata is a JSON object that declares the names of the downsampled Datasets inside a Multiscale Dataset, as well as the downsampling method used. This object has the following structure:

key	type	required	notes
`"resampling_method"`	ResamplingMethod	yes	This is a string that declares the resampling method used to create the downsampled datasets.
`"tile_matrix_set"`	TileMatrixSet or string	yes	This object declares the names of the downsampled Datasets. If `"tile_matrix_set"` is a string, it must be the name of a well-known `TileMatrixSet`, which must resolve to a JSON object consistent with the `[TileMatrixSet](#tilematrixset)` definition. For scientific coordinate systems, custom inline TileMatrixSet objects are supported.
`"tile_matrix_limits"`	{`string`: TileMatrixLimit}	no	Optional limits for each tile matrix level

Members¶

All of the members declared in the multiscales attribute must comply with the Dataset definition. All of these Datasets must have the exact same set of member names. The names of the downsampled Datasets are specified by the "id" field of each TileMatrix object in the "tileMatrices" field in the "TileMatrixSet" object in the tile_matrix_set field in the MultiscaleMetadata object in the "multiscales" field in the attributes of the Multiscale Dataset. Or, more compactly, using a path-like JSON query:

attributes.multiscales.tile_matrix_set.tileMatrices[$idx].id

Chunking Requirements for Downsampled Datasets¶

When creating downsampled datasets in a multiscale hierarchy, careful consideration must be given to chunk sizes to ensure optimal performance and storage efficiency. The chunk dimensions should be aligned with the tile dimensions specified in the corresponding TileMatrix definition to enable efficient tile-based access patterns.

Key chunking considerations:

Chunk-Tile Alignment: Chunk sizes should match or be divisible by the tileWidth and tileHeight values defined in the TileMatrix for each zoom level
Consistent Chunking Strategy: All data variables within a zoom level should use the same chunking scheme to maintain spatial coherence
Memory Constraints: Chunk sizes should be chosen to balance I/O efficiency with memory usage, typically keeping individual chunks under 100MB
Decimation Factor Alignment: When downsampling by integer factors (e.g., 2x, 3x), chunk boundaries should align across zoom levels to enable efficient pyramid generation

For example, if a TileMatrix specifies tileWidth: 1024 and tileHeight: 1024, the corresponding data arrays should use chunk shapes of [1024, 1024] or compatible subdivisions like [512, 512].

Extra members¶

A multiscale Dataset should not contain any members that are not explicitly declared in the "multiscales" field for that multiscale Dataset. Any additional Zarr arrays and groups should be considered external to the GeoZarr model.

Custom Coordinate Reference Systems¶

GeoZarr explicitly supports custom TileMatrixSet definitions for arbitrary coordinate reference systems, encouraging preservation of native CRS in Earth observation data. This is particularly useful for scientific projections including UTM zones, polar stereographic, sinusoidal, and other non-web coordinate systems.

For a dataset to be GeoZarr compliant, data variables MUST include a grid_mapping attribute that references a coordinate reference system variable. This grid_mapping variable defines the spatial referencing information and MUST be consistent with the CRS specified in the TileMatrixSet.

Custom TileMatrixSet Example¶

For custom coordinate systems, the tile_matrix_set should be defined as an inline JSON object following the OGC TileMatrixSet v2.0 specification:

{
  "multiscales": {
    "tile_matrix_set": {
      "id": "UTM_Zone_33N_Custom",
      "title": "UTM Zone 33N for Sentinel-2 native resolution",
      "crs": "EPSG:32633", 
      "orderedAxes": ["E", "N"],
      "tileMatrices": [
        {
          "id": "0",
          "scaleDenominator": 35.28,
          "cellSize": 10.0,
          "pointOfOrigin": [299960.0, 9000000.0],
          "tileWidth": 1024,
          "tileHeight": 1024,
          "matrixWidth": 1094,
          "matrixHeight": 1094
        },
        {
          "id": "1", 
          "scaleDenominator": 70.56,
          "cellSize": 20.0,
          "pointOfOrigin": [299960.0, 9000000.0],
          "tileWidth": 512,
          "tileHeight": 512,
          "matrixWidth": 547,
          "matrixHeight": 547
        }
      ]
    },
    "resampling_method": "average"
  }
}

Custom Decimation Factors¶

While standard web mapping assumes quadtree decimation (scaling by factor of 2), custom TileMatrixSets may use alternative decimation factors:

Factor of 2 (quadtree): Standard web mapping approach where each zoom level has 4x more tiles
Factor of 3 (nonary tree): Each zoom level has 9x more tiles, useful for certain scientific gridding schemes
Other integer factors: Application-specific requirements may dictate alternative decimation

Example with factor-of-3 decimation:

{
  "id": "Custom_Nonary_Grid",
  "crs": "EPSG:4326",
  "tileMatrices": [
    {
      "id": "0",
      "matrixWidth": 1,
      "matrixHeight": 1,
      "tileWidth": 256,
      "tileHeight": 256
    },
    {
      "id": "1", 
      "matrixWidth": 3,
      "matrixHeight": 3,
      "tileWidth": 256,
      "tileHeight": 256
    },
    {
      "id": "2",
      "matrixWidth": 9,
      "matrixHeight": 9,
      "tileWidth": 256,
      "tileHeight": 256
    }
  ]
}

Custom CRS Multiscale Dataset Layout Example¶

Here's a complete example of a multiscale dataset using a custom UTM coordinate reference system:

{
  "zarr.json": {
    "node_type": "group",
    "zarr_format": 3,
    "attributes": {
      "multiscales": {
        "tile_matrix_set": {
          "id": "UTM_Zone_33N_Sentinel2",
          "title": "UTM Zone 33N for Sentinel-2 L2A",
          "crs": "EPSG:32633",
          "orderedAxes": ["E", "N"],
          "tileMatrices": [
            {
              "id": "0",
              "scaleDenominator": 35.28,
              "cellSize": 10.0,
              "pointOfOrigin": [299960.0, 9000000.0],
              "tileWidth": 1024,
              "tileHeight": 1024,
              "matrixWidth": 1094,
              "matrixHeight": 1094
            },
            {
              "id": "1",
              "scaleDenominator": 70.56,
              "cellSize": 20.0,
              "pointOfOrigin": [299960.0, 9000000.0],
              "tileWidth": 512,
              "tileHeight": 512,
              "matrixWidth": 547,
              "matrixHeight": 547
            }
          ]
        },
        "resampling_method": "average"
      }
    }
  },
  "0/zarr.json": {
    "node_type": "group",
    "zarr_format": 3
  },
  "0/red/zarr.json": {
    "zarr_format": 3,
    "node_type": "array",
    "shape": [1094, 1094],
    "data_type": "uint16",
    "chunk_grid": {"name": "regular", "configuration": {"chunk_shape": [1024, 1024]}},
    "codecs": [{"name": "bytes"}],
    "dimension_names": ["y", "x"],
    "attributes": {
      "grid_mapping": "spatial_ref"
    }
  },
  "0/nir/zarr.json": {
    "zarr_format": 3,
    "node_type": "array", 
    "shape": [1094, 1094],
    "data_type": "uint16",
    "chunk_grid": {"name": "regular", "configuration": {"chunk_shape": [1024, 1024]}},
    "codecs": [{"name": "bytes"}],
    "dimension_names": ["y", "x"],
    "attributes": {
      "grid_mapping": "spatial_ref"
    }
  },
  "0/spatial_ref/zarr.json": {
    "zarr_format": 3,
    "node_type": "array",
    "shape": [],
    "data_type": "int32",
    "chunk_grid": {"name": "regular", "configuration": {"chunk_shape": []}},
    "codecs": [{"name": "bytes"}],
    "dimension_names": [],
    "attributes": {
      "crs_wkt": "PROJCS[\"WGS 84 / UTM zone 32N\",GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AUTHORITY[\"EPSG\",\"4326\"]],PROJECTION[\"Transverse_Mercator\"],PARAMETER[\"latitude_of_origin\",0],PARAMETER[\"central_meridian\",9],PARAMETER[\"scale_factor\",0.9996],PARAMETER[\"false_easting\",500000],PARAMETER[\"false_northing\",0],UNIT[\"metre\",1,AUTHORITY[\"EPSG\",\"9001\"]],AXIS[\"Easting\",EAST],AXIS[\"Northing\",NORTH],AUTHORITY[\"EPSG\",\"32632\"]]",
    "semi_major_axis": 6378137.0,
    "semi_minor_axis": 6356752.314245179,
    "inverse_flattening": 298.257223563,
    "reference_ellipsoid_name": "WGS 84",
    "longitude_of_prime_meridian": 0.0,
    "prime_meridian_name": "Greenwich",
    "geographic_crs_name": "WGS 84",
    "horizontal_datum_name": "World Geodetic System 1984",
    "projected_crs_name": "WGS 84 / UTM zone 32N",
    "grid_mapping_name": "transverse_mercator",
    "latitude_of_projection_origin": 0.0,
    "longitude_of_central_meridian": 9.0,
    "false_easting": 500000.0,
    "false_northing": 0.0,
    "scale_factor_at_central_meridian": 0.9996,
    "spatial_ref": "PROJCS[\"WGS 84 / UTM zone 32N\",GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AUTHORITY[\"EPSG\",\"4326\"]],PROJECTION[\"Transverse_Mercator\"],PARAMETER[\"latitude_of_origin\",0],PARAMETER[\"central_meridian\",9],PARAMETER[\"scale_factor\",0.9996],PARAMETER[\"false_easting\",500000],PARAMETER[\"false_northing\",0],UNIT[\"metre\",1,AUTHORITY[\"EPSG\",\"9001\"]],AXIS[\"Easting\",EAST],AXIS[\"Northing\",NORTH],AUTHORITY[\"EPSG\",\"32632\"]]",
    "_ARRAY_DIMENSIONS": [],
    "GeoTransform": "300000.0 10.0 0.0 5000040.0 0.0 -10.0"
    }
  },
  "0/x/zarr.json": {
    "zarr_format": 3,
    "node_type": "array",
    "shape": [1094],
    "data_type": "float64",
    "chunk_grid": {"name": "regular", "configuration": {"chunk_shape": [1094]}},
    "codecs": [{"name": "bytes"}],
    "dimension_names": ["x"]
  },
  "0/y/zarr.json": {
    "zarr_format": 3,
    "node_type": "array",
    "shape": [1094],
    "data_type": "float64", 
    "chunk_grid": {"name": "regular", "configuration": {"chunk_shape": [1094]}},
    "codecs": [{"name": "bytes"}],
    "dimension_names": ["y"]
  },
  "1/zarr.json": {
    "node_type": "group",
    "zarr_format": 3
  },
  "1/red/zarr.json": {
    "zarr_format": 3,
    "node_type": "array",
    "shape": [547, 547],
    "data_type": "uint16",
    "chunk_grid": {"name": "regular", "configuration": {"chunk_shape": [512, 512]}},
    "codecs": [{"name": "bytes"}],
    "dimension_names": ["y", "x"],
    "attributes": {
      "grid_mapping": "spatial_ref"
    }
  },
  "1/nir/zarr.json": {
    "zarr_format": 3,
    "node_type": "array",
    "shape": [547, 547], 
    "data_type": "uint16",
    "chunk_grid": {"name": "regular", "configuration": {"chunk_shape": [512, 512]}},
    "codecs": [{"name": "bytes"}],
    "dimension_names": ["y", "x"],
    "attributes": {
      "grid_mapping": "spatial_ref"
    }
  },
  "1/spatial_ref/zarr.json": {
    "zarr_format": 3,
    "node_type": "array",
    "shape": [],
    "data_type": "int32",
    "chunk_grid": {"name": "regular", "configuration": {"chunk_shape": []}},
    "codecs": [{"name": "bytes"}],
    "dimension_names": [],
    "attributes": {
      "crs_wkt": "PROJCS[\"WGS 84 / UTM zone 32N\",GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AUTHORITY[\"EPSG\",\"4326\"]],PROJECTION[\"Transverse_Mercator\"],PARAMETER[\"latitude_of_origin\",0],PARAMETER[\"central_meridian\",9],PARAMETER[\"scale_factor\",0.9996],PARAMETER[\"false_easting\",500000],PARAMETER[\"false_northing\",0],UNIT[\"metre\",1,AUTHORITY[\"EPSG\",\"9001\"]],AXIS[\"Easting\",EAST],AXIS[\"Northing\",NORTH],AUTHORITY[\"EPSG\",\"32632\"]]",
    "semi_major_axis": 6378137.0,
    "semi_minor_axis": 6356752.314245179,
    "inverse_flattening": 298.257223563,
    "reference_ellipsoid_name": "WGS 84",
    "longitude_of_prime_meridian": 0.0,
    "prime_meridian_name": "Greenwich",
    "geographic_crs_name": "WGS 84",
    "horizontal_datum_name": "World Geodetic System 1984",
    "projected_crs_name": "WGS 84 / UTM zone 32N",
    "grid_mapping_name": "transverse_mercator",
    "latitude_of_projection_origin": 0.0,
    "longitude_of_central_meridian": 9.0,
    "false_easting": 500000.0,
    "false_northing": 0.0,
    "scale_factor_at_central_meridian": 0.9996,
    "spatial_ref": "PROJCS[\"WGS 84 / UTM zone 32N\",GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AUTHORITY[\"EPSG\",\"4326\"]],PROJECTION[\"Transverse_Mercator\"],PARAMETER[\"latitude_of_origin\",0],PARAMETER[\"central_meridian\",9],PARAMETER[\"scale_factor\",0.9996],PARAMETER[\"false_easting\",500000],PARAMETER[\"false_northing\",0],UNIT[\"metre\",1,AUTHORITY[\"EPSG\",\"9001\"]],AXIS[\"Easting\",EAST],AXIS[\"Northing\",NORTH],AUTHORITY[\"EPSG\",\"32632\"]]",
    "_ARRAY_DIMENSIONS": [],
    "GeoTransform": "300000.0 10.0 0.0 5000040.0 0.0 -10.0"
    }
  },
  "1/x/zarr.json": {
    "zarr_format": 3,
    "node_type": "array",
    "shape": [547],
    "data_type": "float64",
    "chunk_grid": {"name": "regular", "configuration": {"chunk_shape": [547]}},
    "codecs": [{"name": "bytes"}],
    "dimension_names": ["x"]
  },
  "1/y/zarr.json": {
    "zarr_format": 3,
    "node_type": "array",
    "shape": [547],
    "data_type": "float64",
    "chunk_grid": {"name": "regular", "configuration": {"chunk_shape": [547]}},
    "codecs": [{"name": "bytes"}],
    "dimension_names": ["y"]
  }
}

This example demonstrates: - Custom CRS: Uses EPSG:32633 (UTM Zone 33N) instead of web mapping CRS - Scientific Resolution: Native 10m pixel size typical for Sentinel-2 L2A data - Custom Tile Sizes: 1024x1024 for native, 512x512 for overview to match scientific data characteristics - Consistent Structure: Both zoom levels (0 and 1) contain the same variables (red, nir, x, y) - Coordinate Variables: UTM coordinates in meters stored as x and y arrays - Chunk Alignment: Chunk sizes match the tileWidth and tileHeight from the TileMatrix definition

File System Hierarchy Example¶

The same custom CRS multiscale dataset would appear as the following directory structure on disk:

sentinel2_utm33n.zarr/
├── zarr.json                    # Root group with multiscales metadata
├── 0/                          # Native resolution (10m) zoom level
│   ├── zarr.json               # Group metadata for zoom level 0
│   ├── red/                    # Red band data variable
│   │   ├── zarr.json           # Array metadata
│   │   └── c/                  # Chunk directory
│   │       ├── 0/0             # Chunk files (1024x1024 chunks)
│   │       ├── 0/1
│   │       └── ...
│   ├── nir/                    # Near-infrared band data variable
│   │   ├── zarr.json           # Array metadata
│   │   └── c/                  # Chunk directory
│   │       ├── 0/0             # Chunk files (1024x1024 chunks)
│   │       ├── 0/1
│   │       └── ...
│   ├── spatial_ref/            # Spatial reference system variable
│   │   ├── zarr.json           # Array metadata with CRS information
│   │   └── c/                  # Chunk directory
│   │       └── 0               # Single chunk (scalar)
│   ├── x/                      # X coordinate variable (UTM Easting)
│   │   ├── zarr.json           # Array metadata
│   │   └── c/                  # Chunk directory
│   │       └── 0               # Single chunk (1094 elements)
│   └── y/                      # Y coordinate variable (UTM Northing)
│       ├── zarr.json           # Array metadata
│       └── c/                  # Chunk directory
│           └── 0               # Single chunk (1094 elements)
└── 1/                          # Overview level (20m) zoom level
    ├── zarr.json               # Group metadata for zoom level 1
    ├── red/                    # Red band data variable
    │   ├── zarr.json           # Array metadata
    │   └── c/                  # Chunk directory
    │       ├── 0/0             # Chunk files (512x512 chunks)
    │       ├── 0/1
    │       └── ...
    ├── nir/                    # Near-infrared band data variable
    │   ├── zarr.json           # Array metadata
    │   └── c/                  # Chunk directory
    │       ├── 0/0             # Chunk files (512x512 chunks)
    │       ├── 0/1
    │       └── ...
    ├── spatial_ref/            # Spatial reference system variable
    │   ├── zarr.json           # Array metadata with CRS information
    │   └── c/                  # Chunk directory
    │       └── 0               # Single chunk (scalar)
    ├── x/                      # X coordinate variable (UTM Easting)
    │   ├── zarr.json           # Array metadata
    │   └── c/                  # Chunk directory
    │       └── 0               # Single chunk (547 elements)
    └── y/                      # Y coordinate variable (UTM Northing)
        ├── zarr.json           # Array metadata
        └── c/                  # Chunk directory
            └── 0               # Single chunk (547 elements)

Key aspects of this file system layout: - Root metadata: The zarr.json at the root contains the multiscales attribute defining the custom UTM TileMatrixSet - Zoom level groups: Directories 0/ and 1/ correspond exactly to the TileMatrix id values - Consistent variables: Each zoom level contains the same set of variables (red, nir, x, y) - Chunk organization: Data is stored in chunks that align with the tile dimensions specified in the TileMatrixSet - Coordinate preservation: UTM coordinates are maintained at each resolution level

Appendix¶

Definitions¶

TileMatrixLimit¶

key	type	required
`"tileMatrix"`	string	yes
`"minTileCol"`	int	yes
`"minTileRow"`	int	yes
`"maxTileCol"`	int	yes
`"maxTileRow"`	int	yes

TileMatrix¶

key	type	required
`"id"`	string	yes
`"scaleDenominator"`	float	yes
`"cellSize"`	float	yes
`"pointOfOrigin"`	[float, float]	yes
`"tileWidth"`	int	yes
`"tileHeight"`	int	yes
`"matrixWidth"`	int	yes
`"matrixHeight"`	int	yes

TileMatrixSet¶

key	type	required	notes
`"id"`	string	yes
`"title"`	string	no
`"crs"`	string	no
`"supportedCRS"`	string	no
`"orderedAxes"`	[str, str]	no
`"tileMatrices"`	[TileMatrix, ...]	yes	May not be empty

ResamplingMethod¶

This is a string literal defined here.

The implementation defaults to "average" for creating overview levels in multiscale datasets.