Skip to content

Quick Start

Get up and running with EOPF GeoZarr in minutes. This guide shows you how to convert your first EOPF dataset to GeoZarr format.

Prerequisites

  • EOPF GeoZarr library installed (Installation Guide)
  • An EOPF dataset in Zarr format
  • Basic familiarity with Python and command-line tools

Your First Conversion

Command Line (Simplest)

Convert an EOPF dataset to GeoZarr format:

bash eopf-geozarr convert input.zarr output.zarr

That's it! The converter will:

  • Analyze your EOPF dataset structure
  • Apply GeoZarr 0.4 specification compliance
  • Create multiscale overviews
  • Preserve native CRS and scientific accuracy

Python API (More Control)

For programmatic usage with custom parameters:

```python

test: skip

import xarray as xr from eopf_geozarr import create_geozarr_dataset

Load your EOPF DataTree

dt = xr.open_datatree("input.zarr", engine="zarr")

Convert to GeoZarr

dt_geozarr = create_geozarr_dataset( dt_input=dt, groups=["/measurements/r10m", "/measurements/r20m", "/measurements/r60m"], output_path="output.zarr", spatial_chunk=4096, min_dimension=256 )

print("Conversion complete!") ```

Working with Cloud Storage

S3 Output

Save directly to AWS S3:

```bash

Set credentials

export AWS_ACCESS_KEY_ID=your_key export AWS_SECRET_ACCESS_KEY=your_secret export AWS_DEFAULT_REGION=us-east-1

Convert to S3

eopf-geozarr convert input.zarr s3://my-bucket/output.zarr ```

S3 Input and Output

```python

Both input and output on S3

dt_geozarr = create_geozarr_dataset( dt_input=xr.open_datatree("s3://input-bucket/data.zarr", engine="zarr"), groups=["/measurements/r10m"], output_path="s3://output-bucket/geozarr.zarr" ) ```

Validation

Verify your GeoZarr dataset meets the specification:

bash eopf-geozarr validate output.zarr

Or in Python:

```python from eopf_geozarr.cli import validate_command import argparse

Create args object

args = argparse.Namespace() args.input_path = "output.zarr" args.verbose = True

validate_command(args) ```

Inspecting Results

Dataset Information

Get detailed information about your converted dataset:

bash eopf-geozarr info output.zarr

Python Inspection

```python import xarray as xr

Open the converted dataset

dt = xr.open_datatree("output.zarr", engine="zarr")

Explore the structure

print(dt)

Check multiscales metadata

print(dt.attrs.get('multiscales', 'No multiscales found'))

Examine resolution levels

Note: Structure depends on converter version (see converter.md for V0 vs V1 differences)

V0 (deprecated): /measurements/r10m/0, /measurements/r10m/1, etc.

V1 (current): /measurements/reflectance/r10m, /measurements/reflectance/r20m, etc.

Example for V0 structure:

if "/measurements/r10m/0" in dt.groups: ds_native = dt["/measurements/r10m/0"].ds print(f"Native shape: {ds_native.dims}")

Example for V1 structure:

if "/measurements/reflectance/r10m" in dt.groups: ds_10m = dt["/measurements/reflectance/r10m"].ds ds_20m = dt["/measurements/reflectance/r20m"].ds print(f"10m resolution: {ds_10m.dims}") print(f"20m resolution: {ds_20m.dims}") ```

Common Patterns

Sentinel-2 Data

For Sentinel-2 L2A data, use the optimized V1 converter (recommended):

```python from eopf_geozarr.s2_optimization.s2_converter import convert_s2_optimized

Recommended: Use V1 optimized converter for Sentinel-2

dt_optimized = convert_s2_optimized( dt_input=dt, output_path="s2_optimized.zarr", spatial_chunk=256, enable_sharding=True ) ```

The V1 converter automatically: - Reuses native resolutions (r10m, r20m, r60m) without duplication - Adds coarser levels (r120m, r360m, r720m) for efficient visualization - Applies variable-aware resampling for different data types

Note: For details on V0 vs V1 differences, see the converter documentation.

Large Datasets with Dask

For processing large datasets efficiently:

bash eopf-geozarr convert large_input.zarr output.zarr --dask-cluster

Or in Python:

```python from dask.distributed import Client

Start Dask client

client = Client('scheduler-address:8786') # Or Client() for local

Process with Dask

dt_geozarr = create_geozarr_dataset( dt_input=dt, groups=["/measurements/r10m"], output_path="output.zarr", spatial_chunk=2048 # Smaller chunks for distributed processing )

client.close() ```

Key Features Demonstrated

Your converted dataset now includes:

GeoZarr 0.4 Compliance - Full specification adherence
Native CRS Preservation - No unnecessary reprojection
Multiscale Pyramids - Efficient overview levels
Optimized Chunking - Aligned chunks for performance
CF Conventions - Standard metadata attributes
Cloud-Ready - S3 and other cloud storage support

Next Steps

  • Detailed Usage: See the User Guide for advanced options
  • API Reference: Explore the API Reference for all functions
  • Examples: Check out Examples for specific use cases
  • Architecture: Understand the Architecture behind the conversion

Troubleshooting Quick Fixes

Memory errors with large datasets?

bash eopf-geozarr convert input.zarr output.zarr --spatial-chunk 2048

S3 permission errors?

bash aws sts get-caller-identity # Verify credentials

Validation failures?

bash eopf-geozarr validate output.zarr --verbose # Get detailed error info

For more troubleshooting help, see the FAQ.