Quick Start¶

Get up and running with EOPF GeoZarr in minutes. This guide shows you how to convert your first EOPF dataset to GeoZarr format.

Prerequisites¶

EOPF GeoZarr library installed (Installation Guide)
An EOPF dataset in Zarr format
Basic familiarity with Python and command-line tools

Your First Conversion¶

Command Line (Simplest)¶

Convert an EOPF dataset to GeoZarr format:

eopf-geozarr convert input.zarr output.zarr

That's it! The converter will:

Analyze your EOPF dataset structure
Apply GeoZarr 0.4 specification compliance
Create multiscale overviews
Preserve native CRS and scientific accuracy

Python API (More Control)¶

For programmatic usage with custom parameters:

# test: skip
import xarray as xr
from eopf_geozarr import create_geozarr_dataset

# Load your EOPF DataTree
dt = xr.open_datatree("input.zarr", engine="zarr")

# Convert to GeoZarr
dt_geozarr = create_geozarr_dataset(
    dt_input=dt,
    groups=["/measurements/r10m", "/measurements/r20m", "/measurements/r60m"],
    output_path="output.zarr",
    spatial_chunk=4096,
    min_dimension=256
)

print("Conversion complete!")

Working with Cloud Storage¶

S3 Output¶

Save directly to AWS S3:

# Set credentials
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
export AWS_DEFAULT_REGION=us-east-1

# Convert to S3
eopf-geozarr convert input.zarr s3://my-bucket/output.zarr

S3 Input and Output¶

# Both input and output on S3
dt_geozarr = create_geozarr_dataset(
    dt_input=xr.open_datatree("s3://input-bucket/data.zarr", engine="zarr"),
    groups=["/measurements/r10m"],
    output_path="s3://output-bucket/geozarr.zarr"
)

Validation¶

Verify your GeoZarr dataset meets the specification:

eopf-geozarr validate output.zarr

Or in Python:

from eopf_geozarr.cli import validate_command
import argparse

# Create args object
args = argparse.Namespace()
args.input_path = "output.zarr"
args.verbose = True

validate_command(args)

Inspecting Results¶

Dataset Information¶

Get detailed information about your converted dataset:

eopf-geozarr info output.zarr

Python Inspection¶

import xarray as xr

# Open the converted dataset
dt = xr.open_datatree("output.zarr", engine="zarr")

# Explore the structure
print(dt)

# Check multiscales metadata
print(dt.attrs.get('multiscales', 'No multiscales found'))

# Examine resolution levels
# Note: Structure depends on converter version (see converter.md for V0 vs V1 differences)
# V0 (deprecated): /measurements/r10m/0, /measurements/r10m/1, etc.
# V1 (current): /measurements/reflectance/r10m, /measurements/reflectance/r20m, etc.

# Example for V0 structure:
if "/measurements/r10m/0" in dt.groups:
    ds_native = dt["/measurements/r10m/0"].ds
    print(f"Native shape: {ds_native.dims}")

# Example for V1 structure:
if "/measurements/reflectance/r10m" in dt.groups:
    ds_10m = dt["/measurements/reflectance/r10m"].ds
    ds_20m = dt["/measurements/reflectance/r20m"].ds
    print(f"10m resolution: {ds_10m.dims}")
    print(f"20m resolution: {ds_20m.dims}")

Common Patterns¶

Sentinel-2 Data¶

For Sentinel-2 L2A data, use the optimized V1 converter (recommended):

from eopf_geozarr.s2_optimization.s2_converter import convert_s2_optimized

# Recommended: Use V1 optimized converter for Sentinel-2
dt_optimized = convert_s2_optimized(
    dt_input=dt,
    output_path="s2_optimized.zarr",
    spatial_chunk=256,
    enable_sharding=True
)

The V1 converter automatically: - Reuses native resolutions (r10m, r20m, r60m) without duplication - Adds coarser levels (r120m, r360m, r720m) for efficient visualization - Applies variable-aware resampling for different data types

Note: For details on V0 vs V1 differences, see the converter documentation.

Large Datasets with Dask¶

For processing large datasets efficiently:

eopf-geozarr convert large_input.zarr output.zarr --dask-cluster

Or in Python:

from dask.distributed import Client

# Start Dask client
client = Client('scheduler-address:8786')  # Or Client() for local

# Process with Dask
dt_geozarr = create_geozarr_dataset(
    dt_input=dt,
    groups=["/measurements/r10m"],
    output_path="output.zarr",
    spatial_chunk=2048  # Smaller chunks for distributed processing
)

client.close()

Key Features Demonstrated¶

Your converted dataset now includes:

✅ GeoZarr 0.4 Compliance - Full specification adherence
✅ Native CRS Preservation - No unnecessary reprojection
✅ Multiscale Pyramids - Efficient overview levels
✅ Optimized Chunking - Aligned chunks for performance
✅ CF Conventions - Standard metadata attributes
✅ Cloud-Ready - S3 and other cloud storage support

Next Steps¶

Detailed Usage: See the User Guide for advanced options
API Reference: Explore the API Reference for all functions
Examples: Check out Examples for specific use cases
Architecture: Understand the Architecture behind the conversion

Troubleshooting Quick Fixes¶

Memory errors with large datasets?

eopf-geozarr convert input.zarr output.zarr --spatial-chunk 2048

S3 permission errors?

aws sts get-caller-identity  # Verify credentials

Validation failures?

eopf-geozarr validate output.zarr --verbose  # Get detailed error info

For more troubleshooting help, see the FAQ.