Skip to content

Examples

Practical examples demonstrating common use cases for the EOPF GeoZarr library.

Basic Examples

Simple Local Conversion

Convert a local EOPF dataset to GeoZarr format:

```python

test: skip

import xarray as xr from eopf_geozarr import create_geozarr_dataset

Load EOPF dataset

dt = xr.open_datatree("sentinel2_l2a.zarr", engine="zarr")

Convert to GeoZarr

dt_geozarr = create_geozarr_dataset( dt_input=dt, groups=["/measurements/r10m"], output_path="sentinel2_geozarr.zarr" )

print("Conversion completed successfully!") ```

Command Line Conversion

```bash

Basic conversion

eopf-geozarr convert input.zarr output.zarr

With custom chunk size

eopf-geozarr convert input.zarr output.zarr --spatial-chunk 2048

Validate the result

eopf-geozarr validate output.zarr ```

Sentinel-2 Examples

Multi-Resolution Sentinel-2 Processing

Process all resolution groups from a Sentinel-2 L2A dataset:

```python import xarray as xr from eopf_geozarr import create_geozarr_dataset

Load Sentinel-2 L2A dataset

dt = xr.open_datatree("S2A_MSIL2A_20230615T103031_N0509_R108_T32TQM_20230615T170304.zarr", engine="zarr")

Convert all resolution groups

dt_geozarr = create_geozarr_dataset( dt_input=dt, groups=[ "/measurements/r10m", # B02, B03, B04, B08 "/measurements/r20m", # B05, B06, B07, B8A, B11, B12 "/measurements/r60m" # B01, B09, B10 ], output_path="s2_l2a_geozarr.zarr", spatial_chunk=4096, min_dimension=256 )

Inspect the result

print(f"Groups created: {list(dt_geozarr.groups)}") for group_name in dt_geozarr.groups: group = dt_geozarr[group_name] if hasattr(group, 'ds') and group.ds is not None: print(f"{group_name}: {dict(group.ds.dims)}") ```

Sentinel-2 Band Analysis

Note on V0 vs V1: This example shows both V0 (deprecated) and V1 (current) approaches. See converter documentation for structural differences.

Access bands from the consolidated pyramid structure:

```python import xarray as xr import matplotlib.pyplot as plt from eopf_geozarr.s2_optimization.s2_converter import convert_s2_optimized

Convert using V1 optimizer (recommended)

dt_input = xr.open_datatree("s2_l2a_input.zarr", engine="zarr") dt = convert_s2_optimized( dt_input=dt_input, output_path="s2_l2a_v1.zarr", spatial_chunk=256 )

Access data from different resolution levels

ds_10m = dt["/measurements/reflectance/r10m"].ds # Native 10m ds_20m = dt["/measurements/reflectance/r20m"].ds # Native 20m ds_60m = dt["/measurements/reflectance/r60m"].ds # Native 60m ds_120m = dt["/measurements/reflectance/r120m"].ds # Computed 120m

Extract RGB bands for visualization (10m resolution)

red = ds_10m["b04"] # Red band green = ds_10m["b03"] # Green band
blue = ds_10m["b02"] # Blue band

Create RGB composite

rgb = xr.concat([red, green, blue], dim="band")

Plot comparison of different resolutions

fig, axes = plt.subplots(1, 3, figsize=(18, 6))

10m resolution

rgb.plot.imshow(ax=axes[0], robust=True) axes[0].set_title("10m Resolution (Native)")

20m resolution (reused native data)

rgb_20m = xr.concat([ds_20m["b04"], ds_20m["b03"], ds_20m["b02"]], dim="band") rgb_20m.plot.imshow(ax=axes[1], robust=True) axes[1].set_title("20m Resolution (Native)")

60m resolution (reused native data)

rgb_60m = xr.concat([ds_60m["b04"], ds_60m["b03"], ds_60m["b02"]], dim="band") rgb_60m.plot.imshow(ax=axes[2], robust=True) axes[2].set_title("60m Resolution (Native)")

plt.tight_layout() plt.show() ```

V0 Approach (Deprecated - create_geozarr_dataset)

For reference, the V0 structure with nested pyramid levels:

```python import xarray as xr import matplotlib.pyplot as plt

Open V0 converted GeoZarr dataset (deprecated structure)

dt = xr.open_datatree("s2_l2a_v0.zarr", engine="zarr")

Access 10m resolution with nested pyramid levels

ds_10m_native = dt["/measurements/r10m/0"].ds # Level 0: native 10m ds_10m_level1 = dt["/measurements/r10m/1"].ds # Level 1: downsampled to ~20m ds_10m_level2 = dt["/measurements/r10m/2"].ds # Level 2: downsampled to ~40m

Note: This creates redundant data since r10m/1 ≈ r20m/0

```

Migration Note: V0 is deprecated. Use V1 (convert_s2_optimized) for new projects.

Cloud Storage Examples

AWS S3 Integration

Complete workflow with S3 input and output:

```python import os import xarray as xr from eopf_geozarr import create_geozarr_dataset from eopf_geozarr.conversion.fs_utils import validate_s3_access

Configure AWS credentials

os.environ['AWS_ACCESS_KEY_ID'] = 'your_access_key' os.environ['AWS_SECRET_ACCESS_KEY'] = 'your_secret_key' os.environ['AWS_DEFAULT_REGION'] = 'us-east-1'

Define paths

input_path = "s3://sentinel-data/input.zarr" output_path = "s3://processed-data/output.zarr"

Validate S3 access

is_valid, error = validate_s3_access(output_path) if not is_valid: raise RuntimeError(f"S3 access validation failed: {error}")

Load from S3

dt = xr.open_datatree(input_path, engine="zarr")

Convert and save to S3

dt_geozarr = create_geozarr_dataset( dt_input=dt, groups=["/measurements/r10m", "/measurements/r20m"], output_path=output_path, spatial_chunk=2048 )

print(f"Successfully converted and saved to {output_path}") ```

S3 with Custom Credentials

Using custom S3 credentials and endpoint:

```python from eopf_geozarr import create_geozarr_dataset from eopf_geozarr.conversion.fs_utils import get_s3_storage_options

Custom S3 configuration

s3_config = { 'key': 'custom_access_key', 'secret': 'custom_secret_key', 'endpoint_url': 'https://s3.custom-provider.com', 'region_name': 'eu-west-1' }

Get storage options

storage_opts = get_s3_storage_options("s3://custom-bucket/output.zarr", **s3_config)

Convert with custom S3 settings

dt_geozarr = create_geozarr_dataset( dt_input=dt, groups=["/measurements/r10m"], output_path="s3://custom-bucket/output.zarr", **storage_opts ) ```

Performance Optimization Examples

Large Dataset Processing with Dask

Process large datasets efficiently using Dask:

```python import xarray as xr from dask.distributed import Client, LocalCluster from eopf_geozarr import create_geozarr_dataset

Set up Dask cluster

cluster = LocalCluster(n_workers=4, threads_per_worker=2, memory_limit='4GB') client = Client(cluster)

try: # Load large dataset dt = xr.open_datatree("large_sentinel2.zarr", engine="zarr")

# Process with optimized chunking for Dask
dt_geozarr = create_geozarr_dataset(
    dt_input=dt,
    groups=["/measurements/r10m", "/measurements/r20m", "/measurements/r60m"],
    output_path="large_geozarr.zarr",
    spatial_chunk=2048,  # Smaller chunks for distributed processing
    max_retries=5
)

print("Large dataset processing completed!")

finally: client.close() cluster.close() ```

Memory-Efficient Processing

Process datasets with limited memory:

```python from eopf_geozarr import create_geozarr_dataset from eopf_geozarr.conversion.utils import calculate_aligned_chunk_size

Calculate memory-efficient chunk size

data_width, data_height = 10980, 10980 memory_limit_mb = 512 # 512 MB limit

Estimate chunk size for memory constraint

Assuming float32 data (4 bytes per pixel)

pixels_per_mb = (1024 * 1024) // 4 target_chunk = int((pixels_per_mb * memory_limit_mb) ** 0.5)

Align chunk size with data dimensions

optimal_chunk = calculate_aligned_chunk_size(data_width, target_chunk)

print(f"Using chunk size: {optimal_chunk}")

Process with memory-efficient settings

dt_geozarr = create_geozarr_dataset( dt_input=dt, groups=["/measurements/r10m"], output_path="memory_efficient.zarr", spatial_chunk=optimal_chunk ) ```

Advanced Use Cases

Custom Metadata Enhancement

Add custom metadata to the converted dataset:

```python import xarray as xr from eopf_geozarr import create_geozarr_dataset

Convert dataset

dt_geozarr = create_geozarr_dataset( dt_input=dt, groups=["/measurements/r10m"], output_path="enhanced.zarr" )

Add custom metadata

dt_geozarr.attrs.update({ 'processing_date': '2024-01-15', 'processing_software': 'eopf-geozarr v0.1.0', 'custom_parameter': 'value' })

Add group-specific metadata

for group_name in dt_geozarr.groups: group = dt_geozarr[group_name] if hasattr(group, 'ds') and group.ds is not None: group.ds.attrs['processing_level'] = 'L2A_GeoZarr'

Save enhanced metadata

dt_geozarr.to_zarr("enhanced.zarr", mode="a") ```

Validation and Quality Control

Comprehensive validation workflow:

```python import xarray as xr from eopf_geozarr import create_geozarr_dataset from eopf_geozarr.conversion.utils import validate_existing_band_data

Convert dataset

dt_geozarr = create_geozarr_dataset( dt_input=dt, groups=["/measurements/r10m"], output_path="validated.zarr" )

Validate the conversion

dt_check = xr.open_datatree("validated.zarr", engine="zarr")

Check multiscales metadata

multiscales = dt_check.attrs.get('multiscales', []) print(f"Multiscales levels: {len(multiscales)}")

Validate each resolution level

for level in ["0", "1", "2"]: group_path = f"/measurements/r10m/{level}" if group_path in dt_check.groups: ds = dt_check[group_path].ds print(f"Level {level}: {dict(ds.dims)}")

    # Check required attributes
    for var_name in ds.data_vars:
        var = ds[var_name]
        has_dims = '_ARRAY_DIMENSIONS' in var.attrs
        has_grid_mapping = 'grid_mapping' in var.attrs
        print(f"  {var_name}: dims={has_dims}, grid_mapping={has_grid_mapping}")

Validate CRS information

for group_name in dt_check.groups: group = dt_check[group_name] if hasattr(group, 'ds') and group.ds is not None: crs_vars = [v for v in group.ds.data_vars if 'spatial_ref' in v or 'crs' in v] print(f"{group_name} CRS variables: {crs_vars}") ```

Batch Processing

Process multiple datasets in batch:

```python import os from pathlib import Path from eopf_geozarr import create_geozarr_dataset import xarray as xr

def batch_convert_datasets(input_dir: str, output_dir: str, groups: list): """Convert multiple EOPF datasets to GeoZarr format."""

input_path = Path(input_dir)
output_path = Path(output_dir)
output_path.mkdir(exist_ok=True)

# Find all .zarr directories
zarr_files = list(input_path.glob("*.zarr"))

for zarr_file in zarr_files:
    try:
        print(f"Processing {zarr_file.name}...")

        # Load dataset
        dt = xr.open_datatree(str(zarr_file), engine="zarr")

        # Convert to GeoZarr
        output_file = output_path / f"{zarr_file.stem}_geozarr.zarr"
        dt_geozarr = create_geozarr_dataset(
            dt_input=dt,
            groups=groups,
            output_path=str(output_file),
            spatial_chunk=4096
        )

        print(f"✓ Completed {zarr_file.name}")

    except Exception as e:
        print(f"✗ Failed {zarr_file.name}: {e}")

Usage

batch_convert_datasets( input_dir="/data/sentinel2/raw", output_dir="/data/sentinel2/geozarr", groups=["/measurements/r10m", "/measurements/r20m"] ) ```

Integration Examples

STAC Integration

Create STAC items for converted GeoZarr datasets:

```python import json from datetime import datetime import xarray as xr from eopf_geozarr import create_geozarr_dataset

Convert dataset

dt_geozarr = create_geozarr_dataset( dt_input=dt, groups=["/measurements/r10m"], output_path="stac_ready.zarr" )

Extract metadata for STAC

ds = dt_geozarr["/measurements/r10m/0"].ds spatial_ref = ds.get('spatial_ref', ds.get('crs', None))

Create basic STAC item

stac_item = { "stac_version": "1.0.0", "type": "Feature", "id": "sentinel2_geozarr_example", "properties": { "datetime": datetime.now().isoformat(), "platform": "sentinel-2", "instruments": ["msi"], "processing:level": "L2A", "processing:software": "eopf-geozarr" }, "geometry": { "type": "Polygon", "coordinates": [[ # Extract from dataset bounds [float(ds.x.min()), float(ds.y.min())], [float(ds.x.max()), float(ds.y.min())], [float(ds.x.max()), float(ds.y.max())], [float(ds.x.min()), float(ds.y.max())], [float(ds.x.min()), float(ds.y.min())] ]] }, "assets": { "geozarr": { "href": "stac_ready.zarr", "type": "application/vnd+zarr", "roles": ["data"], "title": "GeoZarr Dataset" } } }

Save STAC item

with open("stac_item.json", "w") as f: json.dump(stac_item, f, indent=2) ```

Jupyter Notebook Integration

Interactive exploration in Jupyter:

```python

Cell 1: Setup and conversion

import xarray as xr import matplotlib.pyplot as plt from eopf_geozarr import create_geozarr_dataset

dt = xr.open_datatree("input.zarr", engine="zarr") dt_geozarr = create_geozarr_dataset( dt_input=dt, groups=["/measurements/r10m"], output_path="notebook_example.zarr" )

Cell 2: Interactive visualization

%matplotlib widget import ipywidgets as widgets

def plot_band(band_name, level): ds = dt_geozarr[f"/measurements/r10m/{level}"].ds band_data = ds[band_name]

plt.figure(figsize=(10, 8))
band_data.plot(robust=True, cmap='viridis')
plt.title(f"{band_name} - Level {level}")
plt.show()

Create interactive widgets

band_widget = widgets.Dropdown( options=['b02', 'b03', 'b04', 'b08'], value='b04', description='Band:' )

level_widget = widgets.Dropdown( options=['0', '1', '2'], value='0', description='Level:' )

widgets.interact(plot_band, band_name=band_widget, level=level_widget) ```

These examples demonstrate the flexibility and power of the EOPF GeoZarr library across various use cases, from simple conversions to complex cloud-based workflows.