EOPF GeoZarr Documentation¶
Welcome to the EOPF GeoZarr library documentation. This library provides tools to convert EOPF (Earth Observation Processing Framework) datasets to GeoZarr-spec 0.4 compliant format while maintaining scientific accuracy and optimizing for cloud-native workflows.
Quick Navigation¶
Getting Started¶
- Installation - Install the library and set up your environment
- Quick Start - Convert your first dataset in minutes
- User Guide - Comprehensive usage guide with advanced options
Reference¶
- API Reference - Complete Python API documentation
- Examples - Practical examples for common use cases
- Architecture - Technical architecture and design principles
- GeoZarr Mini Spec - Implementation-specific GeoZarr specification
Support¶
- FAQ - Frequently asked questions and troubleshooting
- GeoZarr Specification - Our contributions to the GeoZarr spec
What is EOPF GeoZarr?¶
The EOPF GeoZarr library bridges the gap between EOPF datasets and the emerging GeoZarr specification, enabling:
✅ Scientific Accuracy - Preserves native CRS and data integrity
✅ Cloud-Native - Optimized for S3 and distributed processing
✅ Performance - Intelligent chunking and multiscale pyramids
✅ Standards Compliant - Full GeoZarr 0.4 and CF conventions support
✅ Production Ready - Robust error handling and validation
Key Features¶
🌍 Native CRS Preservation¶
Maintains original coordinate reference systems (UTM, polar stereographic, etc.) without unnecessary reprojection to Web Mercator, preserving scientific accuracy for Earth observation data.
📊 Multiscale Pyramids¶
Automatically generates overview levels with /2 downsampling, creating efficient multiscale pyramids for visualization and analysis at different resolutions.
🔧 Intelligent Chunking¶
Implements aligned chunking strategy that prevents partial chunks, optimizes storage efficiency, and improves I/O performance for both local and cloud storage.
☁️ Cloud-Native Design¶
Full support for AWS S3 and S3-compatible storage with automatic credential detection, retry logic, and optimized multipart uploads.
📋 Standards Compliance¶
- GeoZarr 0.4 specification compliance
- CF conventions for scientific metadata
_ARRAY_DIMENSIONS
attributes on all arrays- Grid mapping variables with proper CRS information
- Multiscales metadata structure
🚀 Performance Optimized¶
- Dask integration for distributed processing
- Lazy loading for memory efficiency
- Band-by-band processing with validation
- Retry logic for robust network operations
Supported Data¶
Satellite Missions¶
- Sentinel-2 L1C and L2A products (fully supported)
- Sentinel-1 (planned support)
- Extensible architecture for additional missions
Data Formats¶
- Input: EOPF DataTree (Zarr format)
- Output: GeoZarr-compliant Zarr with multiscale structure
Storage Backends¶
- Local filesystems
- AWS S3
- S3-compatible storage (MinIO, DigitalOcean Spaces, etc.)
Quick Example¶
import xarray as xr
from eopf_geozarr import create_geozarr_dataset
# Load EOPF dataset
dt = xr.open_datatree("sentinel2_l2a.zarr", engine="zarr")
# Convert to GeoZarr
dt_geozarr = create_geozarr_dataset(
dt_input=dt,
groups=["/measurements/r10m", "/measurements/r20m", "/measurements/r60m"],
output_path="s3://my-bucket/geozarr.zarr",
spatial_chunk=4096
)
print("Conversion complete!")
Or using the command line:
Architecture Overview¶
The library is organized into focused modules:
conversion/
- Core conversion engine and algorithmsgeozarr.py
- Main conversion functionsfs_utils.py
- Storage backend abstractionutils.py
- Processing utilities and chunkingcli.py
- Command-line interfacedata_api/
- Future data access API (pydantic-zarr integration)
Implementation Highlights¶
Based on our experience and contributions to the GeoZarr specification (see ADR-101), this library implements:
Native CRS Tile Matrix Sets¶
Creates custom tile matrix sets for arbitrary coordinate reference systems, not just Web Mercator, enabling scientific applications that require native projections.
Aligned Chunking Strategy¶
Implements intelligent chunk size calculation that prevents partial chunks and optimizes for both storage efficiency and processing performance.
Hierarchical Data Organization¶
Uses a sibling-based structure (/0
, /1
, /2
) for resolution levels that complies with xarray DataTree requirements while maintaining GeoZarr specification compliance.
Robust Cloud Integration¶
Production-ready S3 integration with credential validation, error handling, and performance optimization for large-scale Earth observation workflows.
Getting Started¶
- Install the library
- Quick Start with your first conversion
- Explore Examples for your specific use case
- Read the User Guide for advanced usage
Community and Support¶
- Documentation: Comprehensive guides and API reference
- GitHub: eopf-explorer/data-model
- Issues: Report bugs and request features
- Contributions: Help improve the library and specification
The EOPF GeoZarr library is actively developed and maintained by Development Seed, with ongoing contributions to the GeoZarr specification to better serve the Earth observation community.