API Reference¶
Complete reference for the EOPF GeoZarr library's Python API.
Core Functions¶
create_geozarr_dataset¶
The main function for converting EOPF datasets to GeoZarr format.
def create_geozarr_dataset(
dt_input: xr.DataTree,
groups: List[str],
output_path: str,
spatial_chunk: int = 4096,
min_dimension: int = 256,
tile_width: int = 256,
max_retries: int = 3,
**storage_kwargs
) -> xr.DataTree
Parameters:
dt_input
(xr.DataTree): Input EOPF DataTree to convertgroups
(List[str]): List of group paths to process (e.g.,["/measurements/r10m"]
)output_path
(str): Output path for the GeoZarr dataset (local or S3)spatial_chunk
(int, optional): Target spatial chunk size. Default: 4096min_dimension
(int, optional): Minimum dimension size for processing. Default: 256tile_width
(int, optional): Tile width for multiscale levels. Default: 256max_retries
(int, optional): Maximum retry attempts for operations. Default: 3**storage_kwargs
: Additional storage options (S3 credentials, etc.)
Returns:
xr.DataTree
: The converted GeoZarr-compliant DataTree
Example:
import xarray as xr
from eopf_geozarr import create_geozarr_dataset
dt = xr.open_datatree("input.zarr", engine="zarr")
dt_geozarr = create_geozarr_dataset(
dt_input=dt,
groups=["/measurements/r10m", "/measurements/r20m"],
output_path="output.zarr",
spatial_chunk=2048
)
Conversion Functions¶
setup_datatree_metadata_geozarr_spec_compliant¶
Sets up GeoZarr-compliant metadata for a DataTree.
def setup_datatree_metadata_geozarr_spec_compliant(
dt: xr.DataTree,
geozarr_groups: Dict[str, xr.Dataset]
) -> None
write_geozarr_group¶
Writes a single group to GeoZarr format with proper metadata.
def write_geozarr_group(
group_path: str,
datasets: Dict[str, xr.Dataset],
output_path: str,
spatial_chunk: int = 4096,
max_retries: int = 3,
**storage_kwargs
) -> None
create_geozarr_compliant_multiscales¶
Creates multiscales metadata compliant with GeoZarr specification.
def create_geozarr_compliant_multiscales(
datasets: Dict[str, xr.Dataset],
tile_width: int = 256
) -> List[Dict[str, Any]]
Utility Functions¶
calculate_aligned_chunk_size¶
Calculates optimal chunk size that aligns with data dimensions.
Parameters:
dimension_size
(int): Size of the data dimensiontarget_chunk_size
(int): Desired chunk size
Returns:
int
: Optimal aligned chunk size
Example:
from eopf_geozarr.conversion.utils import calculate_aligned_chunk_size
# For a 10980x10980 image with target 4096 chunks
chunk_size = calculate_aligned_chunk_size(10980, 4096)
print(chunk_size) # Returns 3660 (10980 / 3 = 3660)
downsample_2d_array¶
Downsamples a 2D array by factor of 2 using mean aggregation.
validate_existing_band_data¶
Validates existing band data against expected specifications.
def validate_existing_band_data(
dataset: xr.Dataset,
band_name: str,
expected_shape: Tuple[int, ...],
expected_chunks: Tuple[int, ...]
) -> bool
File System Functions¶
Storage Path Utilities¶
# Path normalization and validation
def normalize_path(path: str) -> str
def is_s3_path(path: str) -> bool
def parse_s3_path(s3_path: str) -> tuple[str, str]
# Storage options
def get_storage_options(path: str, **kwargs: Any) -> Optional[Dict[str, Any]]
def get_s3_storage_options(s3_path: str, **s3_kwargs: Any) -> Dict[str, Any]
S3 Operations¶
# S3 store creation and validation
def create_s3_store(s3_path: str, **s3_kwargs: Any) -> str
def validate_s3_access(s3_path: str, **s3_kwargs: Any) -> tuple[bool, Optional[str]]
def s3_path_exists(s3_path: str, **s3_kwargs: Any) -> bool
# S3 metadata operations
def write_s3_json_metadata(
s3_path: str,
metadata: Dict[str, Any],
**s3_kwargs: Any
) -> None
def read_s3_json_metadata(s3_path: str, **s3_kwargs: Any) -> Dict[str, Any]
Zarr Operations¶
# Zarr group operations
def open_zarr_group(path: str, mode: str = "r", **kwargs: Any) -> zarr.Group
def open_s3_zarr_group(s3_path: str, mode: str = "r", **s3_kwargs: Any) -> zarr.Group
# Metadata consolidation
def consolidate_metadata(output_path: str, **storage_kwargs) -> None
async def async_consolidate_metadata(output_path: str, **storage_kwargs) -> None
Metadata Functions¶
Coordinate Metadata¶
Adds proper coordinate metadata including:
_ARRAY_DIMENSIONS
attributes- CF standard names
- Coordinate variable attributes
Grid Mapping¶
def _setup_grid_mapping(ds: xr.Dataset, grid_mapping_var_name: str) -> None
def _add_geotransform(ds: xr.Dataset, grid_mapping_var: str) -> None
CRS and Tile Matrix¶
def create_native_crs_tile_matrix_set(
crs: Any,
transform: Any,
width: int,
height: int,
tile_width: int = 256
) -> Dict[str, Any]
Creates a tile matrix set for native CRS (non-Web Mercator).
Overview Generation¶
calculate_overview_levels¶
Calculates appropriate overview levels based on data dimensions.
create_overview_dataset_all_vars¶
Creates overview dataset with all variables downsampled.
Error Handling¶
Retry Logic¶
def write_dataset_band_by_band_with_validation(
ds: xr.Dataset,
output_path: str,
max_retries: int = 3,
**storage_kwargs
) -> None
Writes dataset with robust error handling and retry logic.
Constants and Enums¶
Coordinate Attributes¶
Returns standard attributes for X and Y coordinates.
Grid Mapping Detection¶
Determines if a variable is a grid mapping variable.
Usage Examples¶
Basic Conversion¶
import xarray as xr
from eopf_geozarr import create_geozarr_dataset
# Load and convert
dt = xr.open_datatree("input.zarr", engine="zarr")
dt_geozarr = create_geozarr_dataset(
dt_input=dt,
groups=["/measurements/r10m"],
output_path="output.zarr"
)
Advanced S3 Usage¶
from eopf_geozarr.conversion.fs_utils import (
validate_s3_access,
get_s3_storage_options
)
# Validate S3 access
s3_path = "s3://my-bucket/data.zarr"
is_valid, error = validate_s3_access(s3_path)
if is_valid:
# Get storage options
storage_opts = get_s3_storage_options(s3_path)
# Convert with S3
dt_geozarr = create_geozarr_dataset(
dt_input=dt,
groups=["/measurements/r10m"],
output_path=s3_path,
**storage_opts
)
Custom Chunking¶
from eopf_geozarr.conversion.utils import calculate_aligned_chunk_size
# Calculate optimal chunks for your data
width, height = 10980, 10980
optimal_chunk = calculate_aligned_chunk_size(width, 4096)
dt_geozarr = create_geozarr_dataset(
dt_input=dt,
groups=["/measurements/r10m"],
output_path="output.zarr",
spatial_chunk=optimal_chunk
)
Type Hints¶
The library uses comprehensive type hints. Import types as needed:
Error Types¶
Common exceptions you may encounter:
ValueError
: Invalid parameters or dataFileNotFoundError
: Missing input filesPermissionError
: Insufficient permissions for S3 or file operationszarr.errors.ArrayNotFoundError
: Missing Zarr arraysxarray.core.common.DataWithCoords
: Data structure issues
For detailed error handling examples, see the FAQ.