curryer.correction.dataio

Helpers for querying and downloading NetCDF data from AWS S3.

All interactions rely on the boto3 S3 client. Callers may either provide an explicit client instance (useful for testing) or rely on the default client, in which case boto3 must be installed and AWS credentials are read from the standard AWS_* environment variables.

Attributes

Classes

TelemetryLoader

Protocol for mission-specific telemetry loading functions.

ScienceLoader

Protocol for mission-specific science frame loading functions.

GCPLoader

Protocol for mission-specific GCP (Ground Control Point) loading functions.

S3Configuration

Configuration describing how data is organised within an S3 bucket.

Functions

validate_telemetry_output(→ None)

Validate that telemetry loader output has expected structure.

validate_science_output(→ None)

Validate that science loader output has expected structure.

_require_client(→ object)

_iter_dates(→ collections.abc.Iterable[datetime.date])

find_netcdf_objects(→ list[str])

Return S3 object keys for NetCDF files in the given date range.

download_netcdf_objects(→ list[pathlib.Path])

Download the specified S3 objects to destination.

Module Contents

curryer.correction.dataio.boto3 = None
class curryer.correction.dataio.TelemetryLoader

Bases: Protocol

Protocol for mission-specific telemetry loading functions.

Telemetry loaders are responsible for reading spacecraft state data (position, attitude, timing) from mission-specific formats and returning it in a standard DataFrame format.

Standard Signature:

def load_telemetry(tlm_key: str, config) -> pd.DataFrame

Requirements:
  • Accept tlm_key (path or identifier) and config object

  • Return DataFrame with mission-specific telemetry fields

  • Include time fields needed for SPICE kernel creation

  • Include attitude data (quaternions or DCMs)

  • Include position data if creating SPK kernels

Example

def load_clarreo_telemetry(tlm_key: str, config) -> pd.DataFrame:

# Load from multiple CSV files # Convert formats (DCM to quaternion, etc.) # Merge and return return telemetry_df

__call__(tlm_key: str, config) pandas.DataFrame

Load telemetry data for a given key.

class curryer.correction.dataio.ScienceLoader

Bases: Protocol

Protocol for mission-specific science frame loading functions.

Science loaders provide frame timing and metadata for the instrument observations that will be geolocated.

Standard Signature:

def load_science(sci_key: str, config) -> pd.DataFrame

Requirements:
  • Accept sci_key (path or identifier) and config object

  • Return DataFrame with frame timing data

  • Must include time field specified in config.geo.time_field

  • Time values should match expected format (e.g., GPS microseconds)

Example

def load_clarreo_science(sci_key: str, config) -> pd.DataFrame:

# Load frame timestamps # Convert to required units (e.g., GPS µs) return science_df

__call__(sci_key: str, config) pandas.DataFrame

Load science frame timing/metadata.

class curryer.correction.dataio.GCPLoader

Bases: Protocol

Protocol for mission-specific GCP (Ground Control Point) loading functions.

GCP loaders retrieve reference imagery or coordinates for ground truth comparison.

Standard Signature:

def load_gcp(gcp_key: str, config) -> Any

Note

This interface is currently a placeholder. The return type and structure will be standardized when GCP loading is fully integrated into the pipeline.

Example

def load_clarreo_gcp(gcp_key: str, config):

# Load Landsat reference image # Or load GCP coordinate database return gcp_data

__call__(gcp_key: str, config)

Load GCP reference data.

curryer.correction.dataio.validate_telemetry_output(df: pandas.DataFrame, config) None

Validate that telemetry loader output has expected structure.

Parameters:
  • df – DataFrame returned by telemetry loader

  • config – CorrectionConfig object

Raises:
  • TypeError – If not a DataFrame

  • ValueError – If DataFrame is empty

Note

Specific column requirements depend on mission and kernel configs. This performs basic structure checks only.

curryer.correction.dataio.validate_science_output(df: pandas.DataFrame, config) None

Validate that science loader output has expected structure.

Parameters:
  • df – DataFrame returned by science loader

  • config – CorrectionConfig object

Raises:
  • TypeError – If not a DataFrame

  • ValueError – If DataFrame is empty or missing required time field

Example

>>> sci_df = load_science("sci_001", config)
>>> validate_science_output(sci_df, config)
class curryer.correction.dataio.S3Configuration(bucket: str, base_prefix: str)

Configuration describing how data is organised within an S3 bucket.

bucket
base_prefix
date_prefix(date: datetime.date) str

Return the S3 prefix for date.

curryer.correction.dataio._require_client(client: object | None) object
curryer.correction.dataio._iter_dates(start: datetime.date, end: datetime.date) collections.abc.Iterable[datetime.date]
curryer.correction.dataio.find_netcdf_objects(config: S3Configuration, start_date: datetime.date, end_date: datetime.date, *, s3_client=None) list[str]

Return S3 object keys for NetCDF files in the given date range.

Parameters:
  • config (S3Configuration) – Describes the bucket and prefix layout.

  • start_date (datetime.date) – Inclusive date range to scan for NetCDF files.

  • end_date (datetime.date) – Inclusive date range to scan for NetCDF files.

  • s3_client (boto3 S3 client, optional) – Client instance to use. If omitted, a default client is created.

curryer.correction.dataio.download_netcdf_objects(config: S3Configuration, object_keys: collections.abc.Iterable[str], destination: os.PathLike[str] | str, *, s3_client=None) list[pathlib.Path]

Download the specified S3 objects to destination.

Parameters:
  • config (S3Configuration) – Describes the bucket hosting the objects.

  • object_keys (iterable of str) – S3 object keys to download.

  • destination (path-like) – Directory where the files should be stored. It is created if needed.

  • s3_client (boto3 S3 client, optional) – Client instance to use. If omitted, a default client is created.