curryer.correction.dataio ========================= .. py:module:: curryer.correction.dataio .. autoapi-nested-parse:: Helpers for querying and downloading NetCDF data from AWS S3. All interactions rely on the boto3 S3 client. Callers may either provide an explicit client instance (useful for testing) or rely on the default client, in which case boto3 must be installed and AWS credentials are read from the standard ``AWS_*`` environment variables. Attributes ---------- .. autoapisummary:: curryer.correction.dataio.boto3 Classes ------- .. autoapisummary:: curryer.correction.dataio.TelemetryLoader curryer.correction.dataio.ScienceLoader curryer.correction.dataio.GCPLoader curryer.correction.dataio.S3Configuration Functions --------- .. autoapisummary:: curryer.correction.dataio.validate_telemetry_output curryer.correction.dataio.validate_science_output curryer.correction.dataio._require_client curryer.correction.dataio._iter_dates curryer.correction.dataio.find_netcdf_objects curryer.correction.dataio.download_netcdf_objects Module Contents --------------- .. py:data:: boto3 :value: None .. py:class:: TelemetryLoader Bases: :py:obj:`Protocol` Protocol for mission-specific telemetry loading functions. Telemetry loaders are responsible for reading spacecraft state data (position, attitude, timing) from mission-specific formats and returning it in a standard DataFrame format. Standard Signature: def load_telemetry(tlm_key: str, config) -> pd.DataFrame Requirements: - Accept tlm_key (path or identifier) and config object - Return DataFrame with mission-specific telemetry fields - Include time fields needed for SPICE kernel creation - Include attitude data (quaternions or DCMs) - Include position data if creating SPK kernels .. rubric:: Example def load_clarreo_telemetry(tlm_key: str, config) -> pd.DataFrame: # Load from multiple CSV files # Convert formats (DCM to quaternion, etc.) # Merge and return return telemetry_df .. py:method:: __call__(tlm_key: str, config) -> pandas.DataFrame Load telemetry data for a given key. .. py:class:: ScienceLoader Bases: :py:obj:`Protocol` Protocol for mission-specific science frame loading functions. Science loaders provide frame timing and metadata for the instrument observations that will be geolocated. Standard Signature: def load_science(sci_key: str, config) -> pd.DataFrame Requirements: - Accept sci_key (path or identifier) and config object - Return DataFrame with frame timing data - Must include time field specified in config.geo.time_field - Time values should match expected format (e.g., GPS microseconds) .. rubric:: Example def load_clarreo_science(sci_key: str, config) -> pd.DataFrame: # Load frame timestamps # Convert to required units (e.g., GPS µs) return science_df .. py:method:: __call__(sci_key: str, config) -> pandas.DataFrame Load science frame timing/metadata. .. py:class:: GCPLoader Bases: :py:obj:`Protocol` Protocol for mission-specific GCP (Ground Control Point) loading functions. GCP loaders retrieve reference imagery or coordinates for ground truth comparison. Standard Signature: def load_gcp(gcp_key: str, config) -> Any .. note:: This interface is currently a placeholder. The return type and structure will be standardized when GCP loading is fully integrated into the pipeline. .. rubric:: Example def load_clarreo_gcp(gcp_key: str, config): # Load Landsat reference image # Or load GCP coordinate database return gcp_data .. py:method:: __call__(gcp_key: str, config) Load GCP reference data. .. py:function:: validate_telemetry_output(df: pandas.DataFrame, config) -> None Validate that telemetry loader output has expected structure. :param df: DataFrame returned by telemetry loader :param config: CorrectionConfig object :raises TypeError: If not a DataFrame :raises ValueError: If DataFrame is empty .. note:: Specific column requirements depend on mission and kernel configs. This performs basic structure checks only. .. py:function:: validate_science_output(df: pandas.DataFrame, config) -> None Validate that science loader output has expected structure. :param df: DataFrame returned by science loader :param config: CorrectionConfig object :raises TypeError: If not a DataFrame :raises ValueError: If DataFrame is empty or missing required time field .. rubric:: Example >>> sci_df = load_science("sci_001", config) >>> validate_science_output(sci_df, config) .. py:class:: S3Configuration(bucket: str, base_prefix: str) Configuration describing how data is organised within an S3 bucket. .. py:attribute:: bucket .. py:attribute:: base_prefix .. py:method:: date_prefix(date: datetime.date) -> str Return the S3 prefix for ``date``. .. py:function:: _require_client(client: object | None) -> object .. py:function:: _iter_dates(start: datetime.date, end: datetime.date) -> collections.abc.Iterable[datetime.date] .. py:function:: find_netcdf_objects(config: S3Configuration, start_date: datetime.date, end_date: datetime.date, *, s3_client=None) -> list[str] Return S3 object keys for NetCDF files in the given date range. :param config: Describes the bucket and prefix layout. :type config: S3Configuration :param start_date: Inclusive date range to scan for NetCDF files. :type start_date: datetime.date :param end_date: Inclusive date range to scan for NetCDF files. :type end_date: datetime.date :param s3_client: Client instance to use. If omitted, a default client is created. :type s3_client: boto3 S3 client, optional .. py:function:: download_netcdf_objects(config: S3Configuration, object_keys: collections.abc.Iterable[str], destination: os.PathLike[str] | str, *, s3_client=None) -> list[pathlib.Path] Download the specified S3 objects to ``destination``. :param config: Describes the bucket hosting the objects. :type config: S3Configuration :param object_keys: S3 object keys to download. :type object_keys: iterable of str :param destination: Directory where the files should be stored. It is created if needed. :type destination: path-like :param s3_client: Client instance to use. If omitted, a default client is created. :type s3_client: boto3 S3 client, optional