Skip to contents

Transforms data and writes parquet files + manifest to GCS ingest folder. Each ingest workflow produces parquet files for its tables and a manifest tracking provenance back to the source archive.

Usage

write_ingest_outputs(
  data_info,
  provider,
  dataset,
  gcs_bucket = "calcofi-db",
  compression = "snappy"
)

Arguments

data_info

Output from read_csv_files()

provider

Data provider (e.g., "swfsc.noaa.gov")

dataset

Dataset name (e.g., "calcofi-db")

gcs_bucket

GCS bucket for ingest outputs (default: "calcofi-db")

compression

Parquet compression method (default: "snappy")

Value

List with:

  • gcs_base: Base GCS path for this ingest

  • parquet_paths: Named list of GCS paths to parquet files

  • manifest_path: GCS path to manifest.json

  • manifest: The manifest data as a list

Examples

if (FALSE) { # \dontrun{
d <- read_csv_files(
  provider     = "swfsc.noaa.gov",
  dataset      = "calcofi-db",
  metadata_dir = "metadata")

result <- write_ingest_outputs(
  data_info  = d,
  provider   = "swfsc.noaa.gov",
  dataset    = "calcofi-db")

# check manifest
result$manifest$tables
} # }