Transforms data and writes parquet files + manifest to GCS ingest folder. Each ingest workflow produces parquet files for its tables and a manifest tracking provenance back to the source archive.
Usage
write_ingest_outputs(
data_info,
provider,
dataset,
gcs_bucket = "calcofi-db",
compression = "snappy"
)Arguments
- data_info
Output from
read_csv_files()- provider
Data provider (e.g., "swfsc.noaa.gov")
- dataset
Dataset name (e.g., "calcofi-db")
- gcs_bucket
GCS bucket for ingest outputs (default: "calcofi-db")
- compression
Parquet compression method (default: "snappy")
Value
List with:
gcs_base: Base GCS path for this ingestparquet_paths: Named list of GCS paths to parquet filesmanifest_path: GCS path to manifest.jsonmanifest: The manifest data as a list
Examples
if (FALSE) { # \dontrun{
d <- read_csv_files(
provider = "swfsc.noaa.gov",
dataset = "calcofi-db",
metadata_dir = "metadata")
result <- write_ingest_outputs(
data_info = d,
provider = "swfsc.noaa.gov",
dataset = "calcofi-db")
# check manifest
result$manifest$tables
} # }