Skip to contents

Reads parquet files from multiple ingest outputs and integrates them into the Working DuckLake. Tables from different ingests with the same name are combined (e.g., cruise tables from different sources). Provenance columns (_ingest_provider, _ingest_dataset) are added to track origin.

Usage

integrate_to_working_ducklake(
  ingests,
  gcs_bucket = "calcofi-db",
  ducklake_path = "ducklake/working"
)

Arguments

ingests

List of ingest result lists (from write_ingest_outputs()) or a tibble from list_ingest_outputs()

gcs_bucket

GCS bucket (default: "calcofi-db")

ducklake_path

Path to Working DuckLake within bucket (default: "ducklake/working")

Value

List with DuckLake path and table info

Examples

if (FALSE) { # \dontrun{
# from targets pipeline
result <- integrate_to_working_ducklake(
  ingests = list(ingest_swfsc, ingest_bottle))

# from existing GCS ingests
existing <- list_ingest_outputs()
result <- integrate_to_working_ducklake(existing)
} # }