Skip to contents

High-level function to ingest all tables from a dataset into the Working DuckLake. This wraps transform_data() and ingest_to_working() into a single operation with proper provenance tracking.

Usage

ingest_dataset(con, d, mode = "replace", verbose = TRUE)

Arguments

con

DuckDB connection from get_working_ducklake()

d

Data object from read_csv_files()

mode

Insert mode: "replace" (default) or "append"

verbose

Print progress messages (default: TRUE)

Value

Tibble with ingestion statistics for each table:

  • tbl: Original table name

  • tbl_new: New table name after redefinition

  • gcs_path: Source file path for provenance

  • rows_input: Number of rows ingested

  • rows_after: Total rows in table after ingestion

  • ingested_at: Timestamp of ingestion

Examples

if (FALSE) { # \dontrun{
# read and ingest dataset
d <- read_csv_files(
  provider     = "swfsc.noaa.gov",
  dataset      = "calcofi-db",
  dir_data     = "~/My Drive/projects/calcofi/data-public",
  metadata_dir = "metadata")

con <- get_working_ducklake()
stats <- ingest_dataset(con, d, mode = "replace")
save_working_ducklake(con)
close_duckdb(con)
} # }