Creates a sidecar metadata.json file alongside parquet outputs that
documents every table and column. DuckDB COMMENT ON does not propagate
to parquet via COPY TO, so this provides the metadata externally.
Usage
build_metadata_json(
con,
d_tbls_rd,
d_flds_rd,
metadata_derived_csv = NULL,
output_dir,
tables = NULL,
set_comments = TRUE,
provider = NULL,
dataset = NULL,
workflow_url = NULL
)Arguments
- con
DuckDB connection
- d_tbls_rd
Table redefinition data frame (with
tbl_new,tbl_description)- d_flds_rd
Field redefinition data frame (with
tbl_new,fld_new,fld_description,units)- metadata_derived_csv
Path to CSV with derived table/column metadata (columns: table, column, name_long, units, description_md)
- output_dir
Directory to write
metadata.json- tables
Character vector of table names to include. If NULL, uses all tables from DuckDB.
- set_comments
If TRUE, also sets DuckDB
COMMENT ONfor tables/columns- provider
Data provider identifier (e.g. "swfsc.noaa.gov")
- dataset
Dataset identifier (e.g. "calcofi-db")
- workflow_url
URL to the rendered workflow page
Details
Metadata is assembled from three sources:
Table/field redefinition files (
d_tbls_rd,d_flds_rd)A derived metadata CSV for workflow-created tables/columns
Auto-generated stubs for any remaining undocumented columns
Optionally sets DuckDB COMMENT ON for tables and columns.
Examples
if (FALSE) { # \dontrun{
build_metadata_json(
con = con,
d_tbls_rd = d$d_tbls_rd,
d_flds_rd = d$d_flds_rd,
metadata_derived_csv = "metadata/swfsc.noaa.gov/calcofi-db/metadata_derived.csv",
output_dir = "data/parquet/swfsc.noaa.gov_calcofi-db",
tables = DBI::dbListTables(con),
provider = "swfsc.noaa.gov",
dataset = "calcofi-db",
workflow_url = "https://calcofi.io/workflows/ingest_swfsc.noaa.gov_calcofi-db.html")
} # }