Merge Per-Ingest metadata.json into a Release-Level Sidecar
Source:R/wrangle.R
merge_metadata_json.RdCombines per-ingest metadata.json files (produced by
build_metadata_json()) into a single release-level metadata.json.
Adds release-only tables and columns from CSV registries plus optional
dataset.csv and measurement_type.csv blocks. Emits schema
version "1.1" alongside catalog.json and
relationships.json in a frozen release directory.
Usage
merge_metadata_json(
paths,
output_path,
release_version = NULL,
release_tables_csv = NULL,
release_columns_csv = NULL,
measurement_type_csv = NULL,
dataset_csv = NULL,
ingest_yaml = NULL,
table_rows = NULL
)Arguments
- paths
Character vector of paths to per-ingest
metadata.jsonfiles.- output_path
Path for the merged output file.
- release_version
Optional release version string (e.g.
"v2026.05.14") written to the top-levelrelease_versionfield.- release_tables_csv
Optional path to a CSV with columns
table, name_long, description_md, provider, datasetdescribing tables built insiderelease_database.qmdthat have no per-ingest metadata.json (e.g.cruise_summary,_spatial).- release_columns_csv
Optional path to a CSV with columns
table, column, name_long, units, description_mdfor release-only columns.- measurement_type_csv
Optional path to
metadata/measurement_type.csv. When supplied, populates themeasurement_typesblock with one entry per canonical type.- dataset_csv
Optional path to
metadata/dataset.csv. Deprecated fallback for thedatasetsblock; superseded byingest_yaml. When both are supplied,ingest_yamlwins.- ingest_yaml
Optional named list from
read_ingest_yaml()(keyed byprovider_dataset). When supplied, thedatasetsblock and theerd_legendare built from each ingest'scalcofiYAML (authoritative source) rather thandataset_csv.- table_rows
Optional named numeric vector (table name → release-final row count, e.g. from freeze stats). Used as the denominator when computing per-dataset contribution percentages.
Details
Conflict rule: when the same table or table.column key
appears in multiple per-ingest files, the last path wins, but a warning
lists the duplicates so genuine drift between ingests is surfaced.
Examples
if (FALSE) { # \dontrun{
merge_metadata_json(
paths = c(
"data/parquet/swfsc_ichthyo/metadata.json",
"data/parquet/calcofi_bottle/metadata.json",
"data/parquet/calcofi_ctd-cast/metadata.json",
"data/parquet/calcofi_dic/metadata.json"),
output_path = "data/releases/v2026.05.14/metadata.json",
release_version = "v2026.05.14",
release_tables_csv = "metadata/release_tables.csv",
release_columns_csv = "metadata/release_columns.csv",
measurement_type_csv = "metadata/measurement_type.csv",
dataset_csv = "metadata/dataset.csv")
} # }