Validates that CSV files match their redefinition metadata before database ingestion. This function is designed to be called from Quarto notebooks and will stop notebook execution if mismatches are detected.
Usage
check_data_integrity(
d,
dataset_name = "Dataset",
halt_on_fail = TRUE,
display_format = "DT",
verbose = TRUE
)
Arguments
- d
List output from read_csv_files() containing CSV and redefinition data
- dataset_name
Name of dataset for display purposes (e.g., "NOAA CalCOFI Database")
- halt_on_fail
Logical, whether to set knitr eval=FALSE on failure (default: TRUE)
- display_format
Format for displaying changes: "DT" (DataTable), "kable", or "print" (default: "DT")
- verbose
Logical, print detailed messages (default: TRUE)
Value
List with:
passed: Logical indicating if integrity check passed
changes: Full changes object from detect_csv_changes()
n_changes: Number of changes detected
message: Character string with markdown-formatted message
Details
The function:
Detects changes between CSV files and redefinitions using detect_csv_changes()
Prints summary statistics of detected changes
Displays interactive table of changes if any exist
Returns appropriate status for notebook control flow
When called from a Quarto notebook in an output: asis chunk, this function will render markdown messages and can control chunk evaluation via knitr options.
Examples
if (FALSE) { # \dontrun{
# In a Quarto notebook chunk with output: asis
d_noaa <- read_csv_files("swfsc.noaa.gov", "calcofi-db")
integrity_check <- check_data_integrity(
d = d_noaa,
dataset_name = "NOAA CalCOFI Database"
)
# Continue only if check passed
if (!integrity_check$passed) {
stop("Data integrity check failed")
}
} # }