Skip to contents

Validates that CSV files match their redefinition metadata before database ingestion. This function is designed to be called from Quarto notebooks and will stop notebook execution if mismatches are detected.

Usage

check_data_integrity(
  d,
  dataset_name = "Dataset",
  halt_on_fail = TRUE,
  display_format = "DT",
  verbose = TRUE
)

Arguments

d

List output from read_csv_files() containing CSV and redefinition data

dataset_name

Name of dataset for display purposes (e.g., "NOAA CalCOFI Database")

halt_on_fail

Logical, whether to set knitr eval=FALSE on failure (default: TRUE)

display_format

Format for displaying changes: "DT" (DataTable), "kable", or "print" (default: "DT")

verbose

Logical, print detailed messages (default: TRUE)

Value

List with:

  • passed: Logical indicating if integrity check passed

  • changes: Full changes object from detect_csv_changes()

  • n_changes: Number of changes detected

  • message: Character string with markdown-formatted message

Details

The function:

  1. Detects changes between CSV files and redefinitions using detect_csv_changes()

  2. Prints summary statistics of detected changes

  3. Displays interactive table of changes if any exist

  4. Returns appropriate status for notebook control flow

When called from a Quarto notebook in an output: asis chunk, this function will render markdown messages and can control chunk evaluation via knitr options.

Examples

if (FALSE) { # \dontrun{
# In a Quarto notebook chunk with output: asis
d_noaa <- read_csv_files("swfsc.noaa.gov", "calcofi-db")
integrity_check <- check_data_integrity(
  d = d_noaa,
  dataset_name = "NOAA CalCOFI Database"
)

# Continue only if check passed
if (!integrity_check$passed) {
  stop("Data integrity check failed")
}
} # }