Skip to contents

Compares md5 hashes across archive timestamps for a given provider/dataset. When multiple archives have identical content, keeps the earliest and removes the rest.

Usage

cleanup_duplicate_archives(
  provider,
  dataset,
  gcs_bucket = "calcofi-files-public",
  archive_prefix = "archive",
  dry_run = TRUE
)

Arguments

provider

Data provider (e.g., "swfsc.noaa.gov")

dataset

Dataset name (e.g., "calcofi-db")

gcs_bucket

GCS bucket name

archive_prefix

Archive folder prefix

dry_run

If TRUE (default), only report what would be removed

Value

Tibble of removed (or would-be-removed) archive timestamps

Examples

if (FALSE) { # \dontrun{
# preview what would be removed
cleanup_duplicate_archives("swfsc.noaa.gov", "calcofi-db")

# actually remove duplicates
cleanup_duplicate_archives("swfsc.noaa.gov", "calcofi-db", dry_run = FALSE)
} # }