Skip to contents

calcofi4db 2.6.2

Invert consolidation, pipeline exclusions, and missing species corrections

  • consolidate_ichthyo_tables() gains invert_tbl parameter — folds Ed Weber’s inverts.csv into the unified ichthyo table with life_stage = "invert".
  • build_targets_list() gains exclude parameter — skip targets by name (e.g., exclude = "ingest_calcofi_ctd-cast"). Excluded targets are also stripped from other targets’ dependency lists. Normalizes hyphens to underscores for matching.
  • apply_data_corrections() adds 6 missing invert species (including Market squid, Doryteuthis opalescens) sourced from ERDDAP erdCalCOFIinvcnt. Dynamically matches columns to avoid errors when gbif_id hasn’t been added yet.

calcofi4db 2.6.1

Sorted parquet output with ST_Hilbert spatial ordering

  • sort_by parameter write_parquet_outputs() gains a sort_by named list to specify row ordering per table. Sorted row groups enable predicate pushdown (min/max statistics skip irrelevant chunks).
  • Hilbert spatial sort Use "hilbert:lon_col,lat_col" syntax in sort_by to order rows by ST_Hilbert() curve position — clusters spatially nearby records for fast bounding-box queries.
  • paste0() in COPY TO SQL construction in write_parquet_outputs() uses paste0() instead of glue::glue() to prevent cli {variable} interpolation errors when propagating through targets.
  • sort_by in manifest.json Sort specifications recorded alongside partition_by for downstream consumers.

calcofi4db 2.6.0

Native GEOMETRY storage via DuckDB v1.5 — removes spatial workaround

  • storage_compatibility_version = 'latest' get_duckdb_con() now sets this in the default config, enabling DuckDB v1.5’s native built-in GEOMETRY type. This fixes the “Buffer overflow” / “Skipping beyond end of binary data” spatial serialization bug that occurred with the old v0.10.2 storage format.
  • Removed geom_wkb workaround assign_grid_key() no longer refreshes grid geometry from a stored WKB column — native GEOMETRY storage is reliable.
  • Requires duckdb >= 1.5.1 Added minimum version constraint in DESCRIPTION to ensure the native GEOMETRY type is available.
  • Avoid glue in spatial.R assign_grid_key() uses paste0() instead of glue::glue() to prevent cli from intercepting {variable} patterns in error messages propagated through targets.

calcofi4db 2.5.6 (superseded)

Grid geometry refresh workaround for DuckDB spatial bug (removed in 2.6.0)

calcofi4db 2.5.5

Server-side GCS copy for archives & sync_to_gcs replaces put_gcs_file loops

  • Server-side archive copy .sync_to_gcs_archive() now checks _sync/{provider}/{dataset}/ on GCS before uploading from local. If a file exists with matching MD5, uses copy_gcs_file() for instant server-side copy — no local I/O or GD mount needed.
  • copy_gcs_file(src, dst) New helper for server-side GCS-to-GCS copy via gcloud storage cp.
  • Bottle & DIC uploads replaced put_gcs_file() loops in QMDs with sync_to_gcs() for hash-based deduplication (idempotent re-renders).

calcofi4db 2.5.4

Consolidated sync_to_gcs() with archive mode, exclude patterns & GCS logging

  • Unified sync function sync_to_gcs() gains archive, exclude, and log_to_gcs parameters. When archive = TRUE, creates timestamped immutable snapshots (replacing sync_to_gcs_archive() internals). When FALSE (default), standard mirror mode.
  • Exclude patterns New exclude parameter accepts glob patterns (e.g., c(".DS_Store", "*.tmp")) to skip files during sync.
  • GCS action logging log_to_gcs = TRUE writes a timestamped JSON log to gs://{bucket}/{prefix}/_logs/sync_YYYY-MM-DD_HHMMSS.json documenting every upload, skip, and delete.
  • Richer results Sync results tibble now includes size and reason columns (e.g., “checksum match”, “new file”, “crc32c changed”).
  • sync_to_gcs_archive() deprecated Now a thin wrapper calling sync_to_gcs(archive = TRUE). Existing callers work unchanged.

calcofi4db 2.5.3

DuckDB driver lifecycle, idempotent ingestion & defensive ALTER TABLE

calcofi4db 2.5.2

VIEWs for dependencies, GCS server-side copy, crc32c sync & spatial consolidation

  • VIEW-based dependency loading load_prior_tables() gains as_view parameter — creates VIEWs instead of TABLEs for zero-copy parquet reads. Dependency tables no longer duplicated across ingests.
  • calcofi.modifies frontmatter New YAML field declares which dependency tables an ingest modifies (e.g., ship). parse_qmd_frontmatter() parses it; build_release_table_registry() discovers _new delta sidecars from the filesystem.
  • GCS server-side copy for releases release_database.qmd copies parquet from ingest/ to releases/ on GCS via gcloud storage cp instead of re-uploading from local. Only derived/merged tables exported locally.
  • crc32c hash comparison sync_to_gcs() uses gcloud storage ls --json for crc32c hashes; list_gcs_files() returns crc32c column. Unchanged files skipped entirely.
  • Stale file cleanup sync_to_gcs() gains delete_stale parameter to remove orphaned GCS files after partition key or table renames.
  • export_parquet() New helper using DuckDB native COPY TO PARQUET — handles GEOMETRY columns (as WKB), preferred over arrow::write_parquet().
  • build_release_table_registry() Auto-discovers table-to-ingest mapping from manifests with canonical source marking for duplicates.
  • Archive listing fix get_latest_archive_timestamp() uses non-recursive gcloud storage ls instead of recursive --json scan that was hanging on large archives.

calcofi4db 2.5.1

Mismatch tracking, supplemental table support, targets integration & bug fix

calcofi4db 2.5.0

Simplified provider/dataset naming, taxonomy & workflow improvements

calcofi4db 2.4.0

*Use _uuid over _id, smarter sync with GCS*

  • Revert from int _id to _uuid preferred unique identifiers for SWFSC icthyo db
  • Use smarter synchronizing with GCS using md5 hash checks and modified time filenaming

calcofi4db 2.3.0

Addition of ship, taxonomy functions

Added helper functions for processing:

calcofi4db 2.2.1

Addition of spatial, parquet, viz helper functions

calcofi4db 2.2.0

Improvements to cloud plan functions

Workflow ingest_swfsc.noaa.gov_calcofi-db.qmd now fully automates ingestion of CalCOFI database from SWFSC NOAA archive to parquet files in Google Cloud Storage. Many new functions added.

calcofi4db 2.1.0

Addition of functions for phase 2 of cloud plan

  • Added ducklake and freeze functions. Updated documentation with concepts.

calcofi4db 1.2.0

Addition of functions for phase 1 of cloud plan

calcofi4db 1.1.0

Addition of CalCOFI Bottle Database

calcofi4db 1.0.0

Initial production release with NOAA CalCOFI Database

  • Complete NOAA CalCOFI Database ingestion with spatial features
  • Add synchronized versioning system for package and database
  • Create master ingestion workflow with integrity checks
  • Implement comprehensive metadata management

calcofi4db 0.1.1