R package for accessing and visualizing CalCOFI data. Connect directly to the CalCOFI database via DuckDB or use the CalCOFI API.
Install
This package lives on Github, not yet CRAN, so you’ll need to run the following to install or update the package:
remotes::install_github("calcofi/calcofi4r")Then load the package:
Quick Start
Connect to CalCOFI Database
Access the CalCOFI integrated database directly via DuckDB:
# connect to latest frozen release
con <- cc_get_db()
# list available tables
cc_list_tables()
#> [1] "bottle" "bottle_measurement" "cast_condition"
#> [4] "casts" "cruise" "grid"
#> [7] "ichthyo" "lookup" "measurement_type"
#> ...
# query with SQL
DBI::dbGetQuery(con, "SELECT COUNT(*) FROM ichthyo")Read Data with Convenience Functions
# read ichthyoplankton (larvae) data
ichthyo <- cc_read_ichthyo()
# read bottle samples
bottles <- cc_read_bottle()
# read cast data
casts <- cc_read_casts()
# read species taxonomy
species <- cc_read_species()
# get measurement types
cc_list_measurement_types()
# filter while reading (uses dplyr syntax, returns lazy table)
anchovy <- cc_read_ichthyo(species_id == 19, collect = FALSE)Version Control
Access specific database versions for reproducibility:
# list available versions
cc_list_versions()
#> version release_date tables total_rows size_mb is_latest
#> 1 v2026.05.19 2026-05-19 28 133022102 5503.6 TRUE
# connect to specific version
con <- cc_get_db(version = "v2026.05.19")
# get release information
cc_db_info("v2026.05.19")
# view release notes
cc_release_notes("v2026.05.19")Execute Custom Queries
# run SQL queries
results <- cc_query("
SELECT species_id, COUNT(*) as n
FROM ichthyo
GROUP BY species_id
ORDER BY n DESC
LIMIT 10")
# describe table schema (descriptions + units from metadata.json sidecar)
cc_describe_table("ichthyo")
cc_describe_table("casts")CalCOFI API Functions
The package also provides functions for the CalCOFI API at api.calcofi.io:
# get available variables
get_variables()
# get cruise information
get_cruises()
# get interpolated raster
get_raster(
variable = "ctdcast_bottle.t_deg_c",
cruise_id = "2020-01-05-C-33RL",
out_tif = "temperature.tif")
# get time series summary
get_timeseries(
variable = "ctdcast_bottle.t_deg_c",
aoi_wkt = "POLYGON((-121 33, -119 33, -119 35, -121 35, -121 33))",
depth_m_min = 0,
depth_m_max = 100,
time_step = "year")Package Data
The package includes small lookup and example datasets:
# CalCOFI sampling grid
cc_grid
cc_grid_ctrs
cc_grid_zones
# example bottle data
cc_bottle
# station locations
stations
# geographic places
cc_placesData Architecture
CalCOFI data is stored in frozen DuckLake releases on Google Cloud Storage:
gs://calcofi-db/ducklake/releases/
├── v2026.05.19/
│ ├── catalog.json # table list, row counts, total_size
│ ├── relationships.json # primary + foreign keys
│ ├── metadata.json # table/column descriptions, units, datasets, measurement types
│ ├── RELEASE_NOTES.md
│ └── parquet/
│ ├── bottle.parquet
│ ├── casts.parquet
│ ├── ichthyo.parquet
│ ├── species.parquet
│ └── ...
├── versions.json
└── latest.txt → v2026.05.19Data is accessed directly via DuckDB’s httpfs extension — no download required for queries.
See also
-
CalCOFI Schema — per-release ERD, tables, columns (units + descriptions), datasets, and measurement-type registry. The same
metadata.jsonsidecar that powerscc_describe_table()andcc_db_catalog(). - CalCOFI Query — browser-only DuckDB-WASM playground against the public release Parquet.
- CalCOFI Docs — data access, helpers, portals, API.
Code of Conduct
This is an open-source project so your input is greatly welcomed! Please note that the calcofi4r project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.