CalCOFI Query
Pick a query on the left, fill the form, click Run. The SQL runs in your browser via DuckDB-WASM against the public CalCOFI release Parquet on Google Cloud Storage. No server, no credentials, no install.
What runs where
βββββββββββββββ β WASM bundle (cached after first load, ~5 MB)
β β ββββββββββββββββββββββββββββββΊ jsDelivr CDN
β Your browserβ
β (DuckDB-WASMβ β‘ Parquet range requests (no auth, CORS open)
β in a workerβ ββββββββββββββββββββββββββββββΊ storage.googleapis.com
β thread) β β³ /calcofi-db/ducklake/releases/β¦
β β ββββββββββββββββββββββββββββββ
βββββββββββββββ β’ Arrow result rows
Nothing else. No CalCOFI server is involved β DuckDB-WASM does the SQL planning, the HTTP range requests, and the joins, all on your machine.
Speed expectations
| Phase | Time |
|---|---|
| First click β fetch & instantiate the DuckDB-WASM bundle from jsDelivr | ~3β6 s |
| First query β fetch Parquet footers + first row groups over HTTPS | ~10β20 s |
| Subsequent clicks β same query / same tables, warm caches | sub-second for Browse / Quick facts |
| Bio β Env Matching queries (complex CTEs, joins across 5 tables) | ~5β15 s warm |
Your browser caches the WASM bundle; DuckDBβs httpfs caches Parquet ranges
in memory across queries. The bigger second-and-onwards cost is the query
itself β wide scans of ctd_thin (5.5 M rows) or bottle_measurement
(11 M rows) take a few seconds even when warm.
Caveats
- Bundle weight β the WASM bundle is ~5 MB, downloaded once and cached. Donβt try this on cellular if you can help it.
- Match windows β the Bio β Env Matching queries default to relaxed (5 km / 72 hr). Tighter windows (2 km / 6 hr) return fewer rows; see Matching Helpers.
- Env data ends 2021-05 β CTD-bottle environmental observations stop in May 2021, while net-tow biological data runs later. So bioβenv matches against dates after 2021-05 return zero rows. The recurring worked example uses Q1 2018 for this reason.
- Reproducibility β every query takes a
versionparameter (defaultv2026.05.20). Pin the version explicitly (e.g.v2026.05.14) for archival reproducibility β everyread_parquet()URL in the emitted SQL then carries that version.
The same query, everywhere
| Where | How |
|---|---|
| R on your laptop | calcofi4r::cc_match_ichthyo_by_name(...) β emits & runs the same SQL; attr(d, "sql") hands it back |
| Python on a notebook server | duckdb.connect().sql(open("query.sql").read()).df() |
| shell on the command line | duckdb < query.sql |
| your web browser, no install | This page. Run any query, click Copy SQL, paste it anywhere |
The SQL is byte-identical across all four β verified in
CalCOFI/docs
and the
bio-env-matching vignette.
Where to read more
- Schema browser β ERD, table/column descriptions, units, measurement-type registry for every release. Use it to pick which columns to project before writing a query.
- Data Access β direct DuckDB + GCS Parquet querying (R, Python, this app)
- Matching Helpers β the
calcofi4rR-package wrappers calcofi4rreference β the R API- Bio β Env Matching vignette β the worked example, faceted maps + scatter, the 2014β2019 marine heatwave
- API β replacement reference β old Plumber endpoints β this app
- How to add a new query β drop a
.mdin the right folder - Source for this app β github.com/CalCOFI/query
Bio β Env Matching Β· by scientific name
Net-tow ichthyoplankton matched to CTD-bottle measurements by scientific
name. Replaces the retired /ichthyodata Plumber
endpoint.
Mirrors calcofi4r::cc_match_ichthyo_by_name() β
the SQL emitted here is character-identical to its attr(d, "sql"). See
the Matching Helpers chapter for the
windows, join methods, and version pinning.
Default form returns 13 rows β the same Q1 2018 sardine-larva worked example used across the docs and the vignette.
Bio β Env Matching Β· by taxon
Net-tow ichthyoplankton matched to CTD-bottle measurements by WoRMS
taxon ID and every descendant. Replaces the retired
/itis_ichthyodata Plumber endpoint, swapping the
dead ITIS path regex for a recursive walk of
taxon.parentNameUsageID over the WoRMS authority.
Look up a worms_id in Browse β species, or paste one from
WoRMS. Default 125724 is the genus
Engraulis β its subtree currently resolves to E. mordax (northern
anchovy) in the CalCOFI taxonomy.
Bio β Env Matching Β· custom (bio + env)
Power-user mode. The engine (cc_match_bio_env)
just needs two SELECT sub-queries with the right column contract β drop
in your own and the temporal interval + spatial ST_Distance_Sphere join
runs on top.
Contract.
bio must yield bio_id (unique per observation), bio_datetime
(TIMESTAMP), bio_lon, bio_lat (decimal degrees) and bio_value
(DOUBLE). Any extras are carried through to the output as grouping
keys.
env must yield exactly env_id, env_datetime, env_lon, env_lat,
env_value, env_depth_m, measurement_type.
Default form is the worked example pre-filled β click Run and you get the same 13 sardine-larva rows.
Bio β Env Matching Β· zooplankton biomass
Net-tow displacement-volume biomass matched to CTD-bottle measurements.
Biomass comes from the net.totalplankton / net.smallplankton columns
(mL per net haul); the env side is the same bottle_measurement β bottle
β casts chain as the other bioβenv queries.
Replaces the retired
/zooplankton_biomass Plumber endpoint. Mirrors
calcofi4r::cc_match_zooplankton_biomass().
Browse Β· cruises
One row per CalCOFI cruise, filtered by the cast datetime_utc falling in
your date range. Columns: cruise_key (YYYY-MM-NODC natural key β see
Database), date span, ship name, number
of casts.
cruise.parquet itself only has 691 rows; the count(*) lookup against
casts.parquet makes the date-window filter meaningful and adds n_casts.
Browse Β· measurement types
The measurement_type registry β every variable that can appear as a
measurement_type value in bottle_measurement, ctd_thin, dic_*,
ichthyo etc. Each row has units and description. Use this list to
pick the right env_var for the bioβenv queries.
Browse Β· species
species.parquet is the projectβs taxonomic registry β 1150 rows β with
scientific_name, common_name, and the three taxonomic-authority IDs
(worms_id, itis_id, gbif_id). Substring filter on either name. Take
the worms_id of a hit and feed it to Bio β Env Matching β by taxon
to get the matching subtree.
Datasets Β· bottle measurements
bottle_measurement β bottle β casts for one measurement_type over a
depth + date window. Most CalCOFI oceanographic variables live here:
temperature, salinity, oxygen, nutrients, chlorophyll-a, sigma-theta,
dynamic height, pH, β¦
The env_var dropdown is populated on first focus from
measurement_type.parquet so it always reflects the release youβre
querying.
Datasets Β· ichthyoplankton
Single-table ichthyoplankton β net-tow counts joined to species, net,
tow, and site, no environmental match. std_tally is the standardized
catch-per-effort: std_haul_factor Γ tally / prop_sorted.
For the same observations matched to CTD-bottle measurements (one
env_value per row), see Bio β Env Matching β by name.
Quick facts Β· release facts
Headline counts and date ranges from the pinned release. A 9-row scorecard:
- cruises Β· casts Β· species Β· ichthyo_rows Β· bottle_measurements β table-level row counts.
- bio_date_start / bio_date_end β net-tow temporal coverage (
tow.time_start). - env_date_start / env_date_end β CTD-bottle env coverage (
casts.datetime_utc). The gap between this and the bio coverage is why the bioβenv matching examples use 2018 dates.
Each read_parquet() URL is bound to version β change it and the whole scorecard refreshes against that release. See available releases on GCS (one folder per release).
Spatial Β· casts in bbox
CTD casts whose decimal lon/lat fall inside the bounding box and whose
datetime_utc falls in the date range. Pasted straight from the CalCOFI
sampling grid: defaults (-125 to -117 lon, 30 to 38 lat) cover the full
historical pattern from Pt. Conception south to Baja and out to ~Sta. 60.
For richer spatial filtering (polygon, transect, distance to shore) use
the SQL shell with ST_Within / ST_Distance_Sphere.
SQL shell Β· shell
Free-form DuckDB SQL against the release Parquet β same engine as the other
queries, no form-driven template. The httpfs and spatial extensions are
already loaded.
Useful for:
- Ad-hoc queries the other forms donβt cover
- Trying out a SQL pattern before turning it into a permanent query (drop a
.mdfile in the right_queries/subfolder) - Joining tables that arenβt paired in the named queries β e.g.
taxa_rank,cast_condition,_spatial,dic_*
For arbitrary SQL with no UI at all, shell.duckdb.org is DuckDBβs official WASM shell. Same engine, no CalCOFI context.
Temporal Β· casts in date range
CTD casts whose datetime_utc falls in your date window β globally, no
spatial filter. Defaults to all of 2018. Pair this with the Browse β
cruises query to see one row per cruise instead of one per cast.