CalCOFI Query

release v2026.05.20 Ready β€” pick a query and click Run.

CalCOFI Query

Pick a query on the left, fill the form, click Run. The SQL runs in your browser via DuckDB-WASM against the public CalCOFI release Parquet on Google Cloud Storage. No server, no credentials, no install.

What runs where

   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β‘  WASM bundle (cached after first load, ~5 MB)
   β”‚             β”‚ ─────────────────────────────►  jsDelivr CDN
   β”‚ Your browserβ”‚
   β”‚ (DuckDB-WASMβ”‚    β‘‘ Parquet range requests (no auth, CORS open)
   β”‚  in a workerβ”‚ ─────────────────────────────►  storage.googleapis.com
   β”‚   thread)   β”‚                                  ↳ /calcofi-db/ducklake/releases/…
   β”‚             β”‚ ◄─────────────────────────────
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β‘’ Arrow result rows

Nothing else. No CalCOFI server is involved β€” DuckDB-WASM does the SQL planning, the HTTP range requests, and the joins, all on your machine.

Speed expectations

Phase Time
First click β€” fetch & instantiate the DuckDB-WASM bundle from jsDelivr ~3–6 s
First query β€” fetch Parquet footers + first row groups over HTTPS ~10–20 s
Subsequent clicks β€” same query / same tables, warm caches sub-second for Browse / Quick facts
Bio ↔ Env Matching queries (complex CTEs, joins across 5 tables) ~5–15 s warm

Your browser caches the WASM bundle; DuckDB’s httpfs caches Parquet ranges in memory across queries. The bigger second-and-onwards cost is the query itself β€” wide scans of ctd_thin (5.5 M rows) or bottle_measurement (11 M rows) take a few seconds even when warm.

Caveats

  • Bundle weight β€” the WASM bundle is ~5 MB, downloaded once and cached. Don’t try this on cellular if you can help it.
  • Match windows β€” the Bio ↔ Env Matching queries default to relaxed (5 km / 72 hr). Tighter windows (2 km / 6 hr) return fewer rows; see Matching Helpers.
  • Env data ends 2021-05 β€” CTD-bottle environmental observations stop in May 2021, while net-tow biological data runs later. So bio↔env matches against dates after 2021-05 return zero rows. The recurring worked example uses Q1 2018 for this reason.
  • Reproducibility β€” every query takes a version parameter (default v2026.05.20). Pin the version explicitly (e.g. v2026.05.14) for archival reproducibility β€” every read_parquet() URL in the emitted SQL then carries that version.

The same query, everywhere

Where How
R on your laptop calcofi4r::cc_match_ichthyo_by_name(...) β€” emits & runs the same SQL; attr(d, "sql") hands it back
Python on a notebook server duckdb.connect().sql(open("query.sql").read()).df()
shell on the command line duckdb < query.sql
your web browser, no install This page. Run any query, click Copy SQL, paste it anywhere

The SQL is byte-identical across all four β€” verified in CalCOFI/docs and the bio-env-matching vignette.

Where to read more