CalCOFI NOAA Fish Larvae Sizes on ERDDAP

Published

2025-03-13

1 Data

Goal: Read in the “CalCOFI NOAA Fish Larvae Sizes” dataset from ERDDAP with the following:

  • erddap: coastwatch.pfeg.noaa.gov/erddap/tabledap/erdCalCOFIlrvsiz.html
    ERDDAP - CalCOFI NOAA Fish Larvae Sizes - Data Access Form
    • {provider}: coastwatch.pfeg.noaa.gov
    • {dataset}: erdCalCOFIlrvsiz
  • workflow: ingest_{provider}_{dataset}.qmd
  • data: Google Drive calcofi/data/{provider}/{dataset}
    • data: {dataset}.csv
    • metadata: {dataset}_info.csv
  • database definitions: ingest/{provider}/{dataset}
    • tables: tbls_redefine.csv
    • columns: flds_redefine.csv
Code
librarian::shelf(
  dplyr, DT, glue, here, readr, rerddap, stringr, tibble)

# variables
dir_data     <- "/Users/bbest/Library/CloudStorage/GoogleDrive-ben@ecoquants.com/My Drive/projects/calcofi/data" 
dir_provider <- "coastwatch.pfeg.noaa.gov"
ds_url       <- "https://coastwatch.pfeg.noaa.gov/erddap/tabledap/erdCalCOFIlrvsiz.html"

# extract ERDDAP url and dataset ID from the dataset URL
ed_pattern  <- "(https://.*)/tabledap/([A-Za-z0-9]+)\\.html"
ed_url      <- str_replace(ds_url, ed_pattern, "\\1") # "erdCalCOFIlrvsiz"
ed_id       <- str_replace(ds_url, ed_pattern, "\\2") # "https://coastwatch.pfeg.noaa.gov/erddap"
dir_dataset <- glue("{dir_data}/{dir_provider}")
d_csv       <- glue("{dir_dataset}/{ed_id}.csv")
m_csv       <- glue("{dir_dataset}/{ed_id}_info.csv")

if (!dir.exists(dir_dataset))
  dir.create(dir_dataset)

ed_info <- info(ed_id, url = ed_url)

if (!file.exists(d_csv)){
  d <- tabledap(ed_info)
  write_csv(d, d_csv)
} else {
  d <- read_csv(d_csv)
}
dim(d)
[1] 242645     23
Code
head(d) |> 
  datatable()

2 Metadata

Code
if (!file.exists(m_csv)){
  d_m <- ed_info$alldata |> 
    bind_rows() |> 
    tibble()
  write_csv(d_m, m_csv)
} else {
  d_m <- read_csv(m_csv)
}

d_m |> 
  datatable()

3 Google Drive

Output CSVs in Google Drive:

calcofi / data / coastwatch.pfeg.noaa.gov /

4 TODO: load into database

See: