Data
Goal: Read in the “CalCOFI NOAA Fish Larvae Sizes” dataset from ERDDAP with the following:
- erddap: coastwatch.pfeg.noaa.gov/erddap/tabledap/erdCalCOFIlrvsiz.html
ERDDAP - CalCOFI NOAA Fish Larvae Sizes - Data Access Form
{provider}
: coastwatch.pfeg.noaa.gov
{dataset}
: erdCalCOFIlrvsiz
- workflow:
ingest_{provider}_{dataset}.qmd
- data: Google Drive
calcofi/data/{provider}/{dataset}
- data:
{dataset}.csv
- metadata:
{dataset}_info.csv
- database definitions:
ingest/{provider}/{dataset}
- tables:
tbls_redefine.csv
- columns:
flds_redefine.csv
Code
librarian::shelf(
dplyr, DT, glue, here, readr, rerddap, stringr, tibble)
# variables
dir_data <- "/Users/bbest/Library/CloudStorage/GoogleDrive-ben@ecoquants.com/My Drive/projects/calcofi/data"
dir_provider <- "coastwatch.pfeg.noaa.gov"
ds_url <- "https://coastwatch.pfeg.noaa.gov/erddap/tabledap/erdCalCOFIlrvsiz.html"
# extract ERDDAP url and dataset ID from the dataset URL
ed_pattern <- "(https://.*)/tabledap/([A-Za-z0-9]+)\\.html"
ed_url <- str_replace(ds_url, ed_pattern, "\\1") # "erdCalCOFIlrvsiz"
ed_id <- str_replace(ds_url, ed_pattern, "\\2") # "https://coastwatch.pfeg.noaa.gov/erddap"
dir_dataset <- glue("{dir_data}/{dir_provider}")
d_csv <- glue("{dir_dataset}/{ed_id}.csv")
m_csv <- glue("{dir_dataset}/{ed_id}_info.csv")
if (!dir.exists(dir_dataset))
dir.create(dir_dataset)
ed_info <- info(ed_id, url = ed_url)
if (!file.exists(d_csv)){
d <- tabledap(ed_info)
write_csv(d, d_csv)
} else {
d <- read_csv(d_csv)
}
dim(d)
Code
TODO: load into database
See: