Ch. 1 Process

Image of Software Architecture
Figure 1. CalCOFI data workflow.

The original raw data, most often in tabular format [e.g., comma-separated value (*.csv)], gets ingested into the database by R scripts that use functions and lookup data tables in the R package calcofi4r where functions are organized into Read, Analyze and Visualize concepts. The application programming interface (API) provides a program-language-agnostic public interface for rendering subsets of data and custom visualizations given a set of documented input parameters for feeding interactive applications (Apps) using Shiny (or any other web application framework) and reports using Rmarkdown (or any other report templating framework). Finally, R scripts will publish metadata (as Ecological Metadata Language) and data packages (e.g., in Darwin format) for discovery on a variety of data portals oriented around slicing the tabular or gridded data (ERDDAP), biogeographic analysis (OBIS), long-term archive (DataOne, NCEI) or metadata discovery (InPort). The database will be spatially enabled by PostGIS for summarizing any and all data by Areas of Interest (AoIs), whether pre-defined (e.g., sanctuaries, MPAs, counties, etc.) or arbitrary new areas.

  • ERDDAP: great for gridded or tabular data, but does not aggregate on the server or clip to a specific area of interest