Skip to contents

Reads CSV files from a directory and prepares them for ingestion into a database. This function is the primary entry point for the CalCOFI data ingestion workflow. It performs the following steps:

Usage

read_csv_files(
  provider,
  dataset,
  dir_data = "~/My Drive/projects/calcofi/data",
  url_gdata =
    "https://drive.google.com/drive/u/0/folders/1xxdWa4mWkmfkJUQsHxERTp9eBBXBMbV7",
  use_gdrive = TRUE,
  email = "ben@ecoquants.com"
)

Arguments

provider

Data provider (e.g., "swfsc.noaa.gov")

dataset

Dataset name (e.g., "calcofi-db")

dir_data

directory path of CalCOFI base data folder available locally, with CSVs under provider/dataset directory. Default: "~/My Drive/projects/calcofi/data"

url_gdata

URL of CalCOFI base data folder in Google Drive (with CSVs under provider/dataset directory) with metadata information on CSVs. Default: data - Google Drive

use_gdrive

Whether to query Google Drive for metadata. Default: TRUE

email

Google Drive authentication email (if use_gdrive=TRUE). Default: "ben@ecoquants.com"

Value

A list containing:

d_csv

List with CSV data including: - data: tibble with columns (tbl, csv, data, nrow, ncol, flds) - tables: summary of tables (tbl, nrow, ncol) - fields: summary of fields (tbl, fld, type)

d_gdata

Google Drive metadata (if use_gdrive=TRUE) including file names, IDs, modification times, and web links

d_tbls_rd

Table redefinition data frame with columns: tbl_old, tbl_new, tbl_description

d_flds_rd

Field redefinition data frame with columns: tbl_old, tbl_new, fld_old, fld_new, order_old, order_new, type_old, type_new, fld_description, notes, mutation

workflow_info

Information about the workflow including workflow name, QMD file path, and URL

paths

List of file paths used in the workflow

Details

  1. Reads all CSV files from the specified provider/dataset directory

  2. Extracts metadata about tables and fields from the CSV files

  3. Creates or reads redefinition files for table and field transformations

  4. Optionally queries Google Drive for file metadata (creation dates, etc.)

The function returns a comprehensive data structure containing:

  • Raw CSV data and metadata (d_csv)

  • Table redefinitions (d_tbls_rd) for renaming/describing tables

  • Field redefinitions (d_flds_rd) for renaming/typing/transforming fields

  • Google Drive metadata if requested (d_gdata)

  • Workflow information and file paths

Examples

if (FALSE) { # \dontrun{
# Basic usage
d <- read_csv_files(
  provider = "swfsc.noaa.gov",
  dataset  = "calcofi-db")

# Access the raw CSV data
d$d_csv$data

# Check table redefinitions
d$d_tbls_rd

# Check field redefinitions
d$d_flds_rd

# Without Google Drive metadata
d <- read_csv_files(
  provider = "swfsc.noaa.gov",
  dataset  = "calcofi-db",
  use_gdrive = FALSE)
} # }