5 Database
5.1 Database naming conventions
We’re circling the wagons to come up with the best conventions for naming. Here are some ideas:
5.1.1 Name tables
- Table names are plural and use all lower case.
5.1.2 Name columns
To name columns, use snake-case (i.e., lower-case with underscores) so as to prevent the need to quote SQL statements. (TIP: Use
janitor::clean_names()
to convert a table.)Unique identifiers are suffixed with:
*_id
for unique integer keys;*_key
for unique string keys;*_seq
for auto-incrementing sequence integer keys.
Suffix with units where applicable (e.g.,
*_m
for meters,*_km
for kilometers,degc
for degrees Celsius). See units vignette.Set geometry column to
geom
(used by PostGIS spatial extension). If the table has multiple geometry columns, usegeom
for the default geometry column andgeom_{type}
for additional geometry columns (e.g.,geom_point
,geom_line
,geom_polygon
).
5.2 Use Unicode for text
The default character encoding for Postgresql is unicode (UTF8
), which allows for international characters, accents and special characters. Improper encoding can royally mess up basic text.
Logging into the server, we can see this with the following command:
docker exec -it postgis psql -l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
--------------------+-------+----------+------------+------------+-------------------
gis | admin | UTF8 | en_US.utf8 | en_US.utf8 | =Tc/admin +
| | | | | admin=CTc/admin +
| | | | | ro_user=c/admin
lter_core_metabase | admin | UTF8 | en_US.utf8 | en_US.utf8 | =Tc/admin +
| | | | | admin=CTc/admin +
| | | | | rw_user=c/admin
postgres | admin | UTF8 | en_US.utf8 | en_US.utf8 |
template0 | admin | UTF8 | en_US.utf8 | en_US.utf8 | =c/admin +
| | | | | admin=CTc/admin
template1 | admin | UTF8 | en_US.utf8 | en_US.utf8 | =c/admin +
| | | | | admin=CTc/admin
template_postgis | admin | UTF8 | en_US.utf8 | en_US.utf8 |
(6 rows)
Use Unicode (utf-8
in Python or UTF8
in Postgresql) encoding for all database text values to support international characters and documentation (i.e., tabs, etc for markdown conversion).
In Python, use
pandas
to read (read_csv()
) and write (to_csv()
) with UTF-8 encoding (i.e.,encoding='utf-8'
).:import pandas as pd from sqlalchemy import create_engine = create_engine('postgresql://user:password@localhost:5432/dbname') engine # read from a csv file = pd.read_csv('file.csv', encoding='utf-8') df # write to PostgreSQL 'table_name', engine, if_exists='replace', index=False, method='multi', chunksize=1000, encoding='utf-8') df.to_sql( # read from PostgreSQL = pd.read_sql('SELECT * FROM table_name', engine, encoding='utf-8') df # write to a csv file with UTF-8 encoding 'file.csv', index=False, encoding='utf-8') df.to_csv(
In R, use
readr
to read (read_csv()
) and write (write_excel_csv()
) to force UTF-8 encoding.library(readr) library(DBI) library(RPostgres) # connect to PostgreSQL <- dbConnect(RPostgres::Postgres(), dbname = "dbname", host = "localhost", port = 5432, user = "user", password = "password") con # read from a csv file <- read_csv('file.csv', locale = locale(encoding = 'UTF-8')) # explicit df <- read_csv('file.csv') # implicit df # write to PostgreSQL dbWriteTable(con, 'table_name', df, overwrite = TRUE) # read from PostgreSQL <- dbReadTable(con, 'table_name') df # write to a csv file with UTF-8 encoding write_excel_csv(df, 'file.csv', locale = locale(encoding = 'UTF-8')) # explicit write_excel_csv(df, 'file.csv') # implicit
5.3 Describe tables and columns
Use the
COMMENT
clause to add descriptions to tables and columns, either through the GUI pgadmin.calcofi.io (by right-clicking on the table or column and selectingProperties
) or with SQL. For example:COMMENT ON TABLE public.aoi_fed_sanctuaries IS 'areas of interest (`aoi`) polygons for federal **National Marine Sanctuaries**; loaded by _workflow_ [load_sanctuaries](https://calcofi.io/workflows/load_sanctuaries.html)';
Note the use of markdown for including links and formatting (e.g., bold, code, italics), such that the above SQL will render like so:
areas of interest (
aoi
) polygons for federal National Marine Sanctuaries; loaded by workflow load_sanctuariesIt is especially helpful to link to any workflows that are responsible for the ingesting or updating of the input data.
These descriptions can be viewed in the CalCOFI API api.calcofi.io as CSV tables (see code in calcofi/api:
plumber.R
):- api.calcofi.io
/db_tables
fields:
schema
: (only “public” so far)table_type
: “table”, “view”, or “materialized view” (none yet)table
: name of tabletable_description
: description of table (possibly in markdown)
- api.calcofi.io
/db_columns
fields:
schema
: (only “public” so far)table_type
: “table”, “view”, or “materialized view” (none yet)table
: name of tablecolumn
: name of columncolumn_type
: data type of columncolumn_description
: description of column (possibly in markdown)
- api.calcofi.io
Fetch and display these descriptions into an interactive table with
calcofi4r::
cc_db_catalog()
.
5.4 Relationships between tables
TODO:
add calcofi/apps: db to show latest tables, columns and relationsips
5.5 Spatial Tips
- Use
ST_Subdivide()
when running spatial joins on large polygons.