rerddap
is a general purpose R client for working with
ERDDAP servers. ERDDAP is a server built on top of OPenDAP, which serves
some NOAA data. You can get gridded data (griddap),
which lets you query from gridded datasets, or table data (tabledap)
which lets you query from tabular datasets. In terms of how we interface
with them, there are similarties, but some differences too. We try to
make a similar interface to both data types in rerddap
.
rerddap
supports NetCDF format, and is the default when
using the griddap()
function. NetCDF is a binary file
format, and will have a much smaller footprint on your disk than csv.
The binary file format means it’s harder to inspect, but the
ncdf4
package makes it easy to pull data out and write data
back into a NetCDF file. Note the the file extension for NetCDF files is
.nc
. Whether you choose NetCDF or csv for small files won’t
make much of a difference, but will with large files.
Data files downloaded are cached in a single hidden directory
~/.rerddap
on your machine. It’s hidden so that you don’t
accidentally delete the data, but you can still easily delete the data
if you like.
When you use griddap()
or tabledap()
functions, we construct a MD5 hash from the base URL, and any query
parameters - this way each query is separately cached. Once we have the
hash, we look in ~/.rerddap
for a matching hash. If there’s
a match we use that file on disk - if no match, we make a http request
for the data to the ERDDAP server you specify.
You can get a data.frame of ERDDAP servers using the function
servers()
. The list of ERDDAP servers is drawn from the
Awesome ERDDAP page maintained by the Irish Marine Institute .
If you know of more ERDDAP servers, follow the instructions on that page
to add the server.
Stable version from CRAN
install.packages("rerddap")
Or, the development version from GitHub
::install_github("ropensci/rerddap") remotes
library("rerddap")
First, you likely want to search for data, specify either
griddadp
or tabledap
ed_search(query = 'size', which = "table")
#> # A tibble: 41 × 2
#> title datas…¹
#> <chr> <chr>
#> 1 CCE Prey Size and Hard Part Size Regressions mmtdPr…
#> 2 CCE Teleost Prey Size and Hard Part Size Measurements mmtdTe…
#> 3 CalCOFI Larvae Sizes erdCal…
#> 4 Seabird Prey Size cciea_…
#> 5 CCE Non-Teleost Prey Size and Hard Part Size Measurements mmtdNo…
#> 6 Channel Islands, Kelp Forest Monitoring, Size and Frequency, Natural… erdCin…
#> 7 File Names from the AWS S3 noaa-goes16 Bucket awsS3N…
#> 8 File Names from the AWS S3 noaa-goes17 Bucket awsS3N…
#> 9 PacIOOS Beach Camera 001: Waikiki, Oahu, Hawaii BEACHC…
#> 10 PacIOOS Beach Camera 003: Waimea Bay, Oahu, Hawaii BEACHC…
#> # … with 31 more rows, and abbreviated variable name ¹dataset_id
ed_search(query = 'size', which = "grid")
#> # A tibble: 54 × 2
#> title datas…¹
#> <chr> <chr>
#> 1 Audio data from a local source. testGr…
#> 2 Main Hawaiian Islands Multibeam Bathymetry Synthesis: 50-m Bathymetry hmrg_b…
#> 3 Main Hawaiian Islands Multibeam Bathymetry Synthesis: 50-m Bathymetr… hmrg_b…
#> 4 Coastal Upwelling Transport Index (CUTI), Daily erdCUT…
#> 5 SST smoothed frontal gradients FRD_SS…
#> 6 Coastal Upwelling Transport Index (CUTI), Monthly erdCUT…
#> 7 SST smoothed frontal gradients, Lon0360 FRD_SS…
#> 8 Biologically Effective Upwelling Transport Index (BEUTI), Daily erdBEU…
#> 9 Biologically Effective Upwelling Transport Index (BEUTI), Monthly erdBEU…
#> 10 monthly mean psi from the NCEP Reanalysis (psi.mon.ltm), 0001 noaa_p…
#> # … with 44 more rows, and abbreviated variable name ¹dataset_id
There is now a convenience function to search over a list of ERDDAP
servers, designed to work with the function servers()
global_search(query, server_list, which_service)
#> Error in check_arg(query, "character"): object 'query' not found
Then you can get information on a single dataset
info('erdCalCOFIlrvsiz')
#> <ERDDAP info> erdCalCOFIlrvsiz
#> Base URL: https://upwell.pfeg.noaa.gov/erddap
#> Dataset Type: tabledap
#> Variables:
#> calcofi_species_code:
#> Range: 19, 946
#> common_name:
#> cruise:
#> itis_tsn:
#> larvae_10m2:
...
First, get information on a dataset to see time range, lat/long range, and variables.
<- info('erdMBchla1day'))
(out #> <ERDDAP info> erdMBchla1day
#> Base URL: https://upwell.pfeg.noaa.gov/erddap
#> Dataset Type: griddap
#> Dimensions (range):
#> time: (2006-01-01T12:00:00Z, 2022-09-28T12:00:00Z)
#> altitude: (0.0, 0.0)
#> latitude: (-45.0, 65.0)
#> longitude: (120.0, 320.0)
#> Variables:
#> chlorophyll:
#> Units: mg m-3
Then query for gridded data using the griddap()
function
<- griddap(out,
(res time = c('2015-01-01','2015-01-03'),
latitude = c(14, 15),
longitude = c(125, 126)
))#> <ERDDAP griddap> erdMBchla1day
#> Path: [~/Library/Caches/R/rerddap/4d844aa48552049c3717ac94ced5f9b8.nc]
#> Last updated: [2022-09-30 09:34:02]
#> File size: [0.03 mb]
#> Dimensions (dims/vars): [4 X 1]
#> Dim names: time, altitude, latitude, longitude
#> Variable names: Chlorophyll Concentration in Sea Water
#> data.frame (rows/columns): [5043 X 5]
#> # A tibble: 5,043 × 5
#> longitude latitude altitude time chlorophyll
#> <dbl> <dbl> <dbl> <chr> <dbl>
#> 1 125 14 0 2015-01-01T12:00:00Z NA
#> 2 125. 14 0 2015-01-01T12:00:00Z NA
#> 3 125. 14 0 2015-01-01T12:00:00Z NA
#> 4 125. 14 0 2015-01-01T12:00:00Z NA
#> 5 125. 14 0 2015-01-01T12:00:00Z NA
#> 6 125. 14 0 2015-01-01T12:00:00Z NA
#> 7 125. 14 0 2015-01-01T12:00:00Z NA
#> 8 125. 14 0 2015-01-01T12:00:00Z NA
#> 9 125. 14 0 2015-01-01T12:00:00Z NA
#> 10 125. 14 0 2015-01-01T12:00:00Z NA
#> # … with 5,033 more rows
The output of griddap()
is a list that you can explore
further. Get the summary
$summary
res#> $filename
#> [1] "~/Library/Caches/R/rerddap/4d844aa48552049c3717ac94ced5f9b8.nc"
#>
#> $writable
#> [1] FALSE
#>
#> $id
#> [1] 65536
#>
#> $error
#> [1] FALSE
#>
#> $safemode
#> [1] FALSE
#>
...
Get the dimension variables
names(res$summary$dim)
#> [1] "time" "altitude" "latitude" "longitude"
Get the data.frame (beware: you may want to just look at the
head
of the data.frame if large)
head(res$data)
#> longitude latitude altitude time chlorophyll
#> 1 125.000 14 0 2015-01-01T12:00:00Z NA
#> 2 125.025 14 0 2015-01-01T12:00:00Z NA
#> 3 125.050 14 0 2015-01-01T12:00:00Z NA
#> 4 125.075 14 0 2015-01-01T12:00:00Z NA
#> 5 125.100 14 0 2015-01-01T12:00:00Z NA
#> 6 125.125 14 0 2015-01-01T12:00:00Z NA
<- info('erdCalCOFIlrvsiz'))
(out #> <ERDDAP info> erdCalCOFIlrvsiz
#> Base URL: https://upwell.pfeg.noaa.gov/erddap
#> Dataset Type: tabledap
#> Variables:
#> calcofi_species_code:
#> Range: 19, 946
#> common_name:
#> cruise:
#> itis_tsn:
#> larvae_10m2:
...
<- tabledap('erdCalCOFIlrvsiz', fields=c('latitude','longitude','larvae_size',
(dat 'scientific_name'), 'time>=2011-01-01', 'time<=2011-12-31'))
#> <ERDDAP tabledap> erdCalCOFIlrvsiz
#> Path: [~/Library/Caches/R/rerddap/db7389db5b5b0ed9c426d5c13bc43d18.csv]
#> Last updated: [2022-09-30 09:34:05]
#> File size: [0.05 mb]
#> # A tibble: 1,304 × 4
#> latitude longitude larvae_size scientific_name
#> <chr> <chr> <chr> <chr>
#> 1 32.956665 -117.305 4.5 Engraulis mordax
#> 2 32.91 -117.4 5.0 Merluccius productus
#> 3 32.511665 -118.21167 2.0 Merluccius productus
#> 4 32.511665 -118.21167 3.0 Merluccius productus
#> 5 32.511665 -118.21167 5.5 Merluccius productus
#> 6 32.511665 -118.21167 6.0 Merluccius productus
#> 7 32.511665 -118.21167 2.8 Merluccius productus
#> 8 32.511665 -118.21167 3.0 Sardinops sagax
#> 9 32.511665 -118.21167 5.0 Sardinops sagax
#> 10 32.511665 -118.21167 2.5 Engraulis mordax
#> # … with 1,294 more rows
Since both griddap()
and tabledap()
give
back data.frame’s, it’s easy to do downstream manipulation. For example,
we can use dplyr
to filter, summarize, group, and sort:
library("dplyr")
$larvae_size <- as.numeric(dat$larvae_size)
dat%>%
dat group_by(scientific_name) %>%
summarise(mean_size = mean(larvae_size)) %>%
arrange(desc(mean_size))
#> # A tibble: 7 × 2
#> scientific_name mean_size
#> <chr> <dbl>
#> 1 Anoplopoma fimbria 23.3
#> 2 Engraulis mordax 9.26
#> 3 Sardinops sagax 7.28
#> 4 Merluccius productus 5.48
#> 5 Tactostoma macropus 5
#> 6 Scomber japonicus 3.4
#> 7 Trachurus symmetricus 3.29