portalr package started out as a series of scripts
used by Portal researchers to quickly and consistently summarize the Portal data. It has
developed from there into a formalized package, dealing with all the
quirky data manipulation under the hood. Now, the researchers who are
currently collecting data at Portal have started actively using the
package in their own work. Some examples of how we are using the package
are included below.
First, we need to install the
portalr package if we
haven’t done so already. We’ll also load
has many packages for data manipulation and plotting.
#devtools::install_github("weecology/portalr") library(portalr) library(dplyr) library(tidyr) library(ggplot2)
One thing we can do with the Portal rodent data is look at the ratio of biomass between the control plots and the kangaroo rat exclosure plots and how it has changed through time.
Because we are going to compare biomass between plot types, we need
to know the biomass on each plot. To achieve this, we can set
level = "plot"
<- portalr::summarize_rodent_data(path = "repo", biomass_data level = "plot", output = "biomass", time = "date")
Note that the
path argument in the
summarize_rodent_data function has been set to “repo.”
While you can choose the download all of the Portal data onto your local
computer and then load the data into R, you can also get the data
directly from the GitHub repository by setting
path = "repo" as we’ve done here.
The data structure will look like this, with columns for the date, treatment, plot number, and each species:
Let’s select only the rows we want:
Based on information about the plot treatment switches that can be found in this Readme file in the PortalData repo, we want to select censuses in the years 1988-2014; to make the data a bit easier to work with, we can also split the date column into three separate columns and then filter on the year column.
Since we are comparing control plots and exclosure plots, we can also filter for only those treatment types.
<- biomass_data %>% biomass_data # split the date column into year, month, and day separate(col = censusdate, into = c("year", "month", "day"), sep = "-") %>% filter(year >= 1988 & year < 2015, # filter by year == "control" | treatment == "exclosure") # filter by treatment type treatment
We can get the total biomass for each plot per census by summing the mass of all the species per row. From there, we will sum by year for each treatment type. Then, we can create the exclosure:control ratio.
# compute total biomass per year and treatment <- biomass_data %>% biomass_total gather(species, biomass, BA:SO) %>% group_by(year, treatment) %>% summarize(totals = sum(biomass, na.rm = TRUE)) # make a column with the exclosure:control ratio <- biomass_total %>% biomass_ratio spread(treatment, totals) %>% mutate(EX_to_CO_ratio = exclosure / control) %>% ungroup()
We can finally plot the data!
Before the mid-1990s, biomass on the kangaroo rat exclosures never
went above 50% of the biomass found in the control plots; the small
granivores just couldn’t keep up with the larger kangaroo rats. When a
larger pocket mouse, Chaetodipus baileyi, showed up in the
system (first gray bar), they were found primarily in the kangaroo rat
exclosures. This increased the biomass ratio to above 80% of that found
in the control plots. As C. baileyi left the system (second
gray bar), the ratio returned to similar levels as before their
While the rodent community data is the most frequently utilized data
from the Portal Project, we can also use
portalr to get
plant or ant data from the site. We’ve been running some multivariate
statistics on plant composition in the plots and looking for differences
between rodent treatment types. This is how we get the data we need to
Our site has two rainy periods and, therefore, two communities of
annual plants. Let’s say we want to take a look at the abundance of the
winter annuals in the system, and we want to see if they differ
by treatment type. We can use the
function to get the appropriate data.
<- portalr::summarize_plant_data(path = 'repo', level = 'plot', plant_data type = 'winter-annual', correct_sp = TRUE, unknowns = FALSE, shape = 'flat', output = 'abundance')
What do some of these arguments mean? As above,
path = 'repo' pulls the data directly from the online
level = 'plot' indicates that we want the data
to be summarized at the plot level rather than across the entire site,
type = 'winter-annual' will give us only annual species
that can be found in the winter months. Sometimes in the past, we have
misidentified a species of plant;
correct_sp = TRUE goes
through the data and corrects the species name to what we now know to be
the correct species. Other times, we just don’t know what a species is,
so it is unknown; if we don’t want those to be included, we use
unknown = FALSE. Finally, we want the abundance of each
species returned. Unlike with the rodent abundance, however, we’ve asked
for a flat shape, so the data structure will be in long format and look
Wait a second! Didn’t we ask for winter annuals? So why are the first few rows all from a summer plant census?
As it turns out, while we have two annual communities of plants, some of the annuals in our system can be found in both summer AND winter. Hypothetically, someone might want to know about any and all annual plants that could be found in the winter: how many of them show up in the summer, for example?
To get the data we want–just winter annuals found only in the winter season–we just need one more quick line of code.
<- dplyr::filter(plant_data, season == 'winter')plant_data_winter
That’s more like it! Now we have the data we want to run some multivariate statistics or whatever else you might want to do with the data.