`stgam`

The introductory vignette (*‘Introduction to space-time GAMS with
stgam’*) demonstrated how to construct varying
coefficient models using GAMs with Gaussian Process smooths (splines).
The SVC, TVC and STVC models all had a similar form, with each covariate
specified as a fixed parametric term and in space, time or space-time GP
smooth. By way of example the SVC model is repeated below, the the TVC
and STVC had a specified the covariates and GPs in a similar way.

```
stvc.gam = gam(privC ~ 0 +
Intercept + s(X, Y, year, bs = 'gp', by = Intercept) +
unemp + s(X, Y, year, bs = "gp", by = unemp) +
pubC + s(X, Y, year, bs = "gp", by = pubC),
data = productivity)
```

The `stvc.gam`

model above is specified in a way that
assumes the presence of some spatio-temporal dependencies (or
interactions) between the target and the predictor variables. However
this assumption may be incorrect and the model may be incorrectly
specified.

This vignette

- Reflects on the assumptions associated with different model forms or specifications.
- Constructs different varying coefficient models and evaluates them using BIC.
- Constructs probabilities to determine the best model or best set of competing models.
- Undertakes Bayesian Model Averaging to combine the highly probable models.

The code below loads the packages and data:

```
# load the packages
library(stgam)
library(cols4all) # for nice shading in graphs and maps
library(cowplot) # for managing plots
library(dplyr) # for data manipulation
library(ggplot2) # for plotting and mapping
library(glue) # for model construction
library(mgcv) # for GAMs
library(sf) # for spatial data
library(doParallel) # for parallelising operations
library(purrr) # for model construction
library(tidyr) # for model construction
# load the data
data(productivity)
data(us_data)
```

How should you specify your varying coefficient model?

The STCV model specified above included space and time in a single
smooth (spline) for each the covariates. This is to assume that that
spatial and temporal processes *do* interact and that the
*temporal trends* in predictor-target variable relationships will
*vary with location*. But is this assumption correct? If you have
a specific a hypothesis that you are testing or working under a
particular theory related to how space and time interact in the process
you are examining, then you can simply specify how the covariates
interact with target variable. However, more commonly we are seeking
inference (understanding) about *how* processes interact in space
and time.

In this vignette, these interactions are explored using a data driven (rather than theoretical) approach. A series of different models are created, each with different assumptions, and they are evaluated to determine which one of them is, or which set of them are, the most probable.

Multiple models can be created to explore all possible combinations of interaction and then to select the best model, through a probability based measure like BIC (see Comber et al. (submitted) for a full treatment of this). There are a total 6 possible way that each covariate could be specified in the model:

- It is omitted.
- It is included as a parametric response with no spline.
- It is included in a spline with location.
- It is included a spline with time.
- It is included in a single spline with location and time.
- It is included in 2 separate splines with location and time.

The intercept can be treated similarly, but without it being absent (i.e. 5 options).

To investigate STVC model form, a series of models can be specified, with each combination of the 6 permutations for each covariate, plus 5 states for the intercept. The best model(s) can be determined by quantifying the likelihood (probability) of each of model being the correct model. This can be approximated using the Bayesian Information Criterion (BIC) (Schwarz 1978) as described in full in Comber et al. (submitted), and if the probabilities for multiple models are high, then the models can be combined using a Bayesian Model Averaging (BMA) approach. BMA is described in the context of spatial modelling in Fragoso, Bertoli, and Louzada (2018) and summarised in Brunsdon, Harris, and Comber (2023), but in brief, if a number of competing models exist with at least one quantity of interest that all have in common, and the likelihoods of each of them being the correct model is known (e.g. from BIC ), then a posterior distribution of the quantity of interest can be obtained by averaging them using the likelihoods as weights. In this way it allows competing models, treating space and time in different ways, to be combined.

To create STVC multiple models, the code below first defines a grid
of numbers for each covariate form. This is passed to a function to
create the formula specifying each model, with different terms and
smooths, which in turn is passed to the `gam`

function.

```
# define intercept term
productivity <- productivity |> mutate(Intercept = 1)
# define grid of combinations (nrow = 180)
terms_gr = expand.grid(intcp = 1:5, unemp = 1:6, pubC = 1:6)
# examine a random slice
terms_gr |> slice_sample(n = 6)
#> intcp unemp pubC
#> 1 1 3 3
#> 2 1 1 5
#> 3 3 5 6
#> 4 4 4 2
#> 5 5 5 3
#> 6 4 4 5
```

A function is defined to create the equations: here this is bespoke
to the covariate names and number in the `productvity`

data:

```
# define a function to make the equations
makeform_prod <- function(intcp, unemp, pubC, bs='gp') {
#coords <- c("X,Y", "X2,Y2")[coords]
intx <- c("",
glue("+s(year,bs='{bs}',by=Intercept)"),
glue("+s(X,Y,bs='{bs}',by=Intercept)"),
glue("+s(X,Y,bs='{bs}',by=Intercept) + s(year,bs='{bs}',by=Intercept)"),
glue("+s(X,Y,year,bs='{bs}',by=Intercept)"))[intcp]
unempx <- c("",
"+ unemp",
glue("+s(year,bs='{bs}',by=unemp)"),
glue("+s(X,Y,bs='{bs}',by=unemp)"),
glue("+s(X,Y,bs='{bs}',by=unemp) + s(year,bs='{bs}',by=unemp)"),
glue("+s(X,Y,year,bs='{bs}',by=unemp)"))[unemp]
pubCx <- c("",
"+ pubC",
glue("+s(year,bs='{bs}',by=pubC)"),
glue("+s(X,Y,bs='{bs}',by=pubC)"),
glue("+s(X,Y,bs='{bs}',by=pubC) + s(year,bs='{bs}',by=pubC)"),
glue("+s(X,Y,year,bs='{bs}',by=pubC)"))[pubC]
return(formula(glue("privC~Intercept-1{unempx}{intx}{pubCx}")))
}
```

To see how this works, the code below passes some numbers to it.

```
makeform_prod(intcp = 5, unemp = 2, pubC = 4, bs='gp')
#> privC ~ Intercept - 1 + unemp + s(X, Y, year, bs = "gp", by = Intercept) +
#> s(X, Y, bs = "gp", by = pubC)
#> <environment: 0x7faf97c6cc38>
```

Next a function to undertake the analysis and record the BIC, return
the indices and the formula is defined. Not that this has the
`terms_gr`

object defined above embedded in it, taking just
an index of the grid row number as input:

```
do_gam = function(i){
f <- makeform_prod(intcp = terms_gr$intcp[i],
unemp = terms_gr$unemp[i],
pubC = terms_gr$pubC[i],
bs='gp')
m = gam(f,data=productivity)
bic = BIC(m)
index = data.frame(intcp = terms_gr$intcp[i],
unemp = terms_gr$unemp[i],
pubC = terms_gr$pubC[i])
f = paste0('privC~', as.character(f)[3] )
return(data.frame(index, bic, f))
#return(bic)
}
```

This can be tested:

Finally, this can be put in a loop to evaluate all of the potential space-time models:

```
t1 = Sys.time()
res_gam <- NULL
for(i in 1:nrow(terms_gr)) {
res.i = do_gam(i)
res_gam = rbind(res_gam, res.i)
}
Sys.time() - t1
#> Time difference of 2.520804 mins
```

For more complex problems, you could parallelise the loop if you have
a large multivariate analyses. The output is the same as the
`for`

loop above, it is just created more quickly.

```
# set up the parallelisation
library(doParallel)
cl <- makeCluster(detectCores()-1)
registerDoParallel(cl)
# do the parallel loop
t1 = Sys.time()
res_gam <-
foreach(i = 1:nrow(terms_gr),
.combine = 'rbind',
.packages = c("glue", "mgcv", "purrr")) %dopar% {
do_gam(i)
}
Sys.time() - t1
# release the cores
stopCluster(cl)
# have a look
head(res_gam)
```

Having generated STVC multiple models and recorded the BIC values for them, it is possible to generate probabilities for each model and evaluate them. The logics and supporting equations for this evaluations of each mode are detailed in Comber et al. (submitted). The results need to be sorted, the best 10 models identified, their structures extracted and then their relative probabilities calculated:

```
# sort the results
mod_comp <- tibble(
res_gam) |>
rename(BIC = bic) |>
arrange(BIC)
# transpose the indices to to model terms
# rank and return the top 10 results
int_terms <- \(x) c("Fixed","s_T", "s_S", "s_T + S_S", "s_ST")[x]
var_terms <- \(x) c("---", "Fixed","s_T", "s_S", "s_T + s_S", "s_ST")[x]
mod_comp_tab <-
mod_comp |>
slice_head(n = 10) |>
mutate(across(unemp:pubC,var_terms)) |>
mutate(intcp = int_terms(intcp)) |>
rename(`Intercept` = intcp,
`Unemployment.` = unemp,
`Public Captial` = pubC) |>
mutate(Rank = 1:n()) |>
relocate(Rank) |>
select(-f)
# determine the relative probabilities
# ie relative to the top ranked model
p1_vec = NULL
for(i in 2:10) {
p1 = exp(-(mod_comp_tab$BIC[i]-mod_comp_tab$BIC[1])/2)
p1 = p1/(1+p1)
p1_vec = c(p1_vec, p1)
}
mod_comp_tab$`Pr(M)` = c("--", paste0(format(round(p1_vec*100, digits=1), nsmall = 1), "%"))
```

The results can be examined:

```
mod_comp_tab
#> # A tibble: 10 × 6
#> Rank Intercept Unemployment. `Public Captial` BIC `Pr(M)`
#> <int> <chr> <chr> <chr> <dbl> <chr>
#> 1 1 s_ST --- s_ST 17821. "--"
#> 2 2 s_T + S_S --- s_ST 17821. "50.0%"
#> 3 3 s_S --- s_ST 17821. "50.0%"
#> 4 4 s_ST Fixed s_ST 17821. "50.0%"
#> 5 5 s_T + S_S Fixed s_ST 17821. "50.0%"
#> 6 6 s_S Fixed s_ST 17821. "50.0%"
#> 7 7 s_ST s_T s_ST 17821. "49.9%"
#> 8 8 s_T + S_S s_T s_ST 17821. "49.4%"
#> 9 9 s_S s_T s_ST 17821. "49.4%"
#> 10 10 s_T s_T + s_S s_ST 17831. " 0.4%"
```

The results are sorted and suggest that nine of the models are highly probable, each with probabilities of better than the best ranked model of >10%. These are candidates for Bayesian Model Averaging. The are some commonalities in the specification of the 9 models:

- It does not matter whether the Intercept varies over space or time or together either interdependently or simultaneously;
- Unemployment can be absent, fixed or in a temporal smooth.
- Public Capital interacts simultaneously over space and time.

`stgam`

functionsThe process above had a number of stages: - a grid was defined with
indices for each variable defining how it is specified in each model -
this was used to create formula specifying each variable in different
ways - each model was constructed and the BIC calculated using a
`for`

loop or a parallelised approach - the probability for
each model was determined and the top 10 models returned

The `stgam`

package has generic function that wrap these
operations and the code below applies them to the STVC problem. The
`evaluate_models`

function creates and evaluates the
different models (it may take a minute or so to run):

```
stvc_res_gam = evaluate_models(input_data = productivity,
target_var = "privC",
covariates = c("unemp", "pubC"),
coords_x = "X",
coords_y = "Y",
STVC = TRUE,
time_var = "year")
```

The results can be compared with the `res_gam`

object
created earlier:

The model probabilities can be extracted using the
`gam_model_probs`

function, here again suggesting that nine
space time models are equally as probable:

```
stvc_mods = gam_model_probs(stvc_res_gam, n = 10)
stvc_mods
#> # A tibble: 10 × 7
#> Rank Intercept unemp pubC BIC f `Pr(M)`
#> <int> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 1 s_ST s_T s_ST 17821. "privC ~ Intercept - 1 + s(X, Y, year, … --
#> 2 2 s_T + S_S s_T s_ST 17821. "privC ~ Intercept - 1 + s(X, Y, bs = \… 0.500
#> 3 3 s_S s_T s_ST 17821. "privC ~ Intercept - 1 + s(X, Y, bs = \… 0.500
#> 4 4 s_ST --- s_ST 17821. "privC ~ Intercept - 1 + s(X, Y, year, … 0.500
#> 5 5 s_T + S_S --- s_ST 17821. "privC ~ Intercept - 1 + s(X, Y, bs = \… 0.500
#> 6 6 s_S --- s_ST 17821. "privC ~ Intercept - 1 + s(X, Y, bs = \… 0.500
#> 7 7 s_ST Fixed s_ST 17821. "privC ~ Intercept - 1 + s(X, Y, year, … 0.500
#> 8 8 s_T + S_S Fixed s_ST 17821. "privC ~ Intercept - 1 + s(X, Y, bs = \… 0.500
#> 9 9 s_S Fixed s_ST 17821. "privC ~ Intercept - 1 + s(X, Y, bs = \… 0.500
#> 10 10 s_T s_T + s_S s_ST 17831. "privC ~ Intercept - 1 + s(year, bs = \… 0.004
```

Of particular interest are the forms of the covariates in in each high ranking model:

- The Intercept is variously in a spatial, temporal, spatial plus temporal or space-time smooth.
- Unemployment is either absent, fixed or in a temporal smooth.
- Public capital (
`pubC`

) is always specified with a space-time smooth.

We should expect to see these trends when the models are averaged.
Note, also the function has here calculated the *relative
probabilities* for the STVC models because the individual BIC values
resulted in probabilities that were too close to zero to be machine
encodable.

The functions can be applied to a SVC problem:

```
svc_res_gam = evaluate_models(input_data = productivity |> filter(year == "1970"),
target_var = "privC",
covariates = c("unemp", "pubC"),
coords_x = "X",
coords_y = "Y",
STVC = FALSE,
time_var = NULL)
# head(svc_res_gam)
svc_mods = gam_model_probs(svc_res_gam, n = 10)
svc_mods
#> # A tibble: 10 × 7
#> Rank Intercept unemp pubC BIC f `Pr(M|D)`
#> <int> <chr> <chr> <chr> <dbl> <chr> <dbl>
#> 1 1 Fixed Fixed s_S 1060. "privC ~ Intercept - 1 + unemp + s(X, Y, b… 0.314
#> 2 2 Fixed --- s_S 1060. "privC ~ Intercept - 1 + s(X, Y, bs = \"gp… 0.314
#> 3 3 s_S Fixed s_S 1062. "privC ~ Intercept - 1 + s(X, Y, bs = \"gp… 0.125
#> 4 4 s_S --- s_S 1062. "privC ~ Intercept - 1 + s(X, Y, bs = \"gp… 0.124
#> 5 5 Fixed s_S s_S 1062. "privC ~ Intercept - 1 + s(X, Y, bs = \"gp… 0.12
#> 6 6 s_S s_S s_S 1069. "privC ~ Intercept - 1 + s(X, Y, bs = \"gp… 0.004
#> 7 7 s_S --- Fixed 1108. "privC ~ Intercept - 1 + s(X, Y, bs = \"gp… 0
#> 8 8 Fixed --- Fixed 1109. "privC ~ Intercept - 1 + pubC" 0
#> 9 9 Fixed s_S Fixed 1110. "privC ~ Intercept - 1 + s(X, Y, bs = \"gp… 0
#> 10 10 s_S Fixed Fixed 1111. "privC ~ Intercept - 1 + s(X, Y, bs = \"gp… 0
```

For the SVC models, the results indicate that 5 models are highly
likely (greater than 10% probability) to be the best model, all
specifying a Public capital (`pubC`

) smooth that varies
locally over space, with Unemployment (`unemp`

) either
removed or globally constant (fixed), and the Intercept either fixed or
varying with location. Interesting the SVC model in which all three
covariates are specified with a spatial GP smooth (the SVC created in
the first vignette) is the 6th ranked model and with a probability of
less than 1/1000 of being the correct model.

If only one model was highly probable, then this could be specified. The code below does this for the top SVC model:

For both the SVC and STVC cases, a number of models were highly
probable. It is possible to combine these models using the probabilities
as weights combine (average) the coefficient estimates, under a Bayesian
Model Averaging approach (see Comber et al.
(submitted) for details). The code below applies the
`do_bma`

function to the summary tables generated above. The
code can be used to construct the BMA coefficients from absolute or
relative probabilities.

```
# SVC with absolute probabilities
svc_bma <- do_bma(model_table = svc_mods,
terms = c("Intercept", "unemp", "pubC"),
thresh = 0.1,
relative = FALSE,
input_data = productivity |> filter(year == "1970"))
# STVC with relative probabilities
stvc_bma <- do_bma(model_table = stvc_mods,
terms = c("Intercept", "unemp", "pubC"),
thresh = 0.1,
relative = TRUE,
input_data = productivity)
```

The results can be joined back to the spatial layer, in this case
`us_data`

to be mapped:

```
# join
svc_bma_sf <-
us_data |> select(GEOID) |>
left_join(productivity |>
filter(year == "1970") |> select(GEOID, year) |>
cbind(svc_bma)) |>
relocate(geometry, .after = last_col())
#> Joining with `by = join_by(GEOID)`
# map
tit =expression(paste(""*beta[`Public Capital`]*" "))
ggplot(data = svc_bma_sf, aes(fill=pubC)) +
geom_sf() +
scale_fill_continuous_c4a_div(palette="brewer.blues",name=tit) +
coord_sf() +
theme_void()
```

The variations in the averaged BMA STVC coefficient estimates can be
mapped in a similar way by linking to the `us_data`

spatial
layer, and here we see the nature of the temporal variation in the
Unemployment and spatio-temporal variation in Public capital covariates
as expected.

```
# link the data
stvc_bma_sf <-
us_data |> select(GEOID) |>
left_join(productivity |>
select(GEOID, year) |>
cbind(stvc_bma)) |>
relocate(geometry, .after = last_col())
# create the plots
tit =expression(paste(""*beta[`Unemployment`]*""))
p1 = stvc_bma_sf |>
ggplot() + geom_sf(aes(fill = unemp), col = NA) +
scale_fill_binned_c4a_seq(palette="scico.lajolla", name = tit) +
facet_wrap(~year) +
theme_bw() + xlab("") + ylab("") +
theme(
strip.background =element_rect(fill="white"),
strip.text = element_text(size = 8, margin = margin()),
legend.position = c(.7, .1),
legend.direction = "horizontal",
legend.key.width = unit(1.15, "cm"),
axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())
p2 = stvc_bma_sf |>
ggplot() + geom_sf(aes(fill = pubC), col = NA) +
scale_fill_binned_c4a_seq(palette="scico.lajolla", name = tit) +
facet_wrap(~year) +
theme_bw() + xlab("") + ylab("") +
theme(
strip.background =element_rect(fill="white"),
strip.text = element_text(size = 8, margin = margin()),
legend.position = c(.7, .1),
legend.direction = "horizontal",
legend.key.width = unit(1.15, "cm"),
axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())
plot_grid(p1, p2, nrow = 2)
```

The key and substantive methodological point in this vignette is the need to consider the nature of the spatial and temporal interactions (dependencies) between the target and predictor variables.

Model form (and thus the nature of the space-time process) should not
be assumed and hence the provision of functions to create , evaluate and
rank multiple models in the `stgam`

package. Most approaches
to SVC and STVC modelling implicitly assume specific spatial and
space-time dependencies.

The approach presented din this vignettes represents a fundamental
difference in philosophy to (spatial and) space-time modelling. The
method is essentially about model comparison to *test* for space
time dependency.

Brunsdon, Chris, Paul Harris, and Alexis Comber. 2023. *Smarter Than
Your Average Model? Bayesian Model Averaging as a Spatial Analysis
Tool*. 12th International Conference on Geographic Information
Science.

Comber, Alexis, Paul Harris, Naru Tsutsumida, Jennie Gray, and Chris
Brunsdon. submitted. “Where, When and How? Specifying Spatially
and Temporally Varying Coefficient Models Using Generalized Additive
Models with Gaussian Process Splines.” *International Journal
of Geographical Information Science*, submitted.

Fragoso, Tiago M, Wesley Bertoli, and Francisco Louzada. 2018.
“Bayesian Model Averaging: A Systematic Review and Conceptual
Classification.” *International Statistical Review* 86
(1): 1–28.

Schwarz, Gideon. 1978. “Estimating the Dimension of a
Model.” *The Annals of Statistics*, 461–64.