Gustave (Gustave: a User-oriented Statistical Toolkit for Analytical Variance Estimation) is an R package that provides a **toolkit for analytical variance estimation in survey sampling**. Apart from the implementation of standard variance estimators (Sen-Yates-Grundy, Deville-Tillé), its main feature is to help the methodologist produce easy-to-use variance estimation “wrappers”, where systematic operations (linearization, domain estimation) are handled in a consistent and transparent way for the end user.

As gustave is not yet available on CRAN, for now the simplest way to install it on your computer is to use `devtools::install_github()`

:

```
install.packages("devtools")
devtools::install_github("martinchevalier/gustave")
```

In this example, we define a variance estimation wrapper adapted to the example data inspired by the Information and communication technology (ICT) survey. The subset of the (simulated) ICT survey has the following features:

- stratified one-stage sampling design of 650 firms;
- 612 responding firms, non-response correction through reweighting in homogeneous response groups based on economic sub-sector and turnover;
- calibration on margins (number of firms and turnover broken down by economic sub-sector).

In this context, the variance estimation *function* can be defined as follows:

```
# Definition of the variance function
variance_function <- function(y){
# Calibration
y <- rescal(y, x = x)
# Non-response
y <- add0(y, rownames = ict_sample$firm_id)
var_nr <- var_pois(y, pik = ict_sample$response_prob_est, w = ict_sample$w_sample)
# Sampling
y <- y / ict_sample$response_prob_est
var_sampling <- var_srs(y, pik = 1 / ict_sample$w_sample, strata = ict_sample$division)
var_sampling + var_nr
}
# With x the calibration variables matrix
library(gustave)
x <- as.matrix(ict_survey[
order(ict_survey$firm_id),
c(paste0("N_", 58:63), paste0("turnover_", 58:63))
])
# Test of the variance function
y <- as.matrix(ict_survey$speed_quanti)
rownames(y) <- ict_survey$firm_id
variance_function(y)
```

The next step is the definition of a variance *wrapper*, which is easier to use than the variance function:

```
variance_wrapper <- define_variance_wrapper(
variance_function = variance_function,
reference_id = ict_survey$firm_id,
default = list(id = "firm_id", weight = "w_calib"),
objects_to_include = c("x", "ict_sample")
)
```

**Note** The objects `x`

and `ict_sample`

are embedded within the function `variance_wrapper()`

(`variance_wrapper`

is a closure)

```
# Better display of results
variance_wrapper(ict_survey, speed_quanti)
# Mean linearization
variance_wrapper(ict_survey, mean(speed_quanti))
# Ratio linearization
variance_wrapper(ict_survey, ratio(turnover, employees))
# Discretization of qualitative variables
variance_wrapper(ict_survey, speed_quali)
# On-the-fly recoding
variance_wrapper(ict_survey, speed_quali == "Between 2 and 10 Mbs")
# 1-domain estimation
variance_wrapper(ict_survey, speed_quanti, where = division == "58")
# Multiple domains estimation
variance_wrapper(ict_survey, speed_quanti, by = division)
# Multiple variables at a time
variance_wrapper(ict_survey, speed_quanti, big_data)
variance_wrapper(ict_survey, speed_quanti, mean(big_data))
# Flexible syntax for where and by arguments
# (similar to the aes() function in ggplot2)
variance_wrapper(ict_survey, where = division == "58",
mean(speed_quanti), mean(big_data * 100)
)
variance_wrapper(ict_survey, where = division == "58",
mean(speed_quanti), mean(big_data * 100, where = division == "61")
)
variance_wrapper(ict_survey, where = division == "58",
mean(speed_quanti), mean(big_data * 100, where = NULL)
)
```