Introduction

First, we apply labels on the well-known mtcars dataset:

library(expss)
data(mtcars)
mtcars = apply_labels(mtcars,
mpg = "Miles/(US) gallon",
cyl = "Number of cylinders",
disp = "Displacement (cu.in.)",
hp = "Gross horsepower",
drat = "Rear axle ratio",
wt = "Weight (1000 lbs)",
qsec = "1/4 mile time",
vs = "Engine",
vs = c("V-engine" = 0,
"Straight engine" = 1),
am = "Transmission",
am = c("Automatic" = 0,
"Manual"=1),
gear = "Number of forward gears",
carb = "Number of carburetors"
)

Table construction consists of at least of three functions chained with magrittr pipe operator: %>%. At first we need to specify variables for which statistics will be computed with tab_cells. Secondary, we calculate statistics with one of tab_stat_* functions. And last, we finalize table creation with tab_pivot: dataset %>% tab_cells(variable) %>% tab_stat_cases() %>% tab_pivot(). We can split our statistics by columns with tab_cols or by rows with tab_rows. After that we can sort table with tab_sort_asc, drop empty rows/columns with drop_rc and transpose with tab_transpose. Generally, resulting table is just a data.frame so we can use arbitrary operations on it. Statistic is always calculated on the last cells, column/row variables, weight, missing values and subgroup. To define new cell/column/row variables we can call appropriate function one more time. tab_pivot defines how we combine different statistics and where statistic labels will appear - inside/outside rows/columns.

Simple column percent

mtcars %>%
tab_cells(cyl) %>%
tab_cols(vs) %>%
tab_stat_cpct() %>%
tab_pivot() %>%
tab_caption("Simple column percent")
  Engine Simple column percent Number of cylinders 4 5.6 71.4 6 16.7 28.6 8 77.8 #Total cases 18 14

Split by columns and rows

mtcars %>%
tab_cells(cyl) %>%
tab_cols(vs) %>%
tab_rows(am) %>%
tab_stat_cpct() %>%
tab_pivot() %>%
tab_caption("Split by columns and rows")
    Engine Split by columns and rows Transmission Automatic Number of cylinders 4 42.9 6 57.1 8 100.0 #Total cases 12 7 Manual Number of cylinders 4 16.7 100.0 6 50.0 8 33.3 #Total cases 6 7

Multiple banners, table is sorted by total

mtcars %>%
tab_cells(cyl) %>%
tab_cols(total(), vs, am) %>%
tab_stat_cpct() %>%
tab_pivot() %>%
tab_sort_desc() %>%
tab_caption("Multiple banners, table is sorted by total")
  #Total     Engine     Transmission Multiple banners, table is sorted by total Number of cylinders 8 43.8 77.8 63.2 15.4 4 34.4 5.6 71.4 15.8 61.5 6 21.9 16.7 28.6 21.1 23.1 #Total cases 32 18 14 19 13

Nested banners

mtcars %>%
tab_cells(cyl) %>%
tab_cols(total(), vs %nest% am) %>%
tab_stat_cpct() %>%
tab_pivot() %>%
tab_caption("Nested banners")
  #Total     Engine Nested banners Number of cylinders 4 34.4 16.7 42.9 100 6 21.9 50.0 57.1 8 43.8 100 33.3 #Total cases 32 12 6 7 7

Multiple nested banners

mtcars %>%
tab_cells(carb) %>%
tab_cols(total(), list(cyl, vs) %nest% am) %>%
tab_stat_cpct() %>%
tab_pivot() %>%
tab_caption("Multiple nested banners")
  #Total     Number of cylinders     Engine     4     6 Multiple nested banners Number of carburetors 1 21.9 33.3 50 50 42.9 57.1 2 31.2 66.7 50 33.3 33.3 16.7 28.6 42.9 3 9.4 25.0 25.0 4 31.2 50 66.7 41.7 50 41.7 50.0 28.6 6 3.1 33.3 16.7 8 3.1 50 16.7 #Total cases 32 3 8 4 3 12 2 12 6 7 7

Multiple variable and multiple summary statistics

mtcars %>%
tab_cells(mpg, disp, hp, wt, qsec) %>%
tab_cols(total(), am) %>%
tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, "Valid N" = w_n) %>%
tab_pivot() %>%
tab_caption("Multiple variable and multiple summary statistics")
  #Total     Transmission Multiple variable and multiple summary statistics Miles/(US) gallon Mean 20.1 17.1 24.4 Std. dev. 6.0 3.8 6.2 Valid N 32.0 19.0 13.0 Displacement (cu.in.) Mean 230.7 290.4 143.5 Std. dev. 123.9 110.2 87.2 Valid N 32.0 19.0 13.0 Gross horsepower Mean 146.7 160.3 126.8 Std. dev. 68.6 53.9 84.1 Valid N 32.0 19.0 13.0 Weight (1000 lbs) Mean 3.2 3.8 2.4 Std. dev. 1.0 0.8 0.6 Valid N 32.0 19.0 13.0 1/4 mile time Mean 17.8 18.2 17.4 Std. dev. 1.8 1.8 1.8 Valid N 32.0 19.0 13.0

Multiple variable and multiple summary statistics - statistic lables in columns

mtcars %>%
tab_cells(mpg, disp, hp, wt, qsec) %>%
tab_cols(total(), am) %>%
tab_stat_fun(Mean = w_mean, "Valid N" = w_n, method = list) %>%
tab_pivot() %>%
tab_caption("Multiple variable and multiple summary statistics - statistic lables in columns")
  #Total     Transmission   Mean Multiple variable and multiple summary statistics - statistic lables in columns Miles/(US) gallon 20.1 32 17.1 19 24.4 13 Displacement (cu.in.) 230.7 32 290.4 19 143.5 13 Gross horsepower 146.7 32 160.3 19 126.8 13 Weight (1000 lbs) 3.2 32 3.8 19 2.4 13 1/4 mile time 17.8 32 18.2 19 17.4 13

Filter dataset and exclude empty columns

mtcars %>%
tab_subgroup(am == 0) %>%
tab_cells(cyl) %>%
tab_cols(total(), vs %nest% am) %>%
tab_stat_cpct() %>%
tab_pivot() %>%
drop_empty_columns() %>%
tab_caption("Filter dataset and exclude empty columns")
  #Total     Engine Filter dataset and exclude empty columns Number of cylinders 4 15.8 42.9 6 21.1 57.1 8 63.2 100 #Total cases 19 12 7

Total at the top of the table

mtcars %>%
tab_cells(cyl) %>%
tab_cols(total(), vs) %>%
tab_rows(am) %>%
tab_stat_cpct(total_row_position = "above",
total_label = c("number of cases", "row %"),
total_statistic = c("u_cases", "u_rpct")) %>%
tab_pivot() %>%
tab_caption("Total at the top of the table")
    #Total     Engine Total at the top of the table Transmission Automatic Number of cylinders #number of cases 19 12 7 #row % 100 63.2 36.8 4 15.8 42.9 6 21.1 57.1 8 63.2 100.0 Manual Number of cylinders #number of cases 13 6 7 #row % 100 46.2 53.8 4 61.5 16.7 100.0 6 23.1 50.0 8 15.4 33.3

Three different statistics in each cell - stat. labels in rows

mtcars %>%
tab_cells(am) %>%
tab_cols(total(), vs) %>%
tab_total_row_position("none") %>%
tab_stat_cpct(label = "col %") %>%
tab_stat_rpct(label = "row %") %>%
tab_stat_tpct(label = "table %") %>%
tab_pivot(stat_position = "inside_rows") %>%
tab_caption("Three different statistics in each cell - stat. labels in rows")
    #Total Three different statistics in each cell - stat. labels in rows Transmission Automatic col % 59.4 66.7 50.0 row % 100.0 63.2 36.8 table % 59.4 37.5 21.9 Manual col % 40.6 33.3 50.0 row % 100.0 46.2 53.8 table % 40.6 18.8 21.9

Three different statistics in each cell - stat. labels in columns

mtcars %>%
tab_cells(am) %>%
tab_cols(total(), vs) %>%
tab_total_row_position("none") %>%
tab_stat_cpct(label = "col %") %>%
tab_stat_rpct(label = "row %") %>%
tab_stat_tpct(label = "table %") %>%
tab_pivot(stat_position = "inside_columns") %>%
tab_caption("Three different statistics in each cell - stat. labels in columns")
  #Total     Engine   col %     row %     table % Three different statistics in each cell - stat. labels in columns Transmission Automatic 59.4 100 59.4 66.7 63.2 37.5 50 36.8 21.9 Manual 40.6 100 40.6 33.3 46.2 18.8 50 53.8 21.9

Stacked statistics

mtcars %>%
tab_cells(cyl) %>%
tab_cols(total(), am) %>%
tab_stat_mean() %>%
tab_stat_se() %>%
tab_stat_valid_n() %>%
tab_stat_cpct() %>%
tab_pivot() %>%
tab_caption("Stacked statistics")
  #Total     Transmission Stacked statistics Number of cylinders Mean 6.2 6.9 5.1 S. E. 0.3 0.4 0.4 Valid N 32.0 19.0 13.0 4 34.4 15.8 61.5 6 21.9 21.1 23.1 8 43.8 63.2 15.4 #Total cases 32 19 13

mtcars %>%
tab_cells(cyl) %>%
tab_cols(total(), am) %>%
tab_row_label("#Summary statistics") %>%
tab_stat_mean() %>%
tab_stat_se() %>%
tab_stat_valid_n() %>%
tab_row_label("#Column percent") %>%
tab_stat_cpct() %>%
tab_pivot() %>%
tab_caption("Stacked statistics with section headings")
  #Total     Transmission Stacked statistics with section headings #Summary statistics Number of cylinders Mean 6.2 6.9 5.1 S. E. 0.3 0.4 0.4 Valid N 32.0 19.0 13.0 #Column percent Number of cylinders 4 34.4 15.8 61.5 6 21.9 21.1 23.1 8 43.8 63.2 15.4 #Total cases 32 19 13

Stacked statistics - different statistics for different variables

mtcars %>%
tab_cols(total(), am) %>%
tab_cells(mpg, hp, qsec) %>%
tab_stat_mean() %>%
tab_cells(cyl, carb) %>%
tab_stat_cpct() %>%
tab_pivot() %>%
tab_caption("Stacked statistics - different statistics for different variables")
  #Total     Transmission Stacked statistics - different statistics for different variables Miles/(US) gallon Mean 20.1 17.1 24.4 Gross horsepower Mean 146.7 160.3 126.8 1/4 mile time Mean 17.8 18.2 17.4 Number of cylinders 4 34.4 15.8 61.5 6 21.9 21.1 23.1 8 43.8 63.2 15.4 #Total cases 32 19 13 Number of carburetors 1 21.9 15.8 30.8 2 31.2 31.6 30.8 3 9.4 15.8 4 31.2 36.8 23.1 6 3.1 7.7 8 3.1 7.7 #Total cases 32 19 13

Linear regression by groups

mtcars %>%
tab_cells(sheet(mpg, disp, hp, wt, qsec)) %>%
tab_cols(total(), am) %>%
tab_stat_fun_df(
function(x){
frm = reformulate(".", response = as.name(names(x)[1]))
model = lm(frm, data = x)
sheet('Coef.' = coef(model),
confint(model)
)
}
) %>%
tab_pivot() %>%
tab_caption("Linear regression by groups")
  #Total     Transmission   Coef.     2.5 %     97.5 % Linear regression by groups (Intercept) 27.3 9.6 45.1 21.8 -1.9 45.5 13.3 -21.9 48.4 Displacement (cu.in.) 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -0.1 0.1 Gross horsepower 0.0 -0.1 0.0 0.0 -0.1 0.0 0.0 0.0 0.1 Weight (1000 lbs) -4.6 -7.2 -2.0 -2.3 -5.0 0.4 -7.7 -12.5 -2.9 1/4 mile time 0.5 -0.4 1.5 0.4 -0.7 1.6 1.6 -0.2 3.4

Subtotals

mtcars %>%
tab_cells(mpg) %>%
tab_cols(total(), vs) %>%
tab_rows(subtotal(cyl, 1:2, 3:4, "5 and more" = 5 %thru% hi)) %>%
tab_stat_mean() %>%
tab_pivot() %>%
tab_caption("Subtotals in rows")
    #Total     Engine Subtotals in rows Number of cylinders 1 Miles/(US) gallon Mean 2 Miles/(US) gallon Mean TOTAL 1/2 Miles/(US) gallon Mean 3 Miles/(US) gallon Mean 4 Miles/(US) gallon Mean 26.7 26.0 26.7 TOTAL 3/4 Miles/(US) gallon Mean 26.7 26.0 26.7 6 Miles/(US) gallon Mean 19.7 20.6 19.1 8 Miles/(US) gallon Mean 15.1 15.1 5 and more Miles/(US) gallon Mean 16.6 16.1 19.1

Subtotals at the bottom of the table

mtcars %>%
tab_cells(mpg, qsec) %>%
tab_cols(total(), vs) %>%
tab_rows(subtotal(cyl, 1:2, 3:4, "TOTAL 5 and more" = 5 %thru% hi, position = "bottom")) %>%
tab_stat_mean() %>%
tab_pivot() %>%
tab_caption("Subtotals at the bottom of the table")
    #Total     Engine Subtotals at the bottom of the table Number of cylinders 1 Miles/(US) gallon Mean 1/4 mile time Mean 2 Miles/(US) gallon Mean 1/4 mile time Mean 3 Miles/(US) gallon Mean 1/4 mile time Mean 4 Miles/(US) gallon Mean 26.7 26.0 26.7 1/4 mile time Mean 19.1 16.7 19.4 6 Miles/(US) gallon Mean 19.7 20.6 19.1 1/4 mile time Mean 18.0 16.3 19.2 8 Miles/(US) gallon Mean 15.1 15.1 1/4 mile time Mean 16.8 16.8 TOTAL 1/2 Miles/(US) gallon Mean 1/4 mile time Mean TOTAL 3/4 Miles/(US) gallon Mean 26.7 26.0 26.7 1/4 mile time Mean 19.1 16.7 19.4 TOTAL 5 and more Miles/(US) gallon Mean 16.6 16.1 19.1 1/4 mile time Mean 17.2 16.7 19.2

Nets

Net, contrary to subtotal, remove original categories.

mtcars %>%
tab_cells(mpg) %>%
tab_cols(total(), vs) %>%
tab_rows(net(cyl, 1:2, 3:4, "NET 5 and more" = 5 %thru% hi, prefix = "NET ")) %>%
tab_stat_mean() %>%
tab_pivot() %>%
tab_caption("Nets in rows, custom prefix")
    #Total     Engine Nets in rows, custom prefix Number of cylinders NET 1/2 Miles/(US) gallon Mean NET 3/4 Miles/(US) gallon Mean 26.7 26.0 26.7 NET 5 and more Miles/(US) gallon Mean 16.6 16.1 19.1

Nets with complex grouping

mtcars %>%
tab_cells(net(mpg, "Low mpg" = less(mean(mpg)), "High mpg" = greater_or_equal(mean(mpg)))) %>%
tab_cols(total(), am) %>%
tab_stat_cases() %>%
tab_pivot() %>%
tab_caption("Nets with complex grouping")
  #Total     Transmission Nets with complex grouping Miles/(US) gallon Low mpg 18 15 3 High mpg 14 4 10 #Total cases 32 19 13

Significance testing on column percent

Letters marks cells which are significantly greater than cells in the appropriate columns. - and + marks values which are lower/greater than values in the first column. Significance testing on column percent should be applied on the result of tab_stat_cpct with total row.

mtcars %>%
tab_cells(cyl) %>%
tab_cols(total(), vs) %>%
tab_stat_cpct() %>%
tab_pivot() %>%
significance_cpct(compare_type = c("first_column", "subtable"), sig_level = 0.05) %>%
tab_caption("Significance testing on column percent")
  #Total     Engine Significance testing on column percent Number of cylinders 4 34.4 5.6 - 71.4 + A 6 21.9 16.7 28.6 8 43.8 77.8 + #Total cases 32 18 14

Significance testing on means

Significance testing on means should be applied on the result of tab_stat_mean_sd_n.

mtcars %>%
tab_cells(mpg, disp, hp, wt, qsec) %>%
tab_cols(total(), am) %>%
tab_stat_mean_sd_n() %>%
tab_pivot() %>%
significance_means(compare_type = c("first_column", "subtable")) %>%
tab_caption("Significance testing on means")
  #Total     Transmission Significance testing on means Miles/(US) gallon Mean 20.1 17.1 - 24.4 + A Std. dev. 6.0 3.8 6.2 Unw. valid N 32.0 19.0 13.0 Displacement (cu.in.) Mean 230.7 290.4 B 143.5 - Std. dev. 123.9 110.2 87.2 Unw. valid N 32.0 19.0 13.0 Gross horsepower Mean 146.7 160.3 126.8 Std. dev. 68.6 53.9 84.1 Unw. valid N 32.0 19.0 13.0 Weight (1000 lbs) Mean 3.2 3.8 + B 2.4 - Std. dev. 1.0 0.8 0.6 Unw. valid N 32.0 19.0 13.0 1/4 mile time Mean 17.8 18.2 17.4 Std. dev. 1.8 1.8 1.8 Unw. valid N 32.0 19.0 13.0

Multiple-response variables with weighting

Here we load data with multiple-responce questions. mrset means that we treat set of variables as multiple response varibale with category encoding. For dichotomy encoding use mdset.

data(product_test)
codeframe_likes = num_lab("
1 Liked everything
2 Disliked everything
3 Chocolate
4 Appearance
5 Taste
6 Stuffing
7 Nuts
8 Consistency
98 Other
")

set.seed(1)
product_test = product_test %>%
let(
# recode age by groups
age_cat = recode(s2a, lo %thru% 25 ~ 1, lo %thru% hi ~ 2),
wgt = runif(.N, 0.25, 4),
wgt = wgt/sum(wgt)*.N
)  %>%
apply_labels(
age_cat = "Age",
age_cat = c("18 - 25" = 1, "26 - 35" = 2),
a1_1 = "Likes. VSX123",
b1_1 = "Likes. SDF456",
a1_1 = codeframe_likes,
b1_1 = codeframe_likes
)

product_test %>%
tab_cells(mrset(a1_1 %to% a1_6), mrset(b1_1 %to% b1_6)) %>%
tab_cols(total(), age_cat) %>%
tab_weight(wgt) %>%
tab_stat_cpct() %>%
tab_sort_desc() %>%
tab_pivot() %>%
tab_caption("Multiple-response variables with weighting")
  #Total     Age Multiple-response variables with weighting Likes. VSX123 Nuts 68.0 82.5 53.7 Taste 39.3 48.4 30.3 Chocolate 34.5 31.9 37.0 Appearance 33.6 26.7 40.4 Stuffing 31.5 23.6 39.3 Consistency 12.9 7.7 18.0 Disliked everything 0.7 1.4 Liked everything Other Hard to answer #Total cases 150 70 80 Likes. SDF456 Nuts 63.3 62.1 64.4 Taste 31.6 35.2 28.0 Appearance 30.3 28.8 31.8 Chocolate 27.2 27.9 26.4 Stuffing 21.5 14.3 28.7 Consistency 8.2 1.7 14.7 Disliked everything 2.9 3.3 2.5 Other 1.0 1.9 Liked everything Hard to answer #Total cases 150 70 80

Side-by-side variables comparison

To make side-by-side comparison we use “|” to suppress variable labels and put these labels to the statistic labels. Statistics labels we place in columns with tab_pivot.

product_test %>%
tab_cols(total(), age_cat) %>%
tab_weight(wgt) %>%
# '|' is needed to prevent automatic labels creation from argument
tab_cells("|" = unvr(mrset(a1_1 %to% a1_6))) %>%
tab_stat_cpct(label = var_lab(a1_1)) %>%
tab_cells("|" = unvr(mrset(b1_1 %to% b1_6))) %>%
tab_stat_cpct(label = var_lab(b1_1)) %>%
tab_pivot(stat_position = "inside_columns") %>%
tab_caption("Side-by-side variables comparison")
  #Total     Age   Likes. VSX123 Side-by-side variables comparison Liked everything Disliked everything 0.7 2.9 3.3 1.4 2.5 Chocolate 34.5 27.2 31.9 27.9 37.0 26.4 Appearance 33.6 30.3 26.7 28.8 40.4 31.8 Taste 39.3 31.6 48.4 35.2 30.3 28.0 Stuffing 31.5 21.5 23.6 14.3 39.3 28.7 Nuts 68.0 63.3 82.5 62.1 53.7 64.4 Consistency 12.9 8.2 7.7 1.7 18.0 14.7 Other 1.0 1.9 Hard to answer #Total cases 150 150 70 70 80 80

Multiple tables in the loop with knitr

To make the task more practical we will create table with means for variables which have more than 6 unique elements. For other variables we will calculate column percent table. Note that you need to set results='asis' in the chunk options.

# here we specify dataset and banner
banner = mtcars %>%
tab_cols(total(), am)

for(each in colnames(mtcars)){
# note ..$which is used for indirect reference to variable # specify variable curr_table = banner %>% tab_cells(..$each)
# calculate statistics
if(length(unique(mtcars[[each]]))>6){
curr_table = curr_table %>%
tab_stat_mean_sd_n() %>%
tab_pivot() %>%
significance_means()
} else {
curr_table = curr_table %>%
tab_stat_cpct() %>%
tab_pivot() %>%
significance_cpct()
}
# finalize table
curr_table %>%
tab_caption("Variable name: ", each) %>%
htmlTable() %>%
print()
}
  #Total     Transmission Variable name: mpg Miles/(US) gallon Mean 20.1 17.1 24.4 A Std. dev. 6.0 3.8 6.2 Unw. valid N 32.0 19.0 13.0
  #Total     Transmission Variable name: cyl Number of cylinders 4 34.4 15.8 61.5 A 6 21.9 21.1 23.1 8 43.8 63.2 B 15.4 #Total cases 32 19 13
  #Total     Transmission Variable name: disp Displacement (cu.in.) Mean 230.7 290.4 B 143.5 Std. dev. 123.9 110.2 87.2 Unw. valid N 32.0 19.0 13.0
  #Total     Transmission Variable name: hp Gross horsepower Mean 146.7 160.3 126.8 Std. dev. 68.6 53.9 84.1 Unw. valid N 32.0 19.0 13.0
  #Total     Transmission Variable name: drat Rear axle ratio Mean 3.6 3.3 4.0 A Std. dev. 0.5 0.4 0.4 Unw. valid N 32.0 19.0 13.0
  #Total     Transmission Variable name: wt Weight (1000 lbs) Mean 3.2 3.8 B 2.4 Std. dev. 1.0 0.8 0.6 Unw. valid N 32.0 19.0 13.0
  #Total     Transmission Variable name: qsec 1/4 mile time Mean 17.8 18.2 17.4 Std. dev. 1.8 1.8 1.8 Unw. valid N 32.0 19.0 13.0
  #Total     Transmission Variable name: vs Engine V-engine 56.2 63.2 46.2 Straight engine 43.8 36.8 53.8 #Total cases 32 19 13
  #Total     Transmission Variable name: am Transmission Automatic 59.4 100.0 Manual 40.6 100.0 #Total cases 32 19 13
  #Total     Transmission Variable name: gear Number of forward gears 3 46.9 78.9 4 37.5 21.1 61.5 A 5 15.6 38.5 #Total cases 32 19 13
  #Total     Transmission Variable name: carb Number of carburetors 1 21.9 15.8 30.8 2 31.2 31.6 30.8 3 9.4 15.8 4 31.2 36.8 23.1 6 3.1 7.7 8 3.1 7.7 #Total cases 32 19 13