Tutorial: tbl_regression

Daniel D. Sjoberg

Last Updated: November 29, 2018

Introduction

This vignette will walk a reader through the tbl_regression() function, and the various functions available to modify and make additions to an existing formatted regression table. tbl_regression() uses broom::tidy() to perform the initial model formatting, and can accommodate many different model types (e.g. lm(), glm(), survival::coxph(), survival::survreg() and more). If your class of model is not supported , please request support. In some cases, it is simple to support a new class of model.

To start, a quick note on the {magrittr} package’s pipe function, %>%. By default the pipe operator puts whatever is on the left hand side of %>% into the first argument of the function on the right hand side. The pipe function can be used to make the code relating to tbl_regression() easier to use, but it is not required. Here are a few examples of how %>% translates into typical R notation.

x %>% f() is equivalent to f(x)
x %>% f(y) is equivalent to f(x, y)
y %>% f(x, .) is equivalent to f(x, y)
z %>% f(x, y, arg = .) is equivalent to f(x, y, arg = z)

Setup

Before going through the tutorial, install {gtsummary} and {gt}.

install.packages("gtsummary")
remotes::install_github("rstudio/gt")

library(gtsummary)
library(dplyr)

Basic Usage

We will use the trial data set throughout this example. This set contains data from 200 patients who received one of two types of chemotherapy (Drug A or Drug B). The outcome is a binary tumor response. Each variable in the data frame has been assigned an attribute label (i.e. attr(trial$trt, "label") == "Chemotherapy Treatment") with the labelled package.

trt      Chemotherapy Treatment
age      Age, yrs
marker   Marker Level, ng/mL
stage    T Stage
grade    Grade
response Tumor Response

The default output from tbl_regression() is meant to be publication ready. Let’s start by creating a regression model table from the trial data set included in the {gtsummary} package. We will predict tumor response using age, stage, and grade using a logistic regression model.

# build logistic regression model
m1 = glm(response ~ age + stage + grade, trial, family = binomial(link = "logit"))

# format results into data frame
tbl_regression(m1, exponentiate = TRUE)
Characteristic OR1 95% CI1 p-value
Age, yrs 1.02 1.00, 1.04 0.092
T Stage
T1
T2 0.57 0.23, 1.34 0.2
T3 0.91 0.37, 2.22 0.8
T4 0.76 0.31, 1.85 0.6
Grade
I
II 0.84 0.38, 1.85 0.7
III 1.05 0.49, 2.25 >0.9

1 OR = Odds Ratio, CI = Confidence Interval

Because the variables in the data set were labelled, the labels were carried through into the {gtsummary} output table. Had the data not been labelled, the default is to display the variable name. The model was recognized as logistic regression with coefficients exponentiated, so the header displayed “OR” for odds ratio. The correct reference group has also been added to the table.

Customize Output

There are four primary ways to customize the output of the regression model table.

  1. Modify tbl_regression() function input arguments
  2. Add additional data/information to a summary table with add_*() functions
  3. Modify summary table appearance with the {gtsummary} functions
  4. Modify table appearance with {gt} package functions

Modifying function arguments

The tbl_regression() function includes many input options for modifying the appearance.

label         modify the variable labels printed in the table.  
exponentiate  exponentiate model coefficients.  
include       names of variables to include in output. Default is all variables.  
show_yesno    show both levels of yes/no variables. Default prints on single row.  
conf.level    confidence level of confidence interval.  
intercept     logical argument indicates whether to include the intercept in output.  
estimate_fun  function to round and format coefficient estimates.  
pvalue_fun    function to round and format p-values.  

Functions to add information

The {gtsummary} package has built-in functions for adding to results from tbl_regression(). The following functions add columns and/or information to the regression table.

add_global_p()  adds the global p-value for a categorical variables   
add_nevent()    adds the number of observed events to the results object    

{gtsummary} functions to format table

The {gtsummary} package comes with functions specifically made to modify and format summary tables.

bold_labels()       bold variable labels  
bold_levels()       bold variable levels  
italicize_labels()  italicize variable labels  
italicize_levels()  italicize variable levels  
bold_p()            bold significant p-values  

{gt} functions to format table

The {gt} package is packed with many great functions for modifying table output—too many to list here. Review the package’s website for a full listing. https://gt.rstudio.com/index.html

To use the {gt} package functions with {gtsummary} tables, the regression table must first be converted into a {gt} object. To this end, use the as_gt() function after modifications have been completed with {gtsummary} functions.

m1 %>%
  tbl_regression(exponentiate = TRUE) %>%
  as_gt() %>%
  <gt functions>

Example

There are formatting options available, such as adding bold and italics to text. In the example below,
- Variable labels are bold
- Levels of categorical levels are italicized
- Global p-values for T Stage and Grade are reported - P-values less than 0.10 are bold
- Large p-values are rounded to two decimal places
- Coefficients are exponentiated to give odds ratios
- Odds ratios are rounded to 2 or 3 significant figures

Characteristic OR1 95% CI1 p-value
Age, yrs 1.020 0.997, 1.043 0.092
T Stage 0.60
T1
T2 0.567 0.234, 1.342
T3 0.908 0.367, 2.220
T4 0.765 0.310, 1.854
Grade 0.85
I
II 0.841 0.379, 1.849
III 1.045 0.486, 2.246

1 OR = Odds Ratio, CI = Confidence Interval

Report Results Inline

Reproducible reports are an important part of good practices. We often need to report the results from a table in the text of an R markdown report. Inline reporting has been made simple with inline_text().

Let’s first create a regression model table.

tbl_m1 <- tbl_regression(m1, exponentiate = TRUE)
tbl_m1
Characteristic OR1 95% CI1 p-value
Age, yrs 1.02 1.00, 1.04 0.092
T Stage
T1
T2 0.57 0.23, 1.34 0.2
T3 0.91 0.37, 2.22 0.8
T4 0.76 0.31, 1.85 0.6
Grade
I
II 0.84 0.38, 1.85 0.7
III 1.05 0.49, 2.25 >0.9

1 OR = Odds Ratio, CI = Confidence Interval

To report the result for age, use the following commands inline.

`r inline_text(tbl_m1, variable = "age")`

Here’s how the line will appear in your report.

1.02 (95% CI 1.00, 1.04; p=0.092)

It is reasonable that you’ll need to modify the text. To do this, use the pattern argument. The pattern argument syntax follows glue::glue() format with referenced R objects being inserted between curly brackets. The default is pattern = "{estimate} ({conf.level*100}% CI {conf.low}, {conf.high}; {p.value})". You have access the to following fields within the pattern argument.

{estimate}   primary estimate (e.g. model coefficient, odds ratio)
{conf.low}   lower limit of confidence interval
{conf.high}  upper limit of confidence interval
{p.value}    p-value
{conf.level} confidence level of interval
{N}          number of observations

Age was not significantly associated with tumor response `r inline_text(tbl_m1, variable = "age", pattern = "(OR {estimate}; 95% CI {conf.low}, {conf.high}; {p.value})")`.

Age was not significantly associated with tumor response (OR 1.02; 95% CI 1.00, 1.04; p=0.092).

If you’re printing results from a categorical variable, include the level argument, e.g. inline_text(tbl_m1, variable = "stage", level = "T3") resolves to “0.91 (95% CI 0.37, 2.22; p=0.8)”.

The inline_text function has arguments for rounding the p-value (pvalue_fun) and the coefficients and confidence interval (estimate_fun). These default to the same rounding performed in the table, but can be modified when reporting inline.

Advanced Customization

When you print the output from the tbl_regression() function into the R console or into an R markdown, there are default printing functions that are called in the background: print.tbl_regression() and knit_print.tbl_regression(). The true output from tbl_regression() is a named list, but when you print the object, a formatted version of .$table_body is displayed. All formatting and modifications are made using the {gt} package by default.

tbl_regression(m1) %>% names()
#> [1] "table_body"   "table_header" "n"            "model_obj"    "inputs"      
#> [6] "call_list"    "gt_calls"     "kable_calls"

These are the additional data stored in the tbl_regression() output list.

table_body   data frame with summary statistics  
n            N included in model  
model_obj    the model object passed to `tbl_regression`  
call_list    named list of each function called on the `tbl_regression` object  
inputs       inputs from the `tbl_regression()` function call  
gt_calls     named list of {gt} function calls  
kable_calls  named list of function calls to be applied before knitr::kable()  

gt_calls is a named list of saved {gt} function calls. The {gt} calls are run when the object is printed to the console or in an R markdown document. Here’s an example of the first few calls saved with tbl_regression():

tbl_regression(m1) %>% purrr::pluck("gt_calls") %>% head(n = 5)
#> $gt
#> gt::gt(data = x$table_body)
#> 
#> $cols_align
#> gt::cols_align(align = 'center') %>% gt::cols_align(align = 'left', columns = gt::vars(label))
#> 
#> $fmt_missing
#> gt::fmt_missing(columns = gt::everything(), missing_text = '')
#> 
#> $fmt_missing_ref
#> gt::fmt_missing(columns = gt::vars(estimate, ci), rows = row_ref == TRUE, missing_text = '---')
#> 
#> $tab_style_text_indent
#> gt::tab_style(style = gt::cell_text(indent = gt::px(10), align = 'left'),locations = gt::cells_data(columns = gt::vars(label), rows = row_type != 'label'))

The {gt} functions are called in the order they appear, always beginning with the gt() function.

If the user does not want a specific {gt} function to run, any {gt} call can be excluded in the as_gt() function by specifying the exclude argument. For example, the tbl_regression() call creates many named {gt} function calls: gt, cols_align, fmt_missing, fmt_missing_ref, tab_style_text_indent, cols_label, cols_hide, fmt, tab_footnote. Any one of these can be excluded. In this example, the default footnote will be excluded from the output.

tbl_regression(m1, exponentiate = TRUE) %>%
  as_gt(exclude = "tab_footnote")
Characteristic OR 95% CI p-value
Age, yrs 1.02 1.00, 1.04 0.092
T Stage
T1
T2 0.57 0.23, 1.34 0.2
T3 0.91 0.37, 2.22 0.8
T4 0.76 0.31, 1.85 0.6
Grade
I
II 0.84 0.38, 1.85 0.7
III 1.05 0.49, 2.25 >0.9

Univariate Regression

The tbl_uvregression() produces a table of univariate regression results. The function is a wrapper for tbl_regression(), and as a result, accepts nearly identical function arguments. The function’s results can be modified in similar ways to tbl_regression() and the results reported inline similarly to tbl_regression().

trial %>%
  select(-death, -ttdeath, -stage) %>%
  tbl_uvregression(
    method = glm,
    y = response,
    method.args = list(family = binomial),
    exponentiate = TRUE,
    pvalue_fun = function(x) style_pvalue(x, digits = 2)
  ) %>%
  # overrides the default that shows p-values for each level
  add_global_p() %>%
  # adjusts global p-values for multiple testing (default method: FDR)
  add_q() %>%
  # bold p-values under a given threshold (default 0.05)
  bold_p() %>%
  # now bold q-values under the threshold of 0.10
  bold_p(t = 0.10, q = TRUE) %>%
  bold_labels()
Covariate N OR1 95% CI1 p-value q-value2
Chemotherapy Treatment 193 0.53 0.71
Drug A
Drug B 1.21 0.66, 2.24
Age, yrs 183 1.02 1.00, 1.04 0.091 0.20
Marker Level, ng/mL 183 1.35 0.94, 1.93 0.10 0.20
Grade 193 0.93 0.93
I
II 0.95 0.45, 2.00
III 1.10 0.52, 2.29

1 OR = Odds Ratio, CI = Confidence Interval

2 False discovery rate correction for multiple testing

Setting Default Options

The {gtsummary} regression functions and their related functions have sensible defaults for rounding and formatting results. If you, however, would like to change the defaults there are a few options. The default options can be changed in a single script with addition an options() command in the script. The defaults can also be set on the project- or user-level R profile, .Rprofile.

help("Rprofile")

usethis::edit_r_profile()

The following parameters are available to be set:

Description Example Functions
Formatting and rounding p-values options(gtsummary.pvalue_fun = function(x) gtsummary::style_pvalue(x, digits = 2)) add_p(), tbl_regression(), tbl_uvregression()
Formatting and rounding for regression coefficients options(gtsummary.tbl_regression.estimate_fun = function(x) gtsummary::style_sigfig(x, digits = 3)) tbl_regression(), tbl_uvregression()
Set level for limits options(gtsummary.conf.level = 0.90) tbl_regression(), tbl_uvregression()
Print tables with gt or kable options(gtsummary.print_engine = "kable") options(gtsummary.print_engine = "gt") All tbl_*() functions

When setting default rounding/formatting functions, set the default to a function object rather than an evaluated function. For example, if you want to round estimates to 3 significant figures use,

options(gtsummary.tbl_regression.estimate_fun = function(x) sigfig(x, digits = 3))