The package name `brea`

stands for *Bayesian recurrent
events analysis*. This software is capable of implementing discrete
time-to-event models with a variety of features beyond just recurrent
events though, including competing risks, time-varying covariates,
GAM-style incorporation of nonlinear covariate effects, random effects
(frailties), and multiple states.

This tutorial contains a detailed introduction to the use of the
`brea`

R package. Our goal is to help the user understand how
to format data and input parameters for the `brea_mcmc()`

function, as well as how to extract and interpret relevant results from
its output. The advanced features mentioned in the preceding paragraph
(such as multistate models) will be covered in a future tutorial.

To make best use of this tutorial (and the `brea`

software), the user should already be familiar with both basic discrete
time survival modeling and Bayesian inference via simulation (MCMC); we
refer the user to (Allison 1982) and (Gelman et al.
2013), respectively, for detailed expositions of these
topics. We do however also provide brief refreshers on these topics in
the next sections.

Our main goal in this tutorial is to implement a simple Cox
proportional hazards model, namely the leukemia remission times example
from (Cox
1972). Specifically, we first illustrate how to fit this
model with the classical frequentist partial likelihood approach using
the `coxph()`

function from the `survival`

package. Second, we show how to reformulate the data for discrete time
analysis, and fit the model using frequentist simple logistic regression
with the `glm()`

function. Finally, we adopt a Bayesian
approach for this discrete time model, and obtain inferences using the
`brea_mcmc()`

function from the `brea`

package.
Results from these three approaches will be compared and contrasted.

Most popular time-to-event models and corresponding software (e.g.,
the `survival`

package in R) assume that the times of event
occurrence are *continuous*, meaning an event can occur at any
instant in some time span. For example, we may be interested in the
exact number of seconds from rocket liftoff until the rocket payload
achieves orbit (or experiences a competing risk of a malfunction).

*Discrete* timing of events, on the other hand, can arise in
two ways. First, event occurrence may only be possible at a discrete
(i.e., finite or listable) collection of time points. For example, we
may be interested in the number of academic terms before a student
graduates (or experiences a competing risk of dropout). In this example,
the event of interest (graduation) may only occur at the conclusion of
\(t=1\) semester, \(t=2\) semesters, \(t=3\) semesters, and so on; it is not
possible for graduation to happen at \(t=3.14159\) semesters (whereas a rocket
could malfunction at exactly \(t=3.14159\) seconds after liftoff).

The second way discrete survival times arise is when continuous time
is partitioned into intervals, and only the identity of the event’s
interval is recorded. For example, a car accident event may occur at an
exact instant in time, but a car insurance company modeling the time
between accidents may only record the day of the accident. This special
case of discrete time-to-event data is called *grouped time*, and
it is responsible for the occurrence of tied observations in nominally
continuous survival data.

With continuous time, the most common approach for analyzing event
times is to model the *hazard rate* of event occurrence. Let
\(T\) denote a continuous event time
variable, taking some value in the interval \((0,\infty)\). The *continuous time
hazard rate* \(\lambda(t)\) is
defined to be the “instantaneous risk” of event occurrence at time \(t\) given the event of interest has not yet
occurred prior to \(t\): \[
\lambda(t)=\lim_{dt\rightarrow 0} \frac{P(t\leq T \leq t+dt)}{dt\cdot
P(T\geq t)}
\] The *continuous time Cox proportional hazards model*
then models these hazard rates with: \[
\lambda(t)=\lambda_0(t)\text{exp}(\beta_1X_1+\cdots\beta_MX_M)=\text{exp}(f_0(t)+\beta_1X_1+\cdots\beta_MX_M)
\] where \(\lambda_0(t)\) is an
arbitrary function of time \(t\) called
a *baseline hazard* (and \(f_0(t)=\text{log}(\lambda_0(t))\) is this
same baseline hazard on a log scale) and \(X_1,\ldots,X_M\) are covariate values (Cox 1972). Taking
logarithms of both sides of this equation yields an equivalent form of
our continuous time Cox model: \[
\text{log}(\lambda(t)) = \eta(t) = f_0(t)+\beta_1X_1+\cdots\beta_MX_M
\] where \(\eta(t)\) is the
*linear predictor* incorporating both the time effects and other
covariate effects.

As with continuous time, with discrete time we will also model the
*hazard rate* of event occurrence. Let \(T\) now denote the discrete time of event
occurrence, and we assume the possible discrete time points are labeled
with the positive integers. So for example, if the event of interest
occurs at the third discrete time point, then \(T=3\). If there is only a single event type
(i.e., no competing risks), we define the *discrete time hazard
rate* \(\lambda(t)\) to be the
probability of event occurrence at time \(t\) given the event has not already
occurred at an earlier time point: \[\lambda(t)=P(T=t|T\geq t)\] Unlike with
the continuous time hazard rate (which may take any value between 0 and
\(\infty\)), the discrete time hazard
rate is a probability and is thus bounded between 0 and 1. Hence, we
relate \(\lambda(t)\) to a linear
predictor \(\eta(t)\) encapsulating
covariate effects (including the effect of time \(t\)) using the *logit link function*
(just as in logistic regression): \[
\text{log}\left(\frac{\lambda(t)}{1-\lambda(t)}\right) = \eta(t) =
f_0(t)+\beta_1X_1+\cdots\beta_MX_M
\] Here, for the discrete time model, proportionality is now
technically in the *odds* of the hazard. However, for simplicity
we will still refer to this as the *discrete time Cox proportional
hazards model*.

The discrete Cox model is recognizable as being identical to
*logistic regression*. However, the difference here is that we
must include one logistic regression observation *for each discrete
time point \(t\) each sample member was
at risk for event occurrence* in order to model the aggregate risk
experienced by each sample member. To illustrate, suppose we have the
following data set:

ID | T | \(\delta\) | age | gender |
---|---|---|---|---|

01 | 3 | 1 | 24 | M |

02 | 2 | 0 | 27 | F |

\(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |

50 | 5 | 1 | 33 | F |

where ID is a subject id number, \(T\) is *study time* (the time of
either event occurrence or right censoring), \(\delta\) is an event indicator (1 for event
occurrence, 0 for right censoring), and age and gender are covariates.
For instance, the first study subject did *not* experience the
event of interest at times \(t=1\) or
\(t=2\) but *did* experience the
event of interest at time \(t=3\).

To fit the discrete Cox model to this data, we must first expand the
data set to *person-time format*, with *one row of data for
each discrete time point each sample member was at risk for event
occurrence*. The above data set would become:

ID | t | Y | age | gender |
---|---|---|---|---|

01 | 1 | 0 | 24 | M |

01 | 2 | 0 | 24 | M |

01 | 3 | 1 | 24 | M |

02 | 1 | 0 | 27 | F |

02 | 2 | 0 | 27 | F |

\(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |

50 | 1 | 0 | 33 | F |

50 | 2 | 0 | 33 | F |

50 | 3 | 0 | 33 | F |

50 | 4 | 0 | 33 | F |

50 | 5 | 1 | 33 | F |

where \(t\) is the discrete time for that specific person-time observation, and \(Y\) is an outcome indicator (\(Y=0\) for no event occurring at that person-time point, and \(Y=1\) for event occurrence). Note that when \(t=T\) (i.e., it is the last discrete time point of observation for a sample member), \(Y=\delta\). For example, study subject 02 was observed for two discrete time points, and because her \(\delta\) value is 0, her \(Y\) value at her second person-time point of observation is also 0. For all three study subjects depicted here, we assumed their ages remained constant across all discrete time points of observation. However, if we knew otherwise, we could easily calculate potentially distinct age values for each person-time observation.

`brea`

PackageOnce a data set has been expanded into person-time format, it may be
fit via conventional logistic regression software (e.g., the
`glm()`

function with the `family=binomial`

option; see the example later in this tutorial). The
`brea_mcmc()`

function from the `brea`

package is
based on logistic regression, but has one additional requirement:
*All* predictor variables (including time \(t\)) must be *specified* in a
*categorical* manner, either by giving positive integer values
\(k\) (where \(k=1,\ldots,K\) for some \(K\)) or by using an R `factor`

data type. The logistic regression model implementing the discrete Cox
model then becomes:

\[ \text{log}\left(\frac{\lambda(t)}{1-\lambda(t)}\right) = \eta(t) = \beta_0 + \beta_1[X_1(t)] + \beta_2[X_2(t)] + \cdots + \beta_M[X_m(t)] \] Here each predictor value \(X_m(t)\) takes a positive integer value \(k\) between \(1\) and \(K_m\), and so the effects of the \(m^\text{th}\) predictor are modeled on the logit scale by a collection of \(K_m\) parameters \(\beta_m[1]\), \(\beta_m[2]\),…,\(\beta_m[K_m]\).

Note that while the *specification* of covariate values must
be *categorical*, *quantitative* covariates are still
fully supported. To include a quantitative covariate, the user must
first discretize the covariate (e.g., using the `cut()`

function). This effectively means the potentially arbitrary smooth
relationship between the covariate and the hazard will be approximated
by a *step function*. To increase the amount of smoothness, a
finer discretization (with a larger number \(K_m\) of categories) may be used.
*Gaussian Markov random field (GMRF)* prior distributions are
then used for the parameter values \(\beta_m[1]\), \(\beta_m[2]\),…,\(\beta_m[K_m]\) to ensure
regularity/smoothness of the functional relationship even if the
discretization is fine. See (King and Weiss 2021) and the
practical example later in this tutorial for more details.

In Bayesian statistics, information and uncertainty about model parameters is encapsulated by probability distributions for those parameters. We briefly discuss these distributions, and then describe how to obtain inferential results by numerically sampling from the posterior distribution.

The probability distribution we use to capture knowledge we have
about model parameters *prior* to seeing the sample data is
called the *prior distribution*. For example, if had reason to
believe that a model parameter \(\mu\)
was equal to 5, then we could use a prior distribution for \(\mu\) with a mean of 5. Furthermore, if we
were roughly 95% sure (prior to seeing the sample data) that \(\mu\) is between 1 and 9, then we could use
a \(N(5,2^2)\) (i.e., a normal
distribution with a mean of 5 and a standard deviation of 2) prior
distribution for \(\mu\), since for
this distribution roughly 95% of its probability is between 1 and 9:

`## [1] 0.9544997`

`## [1] 1.080072 8.919928`

With our modeling approach, we will generally use *noninformative
priors* that specify little, if any, prior information about the
model parameters (other than, for example, the fact that adjacent step
heights in a step function approximation of a smooth curve are close
together). We will illustrate how to do this in the practical example,
and users of the `brea`

package will generally *not*
have to be concerned with developing prior distribution specifications
beyond “default” noninformative priors.

The *posterior distribution* combines the information about
the model parameters from the prior distribution with the information
from the sample data to produce a *single probability
distribution* that encapsulates our most up-to-date state of
knowledge about the parameters. The posterior distribution is the
primary “output” or “result” or “inference” in a Bayesian analysis.

Mathematically, the posterior distribution is completely determined
by the combination of the prior distribution, statistical model, and
sample data. In all but the simplest cases, this probability
distribution doesn’t have some simple form that we can determine using
algebra or calculus. Instead, the only practical way to learn about this
distribution is to *take a random sample from the posterior
distribution*. Note that here we are talking about using computer
simulation algorithms to draw a random sample from the posterior
probability distribution of the *model parameters*, and this
sample is completely different from the *sample data* that we are
analyzing.

The most common way of selecting a random sample from the posterior
distribution is via a numerical algorithm called *Markov chain Monte
Carlo*, or *MCMC* for short. This is the algorithm used by
the `brea_mcmc()`

function we will use later. One important
difference between a random sample drawn using an MCMC algorithm and a
*simple random sample (SRS)* that the user may be familiar with
is that unlike with an SRS sample, the values from an MCMC sample are
*not independent of each other*. Specifically, MCMC generates
values for the sample as a *sequence*, and successive elements of
the sequence tend to be more similar to each other than independently
drawn values (i.e., the sequence of sampled values exhibits positive
*autocorrelation*).

As a result, for a given sample size \(S\), there is usually much less information
in a sample drawn using MCMC than an independent SRS sample. So while we
may draw a sample of size \(S\) using
MCMC, the *effective sample size* (the size of an independent SRS
sample that would contain the same amount of information as the MCMC
sample) is often 10 or even 100 times smaller than \(S\). In the practical example later, we
will show how to calculate the effective sample size using the
`effectiveSize()`

function from the `coda`

package; for publication-quality inferences, we recommend choosing the
MCMC sample size \(S\) so that the
effective sample size is at least 1,000 (which may necessitate choosing
\(S=10,000\) or even \(S=100,000\)).

Once we have obtained a sample from the posterior distribution, it is
very easy to obtain the types of inferential results the user is
familiar with from fitting frequentist models. Specifically, we may
obtain a *point estimate* of a specific model parameter by
calculating the mean or median of the MCMC sample for that parameter.
Similarly, we may obtain a *confidence interval* (CI, which
Bayesians often call *credible intervals*) for a model parameter
by calculating *quantiles* of the sampled values of that
parameter. For example, we could find the endpoints of a 95% CI by
calculating the \(2.5^\text{th}\)
percentile and \(97.5^\text{th}\)
percentile of the posterior MCMC sample for the desired parameter. We
illustrate this in the practical example below.

We now apply both classical frequentist models and our
`brea`

model to the first data set ever used for a Cox
proportional hazards model, namely the leukemia remission data from
(Cox 1972).
This data came from a clinical trial of acute leukemia patients with two
treatment groups, the drug 6-mercaptopurine (6-MP) and a placebo
control. There were \(n=42\) patients
in total with 21 assigned to each group. The outcome of interest was the
duration of leukemia remission (i.e., the *event of interest* was
the *end* of cancer remission, so *lower* hazard of event
occurrence is better). These durations were measured in whole weeks, so
time here is discrete. The data appear below:

Treatment Group | Observed Duration of Remission (weeks) (* denotes right-censored observations) |
---|---|

Group 0 (6-MP) | 6*, 6, 6, 6, 7, 9*, 10*,10, 11*, 13, 16, 17*, 19*, 20*, 22, 23, 25*, 32*, 32*, 34*, 35* |

Group 1 (control) | 1, 1, 2, 2, 3, 4, 4, 5, 5, 8, 8, 8, 8, 11, 11, 12, 12, 15, 17, 22, 23 |

We will first prepare and analyze this data with a continuous time
Cox proportional hazards model via the `survival`

package.
Then we will use a discrete time Cox model approach, implemented with
both vanilla logistic regression and finally with the `brea`

package.

`coxph()`

We begin by creating separate variables for the study time variable
`time`

(time of event occurrence or right censoring), the
event/censoring indicator variable `event`

(1 for event
occurrence, 0 for right censoring), and the treatment group variable
`treat`

(0 for treatment, 1 for control):

```
time <- c(6,6,6,6,7,9,10,10,11,13,16,17,19,20,22,23,25,32,32,34,35,
1,1,2,2,3,4,4,5,5,8,8,8,8,11,11,12,12,15,17,22,23)
event <- c(0,1,1,1,1,0,0,1,0,1,1,0,0,0,1,1,0,0,0,0,0,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
treat <- c(rep(0,21),rep(1,21))
```

Next, we load the `survival`

package and define our
outcome variable `y`

to be a survival object of class
`Surv`

:

Finally, we fit the Cox proportional hazards model and summarize the results:

```
## Call:
## coxph(formula = y ~ treat)
##
## n= 42, number of events= 30
##
## coef exp(coef) se(coef) z Pr(>|z|)
## treat 1.5721 4.8169 0.4124 3.812 0.000138 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## treat 4.817 0.2076 2.147 10.81
##
## Concordance= 0.69 (se = 0.041 )
## Likelihood ratio test= 16.35 on 1 df, p=5e-05
## Wald test = 14.53 on 1 df, p=1e-04
## Score (logrank) test = 17.25 on 1 df, p=3e-05
```

The estimated coefficient \(\beta_1\) of the treatment variable
`treat`

is 1.5721251. Since this value represents a log
hazard rate ratio, we exponentiate it to obtain the hazard ratio
estimate 4.8168739. Hence, we estimate the risk of remission ending is
4.82 times greater in the placebo group than in the 6-MP group. We may
also obtain and interpret confidence intervals for `treat`

’s
coefficient \(\beta_1\) or its
exponentiated version:

```
## 2.5 % 97.5 %
## treat 0.7638424 2.380408
```

```
## 2.5 % 97.5 %
## treat 2.146508 10.80931
```

We can be 95% sure that the chances of cancer remission ending is between 2.15 and 10.81 times more likely for patients who receive placebo when compared to patients who receive the 6-MP treatment. Thus, we have statistically significant evidence that the 6-MP treatment is highly effective at maintaining remission.

Note that `coxph()`

treats the event times as
*continuous*, despite the fact that the event times have been
grouped into whole weeks (and are thus *discrete*). This grouping
has resulted in numerous *tied observations*, which are sample
members with the exact same observed event time (e.g., four members of
the control group all experienced the event of interest at time \(t=8\)). With truly continuous data, ties
would be impossible, and the partial likelihood model fitting method
must be adjusted to allow for fitting with tied data. Several different
such adjustments are available in `coxph()`

, including the
`efron`

method (the default), the `breslow`

method, and the `exact`

method, which yield different though
qualitatively similar results:

```
## Call:
## coxph(formula = y ~ treat, ties = "efron")
##
## coef exp(coef) se(coef) z p
## treat 1.5721 4.8169 0.4124 3.812 0.000138
##
## Likelihood ratio test=16.35 on 1 df, p=5.261e-05
## n= 42, number of events= 30
```

```
## Call:
## coxph(formula = y ~ treat, ties = "breslow")
##
## coef exp(coef) se(coef) z p
## treat 1.5092 4.5231 0.4096 3.685 0.000229
##
## Likelihood ratio test=15.21 on 1 df, p=9.615e-05
## n= 42, number of events= 30
```

```
## Call:
## coxph(formula = y ~ treat, ties = "exact")
##
## coef exp(coef) se(coef) z p
## treat 1.6282 5.0949 0.4331 3.759 0.00017
##
## Likelihood ratio test=16.25 on 1 df, p=5.544e-05
## n= 42, number of events= 30
```

In summary, the classic Cox proportional hazards model yields hazard rate ratio estimates of 4.52, 4.82, and 5.09, depending on the method for handling ties.

Here we will expand the data into *person-time format*, with
one observation for each discrete time point each patient was at risk
for event occurrence. We begin by setting the total number of
person-time observations `N`

and creating matrices of height
`N`

to store the predictor values `x`

(with first
column subject id, second column time, and third column treatment group)
and binary response variable `y`

:

```
N <- sum(time) # total number of person-time observations
x <- matrix(0,N,3) # columns id, time, and treatment group
colnames(x) <- c("id","t","treat")
y <- matrix(0,N,1) # only one column since only one competing risk
```

Next we iterate through the 42 observations in the original
(unexpanded) data set, expanding each one and adding the corresponding
rows to `x`

and `y`

:

```
next_row <- 1 # next available row in the person-time matrices
for (i in seq_along(time)) {
rows <- next_row:(next_row + time[i] - 1) # person-time rows to fill
x[rows,1] <- rep(i,time[i]) # subject id number is constant for each person
x[rows,2] <- seq_len(time[i]) # discrete time is integers 1,2,...,time[i]
x[rows,3] <- rep(treat[i],time[i]) # treatment group is constant
y[rows,1] <- c(rep(0,time[i] - 1),event[i]) # outcome is 0's until study time
next_row <- next_row + time[i] # increment the next available row pointer
}
```

Let’s view the data from the first two study subjects in both the original and expanded forms to confirm this worked as intended:

```
## id time event treat
## 1 1 6 0 0
## 2 2 6 1 0
```

```
## id t treat y
## 1 1 1 0 0
## 2 1 2 0 0
## 3 1 3 0 0
## 4 1 4 0 0
## 5 1 5 0 0
## 6 1 6 0 0
## 7 2 1 0 0
## 8 2 2 0 0
## 9 2 3 0 0
## 10 2 4 0 0
## 11 2 5 0 0
## 12 2 6 0 1
```

We can see that the data from the first two members of the original
sample were correctly expanded into person-time format: The first six
person-time rows are for subject number 1, while the second six are for
subject number 2 (since both of the first two subjects were observed for
six discrete time points each). In addition, the only event observed for
these two subjects was at the sixth discrete time point
(`t=6`

) for the second subject (`id=2`

), so that
is the only row of the person-time data from the first two subjects
where `y=1`

.

`glm()`

Before we apply logistic regression to the expanded person-time data set, we must decide how to model the effect of discrete time \(t\) on the hazard of event occurrence. Recall that the discrete time Cox proportional hazards model includes an arbitrary function of time \(f_0(t)\) (known as a baseline hazard):

\[
\text{log}\left(\frac{\lambda(t)}{1-\lambda(t)}\right) = \eta(t) =
f_0(t)+\beta_1X_1+\cdots\beta_MX_M
\] Since here we will only be using standard logistic regression
software via the `glm()`

function (as opposed to a
*generalized additive model* which would allow inclusion of
arbitrary smooth functions \(f()\)
inside the linear predictor \(\eta(t)\)), we must use an alternative way
of modeling the effect of discrete time \(t\) on the hazard of event occurrence. Two
readily available choices would be to include a polynomial function of
discrete time \(t\) (e.g., a linear or
quadratic function) or to group discrete time \(t\) into intervals and treat it
categorically. We consider each of these in turn.

With linear modeling of the effect of time \(t\), our model becomes: \[
\text{log}\left(\frac{\lambda(t)}{1-\lambda(t)}\right) = \beta_0 +
\beta_1X_1(t) + \beta_2X_2
\] where \(X_1(t)=t\) is
discrete time and \(X_2\) is treatment
group (0 for 6-MP, 1 for placebo). We may fit this logistic regression
model to our person-time data set using the `glm()`

function
with the `family=binomial`

option:

```
##
## Call:
## glm(formula = y ~ t + treat, family = binomial, data = expanded_data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.01627 0.50971 -7.880 3.29e-15 ***
## t 0.02768 0.02745 1.008 0.313
## treat 1.77337 0.44334 4.000 6.33e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 231.84 on 540 degrees of freedom
## Residual deviance: 213.32 on 538 degrees of freedom
## AIC: 219.32
##
## Number of Fisher Scoring iterations: 6
```

We may then obtain a point estimate of the hazard rate ratio (the exponentiated coefficient of the treatment variable \(X_2\)) using:

```
## treat
## 5.890668
```

Note that technically this is an *odds ratio* estimate, not a
*hazard ratio* estimate, since the discrete Cox model sets the
log of the *odds* of the hazard equal to the linear predictor
\(\eta(t)\). Odds ratios will generally
tend to be more extreme (further from 1) than corresponding risk/hazard
ratios, though when the probabilities are small (as they are for most
discrete survival models), the differences are modest. We will
informally still interpret odds ratio estimates from discrete Cox models
as though they were hazard rate ratios.

We may also try quadratic and cubic formulations for the time effect:

```
quadratic_fit <- glm(y ~ poly(t,2) + treat,family=binomial,data=expanded_data)
exp(coef(quadratic_fit)["treat"])
```

```
## treat
## 5.550794
```

```
cubic_fit <- glm(y ~ poly(t,3) + treat,family=binomial,data=expanded_data)
exp(coef(cubic_fit)["treat"])
```

```
## treat
## 5.530024
```

Next, we consider grouping discrete time t into non-overlapping
intervals and treating the variable as categorical. This is equivalent
to approximating the baseline hazard \(f_0(t)\) with a *step function*.
When doing frequentist inference with this approach, we must have at
least one observed event in each interval of \(t\) values, or else the corresponding
parameter estimate for that interval will be \(-\infty\) (since the hazard appears to be
zero in that interval). Our dataset only contains 30 observed events,
and the last time of event occurrence is \(t=23\). With such limited data, we follow
the suggestion of (Cox
1972) of dividing the variable range into intervals of length
10 (with the exception of the last interval, which will have length 15
in order to include the last time of observation):

By doing a cross tabulation of the discrete time variable
`t`

in the expanded person-time data set with the new grouped
version `t_cat`

, we can verify that all discrete time
observations have been grouped into the correct intervals:

```
## t_cat
## t (0,10] (10,20] (20,35]
## 1 42 0 0
## 2 40 0 0
## 3 38 0 0
## 4 37 0 0
## 5 35 0 0
## 6 33 0 0
## 7 29 0 0
## 8 28 0 0
## 9 24 0 0
## 10 23 0 0
## 11 0 21 0
## 12 0 18 0
## 13 0 16 0
## 14 0 15 0
## 15 0 15 0
## 16 0 14 0
## 17 0 13 0
## 18 0 11 0
## 19 0 11 0
## 20 0 10 0
## 21 0 0 9
## 22 0 0 9
## 23 0 0 7
## 24 0 0 5
## 25 0 0 5
## 26 0 0 4
## 27 0 0 4
## 28 0 0 4
## 29 0 0 4
## 30 0 0 4
## 31 0 0 4
## 32 0 0 4
## 33 0 0 2
## 34 0 0 2
## 35 0 0 1
```

Indeed, we observe for example that all observations with
`t`

values between 1 and 10 have been correctly assigned to
the interval `(0,10]`

. Our discrete time Cox model with a
step function baseline hazard is now as follows: \[
\text{log}\left(\frac{\lambda(t)}{1-\lambda(t)}\right) = \beta_0 +
\beta_1X_1(t) + \beta_2X_2(t) + \beta_3X_3
\] where \(X_1(t)\) is a
dummy/indicator variable for the *second* interval \((10,20]\) (i.e., \(X_1(t)=1\) if \(t\in (10,20]\) and \(X_1(t)=0\) otherwise), \(X_2(t)\) is a dummy/indicator variable for
the *third* interval \((20,30]\)
(so the first interval \((0,10]\) is
the reference group), and \(X_3\) is
the treatment group variable. We may now fit this model:

```
##
## Call:
## glm(formula = y ~ t_cat + treat, family = binomial, data = expanded_data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.9850 0.4311 -9.244 < 2e-16 ***
## t_cat(10,20] 0.3326 0.4526 0.735 0.462
## t_cat(20,35] 0.9364 0.6306 1.485 0.138
## treat 1.8372 0.4459 4.120 3.79e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 231.84 on 540 degrees of freedom
## Residual deviance: 212.16 on 537 degrees of freedom
## AIC: 220.16
##
## Number of Fisher Scoring iterations: 6
```

Our estimated hazard rate ratio \(\text{exp}(\hat \beta_3)\) for the treatment effect is:

```
## treat
## 6.279227
```

We may also obtain a 95% CI for the hazard rate ratio as follows:

`## Waiting for profiling to be done...`

```
## 2.5 % 97.5 %
## (Intercept) -4.9127924 -3.212701
## t_cat(10,20] -0.6038059 1.193752
## t_cat(20,35] -0.4157477 2.121095
## treat 1.0011180 2.768136
```

`## Waiting for profiling to be done...`

```
## 2.5 % 97.5 %
## 2.721323 15.928911
```

Hence, using this model we estimate that the hazard of remission ending is 6.28 times higher in the placebo group compared to the 6-MP group. In addition, we can be 95% sure the hazard is between 2.72 and 15.93 times higher among placebo patients when compared to patients receiving the active treatment.

Note that the effect estimate here is substantially larger in magnitude than for any of the other formulations we have considered thus far. This is due to our step function approximation of the baseline hazard (the time effect) being too crude (only having three steps). With our Bayesian approach in the next section, we will see how to fix this.

`brea_mcmc()`

The function `brea_mcmc()`

from the `brea`

package is used to fit a Bayesian version of our discrete time Cox
proportional hazards model using a Markov chain Monte Carlo (MCMC)
algorithm, as described earlier. Here we will illustrate how to use this
function and interpret its output. We will end by comparing our results
with both the frequentist continuous and discrete time Cox models.

The `brea_mcmc()`

function has the following function
signature:

The only required arguments the user must specify are `x`

and `y`

, which specify, respectively, the predictor values
and event occurrence status for each person-time observation of the
expanded data set. More specifically, `y`

is a vector of
length `N`

or a matrix of height `N`

with a single
column (or with multiple columns if there are competing risks), where
`N`

is the number of person-time observations. Each entry of
`y`

is either 0 (if the event of interest did not occur at
that person-time observation) or 1 (if the event of interest did occur).
The `y`

variable we created earlier meets these requirements
without further modification.

The `x`

variable used to specify covariates at the
person-time level may either be a matrix or dataframe with
`N`

rows. *All* predictors must use a categorical
formulation (though often the categories represent steps in a step
function approximation of an arbitrary smooth function of a numerical
predictor). The categories of each categorical predictor are represented
either by positive whole number values or by R `factor`

variables (factors are only possible if `x`

is a data frame;
if `x`

is a matrix, then `x`

must be numeric).

We will now construct a numeric matrix `x`

that meets the
above requirements. First, we will use the same grouping of discrete
time `t`

into three intervals as used in the previous
section. We will also change the coding of the treatment indicator to
take values 1 and 2 instead of 0 and 1:

```
x_brea <- matrix(0,N,2)
x_brea[,1] <- cut(expanded_data$t,c(0,10,20,35),labels=FALSE) # grouped time t
x_brea[,2] <- expanded_data$treat + 1 # treatment group
```

Typically, users will also specify the `priors`

argument,
which determines *both* the covariate type (categorical fixed
effects, quantitative covariates with step function approximations of
the effects, or random effects) and any parameters for its corresponding
prior distribution. However, for this first `brea`

model, we
will be treating both predictors as categorical fixed effects and be
using corresponding noninformative priors. Since this is the default
specification (when `NULL`

is supplied for the
`priors`

argument), we do not have to provide this
argument.

We may then load the `brea`

package and obtain a sample
from the posterior distribution of the model parameters (“fitting” the
model in the Bayesian sense) as follows:

Note that since the MCMC algorithm uses random number generation, we set the seed of the random number generator to ensure our results are reproducible.

The structure of the R object returned by `brea_mcm()`

is
below:

```
## List of 4
## $ b_0_s: num [1, 1:1000] -2.77 -3.1 -2.98 -2.95 -2.48 ...
## $ b_m_s:List of 2
## ..$ : num [1, 1:3, 1:1000] -0.2452 -0.059 0.3043 0.0227 0.2089 ...
## ..$ : num [1, 1:2, 1:1000] -0.795 0.795 -0.735 0.735 -0.879 ...
## $ s_m_s:List of 2
## ..$ : NULL
## ..$ : NULL
## $ b_m_a:List of 2
## ..$ : int [1:3] 450 446 483
## ..$ : int [1:2] 457 470
```

This object is a list with two components of interest to us (as well
as a few that do not concern us now): `b_0_s`

contains the
posterior sampled values of the intercept parameter \(\beta_0\), and `b_m_s`

contains
posterior sampled values of the parameters \(\beta_m[k]\), where \(\beta_m[k]\) represents the effect of the
\(k^\text{th}\) covariate value
(category) of the \(m^\text{th}\)
predictor on the hazard of event occurrence. Specifically, the \(s^\text{th}\) sampled value (post burn-in)
of \(\beta_m[k]\) is stored in
`b_m_s[[m]][1,k,s]`

(the 1 index in `[1,k,s]`

denotes that we are examining the *first* competing risk, which
must be specified even in cases like this where there is only one
competing risk). In particular, for our model in question,
`m=2`

represents the treatment variable (since treatment is
the *second* column of the `x_brea`

matrix we
supplied), `k=1`

represents the 6-MP treatment, and
`k=2`

represents the placebo control group. Hence, MCMC
samples of the treatment effect (on the linear predictor scale) may be
calculated as follows:

```
b_treatment <- fit$b_m_s[[2]][1,1,]
b_control <- fit$b_m_s[[2]][1,2,]
d <- b_control - b_treatment # sampled values of treatment effect on logit scale
```

We may then obtain point estimates and a standard error estimate of
the treatment effect on the linear predictor (logit) scale, and compare
it to the results from fitting a frequentist version of this model using
`glm()`

earlier:

`## [1] 1.873343`

`## [1] 1.841043`

`## [1] 0.439673`

```
##
## Call:
## glm(formula = y ~ t_cat + treat, family = binomial, data = expanded_data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.9850 0.4311 -9.244 < 2e-16 ***
## t_cat(10,20] 0.3326 0.4526 0.735 0.462
## t_cat(20,35] 0.9364 0.6306 1.485 0.138
## treat 1.8372 0.4459 4.120 3.79e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 231.84 on 540 degrees of freedom
## Residual deviance: 212.16 on 537 degrees of freedom
## AIC: 220.16
##
## Number of Fisher Scoring iterations: 6
```

As we can see, “fitting” the Bayesian version of this discrete Cox
model gives almost exactly the same results as fitting the frequentist
version using `glm()`

.

As described earlier, with our Bayesian approach, we obtain
inferences by taking a sample from the posterior distribution of the
relevant model parameters using an MCMC algorithm. How accurately this
posterior sample describes the posterior distribution depends on two
things: how much *serial correlation* (*autocorrelation*)
is present in the MCMC sample, and the size of the MCMC sample \(S\) (which is the length of the MCMC
chain). We may visualize the sequence of sampled values using a
*trace plot*, which plots the MCMC iteration number along the
horizontal axis and the sampled parameter value along the vertical
axis:

```
par(cex=0.66,mgp=c(1.75,0.5,0),mai=c(0.4,0.4,0.1,0.1))
plot(d,type="l",xlab="MCMC Interation Number s",
ylab="Sampled Value of Treatment Effect Parameter")
```

From the plot, the serial correlation does not seem too severe. As
mentioned earlier though, it’s useful to calculate an *effective
sample size* estimate, which is the size of an independent sample
that contains the same information as contained in our MCMC sample.

```
## var1
## 207.7105
```

The `brea_mcmc()`

function uses a default posterior sample
size of \(S=1,000\), and this sample
produced an effective sample size of 207.7105468, meaning our efficiency
was around 20%. As mentioned earlier, we typically should seek to have
effective sample sizes over 1,000, and since the MCMC run is already
very fast, we can increase the chain length to \(S=10,000\) (and also increase the initial
portion of the chain discarded as *burn-in* to \(B=1,000\)):

```
set.seed(1234)
fit_10k <- brea_mcmc(x_brea,y,S=10000,B=1000)
d <- fit_10k$b_m_s[[2]][1,2,] - fit_10k$b_m_s[[2]][1,1,]
effectiveSize(d)
```

```
## var1
## 1695.931
```

`## [1] 1.847461`

`## [1] 1.834604`

`## [1] 0.4576245`

`## [1] 6.262654`

Our effective sample size estimate is now comfortably over 1,000, though our estimates are not substantially changed (and are still quite close to the corresponding frequentist estimates, as we would expect).

We hypothesized above that our step function approximation of the
time effect which only used three steps was too crude to adequately
model the time effect. However, with the frequentist approach, we were
restrained from including a more fine-grained step function by the
limited amount of data available. With our Bayesian approach however, we
can circumvent this restriction by using a prior distribution that
mathematically “ties together” adjacent steps, which allows us to
estimate step heights even if there is little to no data directly
influencing each step height. One such prior distribution is called a
*Gaussian Markov random field (GMRF)*, and the `brea`

package includes this as a possible prior type.

First, we will re-group the discrete time variable \(t\) into intervals of length 3 instead of intervals of length 10:

Next, we must explicitly set the `priors`

argument of the
`brea_mcmc()`

function; `priors`

is a list with
one element for each covariate (column of `x`

). Each element,
in turn, is a list where the first element specifies the covariate type
(`gmrf`

for quantitative covariates via GMRF,
`cat`

for categorical fixed effects, or `re`

for
random effects). The `gmrf`

option requires two additional
numerical parameters; specification of these is technical, and we
recommend values of 3 and 0.01 as a “default” noninformative prior
specification that should work well in most cases. Similarly, the
`cat`

option requires one additional numerical parameter, and
a value of 4 results in a fairly noninformative prior distribution. We
may now rerun the MCMC as follows:

```
set.seed(1234)
priors <- list(list("gmrf",3,0.01),list("cat",4))
fit_gmrf <- brea_mcmc(x_brea,y,priors,S=10000,B=1000)
d_gmrf <- fit_gmrf$b_m_s[[2]][1,2,] - fit_gmrf$b_m_s[[2]][1,1,]
effectiveSize(d_gmrf)
```

```
## var1
## 2058.032
```

The effective sample for this MCMC run is once again well over 1,000, so we may now proceed to giving practical inferences.

The posterior mean, median, and standard deviation (standard error) of the treatment effect parameter on the linear predictor scale are as follows:

`## [1] 1.674498`

`## [1] 1.669348`

`## [1] 0.4155349`

Exponentiating, we find the following hazard ratio estimate:

`## [1] 5.308705`

We estimate the hazard of cancer remission cessation is 5.31 times lower for patients receiving 6-MP compared to patients receiving placebo. We may also calculate a 95% CI for the hazard rate ratio by calculating the \(2.5^\text{th}\) and \(97.5^\text{th}\) percentile of the posterior sample of exponentiated treatment effect parameters:

```
## 2.5% 97.5%
## 2.412046 12.557243
```

Hence, based on these results, we are 95% confident that 6-MP treatment reduces the hazard of remission ending by a factor of between 2.41 and 12.56.

We have obtained inferential results from a total of nine different
model fits to the leukemia remission data from (Cox 1972). First, we
fit a classic frequentist continuous time Cox proportional hazards model
with three different options for the handling of tied observations (the
Efron, Breslow, and exact methods). Second, we fit a frequentist
discrete time Cox model with four different ways of modeling the effect
of discrete time \(t\): linear,
quadratic, and cubic polynomials, and a step function with three steps.
Finally, we obtained inferences from a Bayesian discrete time Cox model
using the `brea`

package with two different time effect
formulations: a step function with 3 steps and a step function with 12
steps (the latter case using a GMRF prior distribution for the step
heights).

For all nine of these model fits, we obtained point estimates and standard errors of the treatment effect parameter on the linear predictor scale. We also obtained corresponding hazard rate ratio estimates comparing the rate of remission cessation in the placebo control group to the rate in the 6-MP treatment group. These results are summarized in the table below:

Model | Point Estimate | Standard Error | Hazard Ratio Estimate |
---|---|---|---|

Continuos CPH - Efron Ties | 1.57 | 0.41 | 4.82 |

Continous CPH - Breslow Ties | 1.51 | 0.41 | 4.52 |

Continous CPH - Exact Ties | 1.63 | 0.43 | 5.09 |

Discrete CPH - Linear Time | 1.77 | 0.44 | 5.89 |

Discrete CPH - Quadratic Time | 1.71 | 0.43 | 5.55 |

Discrete CPH - Cubic Time | 1.71 | 0.43 | 5.53 |

Discrete CPH - 3-Step Time | 1.84 | 0.45 | 6.28 |

`brea` - 3-Step Time |
1.83 | 0.46 | 6.26 |

`brea` - 12-Step Time |
1.67 | 0.42 | 5.31 |

While the results of all nine model fits were qualitatively similar, there were clear patterns regarding which results were most similar to others. For example, the frequentist and Bayesian discrete time Cox models the 3-step formulation for the time effect yielded almost identical results, which makes sense considering these are essentially the same model but fit in drastically different ways.

In addition, the continuous time Cox model fit with the
*exact* method of handling ties performed quite similarly to the
`brea`

model with a flexible (12-step) formulation for the
baseline hazard function. Again, this makes sense, as the models are
quite similar, with the main difference being that we expect the log
*odds ratio* estimate for the `brea`

model (1.67) to
be slightly further from the null value of 0 than a corresponding log
*hazard ratio* from the continuous CPH model (1.63), which is
what was observed.

Allison, Paul D. 1982. “Discrete-Time Methods for the Analysis of
Event Histories.” *Sociological Methodology* 13: 61–98.

Cox, D. R. 1972. “Regression Models and Life-Tables.”
*Journal of the Royal Statistical Society, Series B* 34 (2):
187–220.

Gelman, A., J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D.
B. Rubin. 2013. *Bayesian Data Analysis, Third Edition*. Chapman
& Hall/CRC Texts in Statistical Science. Taylor & Francis.

King, Adam J, and Robert E Weiss. 2021. “A General Semiparametric
Bayesian Discrete-Time Recurrent Events Model.”
*Biostatistics* 22 (2): 266–82. https://doi.org/10.1093/biostatistics/kxz029.