The package provides datasets (internal `.rda`

and in
CSV-format in
`/extdata/`

) which support users in a black-box
performance qualification (PQ) of their software installations.^{1} Users can
analyze data imported from
CSV- and
Excel-files.

The methods^{2} given by the European Medicines Agency
(EMA) for reference-scaling according to the Guideline on the
Investigation of Bioequivalence^{3} are implemented. Health Canada’s approach^{4} –
requiring a mixed-effects model – is approximated by intra-subject
contrasts. Direct widening of the limits for *C*_{max}
(as a special case of
ABEL)
recommended by the
GCC^{5} is
supported as well.

Potential influence of outliers on the variability of the reference
can be assessed by box plots of studentized and standardized residuals
as suggested at a joint
EGA/EMA
symposium.^{6}

`method.A()`

A linear model of log-transformed
PK responses and effects

*sequence*, *subject*(*sequence*),
*period*, *treatment*

where all effects are fixed (*i.e.*, evaluated by an
ANOVA). Estimated via function
`lm()`

of package `stats`

.

```
<- lm(log(PK) ~ sequence + subject%in%sequence + period + treatment,
modA data = data)
```

`method.B()`

A linear model of log-transformed
PK responses and effects

*sequence*, *subject*(*sequence*),
*period*, *treatment*

where *subject*(*sequence*) is a random effect and all
others are fixed.

Three options are provided

- Estimated via function
`lmer()`

of package`lmerTest`

.

```
<- lmer(log(PK) ~ sequence + period + treatment + (1|subject),
modB data = data)
```

- Employs Satterthwaite’s approximation
^{7}of the degrees of freedom`method.B(..., option = 1)`

, which is equivalent to SAS’`DDFM=SATTERTHWAITE`

, Phoenix WinNonlin’s`Degrees of Freedom Satterthwaite`

, and Stata’s`dfm=Satterthwaite`

. - Note that this is the only available approximation in SPSS.

- Estimated via function
`lme()`

of package`nlme`

.

```
<- lme(log(PK) ~ sequence + period + treatment, random = ~1|subject,
modB data = data)
```

- Employs degrees of freedom equivalent to SAS’
`DDFM=CONTAIN`

(implicitly preferred by the EMA), Phoenix WinNonlin’s`Degrees of Freedom Residual`

, STATISTICA’s`GLM containment`

, and Stata’s`dfm=anova`

. - To comply with the
EMA’s
Q&A document,
`method.B(..., option = 2)`

is the default (*i.e.*, if the argument`option`

is missing).

- Estimated via function
`lmer()`

of package`lmerTest`

.

```
<- lmer(log(PK) ~ sequence + period + treatment + (1|subject),
modB data = data)
```

- Employs the Kenward-Roger approximation
^{8}of the degrees of freedom`method.B(..., option = 3)`

, which is equivalent to Stata’s`dfm=Kenward Roger (EIM)`

and SAS’`DDFM=KENWARDROGER(FIRSTORDER)`

,*i.e.*, based on the*expected*information matrix. - Note that SAS with
`DDFM=KENWARDROGER`

and JMP calculate Satterthwaite’s [*sic*] degrees of freedom and apply the Kackar-Harville correction^{9}*i.e.*, based on the*observed*information matrix.

`ABE()`

Average Bioequivalence, where the model is identical to Method A. By default the conventional acceptance
range of 80.00 – 125.00% is used. Tighter limits (90.00 – 111.11%) for
narrow therapeutic index drugs
(EMA and others) or wider
limits (75.00 – 133.33% for *C*_{max} according to the
guideline of South Africa^{10}) can be specified by the arguments
`theta1`

(lower limit) and/or `theta2`

(upper
limit).

The hypotheses in the confidence interval inclusion approach are
\[\small{H_{0}:
\frac{\mu_\textrm{T}}{\mu_\textrm{R}}\ni
\left\{L,U\right\}\:vs\:H_{1}:L<\frac{\mu_\textrm{T}}{\mu_\textrm{R}}<U}\]
where the null hypothesis is **in**equivalence. In Average
Bioequivalence (ABE) the limits \(\small{\left\{L,U\right\}}\) are
*fixed*, whereas in
ABEL
they can be *expanded* based on the
within-subject variability of the reference treatment.

Details about the reference datasets and their designs:

```
help("data", package = "replicateBE")
::data ?replicateBE
```

Both the test and the reference treatments are administered at least once.

`TRTR | RTRT`

`TRRT | RTTR`

`TTRR | RRTT`

`TRTR | RTRT | TRRT | RTTR`

`TRRT | RTTR | TTRR | RRTT`

The test treatment is administered at least once to ½ of the subjects and the reference treatment at least once to the respective other ½ of the subjects.

`TRT | RTR`

`TRR | RTT`

The test and reference treatments are administered once to ½ of the
subjects (for the estimation of the
CI), *i.e.*, the first
group of subjects follows a conventional
2×2×2
trial. In the second group the test and reference treatments are
administered at least once to ¼ of the subjects, respectively (for the
estimation of *CV*_{wT} and
*CV*_{wR}).

`TR | RT | TT | RR`

Although supported, Balaam’s design^{11} is *not
recommended* due to its poor power characteristics.

The test treatment is administered once and the reference treatment at least once.

`TRR | RTR | RRT`

`TRR | RTR`

The latter is the so-called extra-reference design^{12} which is *not
recommended* since it is biased in the presence of period
effects.

Columns must have the headers `subject`

,
`period`

, `sequence`

, `treatment`

,
`PK`

, and/or `logPK`

.^{13} Any order of columns
is acceptable. Uppercase and mixed case headers will be internally
converted to lowercase headers.

Variable | Format |
---|---|

`subject` |
Integers or any combination of alphanumeric characters
(`A-Z` , `a-z` , `-` , `_` ,
`#` , `0-9` ) |

`period` |
Integers |

`sequence` |
Numbers or literal sequences not listed in the tested designs are not accepted (e.g.,
).`ABAB` |

`treatment` |
The Test treatment must be coded and
the Reference `T` (both uppercase).`R` |

`PK` |
Real positive numbers of PK responses. |

`logPK` |
Real numbers of already log_{e}-transformed PK responses
(optional and rarely needed). |

Relevant data are used for the estimation of *CV*_{wR}
(and *CV*_{wT} in full replicate designs) and BE,
*i.e.*, the datasets might be different (see the example below). It is good practice to state that in
the Statistical Analysis Plan.

If a subject drops out from the study in a higher period, data of
repeated administrations will still be used for the estimation of
*CV*_{w}, although data of the other treatment might be
missing. Examples for the estimation of *CV*_{wR}
(missing values are denoted by `·`

):

`RTRT`

| `RTR·`

`TRRT`

| `TRR·`

`RRTT`

| `RRT·`

or `RR··`

`RRT`

| `RR··`

If a subject drops out from the study in a higher period, data with
at least one administration of the Test and Reference will be used in
the assessment of BE. Examples (missing values are denoted by
`·`

):

`TRTR`

| `TRT·`

or | `TR··`

`RTRT`

| `RTR·`

or | `RT··`

`TRRT`

| `TRR·`

or | `TR··`

`RTTR`

| `RTT·`

or | `RT··`

`TTRR`

| `TTR.`

`TRT`

| `TR·`

`RTR`

| `RT·`

`TRR`

| `TR·`

`RTR`

| `RT·`

`RTT`

| `RT·`

16 subjects enrolled in the study. In sequence `RTRT`

one
dropout in the 2^{nd} period and one in the 4^{th}
period. In sequence `TRTR`

one dropout in the 3^{rd}
period and one in the 4^{th}.

`1 RTR. 5 RTRT 9 TRTR 13 RTRT`

`2 RTRT 6 TR·· 10 TRTR 14 TRT·`

`3 RTRT 7 RTRT 11 RTRT 15 TRTR`

`4 TRTR 8 R··· 12 TRTR 16 TRTR`

We obtain these datasets:

Dataset | Purpose | included | excluded |
---|---|---|---|

#1 | Estimation of CV_{wR} |
13 who received 2 treatments `R` |
6, 8, 14 |

#2 | Assessment of BE | 15 who received ≥1 treatment `T` and ≥1
treatment `R` |
8 |

#3 | Estimation of CV_{wT} |
13 who received 2 treatments `T` |
1, 6, 8 |

Datasets #1 and #2 are required for
ABEL,
whereas for ABE only dataset
#2 is required.

Formerly all three were required for the
WHO’s reference-scaling
of *AUC* (see below).

The EMA proposed a
linear model of log-transformed PK
responses of the reference treatment

*sequence*, *subject*(*sequence*),
*period*

where all effects are fixed. Estimated via function `lm()`

of
package `stats`

:

```
<- lm(log(PK) ~ sequence + subject%in%sequence + period,
modCV data = data[data$treatment = "R", ])
```

In full replicate designs the same model is run with
`data = data[data$treatment = "T", ]`

for informational
purposes.

Special conditions for the sample size in three period full replicate designs:

The question raised asks if it is possible to use a design where subjects are randomised to receive treatments in the order of TRT or RTR.

The CHMP bioequivalence guideline requires that at least 12 patients are needed to provide data for a bioequivalence study to be considered valid, and to estimate all the key parameters. Therefore, if a 3-period replicate design, where treatments are given in the order TRT or RTR, is to be used to justify widening of a confidence interval for C_{max}then it is considered that at least 12 patients would need to provide data from the RTR arm. This implies a study with at least 24 patients in total would be required if equal number of subjects are allocated to the 2 treatment sequences.— Q&A document

^{14}

If less than twelve subjects are eligible in sequence
`RTR`

of a `TRT | RTR`

design (and in analogy in
sequence `TRR`

of a `TRR | RTT`

design), the user
is notified about the ‘uncertain’ estimate of *CV*_{wR}.
However, in a sufficiently powered study such a case is extremely unlikely.

Let us explore the 95% confidence interval of the *CV*:

```
# Estimate sample sizes of full replicate designs (theta0 0.90,
# target power 0.80) and CI of the CV with package PowerTOST
<- 0.30
CV <- c("2x2x4", "2x2x3") # 4- and 3-period full replicate designs
design <- data.frame(design = rep(design, 2), n = c(rep(NA, 2), rep(12, 2)),
res df = NA, CV = CV, lower = NA, upper = NA)
# 95% confidence interval of the CV
for (i in 1:nrow(res)) {
if (is.na(res$n[i])) {
$n[i] <- PowerTOST::sampleN.scABEL(CV = CV,
resdesign = res$design[i],
details = FALSE,
print = FALSE)[["Sample size"]]
}if (i > 2 ) {
.1 <- res$n[i-2] / 2 # no dropouts in one sequence
n.2 <- 12 # only 12 eligible subjects in the other
n$n[i] <- n.1 + n.2
res
}if (res$design[i] == "2x2x4") {
$df[i] <- 3 * res$n[i] - 4
reselse {
} $df[i] <- 2 * res$n[i] - 3
res
}5:6] <- round(PowerTOST::CVCL(CV = CV,
res[i, df = res$df[i],
side = "2-sided",
alpha = 0.05), 3)
}print(res, row.names = FALSE)
# design n df CV lower upper
# 2x2x4 34 98 0.3 0.262 0.351
# 2x2x3 50 97 0.3 0.262 0.352
# 2x2x4 29 83 0.3 0.259 0.357
# 2x2x3 37 71 0.3 0.256 0.362
# Rows 1-2: Sample sizes for target power
# Rows 3-4: Only 12 eligible subjects in one sequence
```

It is incomprehensible why a four period full replicate design is
considered by the EMA to
give a more ‘reliable’ estimate of the *CV* than a three period
full replicate design.

The EMA’s models
assume equal [*sic*] intra-subject variances of Test and
Reference (like in
2×2×2
trials) – even if proven false in one of the full replicate designs
(were *both* *CV*_{wT} and
*CV*_{wR} can be estimated). Hence, among
biostatisticians they are called ‘crippled models’ because the
replicative nature of the study is ignored.

It should be noted that by default `ANOVA()`

produces a
‘Type I’ table, *i.e.*, all tests are performed against the
residual error. Starting with version 1.1.0 ‘Type III’ is employed in
order to obtain the correct test for carryover, where *subject*
has to be tested against *subject*(*sequence*).^{15}

```
library(replicateBE) # attach the library to run example scripts
method.A(data = rds01, print = FALSE, verbose=TRUE)
#
# Data set DS01: Method A by lm()
# ───────────────────────────────────
# Type III Analysis of Variance Table
#
# Response: log(PK)
# Df Sum Sq Mean Sq F value Pr(>F)
# sequence 1 0.0077 0.007652 0.00268 0.9588496
# period 3 0.6984 0.232784 1.45494 0.2278285
# treatment 1 1.7681 1.768098 11.05095 0.0010405
# sequence:subject 75 214.1296 2.855061 17.84467 < 2.22e-16
# Residuals 217 34.7190 0.159995
#
# treatment T – R:
# Estimate Std. Error t value Pr(>|t|)
# 0.14547400 0.04650870 3.12788000 0.00200215
# 217 Degrees of Freedom
```

In ‘Type I’ the *F* value of *sequence* would be \(\small{\frac{0.007652}{0.159995}=0.04783}\)
with \(\small{p(F_{q=0.04783,\,df_1=1,\,df_2=217})\sim
0.827}\), whereas in ‘Type III’ it is \(\small{\frac{0.007652}{2.855061}=0.00268}\)
with \(\small{p(F_{q=0.00268,\,df_1=1,\,df_2=75})\sim
0.959}\).

The nested structure *subject*(*sequence*) of the methods leads to an over-specified model.^{16} The
simple model

*sequence*, *subject*, *period*,
*treatment*

gives identical estimates of the residual variance and the treatment
effect and hence, its confidence interval.

The same holds true for the EMA’s model to estimate
*CV*_{wR}. The simple model

*subject*, *period*

gives an identical estimate of the residual variance.

Reference-scaling is acceptable for *C*_{max}
(immediate release products: BE
Guideline) and *C*_{max}, *C*_{max,ss},
*C*_{τ,ss}, _{partial}*AUC* (modified
release products^{17}). The intention to expand the limits has
to be stated in the protocol and – contrary to the
FDA’s
RSABE – a
clinical justification provided.

Those HVDP for which a wider difference in C_{max}is considered clinically irrelevant based on a sound clinical justification can be assessed with a widened acceptance range. The request for widened interval must be prospectively specified in the protocol.— CHMP Bioequivalence Guideline

The limits can be expanded based on *CV*_{wR}.

- \(\small{CV_\textrm{wR}\leq
30\%}\)

Lower cap,*i.e.*, no scaling, conventional limits:

\(\small{\left\{{L,U}\right\} = 80.00 - 125.00\%}\) - \(\small{30\%<CV_\textrm{wR}\leq
50\%}\) (EMA), \(\small{30\%<CV_\textrm{wR}\leq
\sim57.4\%}\) (Health Canada)

Expanded limits based on \(s_\textrm{wR}\):

\(\small{\left\{L,\,U\right\} = 100\cdot \exp(\mp 0.760\cdot{s_\textrm{wR}})}\) - \(\small{CV_\textrm{wR}>50\%}\)
(EMA), \(\small{CV_\textrm{wR}>\sim57.4\%}\)

Upper cap,*i.e.*, applying \(\small{s^*_\textrm{wR}=\sqrt{\log_{e}(CV_\textrm{cap}^{2}+1)}}\) in the expansion formula or

\(\small{\left\{L,U\right\} = 69.84 - 143.19\%}\) (EMA)

\(\small{\left\{L,U\right\} = 66.7 - 150.0\%}\) (Health Canada).

The special case recommended by the Gulf Cooperation Council (Bahrain, Kuwait, Oman, Qatar, Saudi Arabia, the United Arab Emirates) is:

- \(\small{CV_\textrm{wR}\leq
30\%}\)

Lower cap,*i.e.*, no scaling, conventional limits:

\(\small{\left\{{L,U}\right\} = 80.00 - 125.00\%}\)

- \(\small{CV_\textrm{wR}>30\%}\)

Widened limits:

\(\small{\left\{{L,U}\right\} = 75.00 - 133.33\%}\)

In reference-scaling a so-called *mixed* (a.k.a.
*aggregate*) criterion is applied. In order to pass
ABEL,

- the 90% confidence interval has to lie entirely within the
acceptance range \(\small{\left\{L,U\right\}}\)
*and* - the point estimate (PE) has to lie within 80.00 – 125.00%.

To avoid discontinuities due to double rounding, expanded limits are calculated in full numeric precision and only the confidence interval is rounded according to the guidelines.

```
# Calculate limits with package PowerTOST
<- c(30, 50, 57.382)
CV <- data.frame(CV = CV,
df reg1 = "EMA", L1 = NA, U1 = NA, cap1 = "",
reg2 = "HC", L2 = NA, U2 = NA, cap2 = "",
reg3 = "GCC", L3 = NA, U3 = NA, cap3 = "",
stringsAsFactors = FALSE)
for (i in seq_along(CV)) {
3:4] <- sprintf("%.3f",
df[i, ::scABEL(CV[i]/100,
PowerTOSTregulator = df$reg1[i])*100)
7:8] <- sprintf("%.1f",
df[i, ::scABEL(CV[i]/100,
PowerTOSTregulator = df$reg2[i])*100)
11:12] <- sprintf("%.3f",
df[i, ::scABEL(CV[i]/100,
PowerTOSTregulator = df$reg3[i])*100)
}$cap3[df$CV <= 30] <- df$cap2[df$CV <= 30] <- df$cap1[df$CV <= 30] <- "lower"
df$cap1[df$CV >= 50 & df$reg1 == "EMA"] <- "upper"
df$cap2[df$CV >= 57.382 & df$reg2 == "HC"] <- "upper"
dfnames(df) <- c("CV(%)", rep(c("reg", "L(%)", "U(%)", "cap"), 3))
print(df, row.names = FALSE)
# CV(%) reg L(%) U(%) cap reg L(%) U(%) cap reg L(%) U(%) cap
# 30.000 EMA 80.000 125.000 lower HC 80.0 125.0 lower GCC 80.000 125.000 lower
# 50.000 EMA 69.837 143.191 upper HC 69.8 143.2 GCC 75.000 133.333
# 57.382 EMA 69.837 143.191 upper HC 66.7 150.0 upper GCC 75.000 133.333
```

The SAS code provided by the
EMA in the
Q&A document does not
specify how the degrees of freedom should be calculated in Method B.
Hence, the default in `PROC MIXED`

, namely
`DDFM=CONTAIN`

is applied, *i.e.*,
`method.B(..., option = 2`

). For incomplete data (missing
periods) Satterthwaite’s approximation of the degrees of freedom,
*i.e.*, `method.B(..., option = 1)`

or Kenward-Roger
`method.B(..., option = 3)`

might be a better choice – if
stated as such in the
SAP.

For background about approximations in different software packages see
the electronic Supplementary
Material of Schütz *et al.*

The EMA seemingly prefers Method A:

A simple linear mixed model, which assumes identical within-subject variability (Method B), may be acceptable as long as results obtained with the two methods do not lead to different regulatory decisions. However, in borderline cases […] additional analysis using Method A might be required.— Q&A document (January 2011 and later revisions)

The half-width of the confidence interval in log-scale allows a
comparison of methods (B *v.s.* A) where a higher value
*might* point towards a more conservative decision.^{18} In the
provided example datasets – with one exception – the conclusion of
BE (based on the mixed criterion)
agrees between Method A and Method B.

However, for the highly incomplete dataset 14 Method A is
*liberal* (passing by
ANOVA but failing by the
random effects model):

```
# Compare Method B acc. to the GL with Method A for all reference data sets.
<- substr(grep("rds", unname(unlist(data(package = "replicateBE"))),
ds value = TRUE), start = 1, stop = 5)
for (i in seq_along(ds)) {
<- method.A(print = FALSE, details = TRUE, data = eval(parse(text = ds[i])))$BE
A <- method.B(print = FALSE, details = TRUE, data = eval(parse(text = ds[i])))$BE
B <- paste0("A ", A, ", B ", B, " - ")
r cat(paste0(ds[i], ":"), r)
if (A == B) {
cat("Methods agree.\n")
else {
} if (A == "fail" & B == "pass") {
cat("Method A is conservative.\n")
else {
} cat("Method B is conservative.\n")
}
}
}# rds01: A pass, B pass - Methods agree.
# rds02: A pass, B pass - Methods agree.
# rds03: A pass, B pass - Methods agree.
# rds04: A fail, B fail - Methods agree.
# rds05: A pass, B pass - Methods agree.
# rds06: A pass, B pass - Methods agree.
# rds07: A pass, B pass - Methods agree.
# rds08: A pass, B pass - Methods agree.
# rds09: A pass, B pass - Methods agree.
# rds10: A pass, B pass - Methods agree.
# rds11: A pass, B pass - Methods agree.
# rds12: A fail, B fail - Methods agree.
# rds13: A fail, B fail - Methods agree.
# rds14: A pass, B fail - Method B is conservative.
# rds15: A fail, B fail - Methods agree.
# rds16: A fail, B fail - Methods agree.
# rds17: A fail, B fail - Methods agree.
# rds18: A fail, B fail - Methods agree.
# rds19: A fail, B fail - Methods agree.
# rds20: A fail, B fail - Methods agree.
# rds21: A fail, B fail - Methods agree.
# rds22: A pass, B pass - Methods agree.
# rds23: A pass, B pass - Methods agree.
# rds24: A pass, B pass - Methods agree.
# rds25: A pass, B pass - Methods agree.
# rds26: A fail, B fail - Methods agree.
# rds27: A pass, B pass - Methods agree.
# rds28: A pass, B pass - Methods agree.
# rds29: A pass, B pass - Methods agree.
# rds30: A fail, B fail - Methods agree.
```

Exploring dataset 14:

```
<- method.A(print = FALSE, details = TRUE, data = rds14)
A <- method.B(print = FALSE, details = TRUE, data = rds14, option = 1)
B1 <- method.B(print = FALSE, details = TRUE, data = rds14) # apply default option
B2 <- method.B(print = FALSE, details = TRUE, data = rds14, option = 3)
B3 # Rounding of CI according to the GL
17:21] <- round(A[17:21], 2) # all effects fixed
A[17:21] <- round(B1[17:21], 2) # Satterthwaite's df
B1[17:21] <- round(B2[17:21], 2) # df acc. to Q&A
B2[17:21] <- round(B3[17:21], 2) # Kenward-Roger df
B3[<- c(2, 10, 17:25)
cs <- rbind(A[cs], B1[cs], B2[cs], B3[cs])
df names(df)[c(1, 3:6, 11)] <- c("Meth.", "L(%)", "U(%)",
"CL.lo(%)", "CL.hi(%)", "hw")
c(2, 11)] <- signif(df[, c(2, 11)], 5)
df[, print(df[order(df$hw, df$BE, decreasing = c(FALSE, TRUE),
method = "radix"), ], row.names = FALSE)
# Meth. DF L(%) U(%) CL.lo(%) CL.hi(%) PE(%) CI GMR BE hw
# A 197.44 69.84 143.19 69.21 121.27 91.62 fail pass fail 0.28043
# A 192.00 69.84 143.19 69.21 121.28 91.62 fail pass fail 0.28046
# A 195.99 69.84 143.19 69.21 121.28 91.62 fail pass fail 0.28052
# A 192.00 69.84 143.19 69.99 123.17 92.85 pass pass pass 0.28261
```

All variants of Method B are more conservative than Method A. Before
rounding the confidence interval, `option = 2`

with 192
degrees of freedom would be more conservative (lower
CL 69.21029) than
`option = 1`

with 197.44 degrees of freedom (lower
CL 69.21286). Given the
incompleteness of this data set (four missings in period 2, twelve in
period 3, and 19 in period 4), Satterthwaite’s or Kenward-Roger degrees
of freedom are probably the better choice.

For detailed comparisons between methods based on simulations see the
electronic Supplementary
Material of Schütz *et al.*

It is an open issue how outliers should be handled.

The applicant should justify that the calculated intra-subject variability is a reliable estimate and that it is not the result of outliers.— CHMP Bioequivalence Guideline

The author ‘suggested’ box plots as a mere joke [*sic*] at the
EGA/EMA
symposium, being aware of their nonparametric nature and the
EMA’s reluctance towards
robust methods. Alas, this *joke* was included in the
Q&A document.

[…] a study could be acceptable if the bioequivalence requirements are met both including the outlier subject (using the scaled average bioequivalence approach and the within-subject CV with this subject) and after exclusion of the outlier (using the within-subject CV without this subject).

An outlier test is not an expectation of the medicines agencies but outliers could be shown by a box plot. This would allow the medicines agencies to compare the data between them.— EGA/EMA Q&A document

With the additional argument `ola = TRUE`

in
`method.A()`

and `method.B()`

an outlier analysis
is performed, where the default `fence = 2`

.^{19}

Results differ slightly depending on software’s algorithms to calculate the median and quartiles. Example with the nine types implemented in R:

```
# Compare different types with some random data
set.seed(123456)
<- rnorm(48)
x <- c(25, 50, 75)/100
p <- matrix(data = "", nrow = 9, ncol = 4,
q dimnames = list(paste("type =", 1:9),
c("1st quart.", "median", "3rd quart.",
"software / default")))
for (i in 1:9) {
1:3] <- sprintf("%.5f", quantile(x, prob = p, type = i))
q[i,
}c(2, 4, 6:8), 4] <- c("SAS, Stata", "SciPy", "Phoenix, Minitab, SPSS",
q["R, S, MATLAB, Octave, Excel", "Maple")
print(as.data.frame(q))
# 1st quart. median 3rd quart. software / default
# type = 1 -0.75397 0.05761 0.96782
# type = 2 -0.74451 0.06358 0.96984 SAS, Stata
# type = 3 -0.75397 0.05761 0.96782
# type = 4 -0.75397 0.05761 0.96782 SciPy
# type = 5 -0.74451 0.06358 0.96984
# type = 6 -0.74924 0.06358 0.97084 Phoenix, Minitab, SPSS
# type = 7 -0.73978 0.06358 0.96883 R, S, MATLAB, Octave, Excel
# type = 8 -0.74609 0.06358 0.97017 Maple
# type = 9 -0.74569 0.06358 0.97009
```

Note the differences even in the medians.

- Box plots of studentized
^{20}and standarized^{21}model residuals are constructed.^{22} - Potential outliers are flagged based on the argument
`fence`

provided by the user. - With the additional argument
`verbose = TRUE`

detailed information is shown in the console.

Example for the reference dataset 01:

```
Outlier analysis
(externally) studentized residuals
Limits (2×IQR whiskers): -1.717435, 1.877877
Outliers:
subject sequence stud.res
45 RTRT -6.656940
52 RTRT 3.453122
standarized (internally studentized) residuals
Limits (2×IQR whiskers): -1.69433, 1.845333
Outliers:
subject sequence stand.res
45 RTRT -5.246293
52 RTRT 3.214663
```

If based on *studentized* residuals outliers are detected,
additionally to the expanded limits based on the complete reference
data, tighter limits are calculated based on *CV*_{wR}
after exclusion of outliers and BE
assessed with the new limits. Note that *standardized* residuals
are given for informational purposes only and *not* used for
exclusion of outliers.

Output for the reference dataset 01 (re-ordered for clarity):

```
CVwR : 46.96% (reference-scaling applicable)
swR : 0.44645
Expanded limits : 71.23% ... 140.40% [100exp(±0.760·swR)]
Assessment based on original CVwR 46.96%
────────────────────────────────────────
Confidence interval: 107.11% ... 124.89% pass
Point estimate : 115.66% pass
Mixed (CI & PE) : pass
╟────────┼─────────────────────┼───────■────────◊─────────■───────────────╢
Outlier fence : 2×IQR of studentized residuals.
Recalculation due to presence of 2 outliers (subj. 45|52)
─────────────────────────────────────────────────────────
CVwR (outl. excl.) : 32.16% (reference-scaling applicable)
swR (recalculated) : 0.31374
Expanded limits : 78.79% ... 126.93% [100exp(±0.760·swR)]
Assessment based on recalculated CVwR 32.16%
────────────────────────────────────────────
Confidence interval: pass
Point estimate : pass
Mixed (CI & PE) : pass
╟┼─────────────────────┼───────■────────◊─────────■─╢
```

Note that the PE and its
CI are not affected since the
entire data are used and therefore, these values not reported in the
second analysis (only the conclusion of the assessment).

The ‘ASCII line plot’ is given for informational purposes since its
resolution is only ~0.5%. The filled squares `■`

are the
lower and upper 90% confidence limits, the rhombus `◊`

the
point estimate, the vertical lines `│`

at 100% and the
PE restriction (80.00 – 125.00%),
and the double vertical lines `║`

the expanded limits. The
PE and
CI take presedence over other
symbols. In this case the upper limit of the
PE restriction is not visible.

Since both analyses arrive at the same conclusion, the study should be
acceptable according to the
Q&A document.

An assessment of outliers is *not* required for Brazil’s
ANVISA.

Health Canada^{23} accepts exclusion of outliers
(irrespective of the treatment) if

- the procedure is stated in the protocol and followed
*before*the results of the analysis are summarised into confidence intervals (*i.e.*, regardless of whether results meet the standard), - no more than 5% of the subjects may be considered to be outliers, unless there are 20 or fewer subjects, in which case only one subject may be removed,
- it is recommended that a simple outlier test such as a studentised residual being greater than 3 be used, and
- the subject in question should be identified as an outlier for all parameters, for either the test or reference product, upon which the bioequivalence decision is to be based.

Note that assessing of outliers of the test treatment is currently not
supported. In order to comply with conditions #1 and #3 use (here for
the internal dataset 01 with a fence narrower than the default):

```
B1 <- method.B(ola = TRUE, verbose = TRUE, print = FALSE,
details = FALSE, fence = 1.5, data = rds01,
option = 1)
```

You will get

```
Outlier analysis
(externally) studentized residuals
Limits (1.5×IQR whiskers): -1.631514, 1.553557
Outliers:
subject sequence stud.res
41 RTRT 1.877877
45 RTRT -6.656940
46 TRTR -1.717435
52 RTRT 3.453122
standarized (internally studentized) residuals
Limits (1.5×IQR whiskers): -1.612749, 1.53832
Outliers:
subject sequence stand.res
41 RTRT 1.845333
45 RTRT -5.246293
46 TRTR -1.694330
52 RTRT 3.214663
```

There are 73 subjects in the dataset with two treatments of the reference. Hence, while observing condition #2, we are allowed to delete 73×5%~3 outliers. Delete subject 45 and 52 with residuals outside ±3 from the dataset. Run the analysis on the reduced dataset as usual.

It must be mentioned that Health Canada’s approach might increase the
Type I Error.^{24}

The EMA’s approach of
reference-scaling for highly variable drugs / drug products is currently
recommended in other jurisdictions as well (*i.e.*, the
WHO; the
ASEAN,
Australia, Belarus, Brazil, Chile, the East African Community, Egypt,
the EEU, New Zealand, and
the Russian Federation). Health Canada accepts
ABEL
only for *AUC* with an upper cap of scaling at ~57.4% (maximum
expansion to 66.7 – 150.0%) and *might* require a true
mixed-effects model. Whether Method B is acceptable is unclear.

If degrees of freedom are approximated (Satterthwaite,
Kenward-Roger), the SAP
and statistical report should always specify which method will be / was
used (see above) in order to allow recalculation
in other software. This package uses the *expected* information
matrix.^{25}

The estimated *CV*_{wR} is always uncertain (the
degree of uncertainty depends on the *CV*_{wR} itself,
the design, and – to a minor degree – the sample size), which might lead
to an inflation of the type I error (*i.e.*, if
ABEL
is falsely applied although the *true* – but unknown –
*CV*_{wR} is lower than its estimate).^{26}^{,} ^{27}

Use the optional argument `method.A(..., adjust = TRUE)`

to
iteratively adjust \(\small{\alpha}\)
to control the type I error.^{28}

If you want to apply the most conservative approach of Molins *et
al.*^{29} (which corrects for
*CV*_{wR} 30% instead of the observed one), get the data
frame of results with

` x <- method.A(..., details = TRUE, print = FALSE)`

.

Adjust \(\small{\alpha}\) in package PowerTOST and
call `method.A()`

again:

` design <- "2x2x4" # your design`

` n <- as.integer(strsplit(x[[6]], "|", fixed = TRUE)[[1]]) # subjects / sequence`

` y <- PowerTOST::scABEL.ad(CV = 0.3, n = n, design = design, print = FALSE)`

` method.A(..., alpha = y$alpha.adj)`

In a pilot phase the
WHO accepted
reference-scaling for *AUC* (four period full replicate studies
were mandatory in order to assess the variability associated with each
product).^{30} It was not evident how this assessment
should have been done.

In PBE and
IBE the
*s*_{wT}/*s*_{wR} ratio was assessed and
‘similar’ variability concluded for a ratio within 0.667 – 1.500.
However, the power of comparing variabilities in a study designed to
compare means is low. This was one of the reasons why
PBE and
IBE were not implemented
in regulatory practice. An alternative approach is given by the
FDA where
variabilities are considered ‘comparable’ if the upper confidence limit
of *σ*_{wT}/*σ*_{wR} is ≤2.5.^{31}

However, in 2021 the requirement of comparing variabilities was lifted
by the WHO.^{32}

Results of all reference datasets agree with ones obtained in SAS (9.4), Phoenix WinNonlin (6.4–8.3.4), STATISTICA (13), SPSS (22.0), Stata (15.0), and JMP (10.0.2).

- Helmut Schütz (Author)
- Michael Tomashevskiy (Contributor)
- Detlew Labes (Contributor)

GPL-3 Helmut Schütz 2022-05-02

Program offered for Use without any Guarantees and Absolutely No Warranty. No Liability is accepted for any Loss and Risk to Public Health Resulting from Use of this R-Code.

Schütz H, Tomashevskiy M, Labes D, Shitova A, González-de la Parra M, Fuglsang A.

*Reference Datasets for Studies in a Replicate Design Intended for Average Bioequivalence with Expanding Limits.*AAPS J. 2020; 22(2): Article 44. doi:10.1208/s12248-020-0427-6.↩︎European Medicines Agency. EMA/582648/2016

*Annex I.*London. 21 September 2016.↩︎European Medicines Agency. Committee for Medicinal Products for Human Use.

*Guideline on the Investigation of Bioequivalence.*CPMP/EWP/QWP/1401/98 Rev. 1/Corr **. London. 20 January 2010.↩︎Health Canada. Guidance Document. Conduct and Analysis of Comparative Bioavailability Studies. Ottawa. 2018/06/08.↩︎

Executive Board of the Health Ministers’ Council for GCC States.

*The GCC Guidelines for Bioequivalence.*Version 3.0. May 2021.↩︎European Generic Medicines Association.

*Revised EMA Bioequivalence Guideline. Questions & Answers.*3^{rd}EGA Symposium on Bioequivalence. London. 1 June 2010.↩︎Satterthwaite FE.

*An Approximate Distribution of Estimates of Variance Components.*Biometr Bull. 1946; 2(6): 110–4. doi:10.2307/3002019.↩︎Kenward MG, Roger JH.

*Small Sample Inference for Fixed Effects from Restricted Maximum Likelihood.*Biometrics. 1997; 53(3): 983–97. doi:10.2307/2533558.↩︎Kackar RN, Harville DA.

*Approximations for Standard Errors of Estimators of Fixed and Random Effects in Mixed Linear Models.*J Am Stat Assoc. 1984; 79(388): 853–62. doi:10.1080/01621459.1984.10477102.↩︎Medicines Control Council. Registration of Medicines.

*Biostudies.*Pretoria. June 2015.↩︎Balaam LN.

*A Two-Period Design with t*Biometrics. 1968; 24(1): 61–73. doi:10.2307/2528460.↩︎^{2}Experimental Units.Chow, SC, Shao J, Wang H.

*Individual bioequivalence testing under 2×3 designs.*Stat Med. 2002; 21(5): 629–48. doi:10.1002/sim.1056.↩︎Napierian logarithm (base

*e*). The decadic logarithm (base 10) is not supported).↩︎European Medicines Agency.

*Questions & Answers: positions on specific questions addressed to the Pharmacokinetics Working Party (PKWP).*EMA/618604/2008. London. June 2015 (and later revisons).↩︎Chow S-C, Liu J-p.

*Design and Analysis of Bioavailability and Bioequivalence Studies.*Boca Raton: CRC Press; 3^{rd}ed. 2009. Chapter 3.2.↩︎It contradicts the law of parsimony. Such a nesting is superfluous since in comparative bioavailability trials subjects are uniquely coded. If, say, subject

`1`

is allocated to sequence`TRTR`

there is not yet ‘another’ subject`1`

allocated to sequence`RTRT`

. This explains the many lines in SAS`PROC GML`

given with`.`

and in Phoenix WinNonlin as`not estimable`

.↩︎European Medicines Agency, Committee for Medicinal Products for Human Use.

*Guideline on the pharmacokinetic and clinical evaluation of modified release dosage forms.*EMA/CHMP/EWP/280/96. London. 20 November 2014.↩︎Of course, only if point estimates are identical.↩︎

The fences are given by the lowest value still within

*m*×IQR of the lower quartile, and the highest value still within*m*×IQR of the upper quartile, where IQR is the interquartile range (difference between the 3^{rd}and 1^{st}quartiles). Values outside fences are considered outliers. Decreasing the multiplier*m*to*e.g.*, 1.5 might result in many outliers, whereas increasing the multiplier in only a few.

Different methods exist to calculate quartiles (nine ‘types’ are available in R, where the default is`type = 7`

). R’s default is used by S, MATLAB, Octave, and Excel. Phoenix WinNonlin, Minitab, and SPSS use`type = 6`

, ScyPy uses`type = 4`

, whereas the default in SAS and Stata is`type = 2`

(though others are available as well).↩︎*Externally*studentized: \(\small{\widehat{\sigma}_{(i)}^2={1 \over n-m-1}\sum_{\begin{smallmatrix}j = 1\\j \ne i\end{smallmatrix}}^n \widehat{\varepsilon\,}_j^{\,2}}\)↩︎*Internally*studentized: \(\small{\widehat{\sigma}^2={1 \over n-m}\sum_{j=1}^n \widehat{\varepsilon\,}_j^{\,2}}\)↩︎Both are available in SAS and R, whereas only the latter in

*e.g.*, Phoenix WinNonlin. In general the former are slightly more restrictive. Which one will be used has to be stated in the SAP.↩︎Health Canada.

*Guidance Document: Conduct and Analysis of Comparative Bioavailability Studies. 2.3.5. Outlier Consideration.*Ottawa. 2018/06/08.↩︎Fuglsang A.

*Increased Patient’s Risk Associated with the Canadian Bioequivalence Guidance Due to Outlier Removal.*Ther Innov Reg Sci. 2022; 56: 168–72. doi:10.1007/s43441-021-00344-2.↩︎For Satterthwaite only available in Phoenix WinNonlin, SPSS, and Stata; as an option in SAS by

`SCORING=1`

. For Kenward-Roger default option`EIM`

in Stata. The*observed*information matrix is the only available in JMP and default`SCORING=0`

in SAS; option`OIM`

in Stata.↩︎Wonnemann M, Frömke C, Koch A.

*Inflation of the Type I Error: Investigations on Regulatory Recommendations for Bioequivalence of Highly Variable Drugs.*Pharm Res. 2015; 32(1): 135–43. doi:10.1007/s11095-014-1450-z.↩︎Muñoz J, Alcaide D, Ocaña J.

*Consumer’s risk in the EMA and FDA regulatory approaches for bioequivalence in highly variable drugs.*Stat Med. 2016; 35(12): 1933–43. doi:10.1002/sim.6834.↩︎Labes D, Schütz H.

*Inflation of Type I Error in the Evaluation of Scaled Average Bioequivalence, and a Method for its Control.*Pharm Res. 2016; 33(11): 2805–14. doi:10.1007/s11095-016-2006-1.↩︎Molins E, Cobo E, Ocaña J.

*Two-Stage Designs Versus European Scaled Average Designs in Bioequivalence Studies for Highly Variable Drugs: Which to Choose?*Stat Med. 2017: 36(30); 4777–88. doi:10.1002/sim.7452.↩︎World Health Organization.

*Application of reference-scaled criteria for AUC in bioequivalence studies conducted for submission to PQTm.*Geneva. 22 November 2018.↩︎U.S. Food and Drug Administration, Center for Drug Evaluation and Research.

*Draft Guidance for Industry. Bioequivalence Studies with Pharmacokinetic Endpoints for Drugs Submitted Under an ANDA.*August 2021.↩︎World Health Organization.

*Application of reference-scaled criteria for AUC in bioequivalence studies conducted for submission to PQT/MED.*Geneva. 02 July 2021.↩︎