Pre-processing of clinical data for clinical data review report

Laure Cougnaud, Michela Pasetto

July 14, 2021

This vignette shows functionalities used for annotating and filtering the data within the clinDataReview package.

Utility functions to automate standard pre-processing steps of the data are available in the package.

Note that these functions are mainly useful in combination with the specification of the parameters in ‘config’ file in the clinical data reports (see the dedicated reporting vignette).

For this vignette, we will use example data available in the clinUtils package.

library(clinDataReview)
library(pander)

1 Data format

The input dataset for the clinical data review should be a data.frame with clinical data. Such data is typically imported from SAS data file or xpt data file.
Such dataset can be imported for multiple files at once via the clinUtils::loadDataADaMSDTM function.

The label of the variables stored in the SAS datasets is also used for title/captions.

A few ADaM datasets are included in the clinUtils package for the demonstration, via the dataset dataADaMCDISCP01 and corresponding variable labels.

library(clinUtils)

data(dataADaMCDISCP01)
labelVars <- attr(dataADaMCDISCP01, "labelVars")

dataLB <- dataADaMCDISCP01$ADLBC
dataDM <- dataADaMCDISCP01$ADSL
dataAE <- dataADaMCDISCP01$ADAE

2 Annotate data

The annotateData enables to add metadata for a specific domain/dataset.

dataLBAnnot <- annotateData(
    data = dataLB, 
    annotations = list(data = dataDM, vars = c("ETHNIC", "ARM")), 
    verbose = TRUE
)
## Data annotated with variable(s): ETHNIC ('ETHNIC'), ARM ('ARM') from the 'custom' dataset based on the variable(s):  USUBJID ('USUBJID').
pander(
    head(dataLBAnnot), 
    caption = paste("Laboratory parameters annotated with",
        "demographics information with the `annotatedData` function"
    )
)
Laboratory parameters annotated with demographics information with the annotatedData function (continued below)
STUDYID SUBJID USUBJID TRTP TRTPN
CDISCPILOT01 1148 01-701-1148 Xanomeline High Dose 81
CDISCPILOT01 1148 01-701-1148 Xanomeline High Dose 81
CDISCPILOT01 1148 01-701-1148 Xanomeline High Dose 81
CDISCPILOT01 1148 01-701-1148 Xanomeline High Dose 81
CDISCPILOT01 1148 01-701-1148 Xanomeline High Dose 81
CDISCPILOT01 1148 01-701-1148 Xanomeline High Dose 81
Table continues below
TRTA TRTAN TRTSDT TRTEDT AGE AGEGR1
Xanomeline High Dose 81 2013-08-23 2014-02-20 57 <65
Xanomeline High Dose 81 2013-08-23 2014-02-20 57 <65
Xanomeline High Dose 81 2013-08-23 2014-02-20 57 <65
Xanomeline High Dose 81 2013-08-23 2014-02-20 57 <65
Xanomeline High Dose 81 2013-08-23 2014-02-20 57 <65
Xanomeline High Dose 81 2013-08-23 2014-02-20 57 <65
Table continues below
AGEGR1N RACE RACEN SEX COMP24FL DSRAEFL SAFFL AVISIT
1 WHITE 1 M Y Y Baseline
1 WHITE 1 M Y Y Baseline
1 WHITE 1 M Y Y Baseline
1 WHITE 1 M Y Y Baseline
1 WHITE 1 M Y Y Baseline
1 WHITE 1 M Y Y Baseline
Table continues below
AVISITN ADY ADT VISIT VISITNUM
0 -9 2013-08-14 SCREENING 1 1
0 -9 2013-08-14 SCREENING 1 1
0 -9 2013-08-14 SCREENING 1 1
0 -9 2013-08-14 SCREENING 1 1
0 -9 2013-08-14 SCREENING 1 1
0 -9 2013-08-14 SCREENING 1 1
Table continues below
PARAM PARAMCD PARAMN PARCAT1 AVAL BASE CHG
Sodium (mmol/L) SODIUM 18 CHEM 139 139 NA
Potassium (mmol/L) K 19 CHEM 4 4 NA
Chloride (mmol/L) CL 20 CHEM 109 109 NA
Bilirubin (umol/L) BILI 21 CHEM 8.55 8.55 NA
Alkaline Phosphatase (U/L) ALP 22 CHEM 88 88 NA
Gamma Glutamyl Transferase (U/L) GGT 23 CHEM 43 43 NA
Table continues below
A1LO A1HI R2A1LO R2A1HI BR2A1LO BR2A1HI ANL01FL ALBTRVAL
132 147 1.053 0.9456 1.053 0.9456 81.5
3.4 5.4 1.176 0.7407 1.176 0.7407 4.1
94 112 1.16 0.9732 1.16 0.9732 62
3 21 2.85 0.4071 2.85 0.4071 22.95
31 110 2.839 0.8 2.839 0.8 77
10 61 4.3 0.7049 4.3 0.7049 48.5
Table continues below
ANRIND BNRIND ABLFL AENTMTFL LBSEQ LBNRIND LBSTRESN DATASET
N N Y 26 NORMAL 139 ADLBC
N N Y 19 NORMAL 4 ADLBC
N N Y 11 NORMAL 109 ADLBC
N N Y 6 NORMAL 8.55 ADLBC
N N Y 2 NORMAL 88 ADLBC
N N Y 15 NORMAL 43 ADLBC
ETHNIC ARM
NOT HISPANIC OR LATINO Xanomeline High Dose
NOT HISPANIC OR LATINO Xanomeline High Dose
NOT HISPANIC OR LATINO Xanomeline High Dose
NOT HISPANIC OR LATINO Xanomeline High Dose
NOT HISPANIC OR LATINO Xanomeline High Dose
NOT HISPANIC OR LATINO Xanomeline High Dose

3 Filter data

The filterData enables to filter a dataset.

dataLBAnnotTreatment <- filterData(
    data = dataLBAnnot, 
    filters = list(var = "ARM", value = "Placebo", rev = TRUE), 
    verbose = TRUE
)
## 354 records with ARM ('ARM') %in% 'Placebo' are filtered in data.
pander(
    unique(dataLBAnnotTreatment[, c("USUBJID", "ARM")]), 
    caption = paste("Subset of laboratory parameters filtered",
        "with placebo patients"
    )
)
Subset of laboratory parameters filtered with placebo patients
  USUBJID ARM
1 01-701-1148 Xanomeline High Dose
397 01-701-1192 Xanomeline Low Dose
793 01-701-1211 Xanomeline Low Dose
1363 01-718-1371 Xanomeline High Dose
1615 01-718-1427 Xanomeline High Dose

4 Transform data

The transformData enables to convert data to a different format.

For example, the laboratory data is converted from a long format, containing one record per endpoint * visit * subject to a wide format containing one record per visit * subject. The endpoints are included in different columns.

eDishData <- transformData(
    data = subset(dataLB, PARAMCD %in% c("ALT", "BILI")),
    transformations = list(
        type = "pivot_wider",
        varsID = c("USUBJID", "VISIT"), 
        varsValue = c("LBSTRESN", "LBNRIND"),
        varPivot = "PARAMCD"
    ),
    verbose = TRUE,
    labelVars = labelVars
)
## Warning in reshapeWide(data, idvar = idvar, timevar = timevar, varying = varying, : some constant variables
## (AVISIT,AVISITN,PARAM,PARAMN,AVAL,BASE,CHG,A1LO,A1HI,R2A1LO,R2A1HI,BR2A1LO,BR2A1HI,ANL01FL,ALBTRVAL,LBSEQ) are really varying
## Warning in reshapeWide(data, idvar = idvar, timevar = timevar, varying = varying, : multiple rows match for PARAMCD=BILI: first taken
## Warning in reshapeWide(data, idvar = idvar, timevar = timevar, varying = varying, : multiple rows match for PARAMCD=ALT: first taken
## Data is converted to a wide format with variables: 'Numeric Result/Finding in Standard Units', 'Reference Range Indicator' for different: 'Parameter Code' by 'Unique Subject Identifier', 'Visit Name' pivoted to different columns.
pander(head(eDishData))
Table continues below
  STUDYID SUBJID USUBJID TRTP TRTPN
4 CDISCPILOT01 1148 01-701-1148 Xanomeline High Dose 81
40 CDISCPILOT01 1148 01-701-1148 Xanomeline High Dose 81
76 CDISCPILOT01 1148 01-701-1148 Xanomeline High Dose 81
112 CDISCPILOT01 1148 01-701-1148 Xanomeline High Dose 81
148 CDISCPILOT01 1148 01-701-1148 Xanomeline High Dose 81
184 CDISCPILOT01 1148 01-701-1148 Xanomeline High Dose 81
Table continues below
  TRTA TRTAN TRTSDT TRTEDT AGE
4 Xanomeline High Dose 81 2013-08-23 2014-02-20 57
40 Xanomeline High Dose 81 2013-08-23 2014-02-20 57
76 Xanomeline High Dose 81 2013-08-23 2014-02-20 57
112 Xanomeline High Dose 81 2013-08-23 2014-02-20 57
148 Xanomeline High Dose 81 2013-08-23 2014-02-20 57
184 Xanomeline High Dose 81 2013-08-23 2014-02-20 57
Table continues below
  AGEGR1 AGEGR1N RACE RACEN SEX COMP24FL DSRAEFL SAFFL
4 <65 1 WHITE 1 M Y Y
40 <65 1 WHITE 1 M Y Y
76 <65 1 WHITE 1 M Y Y
112 <65 1 WHITE 1 M Y Y
148 <65 1 WHITE 1 M Y Y
184 <65 1 WHITE 1 M Y Y
Table continues below
  AVISIT AVISITN ADY ADT VISIT VISITNUM
4 Baseline 0 -9 2013-08-14 SCREENING 1 1
40 Week 2 2 14 2013-09-05 WEEK 2 4
76 Week 4 4 28 2013-09-19 WEEK 4 5
112 Week 6 6 42 2013-10-03 WEEK 6 7
148 Week 8 8 57 2013-10-18 WEEK 8 8
184 Week 12 12 87 2013-11-17 WEEK 12 9
Table continues below
  PARAM PARAMN PARCAT1 AVAL BASE CHG A1LO
4 Bilirubin (umol/L) 21 CHEM 8.55 8.55 NA 3
40 Bilirubin (umol/L) 21 CHEM 8.55 8.55 0 3
76 Bilirubin (umol/L) 21 CHEM 8.55 8.55 0 3
112 Bilirubin (umol/L) 21 CHEM 8.55 8.55 0 3
148 Bilirubin (umol/L) 21 CHEM 8.55 8.55 0 3
184 Bilirubin (umol/L) 21 CHEM 6.84 8.55 -1.71 3
Table continues below
  A1HI R2A1LO R2A1HI BR2A1LO BR2A1HI ANL01FL ALBTRVAL
4 21 2.85 0.4071 2.85 0.4071 22.95
40 21 2.85 0.4071 2.85 0.4071 22.95
76 21 2.85 0.4071 2.85 0.4071 22.95
112 21 2.85 0.4071 2.85 0.4071 22.95
148 21 2.85 0.4071 2.85 0.4071 22.95
184 21 2.28 0.3257 2.85 0.4071 Y 24.66
Table continues below
  ANRIND BNRIND ABLFL AENTMTFL LBSEQ DATASET LBSTRESN.BILI
4 N N Y 6 ADLBC 8.55
40 N N 43 ADLBC 8.55
76 N N 78 ADLBC 8.55
112 N N 108 ADLBC 8.55
148 N N 138 ADLBC 8.55
184 N N 168 ADLBC 6.84
  LBNRIND.BILI LBSTRESN.ALT LBNRIND.ALT
4 NORMAL 34 NORMAL
40 NORMAL 41 NORMAL
76 NORMAL 35 NORMAL
112 NORMAL 31 NORMAL
148 NORMAL 31 NORMAL
184 NORMAL 39 NORMAL

5 Process data

The processData function executes all the pre-processing steps described in the previous section at once.

dataLBAnnotTreatment2 <- processData(
    data = dataLB,
    processing = list(
        list(annotate = list(data = dataDM, vars = c("ETHNIC", "ARM"))),
        list(filter = list(var = "ARM", value = "Placebo", rev = TRUE))
    ),
    verbose = TRUE
)

identical(dataLBAnnotTreatment, dataLBAnnotTreatment2)

[1] TRUE

6 Appendix

6.1 Session info

R version 4.1.0 (2021-05-18)

Platform: x86_64-pc-linux-gnu (64-bit)

locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_GB.UTF-8, LC_COLLATE=en_US.UTF-8, LC_MONETARY=en_GB.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_GB.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_GB.UTF-8 and LC_IDENTIFICATION=C

attached base packages: stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: clinUtils(v.0.0.1), clinDataReview(v.1.1.0), pander(v.0.6.4) and knitr(v.1.33)

loaded via a namespace (and not attached): tidyselect(v.1.1.1), xfun(v.0.24), bslib(v.0.2.5.1), purrr(v.0.3.4), haven(v.2.4.1), V8(v.3.4.2), colorspace(v.2.0-2), vctrs(v.0.3.8), generics(v.0.1.0), htmltools(v.0.5.1.1), viridisLite(v.0.4.0), yaml(v.2.2.1), utf8(v.1.2.1), plotly(v.4.9.4.1), rlang(v.0.4.11), jquerylib(v.0.1.4), pillar(v.1.6.1), glue(v.1.4.2), lifecycle(v.1.0.0), plyr(v.1.8.6), stringr(v.1.4.0), munsell(v.0.5.0), gtable(v.0.3.0), htmlwidgets(v.1.5.3), evaluate(v.0.14), forcats(v.0.5.1), crosstalk(v.1.1.1), curl(v.4.3.2), fansi(v.0.5.0), Rcpp(v.1.0.7), scales(v.1.1.1), DT(v.0.18), jsonvalidate(v.1.1.0), jsonlite(v.1.7.2), ggplot2(v.3.3.5), hms(v.1.1.0), digest(v.0.6.27), stringi(v.1.6.2), bookdown(v.0.22), dplyr(v.1.0.7), grid(v.4.1.0), tools(v.4.1.0), magrittr(v.2.0.1), sass(v.0.4.0), lazyeval(v.0.2.2), tibble(v.3.1.2), crayon(v.1.4.1), tidyr(v.1.1.3), pkgconfig(v.2.0.3), ellipsis(v.0.3.2), data.table(v.1.14.0), rmarkdown(v.2.9), httr(v.1.4.2), R6(v.2.5.0) and compiler(v.4.1.0)