The R package ‘DQAstats’ provides core functionalities to perform data quality assessment (DQA) of electronic health record data (EHR).
Currently implemented features are:
The tool provides one main function, dqa()
, to create a
comprehensive PDF document, which presents all statistics and results of
the data quality assessment.
Currently supported input data formats / databases:
data.table
)RPostgres
)RJDBC
)DQAstats
can be installed directly from CRAN with:
install.packages("DQAstats")
You can install the latest development version of
DQAstats
with:
install.packages("remotes")
::install_github("miracum/dqa-dqastats") remotes
Note: A working LaTeX installation is a prerequisite for using this
software (e.g. using the R package tinytex
)!
:bulb: If you want to run this in a dockerized environment you can
use the rocker/verse
image which has TeX already installed.
The configuration of databases, be it CSV files or SQL-based
databases, is done with environment variables, which can be set using
the base R command Sys.setenv()
.
A detailed description, which environment variables need to be set for the specific databases can be found here.
The following code example is intended to provide a minimal working
example on how to apply the DQA tool to data. Example data and a
corresponding MDR are provided with the R package DQAstats (a
working LaTeX installation is a prerequisite for using this software,
e.g. by using the R package tinytex
; please refer
to the DQAstats
wiki for further installation instructions).
# Load library DQAstats:
library(DQAstats)
# Set environment vars to demo files paths:
Sys.setenv("EXAMPLECSV_SOURCE_PATH" = system.file("demo_data",
package = "DQAstats"))
Sys.setenv("EXAMPLECSV_TARGET_PATH" = system.file("demo_data",
package = "DQAstats"))
# Set path to utilities folder where to find the mdr and template files:
<- system.file("demo_data/utilities",
utils_path package = "DQAstats")
# Execute the DQA and generate a PDF report:
<- DQAstats::dqa(
results source_system_name = "exampleCSV_source",
target_system_name = "exampleCSV_target",
utils_path = utils_path,
mdr_filename = "mdr_example_data.csv",
output_dir = "output/",
parallel = FALSE
)
# The PDF report is stored at "./output/"
You can test the package without needing to install anything except docker. :bulb: For further details, see the Wiki: https://github.com/miracum/dqa-dqastats/wiki/Deployment.
L.A. Kapsner, J.M. Mang, S. Mate, S.A. Seuchter, A. Vengadeswaran, F. Bathelt, N. Deppenwiese, D. Kadioglu, D. Kraska, and H.-U. Prokosch, Linking a Consortium-Wide Data Quality Assessment Tool with the MIRACUM Metadata Repository, Appl Clin Inform. 12 (2021) 826–835. doi:10.1055/s-0041-1733847.
@article{kapsner2021,
title = {Linking a {{Consortium}}-{{Wide Data Quality Assessment Tool}} with the {{MIRACUM Metadata Repository}}},
author = {Kapsner, Lorenz A. and Mang, Jonathan M. and Mate, Sebastian and Seuchter, Susanne A. and Vengadeswaran, Abishaa and Bathelt, Franziska and Deppenwiese, Noemi and Kadioglu, Dennis and Kraska, Detlef and Prokosch, Hans-Ulrich},
year = {2021},
month = aug,
journal = {Applied Clinical Informatics},
volume = {12},
number = {04},
pages = {826--835},
issn = {1869-0327},
doi = {10.1055/s-0041-1733847},
language = {en}
}