`ddml`

is an implementation of double/debiased machine
learning estimators as proposed by Chernozhukov et al. (2018). The key
feature of `ddml`

is the straightforward estimation of
nuisance parameters using (short-)stacking (Wolpert, 1992), which allows
for multiple machine learners to increase robustness to the underlying
data generating process.

`ddml`

is the sister R package to our Stata package, mirroring
its key features while also leveraging R to simplify estimation with
user-provided machine learners and/or sparse matrices. See also Ahrens et al. (2023) with
additional discussion of the supported causal models and benefits of
(short)-stacking.

Install the latest development version from GitHub (requires devtools package):

```
if (!require("devtools")) {
install.packages("devtools")
}::install_github("thomaswiemann/ddml", dependencies = TRUE) devtools
```

Install the latest public release from CRAN:

`install.packages("ddml")`

To illustrate `ddml`

on a simple example, consider the
included random subsample of 5,000 observations from the data of Angrist
& Evans (1998). The data contains information on the labor supply of
mothers, their children, as well as demographic data. See
`?AE98`

for details.

```
# Load ddml and set seed
library(ddml)
set.seed(75523)
# Construct variables from the included Angrist & Evans (1998) data
= AE98[, "worked"]
y = AE98[, "morekids"]
D = AE98[, "samesex"]
Z = AE98[, c("age","agefst","black","hisp","othrace","educ")] X
```

`ddml_late`

estimates the local average treatment effect
(LATE) using double/debiased machine learning (see
`?ddml_late`

). Since the statistical properties of machine
learners depend heavily on the underlying (unknown!) structure of the
data, adaptive combination of multiple machine learners can increase
robustness. In the below snippet, `ddml_late`

estimates the
LATE with short-stacking based on three base learners:

- linear regression (see
`?ols`

) - lasso (see
`?mdl_glmnet`

) - gradient boosting (see
`?mdl_xgboost`

)

```
# Estimate the local average treatment effect using short-stacking with base
# learners ols, rlasso, and xgboost.
<- ddml_late(y, D, Z, X,
late_fit_short learners = list(list(fun = ols),
list(fun = mdl_glmnet),
list(fun = mdl_xgboost,
args = list(nrounds = 100,
max_depth = 1))),
ensemble_type = 'nnls1',
shortstack = TRUE,
sample_folds = 10,
silent = TRUE)
summary(late_fit_short)
#> LATE estimation results:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> nnls1 -0.2105019 0.195529 -1.076576 0.2816698
```

`ddml`

Check out our articles to learn more:

`vignette("ddml")`

is a more detailed introduction to`ddml`

`vignette("stacking")`

discusses computational benefits of short-stacking`vignette("new_ml_wrapper")`

shows how to write user-provided base learners`vignette("sparse")`

illustrates support of sparse matrices (see`?Matrix`

)

For additional applied examples, see our case studies:

`vignette("example_401k")`

revisits the effect of 401k participation on retirement savings`vignette("example_BLP95")`

considers flexible demand estimation with endogenous prices

`ddml`

is built to easily (and quickly) estimate common
causal parameters with multiple machine learners. With its support for
short-stacking, sparse matrices, and easy-to-learn syntax, we hope
`ddml`

is a useful complement to `DoubleML`

,
the expansive R and Python package. `DoubleML`

supports many advanced features such as multiway
clustering and stacking.

Ahrens A, Hansen C B, Schaffer M E, Wiemann T (2023). “ddml: Double/debiased machine learning in Stata.” https://arxiv.org/abs/2301.09397

Angrist J, Evans W, (1998). “Children and Their Parents’ Labor Supply: Evidence from Exogenous Variation in Family Size.” American Economic Review, 88(3), 450-477.

Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C B, Newey W, Robins J (2018). “Double/debiased machine learning for treatment and structural parameters.” The Econometrics Journal, 21(1), C1-C68.

Wolpert D H (1992). “Stacked generalization.” Neural Networks, 5(2), 241-259.