Travis build status Coverage status

One issue with different functions available in R that do the same thing is that they can have different interfaces and arguments. For example, to fit a random forest classification model, we might have:

# From randomForest
rf_1 <- randomForest(x, y, mtry = 12, ntree = 2000, importance = TRUE)

# From ranger
rf_2 <- ranger(
  y ~ ., 
  data = dat, 
  mtry = 12, 
  num.trees = 2000, 
  importance = 'impurity'
)

# From sparklyr
rf_3 <- ml_random_forest(
  dat, 
  intercept = FALSE, 
  response = "y", 
  features = names(dat)[names(dat) != "y"], 
  col.sample.rate = 12,
  num.trees = 2000
)

Note that the model syntax is very different and that the argument names (and formats) are also different. This is a pain if you go between implementations.

In this example,

The idea of parsnip is to:

Using the example above, the parsnip approach would be

rand_forest(mtry = 12, trees = 2000) %>%
  set_engine("ranger", importance = 'impurity') %>%
  fit(y ~ ., data = dat)

The engine can be easily changed and the mode can be determined when fit is called. To use Spark, the change is simple:

rand_forest(mtry = 12, trees = 2000) %>%
  set_engine("spark") %>%
  fit(y ~ ., data = dat)

To install it, use:

require(devtools)
install_github("tidymodels/parsnip")