VARMA: Predictability of the commodity prices


It is recommended to read the following vignettes first:


in this vignette, we will predict the commodity prices and discuss the prediction power of the VARMA modeling. Given a model, we design a pseudo-out-of-sample simulation with the following options:

measureOptions <- GetMeasureOptions(
  simFixSize = 24,
  typesIn = NULL, typesOut = c("direction", "scaledRmse", "crps"),
  horizons = as.integer(c(1))

This means, that there are 24 out-of-sample experiments. In each iteration, we predict the direction of next month’s change and summarize the results by calculating the rate of correct predictions. We also calculate the prediction error and CRPS in one horizon-ahead prediction and summarize the results by averaging.


We use World Bank (2022) data set, which is updated monthly based on the IMF’s Primary Commodity Price System:

pcpdata <- Data_Pcp()

While the data set starts from , variables have different starting periods. We focus on the nominal prices, however, we use the (lags of) US and China CPI as two potential predictors. The final data has variables (including CPI series). Since data is monthly, we define the following variable:

seasonsCount <- 12


Since the last two variables are the US and China CPIs, the number of targets is:

numTargets <- ncol(pcpdata$data) - 2

We use the logarithm of data:

y <- sapply(pcpdata$data, function(v) {
  r <- log(v)
  r[r <= 0] <- NA

and we define intercept and linear trend as two potential exogenous predictors:

n <- nrow(y)
x <- as.matrix(data.frame(Intercept =, n), Trend = c(1:n)))
newX <- as.matrix(data.frame(Intercept = rep(1, 24), Trend = c((n + 1):(n + 24))))

We are not able to load the data in this vignette, because it needs an external data set and this is not available in this package. But, a part of the data set is saved in the ldt package, and we load:

y = ldt::vig_data$pcp$y
x = ldt::vig_data$pcp$x
newX = ldt::vig_data$pcp$newX
numTargets <- 3

If you have downloaded the data set files, do not run this code. Note that we also reset the number of targets.

We search VARMA models with at most 2 number of variables:

ySizes <- as.integer(c(1:2))

and we create two groups of exogenous variables:

xGroups <- list(as.integer(c(1)), as.integer(c(1, 2)))

We also define the following maximum values for the parameters of the VARMA:

maxParams <- c(2, 1, 0, 2, 1, 0)

Note that there is no MA part in the selected models. Next, we ask ldt to exclude models with unusual predictions:

checkItems <- GetModelCheckItems(
  estimation = TRUE, prediction = TRUE,
  predictionBoundMultiplier = 4

Finally, we search for the best models:

pcpRes <- VarmaSearch(
  y = y, x = x, numTargets = numTargets,
  ySizes = ySizes, xGroups = xGroups, maxParams = maxParams,
  seasonsCount = seasonsCount, maxHorizon = 1, newX = newX,
  interpolate = TRUE,
  measureOptions = measureOptions,
  modelCheckItems = checkItems,
  searchItems = GetSearchItems(model = TRUE, bestK = 1),
  searchOptions = GetSearchOptions(printMsg = FALSE)

The following table shows, given a performance measure, which price index is better predicted:

Psuedo out-of-sample scores and ranks
direction scaledRmse crps
Commodities for Index: Al… 0.8333 (1) 0.0063 (1) 0.0188 (1)
Non-Fuel Price Index, 201… 0.7917 (2) 0.01 (2) 0.0288 (2)
All Commodity Price Index… 0.7917 (3) 0.0108 (3) 0.0312 (3)

Each value is the best score that is found for the variables in the row. The numbers in the parentheses indicate the rank of the values.


World Bank. 2022. “Primary Commodity Prices (Excel Database).”