Controls

The {fHMM} package allows for multiple hidden Markov model specifications, including different data transformations, state-dependent distributions, and a hierarchical model structure. This vignette1 outlines what and how specifications are possible.

library("fHMM")

The set_controls function

The {fHMM} philosophy is to start the modeling process by setting all data, model, and estimation specifications. This is done by defining a named list of controls and passing it to the set_controls() function. The function checks the specifications and returns an fHMM_controls object which stores all specifications and thereby provides required information for other {fHMM} functionalities.

Example specifications

For demonstration, we list example specifications using data from the Deutscher Aktienindex DAX2 (Janßen and Rudolph 1992):

download_data(symbol = "^GDAXI", file = "dax.csv")
#> Download successful.
#> * symbol: ^GDAXI
#> * from: 1987-12-30
#> * to: 2023-02-13
#> * path: C:\Users\Lennart\AppData\Local\Temp\Rtmp42gnFG\Rbuild186c5f42744c\fHMM\vignettes\dax.csv

HMMs for empirical data

The following lines of code specify a 3-state HMM with state-dependent t-distributions on the data in the file dax.csv. The dates are provided in the column called Date and the data in the column called Close. The logreturns = TRUE line transforms the index data to log-returns. The runs = 50 line sets the number of numerical optimization runs to 50.

controls <- list(
  states = 3,
  sdds   = "t",
  data   = list(file        = "dax.csv",
                date_column = "Date",
                data_column = "Close",
                logreturns  = TRUE),
  fit    = list(runs        = 50)
)
set_controls(controls)
#> fHMM controls:
#> * hierarchy: FALSE 
#> * data type: empirical 
#> * number of states: 3 
#> * sdds: t() 
#> * number of runs: 50

Simulated HMM data

The following specifies a 2-state HMM with state-dependent Gamma distributions, where the expectation values for state 1 and 2 are fixed to 0.5 and 2, respectively. The model will be fitted to 500 data points (horizon = 500), that are going to be simulated from this model specification.

controls <- list(
  states  = 2,
  sdds    = "gamma(mu = 0.5|2)",
  horizon = 500
)
set_controls(controls)
#> fHMM controls:
#> * hierarchy: FALSE 
#> * data type: simulated 
#> * number of states: 2 
#> * sdds: gamma(mu = 0.5|2) 
#> * number of runs: 100

Hierarchical HMMs

Specifying hierarchical HMMs is analogously, except that new parameters can be specified (for example period, see below) and some parameters now can be specified for both hierarchies.

controls <- list(
  hierarchy = TRUE,
  horizon   = c(100, 10),
  sdds      = c("t(df = 1)", "t(df = Inf)"),
  period    = "m"
)
set_controls(controls)
#> fHMM controls:
#> * hierarchy: TRUE 
#> * data type: simulated 
#> * number of states: 2 2 
#> * sdds: t(df = 1) t(df = Inf) 
#> * number of runs: 100

The help page of the set_controls() function provides an overview of all possible specifications.

?set_controls
set_controls R Documentation

Set and validate controls

Arguments

controls

A list of controls, see below.

Either none, all, or selected parameters can be specified. Unspecified parameters are set to default values (see the values in brackets below).

If hierarchy = TRUE, parameters marked with a (*) must be a vector of length 2, where the first entry corresponds to the coarse-scale and the second entry to the fine-scale layer.

  • hierarchy (FALSE): A logical, set to TRUE for an hierarchical HMM.

  • states (*) (2): An integer, the number of states of the underlying Markov chain.

  • sdds (*) (“t(df = Inf)”): A character, specifying the state-dependent distribution. One of “t” (the t-distribution), or “gamma” (the gamma distribution), or “lnorm” (the log-normal distribution). You can fix the parameters (mean mu, standard deviation sigma, degrees of freedom df) of these distributions via, e.g., “t(df = Inf)” or “gamma(mu = 0, sigma = 1)”. To fix different values of a parameter for different states, separate by "|", e.g. “t(mu = -1|1)”.

  • horizon (*) (100): A numeric, specifying the length of the time horizon. The first entry of horizon is ignored if data is specified.

  • period (“m”): Only relevant if hierarchy = TRUE and horizon[2] = NA. In this case, a character which specifies a flexible, periodic fine-scale time horizon and can be one of

    • “w” for a week,

    • “m” for a month,

    • “q” for a quarter,

    • “y” for a year.

  • data (NA): A list of controls specifying the data. If data = NA, data gets simulated (default). Otherwise:

    • file (*): Either:

      • A data.frame, which must have a column named date_column (with dates) and data_column (with financial data). If hierarchy = TRUE, this data.frame is used for both the coarse- and the fine-scale layer. To have different data sets for theses layers, file can be a list of two data.frame.

      • A character, the path to a .csv-file with financial data, which must have a column named date_column (with dates) and data_column (with financial data).

    • date_column (*) (“Date”): A character, the name of the column in file with dates. Can be NA in which case consecutive integers are used as time points.

    • data_column (*) (“Close”): A character, the name of the column in file with financial data.

    • from (NA): A character of the format “YYYY-MM-DD”, setting a lower data limit. No lower limit if from = NA. Ignored if controls\(data\)date_column is NA.

    • to (NA): A character of the format “YYYY-MM-DD”, setting an upper data limit. No upper limit if from = NA. Ignored if controls\(data\)date_column is NA.

    • logreturns (*) (FALSE): A logical, if TRUE the data is transformed to log-returns.

    • merge (function(x) mean(x)): Only relevant if hierarchy = TRUE. In this case, a function with one argument x, which merges a numeric vector of fine-scale data x into one coarse-scale observation. For example,

      • merge = function(x) mean(x) defines the mean of the fine-scale data as the coarse-scale observation,

      • merge = function(x) mean(abs(x)) for the mean of the absolute values,

      • merge = function(x) sum(abs(x)) for the sum of the absolute values,

      • merge = function(x) (tail(x,1)-head(x,1))/head(x,1) for the relative change of the first to the last fine-scale observation.

  • fit: A list of controls specifying the model fitting:

    • runs (100): An integer, setting the number of randomly initialized optimization runs from which the best one is selected as the final model.

    • origin (FALSE): A logical, if TRUE the optimization is initialized at the true parameter values. Only for simulated data. If origin = TRUE, this sets run = 1 and accept = 1:5.

    • accept (1:3): An integer (vector), specifying which optimization runs are accepted based on the output code of nlm.

    • gradtol (1e-6): A positive numeric value, passed on to nlm.

    • iterlim (200): A positive integer, passed on to nlm.

    • print.level (0): One of 0, 1, and 2 to control the verbosity of the optimization, passed on to nlm.

    • steptol (1e-6): A positive numeric value, passed on to nlm.

x

An object of class fHMM_controls.

Currently not used.

References

Janßen, B., and B. Rudolph. 1992. “Der Deutsche Aktienindex DAX.” Knapp Verlag.

  1. This vignette was build using R .4 with the {fHMM} 1.1.0 package.↩︎

  2. The download_data() function is explained in the vignette on data management.↩︎