sgmcmc: Getting Started

Jack Baker

The goal of sgmcmc is to make it as easy as possible for users to run stochastic gradient MCMC (SGMCMC) algorithms. SGMCMC are algorithms which enable MCMC to scale more easily to large datasets, as traditional MCMC can run very slowly as dataset sizes grow.

sgmcmc implements a lot of the popular stochastic gradient MCMC methods including SGLD, SGHMC and SGNHT. The package uses automatic differentiation, so all the differentiation needed for the methods is calculated automatically. Control variate methods can be used in order to improve the efficiency of the methods as proposed in the recent publication. This package is designed to be user friendly. In order to execute these algorithms, users only need to specify the data; the log-likelihood function and log-prior density; the parameter starting values; and a few tuning parameters.

To enable as much flexibility as possible, the data and parameter starting points fed to the functions in sgmcmc are specified as lists. This allows users to specify multiple parameters and datasets. It also allows the user to easily reference these quantities in the log-likelihood function and log-prior density, and to set different stepsizes for different paramters (essential when parameters are on different scales).

Specifying the Data

As we mentioned earlier, the datasets you wish to use are specified as a list. Suppose we have datasets we have already obtained or created X and Y, we would specify the whole dataset for our session as

dataset = list("X" = X, "Y" = Y)

You can specify as many datasets as you like, the most important thing is that your naming is consistent, so you use the same names in your log-likelihood and log-prior functions, and in your list of stepsizes.

The functions assume that each observation is located on the first axis of the object. Suppose Y is a 2d matrix, then the observation \(Y_i\) should be located at dataset$Y[i,]. Similarly if X is a 3d array, observation \(X_i\) should be located at dataset$X[i,,].

Specifying the Parameters

Again parameters are specified as a list, the names are what we will refer to them as in the logLik and logPrior functions introduced below, the values are the desired starting points. Suppose my model depends on two parameter vectors theta1 and theta2, and we want to start both from 0. If we assume both are length 3, this could be specified like this

params = list("theta1" = rep(0, 3), "theta2" = rep(0, 3))

Specifying the Log Likelihood and Log Prior

The log-likelihood function is specified as a function of the dataset and params, which have the same names as the lists you just specified. The only difference is that the objects inside the lists will have automatically been converted to TensorFlow objects for you. The dataset list will contain TensorFlow placeholders. The params list will contain TensorFlow variables. The logLik function should be a function that takes these lists as input and returns what the log-likelihood value should be given the current parameter and data values. It should do this using TensorFlow operations. More details about how TensorFlow works in R can be found here.

Specifying the logLik and logPrior functions regularly requires specifying specific distributions. TensorFlow already has a number of distributions implemented in the TensorFlow Probability package. All of the distributions implemented in TensorFlow Probability are located in tf$distributions, a list is given on the TensorFlow Probability website. More complex distributions can be specified by coding up the logLik and logPrior functions by hand, examples of this, as well as using various distribution functions, are given in the other tutorials.

The logPrior function is specified in exactly the same way except that the function should only take params as input, as a prior should be independent of the dataset. You do not have to specify the prior at all, and this leads to the algorithm using a uniform, uninformative prior.

Suppose we want to simulate from the mean of a multivariate Normal density with each component of the mean having a Student-T prior, we would specify this as follows

library(MASS)
# Simulate and declare dataset
dataset = list("X" = mvrnorm(10^4, c(0, 0), diag(2)))
# Simulate random starting point
params = list("theta" = rnorm(2))

# Declare log likelihood
logLik = function(params, dataset) {
    # Declare distribution, assuming Sigma known and constant
    SigmaDiag = c( 1, 1 )
    distn = tf$distributions$MultivariateNormalDiag(params$theta, SigmaDiag)
    # Return sum of log pdf
    return(tf$reduce_sum(distn$log_prob(dataset$X)))
}

# Declare log prior
logPrior = function(params) {
    # Declare prior distribution
    distn = tf$distributions$StudentT(3, 0, 1)
    # Apply log prior componentwise and return sum
    return(tf$reduce_sum(distn$log_prob(params$theta)))
}

Most of the time just specifying the constants in these functions, such as SigmaDiag as R objects will be fine. But occassionally there are issues when these constants get automatically converted to tf$float64 objects by TensorFlow rather than tf$float32. If you run into errors involving tf$float64 then force the constants to be input as tf$float32 by using SigmaDiag = tf$constant( c( 1, 1 ), dtype = tf$float32 ).

Specifying the Tuning Parameters

The only other input that needs to be specified to set any of the standard stochastic gradient MCMC methods running is the stepsize. This is normally defined as a list with an entry for each parameter name as follows

stepsize = list( "theta = 1e-5" )

A short hand is just to set stepsize = 1e-5 which just sets the stepsize of each parameter to be 1e-5. This shorthand can be used for any of the tuning constants.

So to get sgld working for the multivariate Normal log-likelihood function we specified, and an uninformative uniform prior, we can simply run

sgld( logLik, dataset, params, stepsize )

similarly for sghmc and sgnht.

To use the Student-t prior we specified, and set a minibatch size of 500, we’d use

sgld( logLik, dataset, params, stepsize, logPrior = logPrior, minibatchSize = 500 )

For more details of the other optional arguments see the main documentation in the reference.

All of the control variate methods sgldcv, sghmccv and sgnhtcv require an extra input optStepsize, which is the stepsize tuning constant for the initial optimization to find the MAP estimates. This argument is simply a small numeric value, such as 0.1, and may require some tuning, which we talk about below. To run sgldcv on the multivariate Normal model, with specified prior we’d use

optStepsize = 0.1
sgldcv( logLik, dataset, params, stepsize, optStepsize, logPrior = logPrior, minibatchSize = 500 )

similar for sghmccv and sgnhtcv.

Most of the time, parameters need tuning, we suggest doing this using cross validation. You can roughly check the algorithm is converging by inspection by checking that the Log-posterior estimate output by the algorithm settles down eventually (it should decrease at first unless the chain converges very quickly).

Next Steps

We suggest for more details you read the worked examples in the Articles section, these cover a variety of models (which will be expanded as the package matures): - Multivariate Gaussian - Gaussian Mixture - Logistic Regression

The SGMCMC algorithms can also be run step by step, which allows custom storage of parameters using test functions, or sequential estimates. Useful if your chain is too large to fit into memory! This requires a better knowledge of TensorFlow. An example of this is given in the neural network vignette.

Full details of the API can be found here.