RNGforGPD

Hesen Li, Ruizhe Chen, Hai Nguyen, Yu-che Chung, Ran Gao, Hakan Demirtas

2018-04-09

Introduction

This vignette file conveys certain ideas behind the generalized Poisson distribution and some examples of applying the functions in this package (RNGforGPD).

Functions and Comments

GenUniGpois

We choose different data generation methods according to different parameter values because restrictions apply when the rate parameter or the dispersion parameter of the generalized Poisson is within certain ranges. For example, the normal approximation method does not work well for theta < 10.

GenUniGpois(2, 0.9, 100, method = "Branching")
#> [1] "Specified theta is 2, empirical theta is 1.837232, specified lambda is 0.9, empirical lambda is 0.888518."
GenUniGpois(5, -0.4, 100, method = "Inversion")
#> [1] "Specified theta is 5, empirical theta is 4.692421, specified lambda is -0.4, empirical lambda is -0.364076."
GenUniGpois(12, 0.5, 100, method = "Normal-Approximation")
#> [1] "Specified theta is 12, empirical theta is 11.990429, specified lambda is 0.5, empirical lambda is 0.508388."
data = GenUniGpois(10, 0.4, 10, method = "Chop-Down", details = FALSE)
data = GenUniGpois(3, 0.9, 10000, method = "Build-Up", details = FALSE)

ComputeCorrGpois

From a practical perspective, correlation bounds among variables are typically narrower than between −1 and 1 (the theoretical maximum and minimum correlation bounds) because different correlation upper and lower bounds may be imposed by the marginal distributions. A simple sorting technique can be used to obtain approximate correlation bounds and this approach works regardless of the data type or distribution (Demirtas, Hedeker 2011).

Adopting that sorting technique, we wrote that computes the lower and upper correlation bounds between a pair of generalized Poisson variables. Furthermore, this function is an integral part of the function that examines whether values of pairwise correlation matrix fall within the limits imposed by the marginal distributions.

ComputeCorrGpois(c(3,2,5,4),c(0.3,0.2,0.5,0.6))
#> ............
#> $min
#>            [,1]       [,2]       [,3]       [,4]
#> [1,]         NA -0.8452837 -0.8514745 -0.8061459
#> [2,] -0.8452837         NA -0.8362640 -0.7891064
#> [3,] -0.8514745 -0.8362640         NA -0.7965090
#> [4,] -0.8061459 -0.7891064 -0.7965090         NA
#> 
#> $max
#>           [,1]      [,2]      [,3]      [,4]
#> [1,]        NA 0.9835599 0.9934694 0.9872709
#> [2,] 0.9835599        NA 0.9867457 0.9818189
#> [3,] 0.9934694 0.9867457        NA 0.9947975
#> [4,] 0.9872709 0.9818189 0.9947975        NA
ComputeCorrGpois(c(4,5),c(-0.45,-0.11))
#> ..
#> $min
#>            [,1]       [,2]
#> [1,]         NA -0.9484967
#> [2,] -0.9484967         NA
#> 
#> $max
#>           [,1]      [,2]
#> [1,]        NA 0.9545338
#> [2,] 0.9545338        NA

ValidCorrGpois

This function checks the validity of the values of pairwise correlations including positive definiteness, symmetry, correctness of the dimensions, and whether they fall within the correlation bounds. The function ensures that the supplied correlation matrix is valid for simulating multivariate generalized Poisson distributions using .

ValidCorrGpois(matrix(c(1, 0.9, 0.9, 1), byrow = TRUE, nrow = 2), c(0.5, 0.5), c(0.1, 0.105))
#> ..
#> ..
#> ...
#> [1] TRUE
ValidCorrGpois(matrix(c(1, 0.9, 0.9, 1), byrow = TRUE, nrow = 2), c(3, 2), c(-0.3, -0.2))
#> Warning in vglm.fitter(x = x, y = y, w = w, offset = offset, Xm2 =
#> Xm2, : some quantities such as z, residuals, SEs may be inaccurate due to
#> convergence at a half-step
#> ..
#> ..
#> ...
#> [1] TRUE

QuantileGpois

This function computes the quantile for generalized Poisson distribution. We guarantee that there will be at least five classes if lambda is negative by forcing m >= 4.

QuantileGpois(0.98,1,-0.2,details = TRUE)
#> x = 0, P(X = x) = 0.3678794 ,P(X <= x) = 0.3678794 
#> x = 0, P(X = x) = 0.3678794 , P(X <= x) = 0.3678794 
#> x= 1 , P(X = x) = 0.449329 , P(X <= x) = 0.8172084 
#> x= 2 , P(X = x) = 0.1646435 , P(X <= x) = 0.9818519 
#> When lambda is negative, we need to account for truncation error
#> The adjusted CDF are: 0.3746792 0.8323133 1
#> [1] 2
QuantileGpois(0.80,2,0.025,details = FALSE)
#> [1] 3

CorrNNGpois

This function uses the method proposed by (Yahav, Shmueli 2011). They find that the relationship between the desired correlation and the actual correlation of a generalized Poisson distribution can be approximated by an exponential function. Following their simple and empirically based approximation method we can correct our actual correlation to the desired correlation. One also needs to notice that some desired correlations might not be feasible.

CorrNNGpois(c(0.1,10), c(0.1, 0.2),0.5)
#> [1] 0.7983861
CorrNNGpois(c(0.1,10), c(-0.01, -0.02),0.5)
#> [1] 0.8232106
CorrNNGpois(c(4,2.3), c(-0.32,-0.3),0.7)
#> [1] 0.7515057
CorrNNGpois(c(14,10), c(-0.8, -0.3),0.99)
#> The actual correlation, 1.019962, is not feasible!

CmatStarGpois

This function computes the intermediate correlation values for Poisson-Poisson and Poisson-Normal pairs, and constructs an overall intermediate correlation matrix. It takes the target correlation matrix and returns the intermediate matrix of pairwise correlations.

The output of the function is important because it is one of the input arguments for the main data generating function: . The intermediate correlation matrix will lead to the target correlation matrix using inverse CDF transformation of the samples generated from a multivariate normal distribution.

lambda.vec = c(-0.2, 0.2, -0.3)
theta.vec = c(1, 3, 4)
M = c(0.352, 0.265, 0.342)
N = diag(3)
N[lower.tri(N)] = M
TV = N + t(N)
diag(TV) = 1
cstar = CmatStarGpois(TV, theta.vec, lambda.vec)
#> ......
#> ......
#> ......
#> .........
cstar
#>          [,1]      [,2]      [,3]
#> [1,] 1.000000 0.3946680 0.2942150
#> [2,] 0.394668 1.0000000 0.3594702
#> [3,] 0.294215 0.3594702 1.0000000

GenMVGpois

(the engine function) is the most important function in this package (RNGforGPD). It depends on all the other functions in this package and three external packages: mvtnorm, corpcor, and VGAM. The major difference between the univariate generalized Poisson variables generating function and that of multivariate is the consideration of pairwise correlations between variables. These correlations can be verified using and corrected by .

sample.size = 10000; no.gpois = 3
lambda.vec = c(0.2, 0.2, 0.3); theta.vec = c(1, 3, 4)
M = c(0.352, 0.265, 0.342); N = diag(3); N[lower.tri(N)] = M
TV = N + t(N); diag(TV) = 1
cstar = CmatStarGpois(TV, theta.vec, lambda.vec)
#> ......
#> ......
#> ......
#> .........
data = GenMVGpois(sample.size, no.gpois, cstar, theta.vec, lambda.vec, details = FALSE)
apply(data, 2, mean) # empirical means
#> [1] 1.2533 3.7448 5.6884
theta.vec / (1 - lambda.vec) # theoretical means
#> [1] 1.250000 3.750000 5.714286
apply(data, 2, var) # empirical variances
#> [1]  2.053945  5.739047 11.370242
theta.vec / (1 - lambda.vec)^3 # theoretical variances
#> [1]  1.953125  5.859375 11.661808
cor(data) # empirical correlation matrix
#>           [,1]      [,2]      [,3]
#> [1,] 1.0000000 0.3549812 0.2584236
#> [2,] 0.3549812 1.0000000 0.3243298
#> [3,] 0.2584236 0.3243298 1.0000000
TV # specified correlation matrix
#>       [,1]  [,2]  [,3]
#> [1,] 1.000 0.352 0.265
#> [2,] 0.352 1.000 0.342
#> [3,] 0.265 0.342 1.000

Citations

Demirtas, H. (2017). On accurate and precise generation of generalized Poisson variates. , , 489-499.

Yahav, I. and Shmueli, G. (2012). On generating multivariate Poisson data in management science applications. , , 91-102.

Amatya, A. and Demirtas, H. (2015). Simultaneous generation of multivariate mixed data with Poisson and normal marginals. , , 3129-3139.

Amatya, A. and Demirtas, H. (2017). PoisNor: An R package for generation of multivariate data with Poisson and normal marginals. , , 2241-2253.

Demirtas, H. and Hedeker, D. (2011). A practical way for computing approximate lower and upper correlation bounds. , , 104-109.