# LikertMakeR

LikertMakeR synthesises and correlates Likert-scale and related rating-scale data. You decide the mean and standard deviation, and (optionally) the correlations among vectors, and the package will generate data with those same predefined properties.

The package generates a column of values that simulate the same properties as a rating scale. If multiple columns are generated, then you can use LikertMakeR to rearrange the values so that the new variables are correlated exactly in accord with a user-predefined correlation matrix.

## Purpose

The package should be useful for teaching in the Social Sciences, and for scholars who wish to “replicate” rating-scale data for further analysis and visualisation when only summary statistics have been reported.

I was prompted to write the functions in LikertMakeR after reviewing too many journal article submissions where authors presented questionnaire results with only means and standard deviations (often only the means), with no understanding of the real distributions. Hopefully, this tool will help researchers, teachers, and other reviewers, to better think about rating-scale distributions, and the effects of variance, boundaries and number of items in a scale.

## Rating scale properties

A Likert scale is the mean, or sum, of several ordinal rating scales. They are bipolar (usually “agree-disagree”) responses to propositions that are determined to be moderately-to-highly correlated and capturing various facets of a construct.

Rating scales, such as Likert scales, are not continuous or unbounded.

For example, a 5-point Likert scale that is constructed with, say, five items (questions) will have a summed range of between 5 (all rated ‘1’) and 25 (all rated ‘5’) with all integers in between, and the mean range will be ‘1’ to ‘5’ with intervals of 1/5=0.20. A 7-point Likert scale constructed from eight items will have a summed range between 8 (all rated ‘1’) and 56 (all rated ‘7’) with all integers in between, and the mean range will be ‘1’ to ‘7’ with intervals of 1/8=0.125.

Rating-scale boundaries define minima and maxima for any scale values. If the mean is close to one boundary then data points will gather more closely to that boundary and the data will always be skewed.

# Using LikertMakeR


library(devtools)

install_github("WinzarH/LikertMakeR")

library(LikertMakeR)

## Generate synthetic rating-scale data

To synthesise a rating scale with LikertMakeR, the user must input the following parameters:

• n: sample size

• mean: desired mean

• sd: desired standard deviation

• lowerbound: desired lower bound

• upperbound: desired upper bound

• items: number of items making the scale - default = 1

• seed: optional seed for reproducibility

LikertMakeR offers two different functions for synthesising a rating scale: lfast() and lexact()

### lfast()

• lfast() draws a random sample from a scaled Beta distribution. It is very fast but does not guarantee exact mean and standard deviation. Recommended for relatively large sample sizes.

#### lfast() example

##### a four-item, five-point Likert scale

## a four-item, five-point Likert scale

x <- lfast(
n = 512,
mean = 2.0,
sd = 1.0,
lowerbound = 1,
upperbound = 5,
items = 4
)

##### an 11-point likelihood-of-purchase scale

## an 11-point likelihood-of-purchase scale

x <- lfast(256, 2, 2, 0, 10)

### lexact()

lexact() attempts to produce a vector with exact first and second moments. It uses the Differential Evolution algorithm in the DEoptim package to find appropriate values within the desired constraints. The DEoptim package is described in Mullen, Ardia, Gil, Windover, & Cline (2011) doi:10.18637/jss.v040.i06.

If feasible, lexact() should produce data with moments that are correct to two decimal places. Infeasible cases occur when the requested standard deviation is too large for the combination of mean, n-items, and scale boundaries.

#### lexact() example #1

##### a four-item, five-point Likert scale

x <- lexact(
n = 64,
mean = 2.5,
sd = 1.0,
lowerbound = 1,
upperbound = 5,
items = 4
)
#>
#> ***** summary of DEoptim object *****
#> best member   :  6 12 18 14 11 12 9 12 5 11 19 6 12 8 6 4 14 8 17 11 12 9 19 12 12 6 12 6 6 13 4 12 12 7 7 10 13 9 5 12 4 4 9 8 6 9 14 8 5 5 8 17 8 11 18 12 7 12 12 17 6 10 6 11
#> best value    :  0
#> after         :  32 generations
#> fn evaluated  :  21120 times
#> *************************************

#### lexact() example #2

##### 11-point likelihood-of-purchase scale

x <- lexact(64, 2, 1.8, 0, 10)
#>
#> ***** summary of DEoptim object *****
#> best member   :  0 6 1 1 2 0 1 1 3 1 3 3 1 5 4 1 4 1 0 4 1 5 3 4 1 0 2 1 2 1 2 0 0 0 1 6 4 0 0 0 2 1 0 2 3 1 6 1 0 2 1 6 3 1 3 1 2 1 2 1 5 2 1 6
#> best value    :  0.0028
#> after         :  640 generations
#> fn evaluated  :  410240 times
#> *************************************

## Correlating vectors of synthetic rating scales

LikertMakeR offers another function, lcor(), which rearranges the values in the columns of a data-set so that they are correlated at a specified level. It does not change the values - it swaps their positions within each column so that univariate statistics do not change, but their correlations with other vectors do.

lcor() systematically selects pairs of values in a column and swaps their places, and checks to see if this swap improves the correlation matrix. If the revised data-frame produces a correlation matrix closer to the target correlation matrix, then the swap is retained. Otherwise, the values are returned to their original places. This process is iterated across each column.

To create the desired correlated data, the user must define the following data-frames:

• data: a starter data set of rating-scales

• target: the target correlation matrix

### lcor() example

Let’s generate some data: three 5-point Likert scales, each with five items.

##### generate uncorrelated synthetic data

n <- 32

x1 <- lexact(n, 2.5, 0.75, 1, 5, 5)
#>
#> ***** summary of DEoptim object *****
#> best member   :  17 11 10 10 22 12 9 13 14 14 20 15 7 15 21 11 9 11 10 15 7 9 11 13 9 10 14 13 14 10 9 15
#> best value    :  0.00072
#> after         :  320 generations
#> fn evaluated  :  102720 times
#> *************************************
x2 <- lexact(n, 3.0, 1.50, 1, 5, 5)
#>
#> ***** summary of DEoptim object *****
#> best member   :  25 22 7 10 20 10 24 23 7 6 10 7 6 6 16 22 9 11 6 21 25 7 24 20 9 20 6 24 24 10 24 19
#> best value    :  0.00289
#> after         :  320 generations
#> fn evaluated  :  102720 times
#> *************************************
x3 <- lexact(n, 3.5, 1.00, 1, 5, 5)
#>
#> ***** summary of DEoptim object *****
#> best member   :  16 18 13 19 19 6 10 18 24 11 19 23 18 19 24 17 12 22 22 22 21 21 12 21 20 18 17 25 18 17 13 5
#> best value    :  0.10399
#> after         :  320 generations
#> fn evaluated  :  102720 times
#> *************************************

mydat3 <- cbind(x1, x2, x3) |> data.frame()

The first 10 observations from this data-frame are:

#>        x1  x2  x3
#> par1  3.4 5.0 3.2
#> par2  2.2 4.4 3.6
#> par3  2.0 1.4 2.6
#> par4  2.0 2.0 3.8
#> par5  4.4 4.0 3.8
#> par6  2.4 2.0 1.2
#> par7  1.8 4.8 2.0
#> par8  2.6 4.6 3.6
#> par9  2.8 1.4 4.8
#> par10 2.8 1.2 2.2

Mean values:

#>  x1  x2  x3
#> 2.5 3.0 3.5

Standard deviations:

#>    x1    x2    x3
#> 0.750 1.500 1.001

We can see that the data are close to what is expected. The synthetic data have low correlations:

#>      x1    x2    x3
#> x1 1.00  0.07  0.17
#> x2 0.07  1.00 -0.08
#> x3 0.17 -0.08  1.00
##### a target correlation matrix

## describe a target correlation matrix

tgt3 <- matrix(
c(
1.00, 0.80, 0.75,
0.80, 1.00, 0.90,
0.75, 0.90, 1.00
),
nrow = 3
)

So now we have a data-frame with desired first and second moments, and a target correlation matrix.

##### applying the lcor() function.

## apply lcor function

new3 <- lcor(mydat3, tgt3)

A new data frame with correlations close to our desired correlation matrix:

#>      x1   x2   x3
#> x1 1.00 0.80 0.75
#> x2 0.80 1.00 0.85
#> x3 0.75 0.85 1.00

## Alternative methods & packages

LikertMakeR is intended for synthesising & correlating rating-scale data with means, standard deviations, and correlations as close as possible to predefined parameters. If you don’t need your data to be close to exact, then other options may be faster or more flexible.

Different approaches include:

• sampling from a truncated normal distribution. Data are sampled from a normal distribution, and then truncated to suit the rating-scale boundaries, and rounded to set discrete values as we see in rating scales. See Heiz (2021) for an excellent and short example using the following packages:

• sampling with a predetermined probability distribution

• for example, the following code will generate a vector of values with approximately the given probabilities.

n <- 128
sample(1:5, n, replace = TRUE,
prob = c(0.1, 0.2, 0.4, 0.2, 0.1)
)

## References

Grønneberg, S., Foldnes, N., & Marcoulides, K. M. (2022). covsim: An R Package for Simulating Non-Normal Data for Structural Equation Models Using Copulas. Journal of Statistical Software, 102(1), 1–45. doi:10.18637/jss.v102.i03

Heinz, A. (2021), Simulating Correlated Likert-Scale Data In R: 3 Simple Steps (blog post) https://glaswasser.github.io/simulating-correlated-likert-scale-data/

Lalovic, M. (2021), responsesR: Simulate Likert scale item responses https://github.com/markolalovic/responsesR

Mullen, K. M., Ardia, D., Gil, D. L., Windover, D., & Cline, J. (2011). DEoptim: An R Package for Global Optimization by Differential Evolution. Journal of Statistical Software, 40(6), 1–26. doi:10.18637/jss.v040.i06

Touloumis, A. (2016), Simulating Correlated Binary and Multinomial Responses under Marginal Model Specification: The SimCorMultRes Package, The R Journal 8:2, 79-91. doi:10.32614/RJ-2016-034