Using Plausible Values with Multilevel Models Using R

Last updated on Mar 7, 2024 12 min read wemix, plausible values

Syntax to accompany the article:

Huang, F. (2024). Using plausible values when fitting multilevel models with large-scale assessment data using R. Large-scale Assessments in Education.

When fitting multilevel models using large scale assessments such as PISA or TIMSS, it is important to account for:

the use of weights at different levels and
the presence of multiple plausible values.

I am often asked how do you run this analysis in R. The following example shows how this can be done (read the article for more details). The use of plausible values in inherently a missing data problem– no one child in these LSAs takes the whole battery of tests (if any did, a child would need more than 10 hours to complete the assessment– which is not feasible). Respondents only complete some of the tests– not the whole thing. See Rutkowski et al (2010) for a guide on this. Note: there are other ways of doing this too (such as using the EdSurvey package [Bailey et al.]).

1. Load in packages and the `mixPV` functions

The mixPV functions are loaded in using the source function:

library(WeMix) #for mixed models with weights at different levels
library(modelsummary) #for output tables
library(broom) #needed for output tables
source("https://raw.githubusercontent.com/flh3/pubdata/main/mixPV/mixPV.R")

Note: this function is just a wrapper that uses the mix function in the WeMix package (Bailey et al.). The mix function does all the hard work when fitting the models!

2. Load in the dataset (`pisa2012`).

The dataset is available in the MLMusingR package (v0.3.2).

## Import/load the data (I like using rio::import)
# pisa2012 <- rio::import("https://github.com/flh3/pubdata/raw/main/mixPV/pisa2012.rds")
data(pisa2012, package = 'MLMusingR') #data is in the MLMusingR package

## Inspect the data
## install.packages("lme4", type = "source") #if there are issues with Matrix
head(pisa2012)

   pv1math  pv2math  pv3math  pv4math  pv5math  escs schoolid           st29q03
1 364.7063 362.3695 348.3487 348.3487 399.7585 -0.60  0000001             Agree
2 416.0383 422.2698 410.5858 422.2698 393.4491  0.89  0000001          Disagree
3 410.0405 371.0936 421.7246 386.6724 424.0614 -1.99  0000001    Strongly agree
4 289.8504 333.4709 315.5554 317.1133 352.1654 -0.35  0000001             Agree
5 383.4008 423.1267 414.5583 369.3800 379.5062  0.63  0000001          Disagree
6 398.2785 397.4996 384.2577 378.0262 489.4142  0.62  0000001 Strongly disagree
  st04q01 w_fstuwt w_fschwt    sc14q02      pwt1     noise1
1  Female 766.2523   83.671 Not at all  9.157920 -1.2070657
2    Male 808.8219   83.671 Not at all  9.666693  0.2774292
3    Male 845.6264   83.671 Not at all 10.106565  1.0844412
4    Male 808.8219   83.671 Not at all  9.666693 -2.3456977
5    Male 808.8219   83.671 Not at all  9.666693  0.4291247
6  Female 766.2523   83.671 Not at all  9.157920  0.5060559

summary(pisa2012)

    pv1math         pv2math         pv3math         pv4math     
 Min.   :225.4   Min.   :206.0   Min.   :224.7   Min.   :190.5  
 1st Qu.:420.7   1st Qu.:420.1   1st Qu.:421.3   1st Qu.:421.5  
 Median :479.2   Median :480.8   Median :480.1   Median :481.5  
 Mean   :484.1   Mean   :484.0   Mean   :484.5   Mean   :485.4  
 3rd Qu.:544.8   3rd Qu.:544.6   3rd Qu.:546.4   3rd Qu.:546.0  
 Max.   :822.0   Max.   :808.0   Max.   :783.7   Max.   :775.9  
                                                                
    pv5math           escs           schoolid                      st29q03    
 Min.   :188.9   Min.   :-3.8000   Length:3136        Strongly agree   : 387  
 1st Qu.:422.4   1st Qu.:-0.4825   Class :character   Agree            :1037  
 Median :480.4   Median : 0.3050   Mode  :character   Disagree         :1231  
 Mean   :485.2   Mean   : 0.1990                      Strongly disagree: 481  
 3rd Qu.:544.4   3rd Qu.: 0.9300                      N/A              :   0  
 Max.   :778.6   Max.   : 3.1200                      Invalid          :   0  
                                                      Missing          :   0  
    st04q01        w_fstuwt         w_fschwt                 sc14q02    
 Female :1570   Min.   : 134.5   Min.   :  21.97   Not at all    :2276  
 Male   :1566   1st Qu.: 560.5   1st Qu.:  49.06   Very little   : 525  
 N/A    :   0   Median : 664.8   Median :  81.40   To some extent: 272  
 Invalid:   0   Mean   : 713.2   Mean   : 143.93   A lot         :  63  
 Missing:   0   3rd Qu.: 780.7   3rd Qu.: 149.22   N/A           :   0  
                Max.   :2597.9   Max.   :1942.54   Invalid       :   0  
                                                   Missing       :   0  
      pwt1            noise1         
 Min.   : 1.229   Min.   :-3.396064  
 1st Qu.: 4.680   1st Qu.:-0.663848  
 Median : 8.944   Median : 0.022201  
 Mean   : 9.547   Mean   : 0.008473  
 3rd Qu.:13.617   3rd Qu.: 0.671990  
 Max.   :37.018   Max.   : 3.195901

dim(pisa2012)

[1] 3136   14

Read the article for a description of the predictors or use ?pisa2012 for the codebook. NOTE: do not use this dataset on its own for running any kind of substantive analysis.

3. The model can be fit using the `mix` function in the WeMix package.

However, the example below just uses one plausible value (pv). I just show this to illustrate how using all the pvs can have slightly different results compared to just using one pv. Take note: when running these analysis– do NOT just use one pv– that is incorrect (despite what you may have read in some other peer-reviewed manuscript) for a variety of reasons. Read the documentation of the LSA.

## just using one plausible value
nopv <- mix(pv1math ~
 st29q03 + sc14q02 + st04q01 + escs + (1|schoolid), 
 weights = c('w_fstuwt', 'w_fschwt'), 
 data = pisa2012)
summary(nopv)

Call:
mix(formula = pv1math ~ st29q03 + sc14q02 + st04q01 + escs + 
    (1 | schoolid), data = pisa2012, weights = c("w_fstuwt", 
    "w_fschwt"))

Variance terms:
 Level    Group        Name Variance Std. Error Std.Dev.
     2 schoolid (Intercept)     1414      327.5    37.60
     1 Residual                 5265      152.5    72.56
Groups:
 Level    Group n size mean wgt sum wgt
     2 schoolid    157    192.1   30161
     1      Obs   3136    713.2 2236615

Fixed Effects:
                         Estimate Std. Error t value
(Intercept)               486.804      7.778  62.588
st29q03Agree              -11.108      5.700  -1.949
st29q03Disagree           -19.255      5.456  -3.529
st29q03Strongly disagree  -41.542      6.864  -6.052
sc14q02Very little        -21.340     17.060  -1.251
sc14q02To some extent     -11.782     12.821  -0.919
sc14q02A lot              -26.913      7.657  -3.515
st04q01Male                 9.508      2.986   3.184
escs                       25.568      2.117  12.075

lnl= -12789991.99 
Intraclass Correlation= 0.2117

## can also do it this way (using conditional weights)
# nopv1 <- mix(pv1math ~
#  st29q03 + sc14q02 + st04q01 + escs + (1|schoolid), 
#  weights = c('pwt1', 'w_fschwt'), 
#  data = pisa2012, cWeights = TRUE)
# summary(nopv1)

4. Fit the model using several plausible values

The multiple plausible values are explicitly specified in the formula. A basic random intercepts model can be fit using:

## using 5 plausible values
m0 <- mixPV(pv1math + pv2math + pv3math + pv4math + pv5math ~
 st29q03 + sc14q02 + st04q01 + escs + (1|schoolid), 
 weights = c('w_fstuwt', 'w_fschwt'), 
 data = pisa2012)

Analyzing plausible value: pv1math 
Analyzing plausible value: pv2math 
Analyzing plausible value: pv3math 
Analyzing plausible value: pv4math 
Analyzing plausible value: pv5math

summary(m0)

Results of multilevel analyses with 5 plausible values.
Number of observations: 3136 

Estimates for random effects: 
                     estimate std.error statistic   df Pr(>t)    
schoolid.(Intercept)  1397.96    327.26      4.27 9027 <2e-16 ***
Residual              5295.23    158.43     33.42 2666 <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Estimates for fixed effects: 
                         estimate std.error statistic   df Pr(>t)    
(Intercept)                489.51      8.41     58.22  248 <2e-16 ***
st29q03Agree               -11.29      5.99     -1.88  527 0.0601 .  
st29q03Disagree            -19.53      6.05     -3.23  118 0.0016 ** 
st29q03Strongly disagree   -39.95      7.02     -5.69  335 <2e-16 ***
sc14q02Very little         -22.93     16.69     -1.37  759 0.1699    
sc14q02To some extent      -17.61     11.77     -1.50  204 0.1361    
sc14q02A lot               -30.05      7.75     -3.88  315 0.0001 ***
st04q01Male                  8.40      3.10      2.71  174 0.0075 ** 
escs                        25.88      2.11     12.29 2578 <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# summary_all(m0) #shows all results with multiple PVs

A random slope model can also be fit (adding a random slope for escs):

m1 <- mixPV(pv1math + pv2math + pv3math + pv4math + pv5math ~
 st29q03 + sc14q02 + st04q01 + escs + (escs|schoolid), 
 weights = c('w_fstuwt', 'w_fschwt'), 
 data = pisa2012)

Analyzing plausible value: pv1math 
Analyzing plausible value: pv2math 
Analyzing plausible value: pv3math 
Analyzing plausible value: pv4math 
Analyzing plausible value: pv5math

## a boundary(singular) fit message suggests that the 
## variance components estimated are not different from zero
## the results however do show variation
summary(m1)

Results of multilevel analyses with 5 plausible values.
Number of observations: 3136 

Estimates for random effects: 
                     estimate std.error statistic     df Pr(>t)    
schoolid.(Intercept)  1380.68    325.42      4.24 2151.9 <2e-16 ***
schoolid.escs          324.39     63.89      5.08  350.5 <2e-16 ***
Residual              5021.04    106.04     47.35   81.6 <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Estimates for fixed effects: 
                         estimate std.error statistic   df Pr(>t)    
(Intercept)                486.44      8.27     58.84  245 <2e-16 ***
st29q03Agree               -10.54      5.89     -1.79  370 0.0746 .  
st29q03Disagree            -17.40      6.08     -2.86  108 0.0050 ** 
st29q03Strongly disagree   -36.88      7.05     -5.23  250 <2e-16 ***
sc14q02Very little         -23.18     15.78     -1.47  706 0.1422    
sc14q02To some extent      -13.75     11.88     -1.16  480 0.2478    
sc14q02A lot               -35.74      8.59     -4.16  258 <2e-16 ***
st04q01Male                  8.88      3.09      2.88  192 0.0045 ** 
escs                        27.12      2.44     11.11 1412 <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#summary_all(m1)

Differences between the two models can be tested using a likelihood ratio test: is the full model an improvement over the reduced model (w/random slope vs random intercepts only?). See Grund et al. (2023) article in Psych Methods about different ways an LRT can be performed when using imputed data.

lrtPV(m1, m0) #full vs reduced

         F df1      df2        r     pv
1 98.27685   2 2.650988 441.3453 0.0033

Just testing this with random noise too: noise1 (random variable that is not associated with the outcome)– differences should not be statistically significant (i.e., adding the variable does not improve the model).

m2 <- mixPV(pv1math + pv2math + pv3math + pv4math + pv5math ~
 st29q03 + sc14q02 + st04q01 + escs + noise1 + (1|schoolid), 
 weights = c('w_fstuwt', 'w_fschwt'), 
 data = pisa2012)

Analyzing plausible value: pv1math 
Analyzing plausible value: pv2math 
Analyzing plausible value: pv3math 
Analyzing plausible value: pv4math 
Analyzing plausible value: pv5math

summary(m2) #ns

Results of multilevel analyses with 5 plausible values.
Number of observations: 3136 

Estimates for random effects: 
                     estimate std.error statistic   df Pr(>t)    
schoolid.(Intercept)  1394.17    325.01      4.29 8527 <2e-16 ***
Residual              5295.00    158.46     33.41 2716 <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Estimates for fixed effects: 
                         estimate std.error statistic   df Pr(>t)    
(Intercept)               489.496     8.397    58.294  247 <2e-16 ***
st29q03Agree              -11.303     5.998    -1.884  526 0.0601 .  
st29q03Disagree           -19.562     6.060    -3.228  118 0.0016 ** 
st29q03Strongly disagree  -39.961     7.022    -5.691  335 <2e-16 ***
sc14q02Very little        -22.883    16.710    -1.369  777 0.1713    
sc14q02To some extent     -17.504    11.784    -1.485  204 0.1390    
sc14q02A lot              -30.078     7.702    -3.905  306 0.0001 ***
st04q01Male                 8.418     3.101     2.715  170 0.0073 ** 
escs                       25.895     2.114    12.249 2627 <2e-16 ***
noise1                      0.590     1.397     0.422 1941 0.6729    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

lrtPV(m2, m0) #ns too

         F df1      df2        r     pv
1 3.032867   1 4.233374 34.77259 0.1526

5. Showing results side by side

I like building up models over several phases. This just illustrates how you can do that. This output should still be fixed up (see article) and read the modelsummary documentation.

modelsummary(list("ri" = m0, 'rs' = m1),
             stars = TRUE)

	ri	rs
(Intercept)	489.507***	486.441***
	(8.408)	(8.268)
st29q03Agree	−11.286+	−10.536+
	(5.989)	(5.894)
st29q03Disagree	−19.534**	−17.404**
	(6.050)	(6.079)
st29q03Strongly disagree	−39.949***	−36.881***
	(7.025)	(7.050)
sc14q02Very little	−22.927	−23.178
	(16.690)	(15.777)
sc14q02To some extent	−17.605	−13.748
	(11.766)	(11.881)
sc14q02A lot	−30.050***	−35.744***
	(7.748)	(8.593)
st04q01Male	8.395**	8.881**
	(3.103)	(3.088)
escs	25.883***	27.123***
	(2.105)	(2.441)
schoolid.(Intercept)	1397.963***	1380.681***
	(327.262)	(325.420)
Residual	5295.229***	5021.039***
	(158.428)	(106.040)
schoolid.escs		324.388***
		(63.893)
Nobs	3136	3136
N.pv	5	5
AICbar	25592462.6532503	25504198.0201192
BICbar	25592529.2109875	25504276.6792632
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001

NOTE: After this initial post, I received a request on how the mix output can be used with the modelsummary package. To get that to work (at least for two level models), use this:

source("https://raw.githubusercontent.com/flh3/pubdata/main/mixPV/wemix_modelsummary.R")

NOTE: that mix function does not show the p-values (like lmer). The stars (in modelsummary) – if just using the mix function (not mivPV) are based on p-values using a z statistic (assuming infinite degrees of freedom) which is likely not correct for higher-level predictors (as is done when using Mplus) – so be warned about that.

6. Comparing results using the `EdSurvey` package

NOTE: This can take a while to run if downloading the datasets using the function.

The article also shows the results when using SAS proc glimmix.

library(EdSurvey)
# downloadPISA(years = 2012, root = "C:/", cache = TRUE)

pisa12 <- readPISA(path = "C:/EdSurveyData/PISA/2012/", 
                   countries = c("usa"))
dat <- EdSurvey::getData(data = pisa12, 
 varnames = c('st04q01', 'escs', 'sc14q02', 'st29q03',
 'math', 'w_fstuwt', 'w_fschwt', 'schoolid'),
 omittedLevels = TRUE, addAttributes = TRUE)

m1 <- mixed.sdf(formula = math ~ st29q03 + sc14q02 +
  st04q01 + escs + (1|schoolid), 
  weightVar=c("w_fstuwt", "w_fschwt"), 
  data = dat, weightTransformation = FALSE)
summary(m1)

# > summary(m1)
# Call:
#   mixed.sdf(formula = math ~ st29q03 + sc14q02 + st04q01 + escs + 
#               (1 | schoolid), data = dat, weightVars = c("w_fstuwt", "w_fschwt"), 
#             weightTransformation = FALSE)
# 
# Formula: math ~ st29q03 + sc14q02 + st04q01 + escs + (1 |
#                                                         schoolid)
# 
# Plausible Values: 5
# Number of Groups:
#   Level    Group n size mean wgt sum wgt
# 2 schoolid    157    192.1   30161
# 1      Obs   3136    713.2 2236615
# 
# Variance terms:
#   Level    Group        Name Variance Std. Error Std.Dev.
# 2 schoolid (Intercept)     1398      327.5    37.39
# 1 Residual                 5295      152.5    72.77
# 
# Fixed Effects:
#   Estimate Std. Error t value
# (Intercept)               489.507      8.408  58.218
# st29q03Agree              -11.286      5.989  -1.884
# st29q03Disagree           -19.534      6.050  -3.229
# st29q03Strongly disagree  -39.949      7.025  -5.687
# sc14q02Very little        -22.927     16.690  -1.374
# sc14q02To some extent     -17.605     11.766  -1.496
# sc14q02A lot              -30.050      7.748  -3.879
# st04q01Male                 8.395      3.103   2.706
# escs                       25.883      2.105  12.293
# 
# Intraclass Correlation= 0.209

m2 <- mixed.sdf(formula = math ~ st29q03 + sc14q02 +
   st04q01 + escs + (escs|schoolid), 
   weightVar = c("w_fstuwt", "w_fschwt"), 
   data = dat, weightTransformation = FALSE)
summary(m2)

# > summary(m2)
# Call:
#   mixed.sdf(formula = math ~ st29q03 + sc14q02 + st04q01 + escs + 
#   (escs | schoolid), data = dat, weightVars = c("w_fstuwt", 
#  "w_fschwt"), weightTransformation = FALSE)
# 
# Formula: math ~ st29q03 + sc14q02 + st04q01 + escs + (escs |
#                                                         schoolid)
# 
# Plausible Values: 5
# Number of Groups:
#   Level    Group n size mean wgt sum wgt
# 2 schoolid    157    192.1   30161
# 1      Obs   3136    713.2 2236615
# 
# Variance terms:
#   Level    Group        Name Variance Std. Error Std.Dev. Corr1
# 2 schoolid (Intercept)   1380.7     303.89    37.16      
# 2 schoolid        escs    324.4      63.53    18.01  0.22
# 1 Residual               5021.0     137.82    70.86      
# 
# Fixed Effects:
#   Estimate Std. Error t value
# (Intercept)               486.441      8.268  58.836
# st29q03Agree              -10.536      5.894  -1.788
# st29q03Disagree           -17.404      6.079  -2.863
# st29q03Strongly disagree  -36.881      7.050  -5.231
# sc14q02Very little        -23.178     15.777  -1.469
# sc14q02To some extent     -13.748     11.881  -1.157
# sc14q02A lot              -35.744      8.593  -4.160
# st04q01Male                 8.881      3.088   2.876
# escs                       27.123      2.441  11.113
# 
# Intraclass Correlation= 0.253

— END

regression

Using Plausible Values with Multilevel Models Using R

1. Load in packages and the mixPV functions

2. Load in the dataset (pisa2012).

3. The model can be fit using the mix function in the WeMix package.