Using Plausible Values with Multilevel Models Using R

Syntax to accompany the article:

Huang, F. (2024). Using plausible values when fitting multilevel models with large-scale assessment data using R. Large-scale Assessments in Education.

When fitting multilevel models using large scale assessments such as PISA or TIMSS, it is important to account for:

the use of weights at different levels and
the presence of multiple plausible values.

I am often asked how do you run this analysis in R. The following example shows how this can be done (read the article for more details). The use of plausible values in inherently a missing data problem– no one child in these LSAs takes the whole battery of tests (if any did, a child would need more than 10 hours to complete the assessment– which is not feasible). Respondents only complete some of the tests– not the whole thing. See Rutkowski et al (2010) for a guide on this. Note: there are other ways of doing this too (such as using the EdSurvey package [Bailey et al.]).

1. Load in packages and the `mixPV` functions

The mixPV functions are loaded in using the source function:

library(WeMix) #for mixed models with weights at different levels
library(modelsummary) #for output tables
library(broom) #needed for output tables
source("https://raw.githubusercontent.com/flh3/pubdata/main/mixPV/mixPV.R")

Note: this function is just a wrapper that uses the mix function in the WeMix package (Bailey et al.). The mix function does all the hard work when fitting the models!

2. Load in the dataset (`pisa2012`).

The dataset is available in the MLMusingR package (v0.3.2).

## Import/load the data (I like using rio::import)
# pisa2012 <- rio::import("https://github.com/flh3/pubdata/raw/main/mixPV/pisa2012.rds")
data(pisa2012, package = 'MLMusingR') #data is in the MLMusingR package

## Inspect the data
## install.packages("lme4", type = "source") #if there are issues with Matrix
head(pisa2012)

  pv1math pv2math pv3math pv4math pv5math  escs schoolid           st29q03
1     365     362     348     348     400 -0.60  0000001             Agree
2     416     422     411     422     393  0.89  0000001          Disagree
3     410     371     422     387     424 -1.99  0000001    Strongly agree
4     290     333     316     317     352 -0.35  0000001             Agree
5     383     423     415     369     380  0.63  0000001          Disagree
6     398     397     384     378     489  0.62  0000001 Strongly disagree
  st04q01 w_fstuwt w_fschwt    sc14q02  pwt1 noise1
1  Female      766     83.7 Not at all  9.16 -1.207
2    Male      809     83.7 Not at all  9.67  0.277
3    Male      846     83.7 Not at all 10.11  1.084
4    Male      809     83.7 Not at all  9.67 -2.346
5    Male      809     83.7 Not at all  9.67  0.429
6  Female      766     83.7 Not at all  9.16  0.506

summary(pisa2012)

    pv1math       pv2math       pv3math       pv4math       pv5math   
 Min.   :225   Min.   :206   Min.   :225   Min.   :190   Min.   :189  
 1st Qu.:421   1st Qu.:420   1st Qu.:421   1st Qu.:422   1st Qu.:422  
 Median :479   Median :481   Median :480   Median :482   Median :480  
 Mean   :484   Mean   :484   Mean   :485   Mean   :485   Mean   :485  
 3rd Qu.:545   3rd Qu.:545   3rd Qu.:546   3rd Qu.:546   3rd Qu.:544  
 Max.   :822   Max.   :808   Max.   :784   Max.   :776   Max.   :779  
                                                                      
      escs         schoolid                      st29q03        st04q01    
 Min.   :-3.80   Length:3136        Strongly agree   : 387   Female :1570  
 1st Qu.:-0.48   Class :character   Agree            :1037   Male   :1566  
 Median : 0.30   Mode  :character   Disagree         :1231   N/A    :   0  
 Mean   : 0.20                      Strongly disagree: 481   Invalid:   0  
 3rd Qu.: 0.93                      N/A              :   0   Missing:   0  
 Max.   : 3.12                      Invalid          :   0                 
                                    Missing          :   0                 
    w_fstuwt       w_fschwt              sc14q02          pwt1     
 Min.   : 134   Min.   :  22   Not at all    :2276   Min.   : 1.2  
 1st Qu.: 560   1st Qu.:  49   Very little   : 525   1st Qu.: 4.7  
 Median : 665   Median :  81   To some extent: 272   Median : 8.9  
 Mean   : 713   Mean   : 144   A lot         :  63   Mean   : 9.5  
 3rd Qu.: 781   3rd Qu.: 149   N/A           :   0   3rd Qu.:13.6  
 Max.   :2598   Max.   :1943   Invalid       :   0   Max.   :37.0  
                               Missing       :   0                 
     noise1     
 Min.   :-3.40  
 1st Qu.:-0.66  
 Median : 0.02  
 Mean   : 0.01  
 3rd Qu.: 0.67  
 Max.   : 3.20

dim(pisa2012)

[1] 3136   14

Read the article for a description of the predictors or use ?pisa2012 for the codebook. NOTE: do not use this dataset on its own for running any kind of substantive analysis.

3. The model can be fit using the `mix` function in the WeMix package.

However, the example below just uses one plausible value (pv). I just show this to illustrate how using all the pvs can have slightly different results compared to just using one pv. Take note: when running these analysis– do NOT just use one pv– that is incorrect (despite what you may have read in some other peer-reviewed manuscript) for a variety of reasons. Read the documentation of the LSA.

## just using one plausible value
nopv <- mix(pv1math ~
 st29q03 + sc14q02 + st04q01 + escs + (1|schoolid), 
 weights = c('w_fstuwt', 'w_fschwt'), 
 data = pisa2012)
summary(nopv)

Call:
mix(formula = pv1math ~ st29q03 + sc14q02 + st04q01 + escs + 
    (1 | schoolid), data = pisa2012, weights = c("w_fstuwt", 
    "w_fschwt"))

Variance terms:
 Level    Group        Name Variance Std. Error Std.Dev.
     2 schoolid (Intercept)     1414        328     37.6
     1 Residual                 5265        152     72.6
Groups:
 Level    Group n size mean wgt sum wgt
     2 schoolid    157      192   30161
     1      Obs   3136      713 2236615

Fixed Effects:
                         Estimate Std. Error t value
(Intercept)                486.80       7.78   62.59
st29q03Agree               -11.11       5.70   -1.95
st29q03Disagree            -19.26       5.46   -3.53
st29q03Strongly disagree   -41.54       6.86   -6.05
sc14q02Very little         -21.34      17.06   -1.25
sc14q02To some extent      -11.78      12.82   -0.92
sc14q02A lot               -26.91       7.66   -3.51
st04q01Male                  9.51       2.99    3.18
escs                        25.57       2.12   12.07

lnl= -12789991.99 
Intraclass Correlation= 0.212

## can also do it this way (using conditional weights)
# nopv1 <- mix(pv1math ~
#  st29q03 + sc14q02 + st04q01 + escs + (1|schoolid), 
#  weights = c('pwt1', 'w_fschwt'), 
#  data = pisa2012, cWeights = TRUE)
# summary(nopv1)

4. Fit the model using several plausible values

The multiple plausible values are explicitly specified in the formula. A basic random intercepts model can be fit using:

## using 5 plausible values
m0 <- mixPV(pv1math + pv2math + pv3math + pv4math + pv5math ~
 st29q03 + sc14q02 + st04q01 + escs + (1|schoolid), 
 weights = c('w_fstuwt', 'w_fschwt'), 
 data = pisa2012)

Analyzing plausible value: pv1math 
Analyzing plausible value: pv2math 
Analyzing plausible value: pv3math 
Analyzing plausible value: pv4math 
Analyzing plausible value: pv5math

summary(m0)

Results of multilevel analyses with 5 plausible values.
Number of observations: 3136 

Estimates for random effects: 
                     estimate std.error statistic   df Pr(>t)    
schoolid.(Intercept)  1397.96    327.26      4.27 9027 <2e-16 ***
Residual              5295.23    158.43     33.42 2666 <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Estimates for fixed effects: 
                         estimate std.error statistic   df Pr(>t)    
(Intercept)                489.51      8.41     58.22  248 <2e-16 ***
st29q03Agree               -11.29      5.99     -1.88  527 0.0601 .  
st29q03Disagree            -19.53      6.05     -3.23  118 0.0016 ** 
st29q03Strongly disagree   -39.95      7.02     -5.69  335 <2e-16 ***
sc14q02Very little         -22.93     16.69     -1.37  759 0.1699    
sc14q02To some extent      -17.61     11.77     -1.50  204 0.1361    
sc14q02A lot               -30.05      7.75     -3.88  315 0.0001 ***
st04q01Male                  8.40      3.10      2.71  174 0.0075 ** 
escs                        25.88      2.11     12.29 2578 <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# summary_all(m0) #shows all results with multiple PVs

A random slope model can also be fit (adding a random slope for escs):

m1 <- mixPV(pv1math + pv2math + pv3math + pv4math + pv5math ~
 st29q03 + sc14q02 + st04q01 + escs + (escs|schoolid), 
 weights = c('w_fstuwt', 'w_fschwt'), 
 data = pisa2012)

Analyzing plausible value: pv1math 
Analyzing plausible value: pv2math 
Analyzing plausible value: pv3math 
Analyzing plausible value: pv4math 
Analyzing plausible value: pv5math

## a boundary(singular) fit message suggests that the 
## variance components estimated are not different from zero
## the results however do show variation
summary(m1)

Results of multilevel analyses with 5 plausible values.
Number of observations: 3136 

Estimates for random effects: 
                     estimate std.error statistic     df Pr(>t)    
schoolid.(Intercept)  1380.68    325.42      4.24 2151.8 <2e-16 ***
schoolid.escs          324.39     63.89      5.08  350.5 <2e-16 ***
Residual              5021.04    106.04     47.35   81.6 <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Estimates for fixed effects: 
                         estimate std.error statistic   df Pr(>t)    
(Intercept)                486.44      8.27     58.84  245 <2e-16 ***
st29q03Agree               -10.54      5.89     -1.79  370 0.0746 .  
st29q03Disagree            -17.40      6.08     -2.86  108 0.0050 ** 
st29q03Strongly disagree   -36.88      7.05     -5.23  250 <2e-16 ***
sc14q02Very little         -23.18     15.78     -1.47  706 0.1422    
sc14q02To some extent      -13.75     11.88     -1.16  480 0.2478    
sc14q02A lot               -35.74      8.59     -4.16  258 <2e-16 ***
st04q01Male                  8.88      3.09      2.88  192 0.0045 ** 
escs                        27.12      2.44     11.11 1412 <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#summary_all(m1)

Differences between the two models can be tested using a likelihood ratio test: is the full model an improvement over the reduced model (w/random slope vs random intercepts only?). See Grund et al. (2023) article in Psych Methods about different ways an LRT can be performed when using imputed data.

lrtPV(m1, m0) #full vs reduced

     F df1  df2   r     pv
1 98.3   2 2.65 441 0.0033

Just testing this with random noise too: noise1 (random variable that is not associated with the outcome)– differences should not be statistically significant (i.e., adding the variable does not improve the model).

m2 <- mixPV(pv1math + pv2math + pv3math + pv4math + pv5math ~
 st29q03 + sc14q02 + st04q01 + escs + noise1 + (1|schoolid), 
 weights = c('w_fstuwt', 'w_fschwt'), 
 data = pisa2012)

Analyzing plausible value: pv1math 
Analyzing plausible value: pv2math 
Analyzing plausible value: pv3math 
Analyzing plausible value: pv4math 
Analyzing plausible value: pv5math

summary(m2) #ns

Results of multilevel analyses with 5 plausible values.
Number of observations: 3136 

Estimates for random effects: 
                     estimate std.error statistic   df Pr(>t)    
schoolid.(Intercept)  1394.17    325.01      4.29 8527 <2e-16 ***
Residual              5295.00    158.46     33.41 2716 <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Estimates for fixed effects: 
                         estimate std.error statistic   df Pr(>t)    
(Intercept)               489.496     8.397    58.294  247 <2e-16 ***
st29q03Agree              -11.303     5.998    -1.884  526 0.0601 .  
st29q03Disagree           -19.562     6.060    -3.228  118 0.0016 ** 
st29q03Strongly disagree  -39.961     7.022    -5.691  335 <2e-16 ***
sc14q02Very little        -22.883    16.710    -1.369  777 0.1713    
sc14q02To some extent     -17.504    11.784    -1.485  204 0.1390    
sc14q02A lot              -30.078     7.702    -3.905  306 0.0001 ***
st04q01Male                 8.418     3.101     2.715  170 0.0073 ** 
escs                       25.895     2.114    12.249 2627 <2e-16 ***
noise1                      0.590     1.397     0.422 1941 0.6729    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

lrtPV(m2, m0) #ns too

     F df1  df2    r    pv
1 3.03   1 4.23 34.8 0.153

5. Showing results side by side

I like building up models over several phases. This just illustrates how you can do that. This output should still be fixed up (see article) and read the modelsummary documentation.

modelsummary(list("ri" = m0, 'rs' = m1),
             stars = TRUE, output = 'gt')

	ri	rs
(Intercept)	489.507***	486.441***
	(8.408)	(8.268)
st29q03Agree	-11.286+	-10.536+
	(5.989)	(5.894)
st29q03Disagree	-19.534**	-17.404**
	(6.050)	(6.079)
st29q03Strongly disagree	-39.949***	-36.881***
	(7.025)	(7.050)
sc14q02Very little	-22.927	-23.178
	(16.690)	(15.777)
sc14q02To some extent	-17.605	-13.748
	(11.766)	(11.881)
sc14q02A lot	-30.050***	-35.744***
	(7.748)	(8.593)
st04q01Male	8.395**	8.881**
	(3.103)	(3.088)
escs	25.883***	27.123***
	(2.105)	(2.441)
schoolid.(Intercept)	1397.963***	1380.681***
	(327.262)	(325.420)
Residual	5295.229***	5021.039***
	(158.428)	(106.040)
schoolid.escs		324.388***
		(63.893)
Nobs	3136	3136
N.pv	5	5
AICbar	25592462.6532503	25504198.0201192
BICbar	25592529.2109875	25504276.6792632
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001

NOTE: After this initial post, I received a request on how the mix output can be used with the modelsummary package. To get that to work (at least for two level models), use this:

source("https://raw.githubusercontent.com/flh3/pubdata/main/mixPV/wemix_modelsummary.R")

NOTE: that mix function does not show the p-values (like lmer). The stars (in modelsummary) – if just using the mix function (not mivPV) are based on p-values using a z statistic (assuming infinite degrees of freedom) which is likely not correct for higher-level predictors (as is done when using Mplus) – so be warned about that.

6. Comparing results using the `EdSurvey` package

NOTE: This can take a while to run if downloading the datasets using the function.

The article also shows the results when using SAS proc glimmix.

library(EdSurvey)
# downloadPISA(years = 2012, root = "C:/", cache = TRUE)

pisa12 <- readPISA(path = "C:/EdSurveyData/PISA/2012/", 
                   countries = c("usa"))
dat <- EdSurvey::getData(data = pisa12, 
 varnames = c('st04q01', 'escs', 'sc14q02', 'st29q03',
 'math', 'w_fstuwt', 'w_fschwt', 'schoolid'),
 omittedLevels = TRUE, addAttributes = TRUE)

m1 <- mixed.sdf(formula = math ~ st29q03 + sc14q02 +
  st04q01 + escs + (1|schoolid), 
  weightVar=c("w_fstuwt", "w_fschwt"), 
  data = dat, weightTransformation = FALSE)
summary(m1)

# > summary(m1)
# Call:
#   mixed.sdf(formula = math ~ st29q03 + sc14q02 + st04q01 + escs + 
#               (1 | schoolid), data = dat, weightVars = c("w_fstuwt", "w_fschwt"), 
#             weightTransformation = FALSE)
# 
# Formula: math ~ st29q03 + sc14q02 + st04q01 + escs + (1 |
#                                                         schoolid)
# 
# Plausible Values: 5
# Number of Groups:
#   Level    Group n size mean wgt sum wgt
# 2 schoolid    157    192.1   30161
# 1      Obs   3136    713.2 2236615
# 
# Variance terms:
#   Level    Group        Name Variance Std. Error Std.Dev.
# 2 schoolid (Intercept)     1398      327.5    37.39
# 1 Residual                 5295      152.5    72.77
# 
# Fixed Effects:
#   Estimate Std. Error t value
# (Intercept)               489.507      8.408  58.218
# st29q03Agree              -11.286      5.989  -1.884
# st29q03Disagree           -19.534      6.050  -3.229
# st29q03Strongly disagree  -39.949      7.025  -5.687
# sc14q02Very little        -22.927     16.690  -1.374
# sc14q02To some extent     -17.605     11.766  -1.496
# sc14q02A lot              -30.050      7.748  -3.879
# st04q01Male                 8.395      3.103   2.706
# escs                       25.883      2.105  12.293
# 
# Intraclass Correlation= 0.209

m2 <- mixed.sdf(formula = math ~ st29q03 + sc14q02 +
   st04q01 + escs + (escs|schoolid), 
   weightVar = c("w_fstuwt", "w_fschwt"), 
   data = dat, weightTransformation = FALSE)
summary(m2)

# > summary(m2)
# Call:
#   mixed.sdf(formula = math ~ st29q03 + sc14q02 + st04q01 + escs + 
#   (escs | schoolid), data = dat, weightVars = c("w_fstuwt", 
#  "w_fschwt"), weightTransformation = FALSE)
# 
# Formula: math ~ st29q03 + sc14q02 + st04q01 + escs + (escs |
#                                                         schoolid)
# 
# Plausible Values: 5
# Number of Groups:
#   Level    Group n size mean wgt sum wgt
# 2 schoolid    157    192.1   30161
# 1      Obs   3136    713.2 2236615
# 
# Variance terms:
#   Level    Group        Name Variance Std. Error Std.Dev. Corr1
# 2 schoolid (Intercept)   1380.7     303.89    37.16      
# 2 schoolid        escs    324.4      63.53    18.01  0.22
# 1 Residual               5021.0     137.82    70.86      
# 
# Fixed Effects:
#   Estimate Std. Error t value
# (Intercept)               486.441      8.268  58.836
# st29q03Agree              -10.536      5.894  -1.788
# st29q03Disagree           -17.404      6.079  -2.863
# st29q03Strongly disagree  -36.881      7.050  -5.231
# sc14q02Very little        -23.178     15.777  -1.469
# sc14q02To some extent     -13.748     11.881  -1.157
# sc14q02A lot              -35.744      8.593  -4.160
# st04q01Male                 8.881      3.088   2.876
# escs                       27.123      2.441  11.113
# 
# Intraclass Correlation= 0.253

— END

Using Plausible Values with Multilevel Models Using R

1. Load in packages and the mixPV functions

2. Load in the dataset (pisa2012).

3. The model can be fit using the mix function in the WeMix package.