Using Plausible Values with Multilevel Models Using R
Syntax to accompany the article:
Huang, F. (2024). Using plausible values when fitting multilevel models with large-scale assessment data using R. Large-scale Assessments in Education.
When fitting multilevel models using large scale assessments such as PISA or TIMSS, it is important to account for:
- the use of weights at different levels and
- the presence of multiple plausible values.
I am often asked how do you run this analysis in R. The following example shows how this can be done (read the article for more details). The use of plausible values in inherently a missing data problem– no one child in these LSAs takes the whole battery of tests (if any did, a child would need more than 10 hours to complete the assessment– which is not feasible). Respondents only complete some of the tests– not the whole thing. See Rutkowski et al (2010) for a guide on this. Note: there are other ways of doing this too (such as using the EdSurvey
package [Bailey et al.]).
1. Load in packages and the mixPV
functions
The mixPV
functions are loaded in using the source
function:
library(WeMix) #for mixed models with weights at different levels
library(modelsummary) #for output tables
library(broom) #needed for output tables
source("https://raw.githubusercontent.com/flh3/pubdata/main/mixPV/mixPV.R")
Note: this function is just a wrapper that uses the mix
function in the WeMix
package (Bailey et al.). The mix
function does all the hard work when fitting the models!
2. Load in the dataset (pisa2012
).
The dataset is available in the MLMusingR
package (v0.3.2).
## Import/load the data (I like using rio::import)
# pisa2012 <- rio::import("https://github.com/flh3/pubdata/raw/main/mixPV/pisa2012.rds")
data(pisa2012, package = 'MLMusingR') #data is in the MLMusingR package
## Inspect the data
## install.packages("lme4", type = "source") #if there are issues with Matrix
head(pisa2012)
pv1math pv2math pv3math pv4math pv5math escs schoolid st29q03
1 365 362 348 348 400 -0.60 0000001 Agree
2 416 422 411 422 393 0.89 0000001 Disagree
3 410 371 422 387 424 -1.99 0000001 Strongly agree
4 290 333 316 317 352 -0.35 0000001 Agree
5 383 423 415 369 380 0.63 0000001 Disagree
6 398 397 384 378 489 0.62 0000001 Strongly disagree
st04q01 w_fstuwt w_fschwt sc14q02 pwt1 noise1
1 Female 766 83.7 Not at all 9.16 -1.207
2 Male 809 83.7 Not at all 9.67 0.277
3 Male 846 83.7 Not at all 10.11 1.084
4 Male 809 83.7 Not at all 9.67 -2.346
5 Male 809 83.7 Not at all 9.67 0.429
6 Female 766 83.7 Not at all 9.16 0.506
summary(pisa2012)
pv1math pv2math pv3math pv4math pv5math
Min. :225 Min. :206 Min. :225 Min. :190 Min. :189
1st Qu.:421 1st Qu.:420 1st Qu.:421 1st Qu.:422 1st Qu.:422
Median :479 Median :481 Median :480 Median :482 Median :480
Mean :484 Mean :484 Mean :485 Mean :485 Mean :485
3rd Qu.:545 3rd Qu.:545 3rd Qu.:546 3rd Qu.:546 3rd Qu.:544
Max. :822 Max. :808 Max. :784 Max. :776 Max. :779
escs schoolid st29q03 st04q01
Min. :-3.80 Length:3136 Strongly agree : 387 Female :1570
1st Qu.:-0.48 Class :character Agree :1037 Male :1566
Median : 0.30 Mode :character Disagree :1231 N/A : 0
Mean : 0.20 Strongly disagree: 481 Invalid: 0
3rd Qu.: 0.93 N/A : 0 Missing: 0
Max. : 3.12 Invalid : 0
Missing : 0
w_fstuwt w_fschwt sc14q02 pwt1
Min. : 134 Min. : 22 Not at all :2276 Min. : 1.2
1st Qu.: 560 1st Qu.: 49 Very little : 525 1st Qu.: 4.7
Median : 665 Median : 81 To some extent: 272 Median : 8.9
Mean : 713 Mean : 144 A lot : 63 Mean : 9.5
3rd Qu.: 781 3rd Qu.: 149 N/A : 0 3rd Qu.:13.6
Max. :2598 Max. :1943 Invalid : 0 Max. :37.0
Missing : 0
noise1
Min. :-3.40
1st Qu.:-0.66
Median : 0.02
Mean : 0.01
3rd Qu.: 0.67
Max. : 3.20
dim(pisa2012)
[1] 3136 14
Read the article for a description of the predictors or use ?pisa2012
for the codebook. NOTE: do not use this dataset on its own for running any kind of substantive analysis.
3. The model can be fit using the mix
function in the WeMix package.
However, the example below just uses one plausible value (pv). I just show this to illustrate how using all the pvs can have slightly different results compared to just using one pv. Take note: when running these analysis– do NOT just use one pv– that is incorrect (despite what you may have read in some other peer-reviewed manuscript) for a variety of reasons. Read the documentation of the LSA.
## just using one plausible value
nopv <- mix(pv1math ~
st29q03 + sc14q02 + st04q01 + escs + (1|schoolid),
weights = c('w_fstuwt', 'w_fschwt'),
data = pisa2012)
summary(nopv)
Call:
mix(formula = pv1math ~ st29q03 + sc14q02 + st04q01 + escs +
(1 | schoolid), data = pisa2012, weights = c("w_fstuwt",
"w_fschwt"))
Variance terms:
Level Group Name Variance Std. Error Std.Dev.
2 schoolid (Intercept) 1414 328 37.6
1 Residual 5265 152 72.6
Groups:
Level Group n size mean wgt sum wgt
2 schoolid 157 192 30161
1 Obs 3136 713 2236615
Fixed Effects:
Estimate Std. Error t value
(Intercept) 486.80 7.78 62.59
st29q03Agree -11.11 5.70 -1.95
st29q03Disagree -19.26 5.46 -3.53
st29q03Strongly disagree -41.54 6.86 -6.05
sc14q02Very little -21.34 17.06 -1.25
sc14q02To some extent -11.78 12.82 -0.92
sc14q02A lot -26.91 7.66 -3.51
st04q01Male 9.51 2.99 3.18
escs 25.57 2.12 12.07
lnl= -12789991.99
Intraclass Correlation= 0.212
## can also do it this way (using conditional weights)
# nopv1 <- mix(pv1math ~
# st29q03 + sc14q02 + st04q01 + escs + (1|schoolid),
# weights = c('pwt1', 'w_fschwt'),
# data = pisa2012, cWeights = TRUE)
# summary(nopv1)
4. Fit the model using several plausible values
The multiple plausible values are explicitly specified in the formula. A basic random intercepts model can be fit using:
## using 5 plausible values
m0 <- mixPV(pv1math + pv2math + pv3math + pv4math + pv5math ~
st29q03 + sc14q02 + st04q01 + escs + (1|schoolid),
weights = c('w_fstuwt', 'w_fschwt'),
data = pisa2012)
Analyzing plausible value: pv1math
Analyzing plausible value: pv2math
Analyzing plausible value: pv3math
Analyzing plausible value: pv4math
Analyzing plausible value: pv5math
summary(m0)
Results of multilevel analyses with 5 plausible values.
Number of observations: 3136
Estimates for random effects:
estimate std.error statistic df Pr(>t)
schoolid.(Intercept) 1397.96 327.26 4.27 9027 <2e-16 ***
Residual 5295.23 158.43 33.42 2666 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Estimates for fixed effects:
estimate std.error statistic df Pr(>t)
(Intercept) 489.51 8.41 58.22 248 <2e-16 ***
st29q03Agree -11.29 5.99 -1.88 527 0.0601 .
st29q03Disagree -19.53 6.05 -3.23 118 0.0016 **
st29q03Strongly disagree -39.95 7.02 -5.69 335 <2e-16 ***
sc14q02Very little -22.93 16.69 -1.37 759 0.1699
sc14q02To some extent -17.61 11.77 -1.50 204 0.1361
sc14q02A lot -30.05 7.75 -3.88 315 0.0001 ***
st04q01Male 8.40 3.10 2.71 174 0.0075 **
escs 25.88 2.11 12.29 2578 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# summary_all(m0) #shows all results with multiple PVs
A random slope model can also be fit (adding a random slope for escs
):
m1 <- mixPV(pv1math + pv2math + pv3math + pv4math + pv5math ~
st29q03 + sc14q02 + st04q01 + escs + (escs|schoolid),
weights = c('w_fstuwt', 'w_fschwt'),
data = pisa2012)
Analyzing plausible value: pv1math
Analyzing plausible value: pv2math
Analyzing plausible value: pv3math
Analyzing plausible value: pv4math
Analyzing plausible value: pv5math
## a boundary(singular) fit message suggests that the
## variance components estimated are not different from zero
## the results however do show variation
summary(m1)
Results of multilevel analyses with 5 plausible values.
Number of observations: 3136
Estimates for random effects:
estimate std.error statistic df Pr(>t)
schoolid.(Intercept) 1380.68 325.42 4.24 2151.8 <2e-16 ***
schoolid.escs 324.39 63.89 5.08 350.5 <2e-16 ***
Residual 5021.04 106.04 47.35 81.6 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Estimates for fixed effects:
estimate std.error statistic df Pr(>t)
(Intercept) 486.44 8.27 58.84 245 <2e-16 ***
st29q03Agree -10.54 5.89 -1.79 370 0.0746 .
st29q03Disagree -17.40 6.08 -2.86 108 0.0050 **
st29q03Strongly disagree -36.88 7.05 -5.23 250 <2e-16 ***
sc14q02Very little -23.18 15.78 -1.47 706 0.1422
sc14q02To some extent -13.75 11.88 -1.16 480 0.2478
sc14q02A lot -35.74 8.59 -4.16 258 <2e-16 ***
st04q01Male 8.88 3.09 2.88 192 0.0045 **
escs 27.12 2.44 11.11 1412 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#summary_all(m1)
Differences between the two models can be tested using a likelihood ratio test: is the full model an improvement over the reduced model (w/random slope vs random intercepts only?). See Grund et al. (2023) article in Psych Methods about different ways an LRT can be performed when using imputed data.
lrtPV(m1, m0) #full vs reduced
F df1 df2 r pv
1 98.3 2 2.65 441 0.0033
Just testing this with random noise too: noise1
(random variable that is not associated with the outcome)– differences should not be statistically significant (i.e., adding the variable does not improve the model).
m2 <- mixPV(pv1math + pv2math + pv3math + pv4math + pv5math ~
st29q03 + sc14q02 + st04q01 + escs + noise1 + (1|schoolid),
weights = c('w_fstuwt', 'w_fschwt'),
data = pisa2012)
Analyzing plausible value: pv1math
Analyzing plausible value: pv2math
Analyzing plausible value: pv3math
Analyzing plausible value: pv4math
Analyzing plausible value: pv5math
summary(m2) #ns
Results of multilevel analyses with 5 plausible values.
Number of observations: 3136
Estimates for random effects:
estimate std.error statistic df Pr(>t)
schoolid.(Intercept) 1394.17 325.01 4.29 8527 <2e-16 ***
Residual 5295.00 158.46 33.41 2716 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Estimates for fixed effects:
estimate std.error statistic df Pr(>t)
(Intercept) 489.496 8.397 58.294 247 <2e-16 ***
st29q03Agree -11.303 5.998 -1.884 526 0.0601 .
st29q03Disagree -19.562 6.060 -3.228 118 0.0016 **
st29q03Strongly disagree -39.961 7.022 -5.691 335 <2e-16 ***
sc14q02Very little -22.883 16.710 -1.369 777 0.1713
sc14q02To some extent -17.504 11.784 -1.485 204 0.1390
sc14q02A lot -30.078 7.702 -3.905 306 0.0001 ***
st04q01Male 8.418 3.101 2.715 170 0.0073 **
escs 25.895 2.114 12.249 2627 <2e-16 ***
noise1 0.590 1.397 0.422 1941 0.6729
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
lrtPV(m2, m0) #ns too
F df1 df2 r pv
1 3.03 1 4.23 34.8 0.153
5. Showing results side by side
I like building up models over several phases. This just illustrates how you can do that. This output should still be fixed up (see article) and read the modelsummary
documentation.
modelsummary(list("ri" = m0, 'rs' = m1),
stars = TRUE, output = 'gt')
ri | rs | |
---|---|---|
(Intercept) | 489.507*** | 486.441*** |
(8.408) | (8.268) | |
st29q03Agree | -11.286+ | -10.536+ |
(5.989) | (5.894) | |
st29q03Disagree | -19.534** | -17.404** |
(6.050) | (6.079) | |
st29q03Strongly disagree | -39.949*** | -36.881*** |
(7.025) | (7.050) | |
sc14q02Very little | -22.927 | -23.178 |
(16.690) | (15.777) | |
sc14q02To some extent | -17.605 | -13.748 |
(11.766) | (11.881) | |
sc14q02A lot | -30.050*** | -35.744*** |
(7.748) | (8.593) | |
st04q01Male | 8.395** | 8.881** |
(3.103) | (3.088) | |
escs | 25.883*** | 27.123*** |
(2.105) | (2.441) | |
schoolid.(Intercept) | 1397.963*** | 1380.681*** |
(327.262) | (325.420) | |
Residual | 5295.229*** | 5021.039*** |
(158.428) | (106.040) | |
schoolid.escs | 324.388*** | |
(63.893) | ||
Nobs | 3136 | 3136 |
N.pv | 5 | 5 |
AICbar | 25592462.6532503 | 25504198.0201192 |
BICbar | 25592529.2109875 | 25504276.6792632 |
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 |
NOTE: After this initial post, I received a request on how the mix
output can be used with the modelsummary
package. To get that to work (at least for two level models), use this:
source("https://raw.githubusercontent.com/flh3/pubdata/main/mixPV/wemix_modelsummary.R")
NOTE: that mix
function does not show the p-values (like lmer
). The stars (in modelsummary
) – if just using the mix
function (not mivPV
) are based on p-values using a z statistic (assuming infinite degrees of freedom) which is likely not correct for higher-level predictors (as is done when using Mplus) – so be warned about that.
6. Comparing results using the EdSurvey
package
NOTE: This can take a while to run if downloading the datasets using the function.
The article also shows the results when using SAS proc glimmix.
library(EdSurvey)
# downloadPISA(years = 2012, root = "C:/", cache = TRUE)
pisa12 <- readPISA(path = "C:/EdSurveyData/PISA/2012/",
countries = c("usa"))
dat <- EdSurvey::getData(data = pisa12,
varnames = c('st04q01', 'escs', 'sc14q02', 'st29q03',
'math', 'w_fstuwt', 'w_fschwt', 'schoolid'),
omittedLevels = TRUE, addAttributes = TRUE)
m1 <- mixed.sdf(formula = math ~ st29q03 + sc14q02 +
st04q01 + escs + (1|schoolid),
weightVar=c("w_fstuwt", "w_fschwt"),
data = dat, weightTransformation = FALSE)
summary(m1)
# > summary(m1)
# Call:
# mixed.sdf(formula = math ~ st29q03 + sc14q02 + st04q01 + escs +
# (1 | schoolid), data = dat, weightVars = c("w_fstuwt", "w_fschwt"),
# weightTransformation = FALSE)
#
# Formula: math ~ st29q03 + sc14q02 + st04q01 + escs + (1 |
# schoolid)
#
# Plausible Values: 5
# Number of Groups:
# Level Group n size mean wgt sum wgt
# 2 schoolid 157 192.1 30161
# 1 Obs 3136 713.2 2236615
#
# Variance terms:
# Level Group Name Variance Std. Error Std.Dev.
# 2 schoolid (Intercept) 1398 327.5 37.39
# 1 Residual 5295 152.5 72.77
#
# Fixed Effects:
# Estimate Std. Error t value
# (Intercept) 489.507 8.408 58.218
# st29q03Agree -11.286 5.989 -1.884
# st29q03Disagree -19.534 6.050 -3.229
# st29q03Strongly disagree -39.949 7.025 -5.687
# sc14q02Very little -22.927 16.690 -1.374
# sc14q02To some extent -17.605 11.766 -1.496
# sc14q02A lot -30.050 7.748 -3.879
# st04q01Male 8.395 3.103 2.706
# escs 25.883 2.105 12.293
#
# Intraclass Correlation= 0.209
m2 <- mixed.sdf(formula = math ~ st29q03 + sc14q02 +
st04q01 + escs + (escs|schoolid),
weightVar = c("w_fstuwt", "w_fschwt"),
data = dat, weightTransformation = FALSE)
summary(m2)
# > summary(m2)
# Call:
# mixed.sdf(formula = math ~ st29q03 + sc14q02 + st04q01 + escs +
# (escs | schoolid), data = dat, weightVars = c("w_fstuwt",
# "w_fschwt"), weightTransformation = FALSE)
#
# Formula: math ~ st29q03 + sc14q02 + st04q01 + escs + (escs |
# schoolid)
#
# Plausible Values: 5
# Number of Groups:
# Level Group n size mean wgt sum wgt
# 2 schoolid 157 192.1 30161
# 1 Obs 3136 713.2 2236615
#
# Variance terms:
# Level Group Name Variance Std. Error Std.Dev. Corr1
# 2 schoolid (Intercept) 1380.7 303.89 37.16
# 2 schoolid escs 324.4 63.53 18.01 0.22
# 1 Residual 5021.0 137.82 70.86
#
# Fixed Effects:
# Estimate Std. Error t value
# (Intercept) 486.441 8.268 58.836
# st29q03Agree -10.536 5.894 -1.788
# st29q03Disagree -17.404 6.079 -2.863
# st29q03Strongly disagree -36.881 7.050 -5.231
# sc14q02Very little -23.178 15.777 -1.469
# sc14q02To some extent -13.748 11.881 -1.157
# sc14q02A lot -35.744 8.593 -4.160
# st04q01Male 8.881 3.088 2.876
# escs 27.123 2.441 11.113
#
# Intraclass Correlation= 0.253
— END