Reassessing weights in large-scale assessments and multilevel models

Abstract

When analyzing large-scale assessments (LSAs) that use complex sampling designs, it is important to account for probability sampling using weights. However, the use of these weights in multilevel models has been widely debated, particularly regarding their application at different levels of the model. Yet, no consensus has been reached on the best method to apply weights. To address this, we conducted a Monte Carlo simulation, modeling a typical LSA population with known true values for the variables of interest. Using repeated sampling from this population, we generated weights using a stratified two-stage cluster design, where clusters (schools) were selected using probability proportional to size (PPS) sampling from designated explicit strata. We examined both class-level and student-level sampling structures and applied a nonresponse model at both the school and student levels. For each sample drawn, we assessed bias and coverage rates across models that applied weights at two levels, only at level 2, only at level 1, and without weights. Our findings show that applying only level-2 weights produced the most precise estimates, while models with no weights or only rescaled level-1 weights led to the highest bias. Using both level-1 and level-2 weights together was acceptable, although variance components were slightly underestimated. However, scaling level-1 weights would mirror using only the level-2 weights in datasets where there is no variation of weights within clusters. An applied example using TIMSS data supports these findings. This study contributes to the literature by explaining the least biased weight methods with complex sampling scenarios and offering practical guidance on using weights in multilevel models. We provide the R syntax for both the simulation and the applied example for reproducibility.

Publication
In Large-scale Assessments in Education.
Date
Previous