Using cluster robust standard errors to analyze nested data with a few clusters (Korea)


In education, data are often clustered (e.g., students within schools) and various methods (e.g., multilevel modeling, generalized estimating equations) have been developed over the years to properly account for these nonindependent data structures. Ignoring the clustered data structure is well known to result in erroneous statistical inference tests (e.g., type I errors) due to misestimated standard errors and overly liberal degrees of freedom used. One alternative method when analyzing clustered datasets is to use cluster-robust standard errors (CRSEs; CR0) (Liang & Zeger, 1986). CRSEs are often used in various disciplines (e.g., econometrics) though are not common in educational research. A limitation of CRSEs is that, although they work well with a large number of clusters, CRSEs are known to still underestimate standard errors when there are a limited number of clusters (e.g., < 50). This is of particular importance when analyzing data from cluster randomized controlled trials (CRTs) where often, a limited number of clusters is common. However, over 20 years ago, Bell and McCaffrey (2002) proposed an adjustment to the traditional CRSEs and referred to this as the bias-reduced linearization (or the CR2) estimator used together with Satterthwaite (1946) degrees of freedom (df) adjustments. However, the CR2 has not seen much use in the applied literature due to its limited accessibility. Using Monte Carlo simulations (using R), we evaluated the CR2 estimator using conditions often found in educational research using both continuous and binary outcomes (as well as cross classified data structures). Conditions based on the number of clusters, the intraclass correlation coefficient, and group size (among others) were manipulated. Coverage probabilities, type I error rates, and power were assessed. The CR2 estimator results (with and without df adjustments) were compared to results analyzed using the traditional CR0 CRSEs and multilevel models (MLMs). Findings show that the traditional CRSEs (i.e., CR0) had issues with a few clusters but the CR2 results were comparable to those estimated using multilevel models and are a viable alternative when only a few clusters are present. To extend its use for applied researchers, we also provide a free SPSS add-on that can compute these CRSEs.

23rd International Conference on Education Research
October 11-13, 2023, Seoul National Univ, Hoam Faculty House