In a recent article in Multivariate Behavioral Research, we (Huang, Wiedermann, and Zhang; HWZ; doi: 10.1080/00273171.2022.2077290) discuss a robust standard error that can be used with mixed models that accounts for violations of homogeneity. Note that these robust standard errors have been around for years though are not always provided in statistical software. These can also be computed using the CR2
package or the clubSandwich
package. This page shows how to compute the traditional Liang and Zeger (1986) robust standard errors (CR0) and the CR2 estimator- see Bell and McCaffrey (2002) as well as McCaffrey, Bell, and Botts (2001) (BM and MBB).
NOTE: Cluster robust standard errors can also be computed using our SPSS add-on available here: https://github.com/flh3/CR2. This doesn’t work though with mixed models.
d1 <- rio::import(file = 'https://github.com/flh3/evalclass/raw/main/ccrem/ccrem.sav')
set.seed(123)
d1 <- d1[sample(nrow(d1)),] #jumble up the data to test functions...
## c1 is the clustering variable
Inspecting the intraclass correlation coefficient.
library(lmerTest)
library(CR2)
m0 <- lmer(y ~ 1 + (1|c1),
data = d1)
performance::icc(m0)
# Intraclass Correlation Coefficient
Adjusted ICC: 0.095
Conditional ICC: 0.095
dplyr::n_distinct(d1$c1)
[1] 30
Use a basic random intercepts model.
m2 <- lmer(y ~ W1 + X1 + (1|c1), # + (1|c2), for CCREM, ignore for now
data = d1)
Using elements for from the m2
object, reconstruct the results “manually”. The fixed effects are: \[(X'V^{-1}X)^{-1}X'V^{-1}y.\]
Vm <- CR2::getV(m2) #V matrix, install the package.
X <- model.matrix(m2) #X matrix
y <- m2@resp$y #the outcome
dat <- m2@frame #the dataframe
Vinv <- solve(Vm) #can be slow, don't really do this but the data are small anyway
solve(t(X) %*% Vinv %*% X) %*% t(X) %*% Vinv %*% y #manual, slow
[,1]
(Intercept) 10.706834
W1 0.339460
X1 1.733624
fixef(m2) #fixed effects / regression coefficients
(Intercept) W1 X1
10.706834 0.339460 1.733624
The model-based variance covariance matrix is \[(X'V^{-1}X)^{-1}.\] Standard errors are the square root of the diagonal of the matrix.
solve(t(X) %*% Vinv %*% X)
(Intercept) W1 X1
(Intercept) 3.218505e-02 -0.0047134529 1.409252e-05
W1 -4.713453e-03 0.0340070281 -2.443940e-05
X1 1.409252e-05 -0.0000244394 4.081615e-03
vcov(m2) #same
3 x 3 Matrix of class "dpoMatrix"
(Intercept) W1 X1
(Intercept) 3.218505e-02 -0.0047134529 1.409252e-05
W1 -4.713453e-03 0.0340070281 -2.443940e-05
X1 1.409252e-05 -0.0000244394 4.081615e-03
The regression coefficients and standard errors are the same.
Now to compute robust standard errors
The cluster robust standard errors (CRSEs) were originally proposed by Liang and Zeger (1986). These can be shown based on:
\[\hat{Var}(\hat{\beta})=(X'V^{-1}X)^{-1} \sum_{c=1}^{C}X_{c}^{'}\hat{\Omega_c}X_{c}(X'V^{-1}X)^{-1}\] The formulation of \(\hat{\Omega}\) differs for the CR0 and the CR2 estimators. The sandwich standard errors– using the bread \((X'V^{-1}X)^{-1}\) components and the meat, \(\sum_{c=1}^{C}X_{c}^{'}\hat{\Omega_c}X_{c}\), differ based on the meat portion.
Computing the CR0 standard errors
For the CR0, the meat is:
\[\sum_{c=1}^{C}X_{c}^{'}V^{-1}_c\hat{e}_c\hat{e}'_cV^{-1}_c X_{c}\]
For both CR0 and CR2, we need the residuals…
rr <- y - X %*% fixef(m2) #marginal residuals
Compute the matrices needed for the CR0 standard errors.
Xl <- Vl <- Rl <- list()
cnames <- names(table(dat$c1))
## to get Xc
Xlist <- function(x){
sel <- which(dat$c1 == x)
Xl[[x]] <- X[sel,]
}
Xl <- lapply(cnames, Xlist)
## lists become numeric? just add the names back later...
Vlist <- function(x){
sel <- which(dat$c1 == x)
Vl[[x]] <- solve(Vm[sel, sel]) #can use chol2inv but using solve for transparency
}
Vl <- lapply(cnames, Vlist)
names(Xl) <- names(Vl) <- cnames ## adding names back?
## This is to compute the bread portion
addx <- function(x){
t(Xl[[x]]) %*% Vl[[x]] %*% Xl[[x]] #(Xc' x Vinvc x Xc)
}
mm <- lapply(cnames, addx) #getting this per cluster
br <- solve(Reduce("+", mm)) #adding the matrices together: bread matrix
## residuals in a list -- for the meat
Rl <- split(rr, dat$c1) #has names; this is easier for vectors
mt <- function(x){
t(Rl[[x]]) %*% Vl[[x]] %*% Xl[[x]] #compute part of the the meat matrix
}
u <- t(sapply(cnames, mt))
mt <- t(u) %*% u #the full meat
clvc.cr0 <- br %*% mt %*% br #variance covariance matrix
clvc.cr0 #manual computation of robust vcov matrix
(Intercept) W1 X1
(Intercept) 0.031325170 -0.008068560 0.005728414
W1 -0.008068560 0.019267187 0.001174817
X1 0.005728414 0.001174817 0.015422219
Compare these results to the one estimated using the clubSandwich
package (see Pustejovsky and Tipton– which is much faster and more flexible).
library(clubSandwich)
vcovCR(m2, d1$c1, type = "CR0") #the same
(Intercept) W1 X1
(Intercept) 0.031325170 -0.008068560 0.005728414
W1 -0.008068560 0.019267187 0.001174817
X1 0.005728414 0.001174817 0.015422219
The traditional Liang and Zeger (1986) robust standard errors are the same. The standard errors are the square roots of the diagonals of the matrix:
sqrt(diag(clvc.cr0))
(Intercept) W1 X1
0.1769892 0.1388063 0.1241862
You can also compare this to the empirical standard errors computed by SAS (using the empirical
option in proc mixed
):
Another quick way to do this is to use the robust_mixed
function in the CR2
package. This shows a comparison with the model-based and the robust standard errors.
CR2::robust_mixed(m2, type = 'CR0')
Standard error type = CR0
Degrees of freedom = Satterthwaite
Estimate mb.se robust.se t.stat df Pr(>t)
(Intercept) 10.707 0.179 0.177 60.494 26.3 <2e-16 ***
W1 0.339 0.184 0.139 2.446 9.8 0.035 *
X1 1.734 0.064 0.124 13.960 27.8 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
By default, the robust_mixed
function uses the CR2 estimator (shown next), though for comparative purposes, I indicate CR0
(as this is what SAS and what other software compute).
Note that the LZ (i.e., CR0) SE are well known to underestimate the level-2 SEs when there are a few clusters (e.g., < 40). Instead, the CR2 adjustment of BM and MBB have been recommended (Huang & Li, 2022).
Computing the CR2 standard errors
Now, for the CR2 adjustment matrices which will be needed (see HWZ paper for more explanations), the meat matrix is:
\[\sum_{c=1}^{C}X_{c}^{'}V^{-1}_c[I_c-H_c]^{-1/2}\hat{e}_c\hat{e}'_c[I_c-H_c]^{-1/2}V^{-1}_c X_{c}\]
The \(H_c\) matrix can be computed using (note the bread matrix is used, so good we computed that already earlier):
\[H_c = X_c(X'V^{-1}X)^{-1}X'_cV^{-1}_c.\]
Note: instead of the \([I_c-H_c]^{-1/2}\) adjustment matrix (which is the inverse of the symmetric square root; need the CR2
package to compute this), MBB indicated that we can use:
\[U_c'[U_c(I_c-H_c)V_cU_c']^{-1/2}U_c\] where \(U_c\) is an upper triangular matrix such that \(V_c = U_c'U_c\) so that \(U_c\) is referred to as the Cholesky root of the matrix \(V_c\). For example:
tt = matrix(c(1, .3, .2, #test matrix
.3, 1, .4,
.2, .4, 1), nrow = 3)
tt # full matrix
[,1] [,2] [,3]
[1,] 1.0 0.3 0.2
[2,] 0.3 1.0 0.4
[3,] 0.2 0.4 1.0
chol(tt) #chol matrix; upper triangle
[,1] [,2] [,3]
[1,] 1 0.3000000 0.2000000
[2,] 0 0.9539392 0.3564168
[3,] 0 0.0000000 0.9126703
t(chol(tt)) %*% chol(tt) #obtain original matrix
[,1] [,2] [,3]
[1,] 1.0 0.3 0.2
[2,] 0.3 1.0 0.4
[3,] 0.2 0.4 1.0
Now the adjustment matrix is more complicated. However, this can all be done using syntax:
tXs <- function(s) {
sel <- which(dat$c1 == s)
Vmc <- Vm[sel, sel] #V matrix per cluster
Ijj <- diag(nrow(Xl[[s]])) #Identity matrix
Hjj <- Xl[[s]] %*% br %*% t(Xl[[s]]) %*% Vl[[s]]
IHjj <- Ijj - Hjj #identity less Hat matrix
#MatSqrtInverse(Ijj - Hjj) #early adjustment / valid
V3 <- chol(Vmc) #based on MBB; #see footnote 5 in HWZ; upper triangle
Bi <- V3 %*% IHjj %*% Vmc %*% t(V3)
t(V3) %*% MatSqrtInverse(Bi) %*% V3 #need the CR2 package installed
}
tX <- lapply(cnames, tXs) #these are the adjustment matrices
names(tX) <- cnames
cc2 <- function(x){
Rl[[x]] %*% tX[[x]] %*% Vl[[x]] %*% Xl[[x]] # r x Adj x Vinv x Xc
}
u <- t(sapply(cnames, cc2))
mt2 <- t(u) %*% u #the full meat matrix
clvc.cr2 <- br %*% mt2 %*% br #variance covariance matrix
clvc.cr2 #manual computation of robust vcov matrix for CR2
(Intercept) W1 X1
(Intercept) 0.033001573 -0.008084060 0.006009505
W1 -0.008084060 0.021781321 0.001545199
X1 0.006009505 0.001545199 0.015970182
vcovCR(m2, d1$c1, type = "CR2") #the same
(Intercept) W1 X1
(Intercept) 0.033001573 -0.008084060 0.006009505
W1 -0.008084060 0.021781321 0.001545199
X1 0.006009505 0.001545199 0.015970182
The manual computation and the output provided by clubSandwich
are the same.
Comparing different standard errors
Also, we can ask “Are random slopes warranted?” (They should be since I simulated the dataset to have random slopes…)
m3 <- lmer(y ~ W1 + X1 + (X1|c1), #add random slope
data = d1)
anova(m2, m3) #RS warranted, large X2
refitting model(s) with ML (instead of REML)
Data: d1
Models:
m2: y ~ W1 + X1 + (1 | c1)
m3: y ~ W1 + X1 + (X1 | c1)
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
m2 5 6970.9 6997.4 -3480.4 6960.9
m3 7 6923.6 6960.8 -3454.8 6909.6 51.283 2 7.312e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Results indicate that RS should be included (\(\chi^2\) is huge).
To compare the various model-based random intercept, CR0, CR2, and random slope standard errors:
data.frame(
mb_ri = sqrt(diag(vcov(m2))), #model-based, X1 too low, should have a random slope
cr0 = sqrt(diag(clvc.cr0)),
cr2 = sqrt(diag(clvc.cr2)),
mb_rs = sqrt(diag(vcov(m3)))
)
mb_ri cr0 cr2 mb_rs
(Intercept) 0.17940193 0.1769892 0.1816634 0.1766116
W1 0.18440995 0.1388063 0.1475850 0.1670175
X1 0.06388752 0.1241862 0.1263732 0.1304942
We see that the model-based RI SEs are too low for the level-1 (X1) variable. All the other models correct for that (due to heterogeneity). In that way, robust standard errors can be used as well as a diagnostic if random slopes are needed too.
The CR2 standard errors can also be computed using:
CR2::robust_mixed(m2)
Standard error type = CR2
Degrees of freedom = Satterthwaite
Estimate mb.se robust.se t.stat df Pr(>t)
(Intercept) 10.707 0.179 0.182 58.938 26.3 <2e-16 ***
W1 0.339 0.184 0.148 2.300 8.9 0.047 *
X1 1.734 0.064 0.126 13.718 27.7 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
References
Bell, R., & McCaffrey, D. (2002). Bias reduction in standard errors for linear regression with multi-stage samples. Survey Methodology, 28, 169-182.
Huang, F., & Li, X. (2021). Using cluster-robust standard errors when analyzing group-randomized trials with few clusters. Behavior Research Methods. https://doi.org/10.3758/s13428-021-01627-0
Huang, F., Zhang, B., & Li, X. (in press). Using robust standard errors for the analysis of binary outcomes with a small number of clusters. Journal of Research on Educational Effectiveness.
Huang, F., Wiedermann, W., & Zhang, B. (2022). Accounting for heteroskedasticity resulting from between-group differences in multilevel models. Multivariate Behavioral Research. doi: 10.1080/00273171.2022.2077290.
Liang, K.-Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1), 13-22.
McCaffrey, D., Bell, R., & Botts, C. (2001, August). Generalizations of biased reduced linearization. Paper presented at the Annual Meeting of the American Statistical Association, Atlanta, GA.
Pustejovsky, J. E., & Tipton, E. (2018). Small-sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models. Journal of Business & Economic Statistics, 36(4), 672-683.