# Robust standard errors in mixed models

In a recent article in Multivariate Behavioral Research, we (Huang, Wiedermann, and Zhang; HWZ; doi: 10.1080/00273171.2022.2077290) discuss a robust standard error that can be used with mixed models that accounts for violations of homogeneity. Note that these robust standard errors have been around for years though are not always provided in statistical software. These can also be computed using the CR2 package or the clubSandwich package. This page shows how to compute the traditional Liang and Zeger (1986) robust standard errors (CR0) and the CR2 estimator- see Bell and McCaffrey (2002) as well as McCaffrey, Bell, and Botts (2001) (BM and MBB).

NOTE: Cluster robust standard errors can also be computed using our SPSS add-on available here: https://github.com/flh3/CR2. This doesn’t work though with mixed models.

d1 <- rio::import(file = 'https://github.com/flh3/evalclass/raw/main/ccrem/ccrem.sav')
set.seed(123)
d1 <- d1[sample(nrow(d1)),] #jumble up the data to test functions...
## c1 is the clustering variable

Inspecting the intraclass correlation coefficient.

library(lmerTest)
library(CR2)
m0 <- lmer(y ~ 1 + (1|c1),
data = d1)
performance::icc(m0)
# Intraclass Correlation Coefficient

Conditional ICC: 0.095
dplyr::n_distinct(d1$c1) [1] 30 Use a basic random intercepts model. m2 <- lmer(y ~ W1 + X1 + (1|c1), # + (1|c2), for CCREM, ignore for now data = d1) Using elements for from the m2 object, reconstruct the results “manually”. The fixed effects are: $(X'V^{-1}X)^{-1}X'V^{-1}y.$ Vm <- CR2::getV(m2) #V matrix, install the package. X <- model.matrix(m2) #X matrix y <- m2@resp$y #the outcome
dat <- m2@frame #the dataframe
Vinv <- solve(Vm) #can be slow, don't really do this but the data are small anyway

solve(t(X) %*% Vinv %*% X) %*% t(X) %*% Vinv %*% y #manual, slow
                 [,1]
(Intercept) 10.706834
W1           0.339460
X1           1.733624
fixef(m2) #fixed effects / regression coefficients
(Intercept)          W1          X1
10.706834    0.339460    1.733624 

The model-based variance covariance matrix is $(X'V^{-1}X)^{-1}.$ Standard errors are the square root of the diagonal of the matrix.

solve(t(X) %*% Vinv %*% X)
              (Intercept)            W1            X1
(Intercept)  3.218505e-02 -0.0047134529  1.409252e-05
W1          -4.713453e-03  0.0340070281 -2.443940e-05
X1           1.409252e-05 -0.0000244394  4.081615e-03
vcov(m2) #same
3 x 3 Matrix of class "dpoMatrix"
(Intercept)            W1            X1
(Intercept)  3.218505e-02 -0.0047134529  1.409252e-05
W1          -4.713453e-03  0.0340070281 -2.443940e-05
X1           1.409252e-05 -0.0000244394  4.081615e-03

The regression coefficients and standard errors are the same.

## Now to compute robust standard errors

The cluster robust standard errors (CRSEs) were originally proposed by Liang and Zeger (1986). These can be shown based on:

$\hat{Var}(\hat{\beta})=(X'V^{-1}X)^{-1} \sum_{c=1}^{C}X_{c}^{'}\hat{\Omega_c}X_{c}(X'V^{-1}X)^{-1}$ The formulation of $\hat{\Omega}$ differs for the CR0 and the CR2 estimators. The sandwich standard errors– using the bread $(X'V^{-1}X)^{-1}$ components and the meat, $\sum_{c=1}^{C}X_{c}^{'}\hat{\Omega_c}X_{c}$, differ based on the meat portion.

### Computing the CR0 standard errors

For the CR0, the meat is:

$\sum_{c=1}^{C}X_{c}^{'}V^{-1}_c\hat{e}_c\hat{e}'_cV^{-1}_c X_{c}$

For both CR0 and CR2, we need the residuals…

rr <- y - X %*% fixef(m2) #marginal residuals

Compute the matrices needed for the CR0 standard errors.

Xl <- Vl <- Rl <- list()
cnames <- names(table(dat$c1)) ## to get Xc Xlist <- function(x){ sel <- which(dat$c1 == x)
Xl[[x]] <- X[sel,]
}

Xl <- lapply(cnames, Xlist)
## lists become numeric? just add the names back later...

Vlist <- function(x){
sel <- which(dat$c1 == x) Vl[[x]] <- solve(Vm[sel, sel]) #can use chol2inv but using solve for transparency } Vl <- lapply(cnames, Vlist) names(Xl) <- names(Vl) <- cnames ## adding names back? ## This is to compute the bread portion addx <- function(x){ t(Xl[[x]]) %*% Vl[[x]] %*% Xl[[x]] #(Xc' x Vinvc x Xc) } mm <- lapply(cnames, addx) #getting this per cluster br <- solve(Reduce("+", mm)) #adding the matrices together: bread matrix ## residuals in a list -- for the meat Rl <- split(rr, dat$c1) #has names; this is easier for vectors

mt <- function(x){
t(Rl[[x]]) %*% Vl[[x]] %*% Xl[[x]]  #compute part of the the meat matrix
}

u <- t(sapply(cnames, mt))

mt <- t(u) %*% u #the full meat
clvc.cr0 <- br %*% mt %*% br #variance covariance matrix
clvc.cr0 #manual computation of robust vcov matrix
             (Intercept)           W1          X1
(Intercept)  0.031325170 -0.008068560 0.005728414
W1          -0.008068560  0.019267187 0.001174817
X1           0.005728414  0.001174817 0.015422219

Compare these results to the one estimated using the clubSandwich package (see Pustejovsky and Tipton– which is much faster and more flexible).

library(clubSandwich)
vcovCR(m2, d1$c1, type = "CR0") #the same  (Intercept) W1 X1 (Intercept) 0.031325170 -0.008068560 0.005728414 W1 -0.008068560 0.019267187 0.001174817 X1 0.005728414 0.001174817 0.015422219 The traditional Liang and Zeger (1986) robust standard errors are the same. The standard errors are the square roots of the diagonals of the matrix: sqrt(diag(clvc.cr0)) (Intercept) W1 X1 0.1769892 0.1388063 0.1241862  You can also compare this to the empirical standard errors computed by SAS (using the empirical option in proc mixed): Another quick way to do this is to use the robust_mixed function in the CR2 package. This shows a comparison with the model-based and the robust standard errors. CR2::robust_mixed(m2, type = 'CR0')  Standard error type = CR0 Degrees of freedom = Satterthwaite Estimate mb.se robust.se t.stat df Pr(>t) (Intercept) 10.707 0.179 0.177 60.494 26.3 <2e-16 *** W1 0.339 0.184 0.139 2.446 9.8 0.035 * X1 1.734 0.064 0.124 13.960 27.8 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 By default, the robust_mixed function uses the CR2 estimator (shown next), though for comparative purposes, I indicate CR0 (as this is what SAS and what other software compute). Note that the LZ (i.e., CR0) SE are well known to underestimate the level-2 SEs when there are a few clusters (e.g., < 40). Instead, the CR2 adjustment of BM and MBB have been recommended (Huang & Li, 2022). ### Computing the CR2 standard errors Now, for the CR2 adjustment matrices which will be needed (see HWZ paper for more explanations), the meat matrix is: $\sum_{c=1}^{C}X_{c}^{'}V^{-1}_c[I_c-H_c]^{-1/2}\hat{e}_c\hat{e}'_c[I_c-H_c]^{-1/2}V^{-1}_c X_{c}$ The $H_c$ matrix can be computed using (note the bread matrix is used, so good we computed that already earlier): $H_c = X_c(X'V^{-1}X)^{-1}X'_cV^{-1}_c.$ Note: instead of the $[I_c-H_c]^{-1/2}$ adjustment matrix (which is the inverse of the symmetric square root; need the CR2 package to compute this), MBB indicated that we can use: $U_c'[U_c(I_c-H_c)V_cU_c']^{-1/2}U_c$ where $U_c$ is an upper triangular matrix such that $V_c = U_c'U_c$ so that $U_c$ is referred to as the Cholesky root of the matrix $V_c$. For example: tt = matrix(c(1, .3, .2, #test matrix .3, 1, .4, .2, .4, 1), nrow = 3) tt # full matrix  [,1] [,2] [,3] [1,] 1.0 0.3 0.2 [2,] 0.3 1.0 0.4 [3,] 0.2 0.4 1.0 chol(tt) #chol matrix; upper triangle  [,1] [,2] [,3] [1,] 1 0.3000000 0.2000000 [2,] 0 0.9539392 0.3564168 [3,] 0 0.0000000 0.9126703 t(chol(tt)) %*% chol(tt) #obtain original matrix  [,1] [,2] [,3] [1,] 1.0 0.3 0.2 [2,] 0.3 1.0 0.4 [3,] 0.2 0.4 1.0 Now the adjustment matrix is more complicated. However, this can all be done using syntax: tXs <- function(s) { sel <- which(dat$c1 == s)
Vmc <- Vm[sel, sel] #V matrix per cluster
Ijj <- diag(nrow(Xl[[s]])) #Identity matrix
Hjj <- Xl[[s]] %*% br %*% t(Xl[[s]]) %*% Vl[[s]]
IHjj <- Ijj - Hjj #identity less Hat matrix
#MatSqrtInverse(Ijj - Hjj) #early adjustment / valid
V3 <- chol(Vmc) #based on MBB; #see footnote 5 in HWZ; upper triangle
Bi <- V3 %*% IHjj %*% Vmc %*% t(V3)
t(V3) %*% MatSqrtInverse(Bi) %*% V3 #need the CR2 package installed
}

tX <- lapply(cnames, tXs) #these are the adjustment matrices
names(tX) <- cnames

cc2 <- function(x){
Rl[[x]] %*% tX[[x]] %*% Vl[[x]] %*% Xl[[x]] # r x Adj x Vinv x Xc
}

u <- t(sapply(cnames, cc2))

mt2 <- t(u) %*% u #the full meat matrix
clvc.cr2 <- br %*% mt2 %*% br #variance covariance matrix

clvc.cr2 #manual computation of robust vcov matrix for CR2
             (Intercept)           W1          X1
(Intercept)  0.033001573 -0.008084060 0.006009505
W1          -0.008084060  0.021781321 0.001545199
X1           0.006009505  0.001545199 0.015970182
vcovCR(m2, d1\$c1, type = "CR2") #the same
             (Intercept)           W1          X1
(Intercept)  0.033001573 -0.008084060 0.006009505
W1          -0.008084060  0.021781321 0.001545199
X1           0.006009505  0.001545199 0.015970182

The manual computation and the output provided by clubSandwich are the same.

### Comparing different standard errors

Also, we can ask “Are random slopes warranted?” (They should be since I simulated the dataset to have random slopes…)

m3 <- lmer(y ~ W1 + X1 + (X1|c1), #add random slope
data = d1)
anova(m2, m3) #RS warranted, large X2
refitting model(s) with ML (instead of REML)
Data: d1
Models:
m2: y ~ W1 + X1 + (1 | c1)
m3: y ~ W1 + X1 + (X1 | c1)
npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
m2    5 6970.9 6997.4 -3480.4   6960.9
m3    7 6923.6 6960.8 -3454.8   6909.6 51.283  2  7.312e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Results indicate that RS should be included ($\chi^2$ is huge).

To compare the various model-based random intercept, CR0, CR2, and random slope standard errors:

data.frame(
mb_ri = sqrt(diag(vcov(m2))), #model-based, X1 too low, should have a random slope
cr0 = sqrt(diag(clvc.cr0)),
cr2 = sqrt(diag(clvc.cr2)),
mb_rs = sqrt(diag(vcov(m3)))
)
                 mb_ri       cr0       cr2     mb_rs
(Intercept) 0.17940193 0.1769892 0.1816634 0.1766116
W1          0.18440995 0.1388063 0.1475850 0.1670175
X1          0.06388752 0.1241862 0.1263732 0.1304942

We see that the model-based RI SEs are too low for the level-1 (X1) variable. All the other models correct for that (due to heterogeneity). In that way, robust standard errors can be used as well as a diagnostic if random slopes are needed too.

The CR2 standard errors can also be computed using:

CR2::robust_mixed(m2)

Standard error type = CR2
Degrees of freedom = Satterthwaite

Estimate  mb.se robust.se t.stat   df Pr(>t)
(Intercept)   10.707  0.179     0.182 58.938 26.3 <2e-16 ***
W1             0.339  0.184     0.148  2.300  8.9  0.047 *
X1             1.734  0.064     0.126 13.718 27.7 <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### References

Bell, R., & McCaffrey, D. (2002). Bias reduction in standard errors for linear regression with multi-stage samples. Survey Methodology, 28, 169-182.

Huang, F., & Li, X. (2021). Using cluster-robust standard errors when analyzing group-randomized trials with few clusters. Behavior Research Methods. https://doi.org/10.3758/s13428-021-01627-0

Huang, F., Zhang, B., & Li, X. (in press). Using robust standard errors for the analysis of binary outcomes with a small number of clusters. Journal of Research on Educational Effectiveness.

Huang, F., Wiedermann, W., & Zhang, B. (2022). Accounting for heteroskedasticity resulting from between-group differences in multilevel models. Multivariate Behavioral Research. doi: 10.1080/00273171.2022.2077290.

Liang, K.-Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1), 13-22.

McCaffrey, D., Bell, R., & Botts, C. (2001, August). Generalizations of biased reduced linearization. Paper presented at the Annual Meeting of the American Statistical Association, Atlanta, GA.

Pustejovsky, J. E., & Tipton, E. (2018). Small-sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models. Journal of Business & Economic Statistics, 36(4), 672-683.