🎉 Analyzing clustered data using GEEs

I show this in a recent JEBS article on using Generalized Estimating Equations (GEEs). Shown below is some annotated syntax and examples.

Huang, F. (2021). Analyzing cross-sectionally clustered data using generalized estimating equations. Journal of Educational and Behavioral Statistics. doi: 10.3102/10769986211017480

In the original paper draft, I had a section which showed how much more widely used mixed models (i.e., MLMs, HLMs) were compared to GEEs but was asked to remove that (to save space). I thought the usage was interesting so I am including it here:

In psychology, mixed model studies are much more popular than studies using GEEs by a ratio of 15:1 (Bauer & Sterba, 2011)

Citations in JEBS:

In the Journal of Educational and Behavioral Statistics (JEBS): one article on how to use multilevel models by Singer (1998) has over 3,300 citations (as of 2020.06.11, Google Scholar)
In the same journal, Ghisletta & Sini (2004) provided an introduction to GEEs. This article has 329 citations. GS wrote (p. 431):

Although GEEs are widely applied in biological, pharmacological, and closely related disciplines, their application in educational and social sciences remains relatively scarce.

There is a difference of 6 years but the Singer article has been cited over 10 times more! If using average citations per year, 7.5 times more.

Solving GEEs using IRLS

This example is only for an identity link using either an independence or exchangeable working correlation matrix.

Initial estimates for $\beta$ are estimated using a standard GLM (with an independence correlation structure). Residuals are obtained. If an independence working correlation structure is specified, these are also the final estimates, go to Step 7.
Using the residuals, $\alpha$ is estimated. An example of how this is computed with an exchangeable correlation matrix is shown in Hardin and Hilbe (2013, pp. 65–67), and we show this in our syntax as well.
The working exchangeable correlation matrix is updated to include a in the off diagonals. The correlation matrix is converted to a covariance matrix, $V_j$ , based on the variance of the outcome. An overall $V$ block diagonal matrix is formed which has the same $V_j$ for every cluster.
When using an identity link, new coefficients are reestimated using $\hat{\beta} = (X'V^{-1}X)^{-1}X'V^{-1}y$
New residuals are obtained using the updated $\beta$ .
Steps 2–5 are repeated until some convergence criteria are met (e.g., for continuous outcomes, minimal changes in the sum of squared residuals).
Compute and apply cluster-robust/empirical standard errors based on formulas also developed by Liang and Zeger (1986). CRSEs can be written as: $(X'X)^{-1}\hat{\Omega}(X'X)^{-1}$ where $\hat{\Omega}=\sum^{J}_{j=1}X^{'}_j\hat{u}_j\hat{u}_j'X_j$ and $\hat{u}_j$ are the residuals from the observations in cluster $j$ .

ICC estimation based on residuals

This example shows how to compute the exchangeable correlation coefficient (ICC) based on residuals. This is a different approach from using the between and within group variance components in a mixed model. This comes from Hardin and Hilbe’s (2013) book on Generalized Estimating Equations, 2nd edition. Crespi et al. (2011) and Wu et al. (2012) show how there can be several approaches in computing the ICC/exchangeable correlation coefficient.

NG = 2
id <- c(1,1,1,1,2,2,2,2)
t <- c(1,2,3,4,1,2,3,4) 
y <- c(4,5,6,7,5,6,7,8) 
x <- c(0,1,0,1,0,1,0,1)
dat <- data.frame(id, t, y, x)

Scale parameter ( $\phi$ ) must be estimated.
NG = number of groups.
GS = group size. Number of observations in the cluster.

\hat{\phi} = \frac{1}{n}\sum^{NG}_{i=1}\sum^{GS}_{t=1}\text{res}^2_{it}

This is just the mean of the squared (Pearson) residuals. A basic OLS regression is used:

m1 <- lm(y ~ x, data = dat)
res <- resid(m1) #vector
dat$res <- res #save it into the original dataset
(scalep = mean(res^2)) #scale parameter

[1] 1.25

dat

  id t y x  res
1  1 1 4 0 -1.5
2  1 2 5 1 -1.5
3  1 3 6 0  0.5
4  1 4 7 1  0.5
5  2 1 5 0 -0.5
6  2 2 6 1 -0.5
7  2 3 7 0  1.5
8  2 4 8 1  1.5

The exchangeable correlation coefficient (e.g., compound symmetry, common correlation, equal correlation) can be estimated using (formula 3.28 in the HH book). This can be thought of as the intraclass correlation coefficient and at times can be seen as $\rho$ or $\alpha$ .

\begin{aligned} \hat{\rho} &= \hat{\phi}^{-1}\frac{1}{12}\sum^{NG}_{i=1}\sum^{GS}_{t=1}\sum_{t'>t}\text{res}_{it}\text{res}_{it'} \\ &= \frac{1}{1.25}\frac{1}{12}\{[-1.5(-1.5+.5+.5) - 1.5(.5+.5) + .5(.5)] + [-.5(-.5+1.5+1.5)-.5(1.5+1.5)+1.5(1.5)]\} \\ &= .8\frac{1}{12}([-.5] + [.5]) \\ &= -.06667 \end{aligned}

So we need to get the residuals per group

rpg <- split(res, dat$id)
nm <- names(table(dat$id))
r1 <- rpg[[1]] #residuals from first group
egeg <- r1 %*% t(r1) #quicker way to do this, orig a loop
(forg1 = sum(egeg[lower.tri(egeg)]))

[1] -0.5

r2 <- rpg[[2]] #residuals from second group
egeg2 <- r2 %*% t(r2) #quicker way to do this, orig a loop
(forg2 = sum(egeg2[lower.tri(egeg2)]))

[1] -0.5

(1/scalep) * (1/12) * (forg1 + forg2) #putting it all together

[1] -0.06666667

The 12 is based on the number of residuals we have multiplied and added together. In a 4 x 4 matrix, there are 6 pieces of unique information. Since had two groups, this is 12 in total.

Putting this together in a function:

geticc <- function(data, cluster, r){
  
  scalep <- mean(r^2) #dispersion parameter
  rpg <- split(r, data[,cluster]) #individual residuals
  nm <- names(table(data[,cluster])) #names of clusters
  coll <- numeric() #empty container
  
  ### p. 63 Hardin and Hilbe
  ### get ICC based on residuals per cluster
  ### right now, need the data to be sorted by cluster
  
  multresid <- function(x){
    r2 <- rpg[[x]] #extract resid per group (rpg)
    egeg <- r2 %*% t(r2) #e %*% t(e)
    coll[x] <- sum(egeg[lower.tri(egeg)]) #only lower diag
  }
  
  tst <- sum(sapply(nm, multresid)) #how many per group 
  ns <- sapply(rpg, length) #add up how many were added
  den <- sum((ns * (ns - 1 )) / 2) #how many products were added
  icc.model <- (tst / den) * (1 / scalep) #the icc
  return(icc.model)
  
}

Using the function:

geticc(dat, 'id', res)

[1] -0.06666667

Comparing the output using the geeglm function in the geepack package. $\alpha$ and $\rho$ are at times used interchangeably.

library(geepack)
test <- geeglm(y ~ x, id = id, corstr = 'exchangeable', family = gaussian, data = dat)
summary(test)


Call:
geeglm(formula = y ~ x, family = gaussian, data = dat, id = id, 
    corstr = "exchangeable")

 Coefficients:
            Estimate Std.err Wald Pr(>|W|)    
(Intercept)   5.5000  0.3536  242   <2e-16 ***
x             1.0000  0.0000  Inf   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation structure = exchangeable 
Estimated Scale Parameters:

            Estimate   Std.err
(Intercept)     1.25 2.453e-17
  Link = identity 

Estimated Correlation Parameters:
      Estimate   Std.err
alpha -0.06667 8.378e-18
Number of clusters:   2  Maximum cluster size: 4

test$geese$alpha #this is the common correlation;

   alpha 
-0.06667

Example with a homemade GEE function

This syntax is not made for speed, but for transparency (i.e., hopefully, the steps can be seen).

# HOMEGROWN GEE FUNCTION using iterative reweighted least squares (IRLS)
jebsgee <- function(fml, data, cluster, corstr = 'independence'){
  ## extract data
  tmp <- cbind(data, cluster = data[,cluster]) #dataframe with cluster
  tmp <- tmp[order(tmp$cluster), ] #sorting by cluster
  fml <- formula(fml)
  df <- model.frame(fml, tmp)
  X <- model.matrix(fml, df)
  y <- model.response(df)

  if(sum(is.na(df)) > 0) (stop("You have missing data."))
  
  gpsz <- table(data[, cluster]) #how many per group; group size
  NG <- length(gpsz) #how many groups
  maxsize <- max(gpsz) #what's the biggest group size
  
  CS <- c('independence', 'exchangeable')
  cs <- pmatch(corstr, CS, -1) #allow to match corstr by keywords
  if (cs == -1) (stop("Currently can only use an independence or exchangeable correlation structure"))
  
  corstr <- CS[cs] #put in the whole word
  
  # STEP #1
  firstrun <- glm(formula(fml), data = df) #just a regular glm
  
  # STEP #2
  r <- resid(firstrun, 'pearson') #get residuals
  betas <- coef(firstrun) #get initial coefficients
  
  if (corstr == 'exchangeable') {

  ### setup iterations need for exchangeable structure
    dev <- 0
    delta.dev <- 1
    tol <- 1e-5 #can make this bigger or smaller
    maxiter <- 50 #number of iterations, can make this bigger
    i = 1 #starting at iteration 1
    
    cat("Iteration: ") 
    while(abs(delta.dev > tol & i < maxiter)){ #when change in deviance is small, stop
      cat(i, "::")
      
      # after iteration, this is STEP #5
      r <- y - X %*% betas #residuals / use Pearson if non-identity link
      
      icc <- geticc(data = tmp, 'cluster', r) #compute new iccs #going back to step #2
      results <- exchR(icc, maxsize = maxsize, y = y, NG, gpsz) #compute new weight matrix STEP3
      vm2 <- results$vm2
      # STEP #4
      betas <- solve(t(X) %*% vm2 %*% X) %*% t(X) %*% vm2 %*% y #update betas
      dev0 <- dev #get prior dev
      dev <- sum((y - X %*% betas)^2) #new deviance
      delta.dev <- dev - dev0 #change in deviance
      i = i + 1 #add one to the iteration
    }
    
    cat("\nFinal alpha:", icc, "\n")
  }
  
  ### STEP #7: computing Liang and Zeger SEs
  re <- as.numeric(y - X %*% betas) #get residuals
  k <- ncol(X) #how many predictors (including intercept)
  cdata <- data.frame(cluster = tmp$cluster, r = re) #data with cluster and residuals
  gs <- names(table(cdata$cluster)) #names of the clusters

  u <- matrix(NA, nrow = NG, ncol = k) #empty matrix
  gpsv <- tmp$cluster
  
  if (corstr == 'independence') wcv <- vm2 <- diag(nrow(X)) #if independence
  if (corstr == 'exchangeable') wcv <- results$wcv
  
  for(i in 1:NG){
      tmp <- nrow((df[gpsv == gs[i], ]))
      u[i,] <- t(cdata$r[cdata$cluster == gs[i]]) %*% 
        solve(wcv[1:tmp, 1:tmp]) %*% X[gpsv == gs[i], 1:k]
  }
  
  mt <- crossprod(u) #t(u) %*% u :: meat
  br <- solve(t(X) %*% vm2 %*% X) #bread matrix
  clvc <- br %*% mt %*% br #LZ robust vcov matrix
  
  ### putting it all together
  
  se <- as.numeric(sqrt(diag(clvc))) #standard error
  b <- as.numeric(betas) #betas
  wald <- (b / se)^2 #wald
  pv <- pchisq(wald, 1, lower.tail = F) #p value
  stars <- cut(pv, breaks = c(0, 0.001, 0.01, 0.05, 0.1, 1), 
               labels = c("***", "**", "*  ", ".  ", " "), 
               include.lowest = TRUE)
  
  res <- data.frame(estimates = b, se, wald, pv = round(pv, 4), s = stars)
  row.names(res) <- colnames(X) #getting the names of the coefficients
  
  cat("Working correlation structure:", corstr, "\n")
  print(res) #output results
}

## only for creating an exchangeable R matrix
exchR <- function(icc, maxsize, y, NG, gpsz, ...){
  wr1 <- matrix(icc, nrow = maxsize, ncol = maxsize)
  diag(wr1) <- 1
  ## converting to a covariance matrix
  wcv <- wr1 * var(y) #save it, used when getting the RSE
  wcl <- list() #empty list
  for (i in 1:NG){ #making several covariance matrices
    GS <- gpsz[i]  #depending on how many units per cluster
    tmp <- wcv[1:GS, 1:GS]
    wcl[[i]] <- tmp
  }
  vm2 <- solve(Matrix::bdiag(wcl)) #create block diagonal (this is V^-1)
  return(list(vm2 = vm2, wcv = wcv)) #return the inverse of the variance matrix
}

Just using the commonly-used High School and Beyond dataset.

library(mlmRev)
data(Hsb82)
summary(geeglm(mAch ~ sector + meanses + cses + sector * cses, 
        id = school, 
        corstr = 'ex', 
        data = Hsb82)) #using an existing package


Call:
geeglm(formula = mAch ~ sector + meanses + cses + sector * cses, 
    data = Hsb82, id = school, corstr = "ex")

 Coefficients:
                    Estimate Std.err   Wald Pr(>|W|)    
(Intercept)           12.128   0.174 4878.1  < 2e-16 ***
sectorCatholic         1.225   0.308   15.8  6.9e-05 ***
meanses                5.333   0.334  254.3  < 2e-16 ***
cses                   2.782   0.159  307.1  < 2e-16 ***
sectorCatholic:cses   -1.349   0.233   33.5  6.9e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation structure = exchangeable 
Estimated Scale Parameters:

            Estimate Std.err
(Intercept)     39.1   0.671
  Link = identity 

Estimated Correlation Parameters:
      Estimate Std.err
alpha   0.0558  0.0099
Number of clusters:   160  Maximum cluster size: 67

Using my function:

jebsgee(mAch ~ sector + meanses + cses + sector * cses, 
        data = Hsb82, 
        cluster = 'school',
        corstr = 'ex') #using my function

Iteration: 1 ::2 ::3 ::4 ::
Final alpha: 0.0558 
Working correlation structure: exchangeable 
                    estimates    se   wald    pv   s
(Intercept)             12.13 0.174 4878.1 0e+00 ***
sectorCatholic           1.23 0.308   15.8 1e-04 ***
meanses                  5.33 0.334  254.3 0e+00 ***
cses                     2.78 0.159  307.1 0e+00 ***
sectorCatholic:cses     -1.35 0.233   33.6 0e+00 ***

Results are the same.

Here are the results if an independence working correlation matrix is used:

summary(geeglm(mAch ~ sector + meanses + cses + sector * cses, 
        id = school, 
        corstr = 'ind', 
        data = Hsb82)) #using an existing package


Call:
geeglm(formula = mAch ~ sector + meanses + cses + sector * cses, 
    data = Hsb82, id = school, corstr = "ind")

 Coefficients:
                    Estimate Std.err   Wald Pr(>|W|)    
(Intercept)           12.116   0.170 5087.1  < 2e-16 ***
sectorCatholic         1.280   0.299   18.3  1.9e-05 ***
meanses                5.164   0.334  239.0  < 2e-16 ***
cses                   2.782   0.159  307.1  < 2e-16 ***
sectorCatholic:cses   -1.349   0.233   33.5  6.9e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation structure = independence 
Estimated Scale Parameters:

            Estimate Std.err
(Intercept)     39.1   0.671
Number of clusters:   160  Maximum cluster size: 67

jebsgee(mAch ~ sector + meanses + cses + sector * cses, 
        data = Hsb82, 
        cluster = 'school',
        corstr = 'ind') #using my function

Working correlation structure: independence 
                    estimates    se   wald pv   s
(Intercept)             12.12 0.170 5087.1  0 ***
sectorCatholic           1.28 0.299   18.3  0 ***
meanses                  5.16 0.334  239.0  0 ***
cses                     2.78 0.159  307.1  0 ***
sectorCatholic:cses     -1.35 0.233   33.6  0 ***

Testing using the small mtcars dataset in R:

mtcars2 <- mtcars[order(mtcars$cyl), ] #needs to be sorted for geeglm
jebsgee(mpg ~ wt + am + qsec + hp + vs, 
        data = mtcars, 
        cluster = 'cyl', 
        corstr = 'ex')

Iteration: 1 ::2 ::3 ::4 ::5 ::6 ::7 ::8 ::9 ::10 ::11 ::
Final alpha: -0.0284 
Working correlation structure: exchangeable 
            estimates      se   wald     pv   s
(Intercept)    17.122 4.91011 12.160 0.0005 ***
wt             -3.260 0.70955 21.105 0.0000 ***
am              2.929 0.40431 52.496 0.0000 ***
qsec            0.817 0.20428 15.999 0.0001 ***
hp             -0.016 0.00962  2.753 0.0971 .  
vs              0.174 0.32731  0.282 0.5956

summary(geeglm(mpg ~ wt + am + qsec + hp + vs, 
        id = cyl, 
        corstr = 'ex', 
        data = mtcars2)
)


Call:
geeglm(formula = mpg ~ wt + am + qsec + hp + vs, data = mtcars2, 
    id = cyl, corstr = "ex")

 Coefficients:
            Estimate  Std.err  Wald Pr(>|W|)    
(Intercept) 17.12208  4.91011 12.16  0.00049 ***
wt          -3.25966  0.70954 21.10  4.3e-06 ***
am           2.92939  0.40431 52.50  4.3e-13 ***
qsec         0.81711  0.20428 16.00  6.3e-05 ***
hp          -0.01596  0.00962  2.75  0.09710 .  
vs           0.17373  0.32731  0.28  0.59558    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation structure = exchangeable 
Estimated Scale Parameters:

            Estimate Std.err
(Intercept)        5    1.29
  Link = identity 

Estimated Correlation Parameters:
      Estimate Std.err
alpha  -0.0284  0.0775
Number of clusters:   3  Maximum cluster size: 14

References

Bauer, D. J., & Sterba, S. K. (2011). Fitting multilevel models with ordinal outcomes: Performance of alternative specifications and methods of estimation. Psychological Methods, 16(4), 373–390. https://doi.org/10.1037/a0025813

Crespi, C. M., Wong, W. K., & Wu, S. (2011). A new dependence parameter approach to improve the design of cluster randomized trials with binary outcomes. Clinical Trials: Journal of the Society for Clinical Trials, 8(6), 687–698. https://doi.org/10.1177/1740774511423851

Ghisletta, P., & Spini, D. (2004). An introduction to generalized estimating equations and an application to assess selectivity effects in a longitudinal study on very old individuals. Journal of Educational and Behavioral Statistics, 29(4), 421-437.

Hardin, J., & Hilbe, J. (2013). Generalized estimating equations (2nd ed.). CRC Press.

Liang, K.-Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1), 13–22.

Singer, J. D. (1998). Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models. Journal of Educational and Behavioral Statistics, 23(4), 323-355.

Wu, S., Crespi, C. M., & Wong, W. K. (2012). Comparison of methods for estimating the intraclass correlation coefficient for binary responses in cancer prevention cluster randomized trials. Contemporary Clinical Trials, 33(5), 869–880.

– END