Analyzing cross-sectionally clustered data using generalized estimating equations


The presence of clustered data is common in the socio-behavioral sciences. One approach that specifically deals with clustered data but has seen little use in education is the generalized estimating equations (GEEs) approach. We provide a background on GEEs, discuss why it is appropriate for the analysis of clustered data, and provide worked examples using both continuous and binary outcomes. Comparisons are made between GEEs, MLMs, and ordinary least squares results to highlight similarities and differences between the approaches. Detailed walkthroughs are provided using both R and SPSS.

Journal of Educational and Behavioral Statistics

NOTES: the materials in the appendix can be found here. The preprint link above is actually for the appendix.

In the original paper draft, I had a section which showed how much more widely used mixed models (i.e., MLMs, HLMs) were compared to GEEs but was asked to remove that. I thought the usage was interesting so I am including it here:

  • In psychology, mixed model studies are much more popular than studies using GEEs by a ratio of 15:1 (Bauer & Sterba, 2011)

Citations in JEBS:

  • In the Journal of Educational and Behavioral Statistics (JEBS): one article on how to use multilevel models by Singer (1998) has over 3,300 citations (as of 2020.06.11, Google Scholar)
  • In the same journal, Ghisletta & Sini (2004) provided an introduction to GEEs. This article has 329 citations. GS wrote (p. 431):

Although GEEs are widely applied in biological, pharmacological, and closely related disciplines, their application in educational and social sciences remains relatively scarce.

There is a difference of 6 years but the Singer article has been cited over 10 times more! If using average citations per year, 7.5 times more.


Ghisletta, P., & Spini, D. (2004). An introduction to generalized estimating equations and an application to assess selectivity effects in a longitudinal study on very old individuals. Journal of Educational and Behavioral Statistics, 29(4), 421-437.

Singer, J. D. (1998). Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models. Journal of Educational and Behavioral Statistics, 23(4), 323-355.