The presence of clustered data is common in the socio-behavioral sciences. One approach that specifically deals with clustered data but has seen little use in education is the generalized estimating equations (GEEs) approach. We provide a background on GEEs, discuss why it is appropriate for the analysis of clustered data, and provide worked examples using both continuous and binary outcomes. Comparisons are made between GEEs, MLMs, and ordinary least squares results to highlight similarities and differences between the approaches. Detailed walkthroughs are provided using both R and SPSS.
NOTES: the materials in the appendix can be found here. The preprint link above is actually for the appendix.
In the original paper draft, I had a section which showed how much more widely used mixed models (i.e., MLMs, HLMs) were compared to GEEs but was asked to remove that. I thought the usage was interesting so I am including it here:
Citations in JEBS:
Although GEEs are widely applied in biological, pharmacological, and closely related disciplines, their application in educational and social sciences remains relatively scarce.
There is a difference of 6 years but the Singer article has been cited over 10 times more! If using average citations per year, 7.5 times more.
References:
Ghisletta, P., & Spini, D. (2004). An introduction to generalized estimating equations and an application to assess selectivity effects in a longitudinal study on very old individuals. Journal of Educational and Behavioral Statistics, 29(4), 421-437.
Singer, J. D. (1998). Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models. Journal of Educational and Behavioral Statistics, 23(4), 323-355.