Posts

More efficient CR2 cluster-robust standard errors

I had written before about using the CR2 standard error variant and how they can be used to account for clustering when using basic OLS (or GLM) regression. Some articles on the topic:

🎉 Job postings 2025-2026

List of Research, Statistics, and Evaluation job postings (that I’ve seen) as of September 2025.

Postings for (2025-2026):

As of 2025.09

—END

🎉 Job postings 2024-2025

List of Research, Statistics, and Evaluation job postings (that I’ve seen) as of July 2025.

Postings for (2024-2025):

As of 2025.07

  • Post-doctoral Associate (5 openings!) Rutgers University, New Jersey.
  • Post-doctoral Associate (w/Guanglei Hong) [University of Chicago](“att/postdoc ad_GH_2025-2026_OOP_approved.docx”), Chicago, IL

As of 2025.06

Plausible Values as Predictors

Although the mixPV function was introduced as a way to analyze large scale assessments using multiple plausible values (PV), the function only works if the plausible values are used as the outcome (i.e., it is the Y variable or on the left hand side [LHS] of the equation). However, there are times when the PV is the predictor of interest. This still has to be analyzed properly (i.e., just don’t average all the values).

Correlation and causation revisited

Statistics students are taught that correlation does not equal causation. Just because two variables (e.g., x and y) are related to each other does not necessarily mean that one causes the other (e.g., x causes y). The correlation coefficients (i.e., ρ) for ρ(x, y) and ρ(y, x) are the same and does not provide information on the directionality of the effect (e.g., x → y or x ← y). It could also be that the variables are related due to a third variable z which causes both (i.e., a confounder).

Working with missing data in large-scale assessments (without plausible values)

This is the syntax for accounting for missing data/imputing data with large scale assessments (without plausible values). This is Appendix A and accompanies the article:

Huang, F., & Keller, B. (2025). Working with missing data in large-scale assessments. Large-scale Assessments in Education. doi: 10.1186/s40536-025-00248-9

Working with missing data in large-scale assessments (with plausible values)

This is the syntax for accounting for missing data/imputing data with large scale assessments (with plausible values). This accompanies the article:

Huang, F., & Keller, B. (2025). Working with missing data in large-scale assessments. Large-scale Assessments in Education. doi: 10.1186/s40536-025-00248-9

Weights with Multilevel Models

This is an applied example regarding the use of weights in multilevel models when using large scale assessments. This is using the Germany TIMSS dataset. This accompanies the article:

Atasever, U., Huang, F., & Rutkowski, L. (2025). Reassessing weights in large-scale assessments and multilevel models. Large-scale Assessments in Education. doi: 10.1186/s40536-025-00245-y

Selecting the proper weights in LSAs with multilevel models

A common question with the use of large-scale assessments (LSAs) is related to the use of weights. Another issue is how to specify these weights properly.

Software such as SAS and Mplus, when specifying weights at two levels, require the use of conditional weights at level 1 if the level-2 weight is specified (or you can just use the level-2 weights alone; see Mang et al., 2021, see bottom part of this post).

Fun with PCA using Images

Creating grayscale images using PCA

The following links were helpful: 1. https://cran.r-project.org/web/packages/imager/vignettes/gettingstarted.html 2. https://stats.stackexchange.com/questions/229092/how-to-reverse-pca-and-reconstruct-original-variables-from-several-principal-com

library(tidyverse) #for ggplot, %>%
library(imager) #to read in the jpg

image1 <- load.image("c:/data/snorlax_g2.jpg")

Can download the image from: https://github.com/flh3/pubdata/blob/main/miscdata/snorlax_g2.jpg