Posts

Extracting the V matrix for MLMs

Notes to self (and anyone else who might find this useful). With the general linear mixed models (to simplify, I am just omitting super/subscripts):

Y=Xβ+Zu+eY = X\beta + Zu + e

where we assume uMVN(0,G)u \sim MVN(0, G) and eMVN(0,R)e \sim MVN(0, R). VV is:

🎉 Comparing coefficients across logistic regression models

ROUGH NOTES: [let me know if you spot any errors– there might be a couple!] Often, in randomized control trial where individuals are randomly assigned to treatment and control conditions, covariates are included to improve precision by reducing error and improving statistical power. However, when binary outcomes are used (e.g., patient recovers or not), there are several additional concerns that have gone unnoticed by many applied researchers.

Prior problem behavior and suspensions: A replication

This post accompanies the article (this was originally submitted as an Issue Brief which is why the Discussion section and Introduction are so short):

Huang, F. (2020). Prior problem behaviors do not account for the racial suspension gap. Educational Researcher. Advance online publication.

🎉 Why does centering reduce multicollinearity?

Centering often reduces the correlation between the individual variables (x1, x2) and the product term (x1 ×\times x2). In the example below, r(x1, x1x2) = .80. With the centered variables, r(x1c, x1x2c) = -.15.

🎉 Principal Components Analysis using R

[Rough notes: Let me know if there are corrections]

Principal components analysis (PCA) is a convenient way to reduce high-dimensional data into a smaller number number of ‘components.’ PCA has been referred to as a data reduction/compression technique (i.e., dimensionality reduction). PCA is often used as a means to an end and is not the end in itself. For example, instead of performing a regression with six (and highly correlated) variables, we may be able to compress the data into one or two meaningful components instead and use these in our models instead of the original six variables. Using less variables reduces possible problems associated with multicollinearity. This decreases the problems of redundancy.

🎉 Applied example for alternatives to logistic regression

Introduction

Logistic regression is often used to analyze experiments with binary outcomes (e.g., pass vs fail) and binary predictors (e.g., treatment vs control). Although appropriate, there are other possible models that can be run that may provide easier to interpret results.