Logistic regression
In an earlier post, I had shown this using iteratively reweighted least squares (IRLS). This is just an alternative method using Newton Raphson and the Fisher scoring algorithm. For further details, you can look here as well.
library(MLMusingR) data(suspend) m1 <- glm(sus ~ male + gpa * frpl + fight + frmp.c * pminor.c, data = suspend, family = binomial) ### extracting raw components dat <- model.frame(m1) fml <- formula(m1) X <- model.
Logistic regression is a modeling technique that has attracted a lot of attention, especially from folks interested in classification, machine learning, and prediction using binary outcomes. One of the neat things about using R is that users can revisit commonly used procedures and figure out how they work.
What follows are some logistic regression notes (this is not on interpreting results). Even though I’ve written about how other alternatives might be simpler than logistic regression or that there are challenges when comparing coefficients across models, it is interesting to see how the procedure works.
ROUGH NOTES: [let me know if you spot any errors– there might be a couple!] Often, in randomized control trial where individuals are randomly assigned to treatment and control conditions, covariates are included to improve precision by reducing error and improving statistical power. However, when binary outcomes are used (e.g., patient recovers or not), there are several additional concerns that have gone unnoticed by many applied researchers.
Take a simulated example where our true model data generating process is (to keep things simple, the intercept is zero and the parameters are both set to 1):
Introduction Logistic regression is often used to analyze experiments with binary outcomes (e.g., pass vs fail) and binary predictors (e.g., treatment vs control). Although appropriate, there are other possible models that can be run that may provide easier to interpret results.
In addition, some of these models may be quicker to run. Some may say that this point is moot given the availability of computing power today but if you’ve ever tried to run a hierarchical generalized linear model with a logit link function and a binary outcome, you know that when using R (using glmer or nlme) this may take quite a long time (and cross your fingers that you don’t have convergence issues).