Introduction
Logistic regression is often used to analyze experiments with binary outcomes (e.g., pass vs fail) and binary predictors (e.g., treatment vs control). Although appropriate, there are other possible models that can be run that may provide easier to interpret results.
In addition, some of these models may be quicker to run. Some may say that this point is moot given the availability of computing power today but if you’ve ever tried to run a hierarchical generalized linear model with a logit link function and a binary outcome, you know that when using R (using glmer
or nlme
) this may take quite a long time (and cross your fingers that you don’t have convergence issues).
The following code replicates the example (see the manuscript for details) in the article:
Huang, F. (2019). Alternatives to logistic regression models with binary outcomes. Journal of Experimental Education. doi: 10.1080/00220973.2019.1699769
Data are based on the article of Huang and Cornell (2015). Using an online survey, investigators tested for the presence of the question-order effect. Based on a random number generated when the students took the survey, they were placed in either the treatment (n = 1037) or control (n = 963) condition. Students in the treatment condition were asked four specific types bullying questions (i.e., verbal, physical, social, cyber) and then were asked a general bullying question (“I have been bullied in the past year”). Students in the control condition were asked the general bullying question first and then the specific bullying questions. We hypothesized that students who were asked the specific bullying questions first would report overall higher bullying vs the control group.
1. Examining cross tabs
Load in the required packages:
library(dplyr) #just for the pipe, %>%
library(logbin) #to run a log binomial model with a function
library(summarytools) #for nicer crosstabs
library(jtools) #for easier exp, confints, and adjusted standard errors
Load and examine the data frame:
#specify the location on website
dat <- url("http://faculty.missouri.edu/huangf/data/jxe2019/jxe.rdata")
load(dat) #load in data from the website
summary(jxe) #name of the data.frame
## abully tord female gl race
## Min. :0.000 Min. :0.0000 Min. :0.0000 9th :446 w:1278
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000 10th:530 b: 287
## Median :0.000 Median :1.0000 Median :1.0000 11th:517 h: 170
## Mean :0.126 Mean :0.5185 Mean :0.5005 12th:507 a: 79
## 3rd Qu.:0.000 3rd Qu.:1.0000 3rd Qu.:1.0000 o: 186
## Max. :1.000 Max. :1.0000 Max. :1.0000
head(jxe)
## abully tord female gl race
## 6910 0 0 1 10th w
## 8394 0 0 0 9th w
## 7293 0 1 1 9th w
## 8491 0 1 0 10th w
## 4374 1 1 0 10th w
## 1594 0 1 1 10th b
dim(jxe)
## [1] 2000 5
Review some crosstabs. All computations are based on the table below. tord
is the treatment variable. abully
indicates if the respondent had been bullied.
tt <- ctable(jxe$abully, jxe$tord, prop = 'c')
tt
## Cross-Tabulation, Column Proportions
## abully * tord
## Data Frame: jxe
##
## -------- ------ -------------- --------------- ---------------
## tord 0 1 Total
## abully
## 0 863 ( 89.6%) 885 ( 85.3%) 1748 ( 87.4%)
## 1 100 ( 10.4%) 152 ( 14.7%) 252 ( 12.6%)
## Total 963 (100.0%) 1037 (100.0%) 2000 (100.0%)
## -------- ------ -------------- --------------- ---------------
tt$cross_table #crosstabs
## tord
## abully 0 1 Total
## 0 863 885 1748
## 1 100 152 252
## Total 963 1037 2000
tt$proportions #proportions
## 0 1 Total
## 0 0.8961578 0.8534233 0.874
## 1 0.1038422 0.1465767 0.126
## Total 1.0000000 1.0000000 1.000
(14.66 / 85.34) / (10.38 / 89.62) #OR = 1.48
## [1] 1.483163
14.66 - 10.38 #risk difference: 4.28
## [1] 4.28
14.66 / 10.38 #risk ratio: 1.41
## [1] 1.412331
2. Models without covariates
For comparability, see how the results above map on to the first set of regressions without covariates:
tab6.log1 <- glm(abully ~ tord, data = jxe, family = binomial)
summ(tab6.log1, exp = T)$coef %>% round(3)
## exp(Est.) 2.5% 97.5% z val. p
## (Intercept) 0.116 0.094 0.143 -20.404 0.000
## tord 1.482 1.132 1.940 2.865 0.004
tab6.lpm1 <- glm(abully ~ tord, data = jxe) #risk difference
summ(tab6.lpm1, confint = T, robust = 'HC3')$coef %>% round(3)
## Est. 2.5% 97.5% t val. p
## (Intercept) 0.104 0.085 0.123 10.553 0.000
## tord 0.043 0.014 0.072 2.896 0.004
tab6.poi1 <- glm(abully ~ tord, data = jxe, family = poisson) #risk ratio
summ(tab6.poi1, robust = 'HC3')$coef %>% round(3)
## Est. S.E. z val. p
## (Intercept) -2.265 0.095 -23.900 0.000
## tord 0.345 0.121 2.852 0.004
If you want to run a log-binomial model
logb <- update(tab6.lpm1, family = binomial(link = 'log'))
#logb <- logbin(abully ~ tord, data = jxe)
summ(logb, exp = T, digits = 3, robust = 'HC3')$coef %>% round(3)
## exp(Est.) 2.5% 97.5% z val. p
## (Intercept) 0.104 0.086 0.125 -23.900 0.000
## tord 1.412 1.114 1.789 2.852 0.004
3. Models with covariates
Compare these to the results in Table 6 in the article with the covariates:
tab6.log2 <- glm(abully ~ tord + female + gl + race, data = jxe, family = binomial)
summ(tab6.log2, exp = T)$coef %>% round(3)
## exp(Est.) 2.5% 97.5% z val. p
## (Intercept) 0.149 0.105 0.211 -10.706 0.000
## tord 1.487 1.134 1.951 2.866 0.004
## female 1.184 0.906 1.547 1.239 0.215
## gl10th 0.749 0.525 1.068 -1.597 0.110
## gl11th 0.633 0.438 0.914 -2.439 0.015
## gl12th 0.515 0.348 0.761 -3.333 0.001
## raceb 0.646 0.411 1.015 -1.895 0.058
## raceh 1.227 0.780 1.929 0.885 0.376
## racea 1.512 0.826 2.769 1.340 0.180
## raceo 1.165 0.747 1.815 0.674 0.501
tab6.lpm2 <- update(tab6.log2, family = gaussian)
summ(tab6.lpm2, confint = T, robust = 'HC3')$coef %>% round(3)
## Est. 2.5% 97.5% t val. p
## (Intercept) 0.138 0.098 0.178 6.707 0.000
## tord 0.042 0.013 0.071 2.873 0.004
## female 0.018 -0.011 0.047 1.231 0.218
## gl10th -0.038 -0.083 0.008 -1.631 0.103
## gl11th -0.055 -0.100 -0.011 -2.440 0.015
## gl12th -0.074 -0.117 -0.030 -3.331 0.001
## raceb -0.040 -0.077 -0.003 -2.109 0.035
## raceh 0.024 -0.033 0.081 0.825 0.409
## racea 0.051 -0.035 0.138 1.161 0.246
## raceo 0.017 -0.037 0.071 0.625 0.532
tab6.poi2 <- update(tab6.log2, family = poisson)
summ(tab6.poi2, exp = T, robust = 'HC3')$coef %>% round(3)
## exp(Est.) 2.5% 97.5% z val. p
## (Intercept) 0.128 0.096 0.172 -13.815 0.000
## tord 1.411 1.114 1.787 2.852 0.004
## female 1.157 0.917 1.458 1.232 0.218
## gl10th 0.785 0.581 1.060 -1.581 0.114
## gl11th 0.678 0.494 0.929 -2.421 0.015
## gl12th 0.563 0.401 0.793 -3.295 0.001
## raceb 0.678 0.449 1.024 -1.845 0.065
## raceh 1.190 0.811 1.746 0.888 0.374
## racea 1.417 0.858 2.339 1.363 0.173
## raceo 1.140 0.781 1.664 0.678 0.498
tab6.logbin <- update(tab6.log2, family = binomial(link = 'log'))
## using logbinomial, adding `em` to speed up the estimation
## tab6.logbin <- logbin(abully ~ tord + female + gl + race,
## data = jxe, method = 'em')
summ(tab6.logbin, exp = T, confint = T, robust = 'HC3')$coef %>% round(3)
## exp(Est.) 2.5% 97.5% z val. p
## (Intercept) 0.128 0.096 0.171 -13.869 0.000
## tord 1.412 1.115 1.788 2.866 0.004
## female 1.165 0.924 1.468 1.291 0.197
## gl10th 0.780 0.578 1.052 -1.628 0.104
## gl11th 0.672 0.491 0.921 -2.470 0.013
## gl12th 0.565 0.402 0.794 -3.288 0.001
## raceb 0.674 0.446 1.019 -1.872 0.061
## raceh 1.187 0.809 1.741 0.876 0.381
## racea 1.422 0.863 2.342 1.382 0.167
## raceo 1.142 0.782 1.666 0.687 0.492
The pattern of results are similar. The Poisson, log-binomial, and linear probability models however may provide results that are easier to understand (especially if communicating results to a lay audience who do not understand odds ratios).
References
Huang, F., & Cornell, D. (2015). Order and definitional effects in bullying surveys: Results from an experimental study. Psychological Assessment, 27, 1484-1493. doi: http://dx.doi.org/10.1037/pas0000149
– END