## Introduction

Logistic regression is often used to analyze experiments with binary outcomes (e.g., pass vs fail) and binary predictors (e.g., treatment vs control). Although appropriate, there are other possible models that can be run that may provide easier to interpret results.

In addition, some of these models may be quicker to run. Some may say that this point is moot given the availability of computing power today but if you’ve ever tried to run a hierarchical generalized linear model with a logit link function and a binary outcome, you know that when using R (using `glmer`

or `nlme`

) this may take quite a long time (and cross your fingers that you don’t have convergence issues).

The following code replicates the example (see the manuscript for details) in the article:

Huang, F. (2019). Alternatives to logistic regression models with binary outcomes. Journal of Experimental Education. doi: 10.1080/00220973.2019.1699769

Data are based on the article of Huang and Cornell (2015). Using an online survey, investigators tested for the presence of the question-order effect. Based on a random number generated when the students took the survey, they were placed in either the treatment (n = 1037) or control (n = 963) condition. Students in the treatment condition were asked four specific types bullying questions (i.e., verbal, physical, social, cyber) and then were asked a general bullying question (“I have been bullied in the past year”). Students in the control condition were asked the general bullying question first and then the specific bullying questions. We hypothesized that students who were asked the specific bullying questions first would report overall higher bullying vs the control group.

## 1. Examining cross tabs

Load in the required packages:

```
library(dplyr) #just for the pipe, %>%
library(logbin) #to run a log binomial model with a function
library(summarytools) #for nicer crosstabs
library(jtools) #for easier exp, confints, and adjusted standard errors
```

Load and examine the data frame:

```
#specify the location on website
dat <- url("http://faculty.missouri.edu/huangf/data/jxe2019/jxe.rdata")
load(dat) #load in data from the website
summary(jxe) #name of the data.frame
```

```
## abully tord female gl race
## Min. :0.000 Min. :0.0000 Min. :0.0000 9th :446 w:1278
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000 10th:530 b: 287
## Median :0.000 Median :1.0000 Median :1.0000 11th:517 h: 170
## Mean :0.126 Mean :0.5185 Mean :0.5005 12th:507 a: 79
## 3rd Qu.:0.000 3rd Qu.:1.0000 3rd Qu.:1.0000 o: 186
## Max. :1.000 Max. :1.0000 Max. :1.0000
```

`head(jxe)`

```
## abully tord female gl race
## 6910 0 0 1 10th w
## 8394 0 0 0 9th w
## 7293 0 1 1 9th w
## 8491 0 1 0 10th w
## 4374 1 1 0 10th w
## 1594 0 1 1 10th b
```

`dim(jxe)`

`## [1] 2000 5`

Review some crosstabs. All computations are based on the table below. `tord`

is the treatment variable. `abully`

indicates if the respondent had been bullied.

```
tt <- ctable(jxe$abully, jxe$tord, prop = 'c')
tt
```

```
## Cross-Tabulation, Column Proportions
## abully * tord
## Data Frame: jxe
##
## -------- ------ -------------- --------------- ---------------
## tord 0 1 Total
## abully
## 0 863 ( 89.6%) 885 ( 85.3%) 1748 ( 87.4%)
## 1 100 ( 10.4%) 152 ( 14.7%) 252 ( 12.6%)
## Total 963 (100.0%) 1037 (100.0%) 2000 (100.0%)
## -------- ------ -------------- --------------- ---------------
```

`tt$cross_table #crosstabs`

```
## tord
## abully 0 1 Total
## 0 863 885 1748
## 1 100 152 252
## Total 963 1037 2000
```

`tt$proportions #proportions`

```
## 0 1 Total
## 0 0.8961578 0.8534233 0.874
## 1 0.1038422 0.1465767 0.126
## Total 1.0000000 1.0000000 1.000
```

`(14.66 / 85.34) / (10.38 / 89.62) #OR = 1.48 `

`## [1] 1.483163`

`14.66 - 10.38 #risk difference: 4.28`

`## [1] 4.28`

`14.66 / 10.38 #risk ratio: 1.41`

`## [1] 1.412331`

## 2. Models without covariates

For comparability, see how the results above map on to the first set of regressions without covariates:

```
tab6.log1 <- glm(abully ~ tord, data = jxe, family = binomial)
summ(tab6.log1, exp = T)$coef %>% round(3)
```

```
## exp(Est.) 2.5% 97.5% z val. p
## (Intercept) 0.116 0.094 0.143 -20.404 0.000
## tord 1.482 1.132 1.940 2.865 0.004
```

```
tab6.lpm1 <- glm(abully ~ tord, data = jxe) #risk difference
summ(tab6.lpm1, confint = T, robust = 'HC3')$coef %>% round(3)
```

```
## Est. 2.5% 97.5% t val. p
## (Intercept) 0.104 0.085 0.123 10.553 0.000
## tord 0.043 0.014 0.072 2.896 0.004
```

```
tab6.poi1 <- glm(abully ~ tord, data = jxe, family = poisson) #risk ratio
summ(tab6.poi1, robust = 'HC3')$coef %>% round(3)
```

```
## Est. S.E. z val. p
## (Intercept) -2.265 0.095 -23.900 0.000
## tord 0.345 0.121 2.852 0.004
```

If you want to run a log-binomial model

```
logb <- update(tab6.lpm1, family = binomial(link = 'log'))
#logb <- logbin(abully ~ tord, data = jxe)
summ(logb, exp = T, digits = 3, robust = 'HC3')$coef %>% round(3)
```

```
## exp(Est.) 2.5% 97.5% z val. p
## (Intercept) 0.104 0.086 0.125 -23.900 0.000
## tord 1.412 1.114 1.789 2.852 0.004
```

## 3. Models with covariates

Compare these to the results in Table 6 in the article with the covariates:

```
tab6.log2 <- glm(abully ~ tord + female + gl + race, data = jxe, family = binomial)
summ(tab6.log2, exp = T)$coef %>% round(3)
```

```
## exp(Est.) 2.5% 97.5% z val. p
## (Intercept) 0.149 0.105 0.211 -10.706 0.000
## tord 1.487 1.134 1.951 2.866 0.004
## female 1.184 0.906 1.547 1.239 0.215
## gl10th 0.749 0.525 1.068 -1.597 0.110
## gl11th 0.633 0.438 0.914 -2.439 0.015
## gl12th 0.515 0.348 0.761 -3.333 0.001
## raceb 0.646 0.411 1.015 -1.895 0.058
## raceh 1.227 0.780 1.929 0.885 0.376
## racea 1.512 0.826 2.769 1.340 0.180
## raceo 1.165 0.747 1.815 0.674 0.501
```

```
tab6.lpm2 <- update(tab6.log2, family = gaussian)
summ(tab6.lpm2, confint = T, robust = 'HC3')$coef %>% round(3)
```

```
## Est. 2.5% 97.5% t val. p
## (Intercept) 0.138 0.098 0.178 6.707 0.000
## tord 0.042 0.013 0.071 2.873 0.004
## female 0.018 -0.011 0.047 1.231 0.218
## gl10th -0.038 -0.083 0.008 -1.631 0.103
## gl11th -0.055 -0.100 -0.011 -2.440 0.015
## gl12th -0.074 -0.117 -0.030 -3.331 0.001
## raceb -0.040 -0.077 -0.003 -2.109 0.035
## raceh 0.024 -0.033 0.081 0.825 0.409
## racea 0.051 -0.035 0.138 1.161 0.246
## raceo 0.017 -0.037 0.071 0.625 0.532
```

```
tab6.poi2 <- update(tab6.log2, family = poisson)
summ(tab6.poi2, exp = T, robust = 'HC3')$coef %>% round(3)
```

```
## exp(Est.) 2.5% 97.5% z val. p
## (Intercept) 0.128 0.096 0.172 -13.815 0.000
## tord 1.411 1.114 1.787 2.852 0.004
## female 1.157 0.917 1.458 1.232 0.218
## gl10th 0.785 0.581 1.060 -1.581 0.114
## gl11th 0.678 0.494 0.929 -2.421 0.015
## gl12th 0.563 0.401 0.793 -3.295 0.001
## raceb 0.678 0.449 1.024 -1.845 0.065
## raceh 1.190 0.811 1.746 0.888 0.374
## racea 1.417 0.858 2.339 1.363 0.173
## raceo 1.140 0.781 1.664 0.678 0.498
```

```
tab6.logbin <- update(tab6.log2, family = binomial(link = 'log'))
## using logbinomial, adding `em` to speed up the estimation
## tab6.logbin <- logbin(abully ~ tord + female + gl + race,
## data = jxe, method = 'em')
summ(tab6.logbin, exp = T, confint = T, robust = 'HC3')$coef %>% round(3)
```

```
## exp(Est.) 2.5% 97.5% z val. p
## (Intercept) 0.128 0.096 0.171 -13.869 0.000
## tord 1.412 1.115 1.788 2.866 0.004
## female 1.165 0.924 1.468 1.291 0.197
## gl10th 0.780 0.578 1.052 -1.628 0.104
## gl11th 0.672 0.491 0.921 -2.470 0.013
## gl12th 0.565 0.402 0.794 -3.288 0.001
## raceb 0.674 0.446 1.019 -1.872 0.061
## raceh 1.187 0.809 1.741 0.876 0.381
## racea 1.422 0.863 2.342 1.382 0.167
## raceo 1.142 0.782 1.666 0.687 0.492
```

The pattern of results are similar. The Poisson, log-binomial, and linear probability models however may provide results that are easier to understand (especially if communicating results to a lay audience who do not understand odds ratios).

**References**

Huang, F., & Cornell, D. (2015). Order and definitional effects in bullying surveys: Results from an experimental study. *Psychological Assessment, 27,* 1484-1493. doi: http://dx.doi.org/10.1037/pas0000149

– END