Poisson and Negative Binomial Regression using R

A few years ago, I published an article on using Poisson, negative binomial, and zero inflated models in analyzing count data (see Pick Your Poisson). The abstract of the article indicates:

School violence research is often concerned with infrequently occurring events such as counts of the number of bullying incidents or fights a student may experience. Analyzing count data using ordinary least squares regression may produce improbable predicted values, and as a result of regression assumption violations, result in higher Type I errors. Count data are optimally analyzed using Poisson-based regression techniques such as Poisson or negative binomial regression. We apply these techniques to an example study of bullying in a statewide sample of 290 high schools and explain how Poisson-based analyses, although less familiar to many researchers, can produce findings that are more accurate and reliable, and are easier to interpret in real-world contexts.

At the time, I had included syntax using both SAS and SPSS. This post extends the original article by now including R syntax. For the specifics on the whys and hows, consult the article (see Example 2). I show two examples: the first using data from Long (1990) and the second using Huang and Cornell (2012).

Example 1: Long (1990)

This replicates the SAS results where Long (1990) examined the relationship of gender (fem), marital status (mar), number of children (kid5), prestige of graduate program (phd), and the number of articles (ment) published by the individual’s mentor on the number of articles published by the scientist (art: the outcome).

library(dplyr) #basic data management & %>%
library(MASS) #for negative bin regression
library(stargazer) #for combining output
library(ggplot2) #for graphing
library(pscl) #for zero inflated models & predprob
library(reshape2) #convert wide to tall
library(summarytools) #freq

articles <- rio::import("http://faculty.missouri.edu/huangf/data/poisson/articles.csv")
head(articles)
##   fem      ment  phd mar kid5 art
## 1   0  7.999999 1.38   1    2   3
## 2   0  7.000000 4.29   0    0   0
## 3   0 47.000008 3.85   0    0   4
## 4   0 19.000002 3.59   1    1   1
## 5   0  0.000000 1.81   1    0   1
## 6   0  6.000000 3.59   1    1   1
dim(articles)
## [1] 915   6
freq(articles$art)
## Frequencies  
## articles$art  
## 
##               Freq   % Valid   % Valid Cum.   % Total   % Total Cum.
## ----------- ------ --------- -------------- --------- --------------
##           0    275     30.05          30.05     30.05          30.05
##           1    246     26.89          56.94     26.89          56.94
##           2    178     19.45          76.39     19.45          76.39
##           3     84      9.18          85.57      9.18          85.57
##           4     67      7.32          92.90      7.32          92.90
##           5     27      2.95          95.85      2.95          95.85
##           6     17      1.86          97.70      1.86          97.70
##           7     12      1.31          99.02      1.31          99.02
##           8      1      0.11          99.13      0.11          99.13
##           9      2      0.22          99.34      0.22          99.34
##          10      1      0.11          99.45      0.11          99.45
##          11      1      0.11          99.56      0.11          99.56
##          12      2      0.22          99.78      0.22          99.78
##          16      1      0.11          99.89      0.11          99.89
##          19      1      0.11         100.00      0.11         100.00
##        <NA>      0                               0.00         100.00
##       Total    915    100.00         100.00    100.00         100.00
mean(articles$art) #average number of articles
## [1] 1.692896

Based on the frequency count, 275 scientists had no publications (0) or 30.05%. If we compare this to predicted probability based on the mean: \(P(art = x) = \frac{\lambda^x e^{-\lambda}}{x!}\) where \(x = 0\).

1.69^0 * exp(-1.69) / factorial(0) #probability of zero articles is 18.5%
## [1] 0.1845195
#which is quite different from the 30.05%
fr <- table(articles$art) %>% data.frame
names(fr) <- c('articles', 'freq')
fr$articles <- as.numeric(as.character(fr$articles)) #convert factor to numeric
ggplot(fr, aes(x = articles, y = freq)) +
  geom_col() +
  theme_bw() +
  lims(y = c(0, 300)) + 
  geom_line() + 
  labs(x = "Number of articles", y = "Frequency") +
  geom_text(aes(x = articles, y = freq, label = freq, vjust = -1)) +
  theme(axis.title.y = element_text(angle = 0)) 

With a Poisson distribution, the mean and the variances are both equal (\(\mu = \sigma^2\)): a condition (i.e., equidispersion) I am not sure how often occurs in reality. So a negative binomial should be more flexible as it does not have the assumption of equidispersion. With many zeroes, a zero inflated model should fit even better

The following code shows how to estimate different types of models:

### Run the models
linear <- glm(art ~ fem + ment + phd + mar + kid5, data = articles) #identity link, OLS
pois <- glm(art ~ fem + ment + phd + mar + kid5, "poisson", data = articles) #Poisson
negb <- glm.nb(art ~ fem + ment + phd + mar + kid5, data = articles) #negative binomial
#in the MASS package
zinb <- zeroinfl(art ~ fem + ment + phd + mar + kid5 | fem + ment + phd + mar + kid5,
               dist = "negbin", data = articles) #zero inflated nb, in the pscl package

Of course, to interpet the results with the log or logit links, we need to exponentiate the coefficients to get the incidence rate ratios. The jtools package provides an easy way to do this:

library(jtools)
summ(pois, exp = T) #could exp the coefficients but this is
## MODEL INFO:
## Observations: 915
## Dependent Variable: art
## Type: Generalized linear model
##   Family: poisson 
##   Link function: log 
## 
## MODEL FIT:
## <U+03C7>²(5) = 183.03, p = 0.00
## Pseudo-R² (Cragg-Uhler) = 0.19
## Pseudo-R² (McFadden) = 0.05
## AIC = 3314.11, BIC = 3343.03 
## 
## Standard errors: MLE
## ------------------------------------------------------------
##                     exp(Est.)   2.5%   97.5%   z val.      p
## ----------------- ----------- ------ ------- -------- ------
## (Intercept)              1.36   1.11    1.66     2.96   0.00
## fem                      0.80   0.72    0.89    -4.11   0.00
## ment                     1.03   1.02    1.03    12.73   0.00
## phd                      1.01   0.96    1.07     0.49   0.63
## mar                      1.17   1.04    1.32     2.53   0.01
## kid5                     0.83   0.77    0.90    -4.61   0.00
## ------------------------------------------------------------
#useful because we get the 95% CI as well
summ(negb, exp = T) #difference in marital status
## MODEL INFO:
## Observations: 915
## Dependent Variable: art
## Type: Generalized linear model
##   Family: Negative Binomial(2.2644) 
##   Link function: log 
## 
## MODEL FIT:
## <U+03C7>²() = , p = 
## Pseudo-R² (Cragg-Uhler) = 0.10
## Pseudo-R² (McFadden) = 0.03
## AIC = 3135.92, BIC = 3169.65 
## 
## Standard errors: MLE
## ------------------------------------------------------------
##                     exp(Est.)   2.5%   97.5%   z val.      p
## ----------------- ----------- ------ ------- -------- ------
## (Intercept)              1.29   0.99    1.69     1.86   0.06
## fem                      0.81   0.70    0.93    -2.98   0.00
## ment                     1.03   1.02    1.04     9.05   0.00
## phd                      1.02   0.95    1.09     0.43   0.67
## mar                      1.16   0.99    1.37     1.83   0.07
## kid5                     0.84   0.76    0.93    -3.34   0.00
## ------------------------------------------------------------
summary(zinb) #more complex, has two parts to it. 
## 
## Call:
## zeroinfl(formula = art ~ fem + ment + phd + mar + kid5 | fem + ment + 
##     phd + mar + kid5, data = articles, dist = "negbin")
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -1.2942 -0.7601 -0.2909  0.4448  6.4155 
## 
## Count model coefficients (negbin with log link):
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.4167466  0.1435964   2.902  0.00371 ** 
## fem         -0.1955076  0.0755926  -2.586  0.00970 ** 
## ment         0.0247862  0.0034927   7.097 1.28e-12 ***
## phd         -0.0006998  0.0362697  -0.019  0.98461    
## mar          0.0975826  0.0844520   1.155  0.24789    
## kid5        -0.1517321  0.0542061  -2.799  0.00512 ** 
## Log(theta)   0.9763578  0.1354696   7.207 5.71e-13 ***
## 
## Zero-inflation model coefficients (binomial with logit link):
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept) -0.19161    1.32280  -0.145  0.88483   
## fem          0.63587    0.84890   0.749  0.45382   
## ment        -0.88227    0.31622  -2.790  0.00527 **
## phd         -0.03773    0.30801  -0.123  0.90250   
## mar         -1.49944    0.93866  -1.597  0.11017   
## kid5         0.62841    0.44277   1.419  0.15583   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Theta = 2.6548 
## Number of iterations in BFGS optimization: 27 
## Log-likelihood: -1550 on 13 Df
tmp <- summary(zinb) #doing this manually
tmp$coefficients$count[-7, 1] %>% exp()
## (Intercept)         fem        ment         phd         mar        kid5 
##   1.5170181   0.8224171   1.0250959   0.9993005   1.1025025   0.8592185
tmp$coefficients$zero[-7, 1] %>% exp()
## (Intercept)         fem        ment         phd         mar        kid5 
##   0.8256318   1.8886654   0.4138409   0.9629702   0.2232558   1.8746261

Note: the logit portion predicts zeroes (so coefficients may seem like they are in the opposite direction). Interpret as ORs. Again, consult the article for interpretation of results. Can also consult the AIC values to help with model selection:

tmp <- data.frame(OLS = AIC(linear), pois = AIC(pois), 
                  negb = AIC(negb), zinb = AIC(zinb))
knitr::kable(tmp, align = "l")
OLS pois negb zinb
3702.186 3314.113 3135.917 3125.982
#zinb has the lowest, close to negb

Plotting results

I replicate the graph found in the SAS post for plotted results. The predicted probabilities are based on the probability density function. Can use the predprob function in the pscl package

po.p <- predprob(pois) %>% colMeans
po.nb <- predprob(negb) %>% colMeans
po.zinb <- predprob(zinb) %>% colMeans
df <- data.frame(x = 0:max(articles$art), Poisson = po.p, 
                 NegBin = po.nb, Zinb = po.zinb)

obs <- table(articles$art) %>% prop.table() %>% data.frame #Observed
names(obs) <- c("x", 'Observed')

p1 <- predict(linear) %>% round() %>% table %>% prop.table %>% data.frame #for OLS
names(p1) <- c('x', 'OLS')

tmp <- merge(p1, obs, by = 'x', all = T)
tmp$x <- as.numeric(as.character(tmp$x))

comb <- merge(tmp, df, by = 'x', all = T)
comb[is.na(comb)] <- 0

comb2 <- comb[1:11, ] #just for the first 11 results, including zero

mm <- melt(comb2, id.vars = 'x', value.name = 'prob', variable.name = 'Model')
mm <- filter(mm, Model != "OLS") #can include the linear model too if you want
#the SAS note does not, so I am not including it

ggplot(mm, aes(x = x, y = prob, group = Model, col = Model)) +
  geom_line(aes(lty = Model), lwd = 1) +
  theme_bw() +
  labs(x = "Number of articles", y = 'Probability',
       title = "Models for number of published articles") +
  scale_color_manual(values = c('black', 'blue', 'red', 'green')) +
  scale_linetype_manual(values = c('solid', 'dotted', 'dotted', 'dotted')) +
  theme(legend.position=c(.75, .65), axis.title.y = element_text(angle = 0))

Plot shows that the negative binomial and the zero inflated nb almost overlap with the observed observations (a good approximation).

Example 2: Bullying in school (Huang & Cornell, 2012)

This data is not available for download though (w/c is why I started of with the Long example).

x <- rio::import("C:/Users/huangf/Box Sync/Bully/Bully/CountJK/MASTERDATA7-09.sav")
x2 <- dplyr::select(x, NEWBULLY, TOTAL_F_R, 
                    DIVERSITYINDEX, PROPNONWHITE, PD1000, siz1000
                    )

library(tableone)
t1 <- CreateTableOne(data = x2)
print(t1)
##                             
##                              Overall    
##   n                           290       
##   NEWBULLY (mean (SD))       5.42 (5.49)
##   TOTAL_F_R (mean (SD))      0.30 (0.16)
##   DIVERSITYINDEX (mean (SD)) 0.37 (0.21)
##   PROPNONWHITE (mean (SD))   0.34 (0.26)
##   PD1000 (mean (SD))         0.94 (1.26)
##   siz1000 (mean (SD))        1.21 (0.69)
cov(x2) %>% round(3)
##                NEWBULLY TOTAL_F_R DIVERSITYINDEX PROPNONWHITE PD1000 siz1000
## NEWBULLY         30.148    -0.212          0.491        0.329  1.650   2.131
## TOTAL_F_R        -0.212     0.025         -0.006        0.012  0.014  -0.046
## DIVERSITYINDEX    0.491    -0.006          0.044        0.035  0.104   0.084
## PROPNONWHITE      0.329     0.012          0.035        0.067  0.181   0.066
## PD1000            1.650     0.014          0.104        0.181  1.597   0.364
## siz1000           2.131    -0.046          0.084        0.066  0.364   0.473

Visualize the outcome variable:

freq(x2$NEWBULLY)
## Frequencies  
## x2$NEWBULLY  
## Label: New raw "bullying" without threatening, USE  
## Type: Numeric  
## 
##               Freq   % Valid   % Valid Cum.   % Total   % Total Cum.
## ----------- ------ --------- -------------- --------- --------------
##           0     42     14.48          14.48     14.48          14.48
##           1     31     10.69          25.17     10.69          25.17
##           2     39     13.45          38.62     13.45          38.62
##           3     19      6.55          45.17      6.55          45.17
##           4     33     11.38          56.55     11.38          56.55
##           5     19      6.55          63.10      6.55          63.10
##           6     18      6.21          69.31      6.21          69.31
##           7     14      4.83          74.14      4.83          74.14
##           8     16      5.52          79.66      5.52          79.66
##           9     11      3.79          83.45      3.79          83.45
##          10     10      3.45          86.90      3.45          86.90
##          11      4      1.38          88.28      1.38          88.28
##          12      1      0.34          88.62      0.34          88.62
##          13      3      1.03          89.66      1.03          89.66
##          14      4      1.38          91.03      1.38          91.03
##          15      4      1.38          92.41      1.38          92.41
##          16      1      0.34          92.76      0.34          92.76
##          17      3      1.03          93.79      1.03          93.79
##          18      4      1.38          95.17      1.38          95.17
##          19      5      1.72          96.90      1.72          96.90
##          20      3      1.03          97.93      1.03          97.93
##          21      1      0.34          98.28      0.34          98.28
##          22      1      0.34          98.62      0.34          98.62
##          23      2      0.69          99.31      0.69          99.31
##          25      1      0.34          99.66      0.34          99.66
##          28      1      0.34         100.00      0.34         100.00
##        <NA>      0                               0.00         100.00
##       Total    290    100.00         100.00    100.00         100.00
moments::skewness((x2$NEWBULLY)) #positively skewed
## [1] 1.51982
fr <- table(x2$NEWBULLY) %>% data.frame
names(fr) <- c('bullyinc', 'freq')
fr$bullyinc <- as.numeric(as.character(fr$bullyinc)) #convert factor to numeric
ggplot(fr, aes(x = bullyinc, y = freq)) +
  geom_col() +
  theme_bw() +
  lims(y = c(0, 45)) + 
  geom_line() + 
  labs(x = "Bullying incidents", y = "Frequency") +
  geom_text(aes(x = bullyinc, y = freq, label = freq, vjust = -1)) +
  theme(axis.title.y = element_text(angle = 0)) 

Run the models:

m1 <- glm(NEWBULLY ~ . , data = x2)
m2 <- glm(NEWBULLY ~ . , family = poisson, data = x2) 
m3 <- glm(NEWBULLY ~ . , family = quasipoisson, data = x2) #for overdispersion
m4 <- glm.nb(NEWBULLY ~ ., data = x2)
m5 <- zeroinfl(NEWBULLY ~ . | siz1000, dist = "negbin", data = x2)

We can use jtools again but just showing raw output all together. Note the Poisson model has standard errors that are too low (compare to the quasi Poisson and the negative binomial).

stargazer(m1, m2, m3, m4, m5,  type = 'text', star.cutoffs = c(.05, .01, .001), 
          no.space = T, digits = 2)
## 
## ==================================================================================
##                                         Dependent variable:                       
##                   ----------------------------------------------------------------
##                                               NEWBULLY                            
##                    normal  Poisson  glm: quasipoisson    negative    zero-inflated
##                                        link = log        binomial     count data  
##                     (1)      (2)           (3)             (4)            (5)     
## ----------------------------------------------------------------------------------
## TOTAL_F_R           1.19    -0.26         -0.26           -0.42          -0.21    
##                    (2.39)   (0.25)       (0.47)           (0.46)        (0.46)    
## DIVERSITYINDEX     5.48**  1.09***       1.09**           0.90*          0.89*    
##                    (2.06)   (0.22)       (0.41)           (0.39)        (0.37)    
## PROPNONWHITE       -2.07    -0.09         -0.09            0.01          -0.09    
##                    (1.85)   (0.20)       (0.38)           (0.35)        (0.36)    
## PD1000             0.003    -0.01         -0.01           -0.02          -0.01    
##                    (0.26)   (0.02)       (0.04)           (0.05)        (0.05)    
## siz1000           3.93***  0.64***       0.64***         0.72***        0.65***   
##                    (0.56)   (0.05)       (0.10)           (0.10)        (0.11)    
## Constant           -1.00   0.46***        0.46*           0.45*         0.56**    
##                    (1.11)   (0.11)       (0.21)           (0.21)        (0.21)    
## ----------------------------------------------------------------------------------
## Observations        290      290           290             290            290     
## Log Likelihood    -846.38  -892.53                       -742.12        -736.73   
## theta                                                 2.18*** (0.29)              
## Akaike Inf. Crit. 1,704.77 1,797.06                      1,496.23                 
## ==================================================================================
## Note:                                                *p<0.05; **p<0.01; ***p<0.001
#have to exponentiate results to get the IRR
#does not show the logit results for the ZI model

Graph the results:

po.p <- predprob(m2) %>% colMeans
po.nb <- predprob(m4) %>% colMeans
po.zinb <- predprob(m5) %>% colMeans

df <- data.frame(x = 0:max(x2$NEWBULLY), Poisson = po.p, 
                 NegBin = po.nb, Zinb = po.zinb)

obs <- table(x2$NEWBULLY) %>% prop.table() %>% data.frame
names(obs) <- c("x", 'Observed')

p1 <- predict(m1) %>% round() %>% table %>% prop.table %>% data.frame
names(p1) <- c('x', 'OLS')
p1 <- p1[-1, ] #there is a negative value

tmp <- merge(p1, obs, by = 'x', all = T)
tmp$x <- as.numeric(as.character(tmp$x))

comb <- merge(tmp, df, by = 'x', all = T)
comb[is.na(comb)] <- 0
mm <- melt(comb, id.vars = 'x', value.name = 'prob', variable.name = 'Model')


ggplot(mm, aes(x = x, y = prob, group = Model, col = Model)) +
  geom_line(aes(lty = Model), lwd = 1) +
  theme_bw() +
  labs(x = "Number of bullying incidents", y = 'Probability',
  title = "Comparison of models") +
  scale_color_manual(values = c('brown', 'black', 'blue', 'green', 'red')) +
  scale_linetype_manual(values = c('dotted', 'solid', 'dotted', 'dotted', 'dashed')) +
  theme(legend.position=c(.65, .65), legend.background = element_rect(), 
        axis.title.y = element_text(angle = 0))

Other notes to self: Ignore these

pdf.pois <- function(x){
  tmp <- data.frame()
  for (i in 0:10){
    pp <- x^i * exp(-x) / factorial(i)
    tmp <- rbind(tmp, data.frame(i, pp))
  }
  return(round(tmp, 3))
}

dpois(0, 1.69)
## [1] 0.1845195
dpois(1, 1.69)
## [1] 0.311838
pdf.pois(1.69)
##     i    pp
## 1   0 0.185
## 2   1 0.312
## 3   2 0.264
## 4   3 0.148
## 5   4 0.063
## 6   5 0.021
## 7   6 0.006
## 8   7 0.001
## 9   8 0.000
## 10  9 0.000
## 11 10 0.000
nn <- matrix(NA, nrow = nobs(pois), ncol = max(articles$art) + 1)
pp <- fitted(pois)
for (i in 0:max(articles$art)){
  nn[, i+1] <- dpois(i, pp)
}
head(nn)
##             [,1]       [,2]       [,3]      [,4]       [,5]       [,6]
## [1,] 0.255039989 0.34847004 0.23806340 0.1084248 0.03703614 0.01012075
## [2,] 0.180268815 0.30885567 0.26458215 0.1511034 0.06472160 0.02217758
## [3,] 0.008802787 0.04166084 0.09858384 0.1555222 0.18400942 0.17417179
## [4,] 0.106495003 0.23851230 0.26709290 0.1993988 0.11164627 0.05000987
## [5,] 0.197692617 0.32046801 0.25974604 0.1403531 0.05687955 0.01844083
## [6,] 0.200521047 0.32220445 0.25886487 0.1386511 0.05569741 0.01789932
##             [,7]         [,8]         [,9]        [,10]        [,11]
## [1,] 0.002304723 0.0004498606 7.683253e-05 1.166433e-05 1.593738e-06
## [2,] 0.006332831 0.0015500112 3.319555e-04 6.319348e-05 1.082698e-05
## [3,] 0.137383425 0.0928846765 5.494926e-02 2.889529e-02 1.367524e-02
## [4,] 0.018667495 0.0059726844 1.672096e-03 4.161024e-04 9.319266e-05
## [5,] 0.004982226 0.0011537709 2.337889e-04 4.210906e-05 6.826055e-06
## [6,] 0.004793546 0.0011003489 2.210100e-04 3.945854e-05 6.340341e-06
##             [,12]        [,13]        [,14]        [,15]        [,16]
## [1,] 1.979618e-07 2.254018e-08 2.369033e-09 2.312066e-10 2.106038e-11
## [2,] 1.686357e-06 2.407705e-07 3.173181e-08 3.883308e-09 4.435530e-10
## [3,] 5.883693e-03 2.320473e-03 8.447748e-04 2.855753e-04 9.010257e-05
## [4,] 1.897451e-05 3.541367e-06 6.101114e-07 9.760288e-08 1.457313e-08
## [5,] 1.005938e-06 1.358890e-07 1.694475e-08 1.962011e-09 2.120335e-10
## [6,] 9.261717e-07 1.240172e-07 1.532887e-08 1.759355e-09 1.884664e-10
##             [,17]        [,18]        [,19]        [,20]
## [1,] 1.798471e-12 1.445479e-13 1.097227e-14 7.890416e-16
## [2,] 4.749638e-11 4.786814e-12 4.556265e-13 4.108567e-14
## [3,] 2.665170e-05 7.419657e-06 1.950828e-06 4.859295e-07
## [4,] 2.039926e-09 2.687492e-10 3.343922e-11 3.941705e-12
## [5,] 2.148220e-11 2.048444e-12 1.844785e-13 1.573933e-14
## [6,] 1.892716e-11 1.788991e-12 1.597008e-13 1.350595e-14
library(dplyr)
colMeans(nn) %>% round(2)
##  [1] 0.21 0.31 0.24 0.13 0.06 0.02 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## [16] 0.00 0.00 0.00 0.00 0.00
predprob(pois) %>% colMeans %>% round(2)
##    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15 
## 0.21 0.31 0.24 0.13 0.06 0.02 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
##   16   17   18   19 
## 0.00 0.00 0.00 0.00
#identical

colMeans(nn) %>% plot(., xlab = 'Number of articles')
colMeans(nn) %>% lines

Simulating:

set.seed(123)
sim.p <- rpois(915, 1.693)
sim.p %>% table
## .
##   0   1   2   3   4   5   6   7 
## 173 288 239 139  48  21   6   1
sim.p %>% table %>% prop.table %>% round(2)
## .
##    0    1    2    3    4    5    6    7 
## 0.19 0.31 0.26 0.15 0.05 0.02 0.01 0.00
sim.p %>% table %>% barplot() #quick and dirty

Dispersion tests:

AER::dispersiontest(m2) #tests for equidispersion
## 
##  Overdispersion test
## 
## data:  m2
## z = 5.638, p-value = 8.601e-09
## alternative hypothesis: true dispersion is greater than 1
## sample estimates:
## dispersion 
##   3.455817
sqrt(3.4558) 
## [1] 1.858978
#indicates how much larger the poisson standard should be

Compare standard errors in models 2 and 3 in example 2.

References:

Huang, F., & Cornell, D. (2012). Pick your Poisson: Regression models for count data in school violence research. Journal of School Violence, 11, 187-206. doi: 10.1080/15388220.2012.682010

Long, J. S. (1990). The origins of sex differences in science. Social Forces, 68, 1297-1315.

##END

Next
Previous
comments powered by Disqus