Sim
In our module on regression diagnostics, I mentioned 1) that at times (with clustered data) standard errors may be misestimated and may be too low, resulting in a greater chance of making a Type I error (i.e., claiming statistically significant results when they should not be). In our ANCOVA session, I also indicated that 2) covariates are helpful because they help to lower the (standard) error in the model and increase power.
Illustrates why OVB is an issue This issue plagues a lot of the analysis using secondary or observational data
Data are already existing We may have unobserved characteristics that were not collected To illustrate how OVB may affect regression results, we examine some simulated data.
Create some correlated data library(stargazer) #to create simpler regression output library(gendata) #to simulate data #1 create two correlated variables X1 and X2 (r = .
Researchers may want to simulate a two-level model (i.e., a hierarchical linear model, a random effects model, etc.). The following code illustrates how to generate the data and compares analytic techniques using MLM and OLS.
1. Simulate the data set.seed(1234) #for reproducability nG <- 20 #number of groups nJ <- 30 #cluster size W1 <- 2 #level 2 coeff X1 <- 3 #level 1 coeff tmp2 <- rnorm(nG) #generate 20 random numbers, m = 0, sd = 1 l2 <- rep(tmp2, each = nJ) #all units in l2 have the same value group <- gl(nG, k = nJ) #creating cluster variable tmp2 <- rnorm(nG) #error term for level 2 err2 <- rep(tmp2, each = nJ) #all units in l2 have the same value l1 <- rnorm(nG * nJ) #total sample size is nG * nJ err1 <- rnorm(nG * nJ) #level 1 #putting it all together y <- W1 * l2 + X1 * l1 + err2 + err1 dat <- data.