The other day in class, while talking about instances (e.g., analyzing clustered data or heteroskedastic residuals) where adjustments are required to the standard errors of a regression model, a student asked: how do we know what the ‘true’ standard error should be in the first place– which is necessary to know if it is too high or too low.
This short simulation illustrates that, over repeated sampling from a specified population, the standard deviaton of the regression coefficients can be used as the true standard errors.
Illustrates why OVB is an issue This issue plagues a lot of the analysis using secondary or observational data
Data are already existing We may have unobserved characteristics that were not collected To illustrate how OVB may affect regression results, we examine some simulated data.
Create some correlated data library(stargazer) #to create simpler regression output library(gendata) #to simulate data #1 create two correlated variables X1 and X2 (r = .
Researchers may want to simulate a two-level model (i.e., a hierarchical linear model, a random effects model, etc.). The following code illustrates how to generate the data and compares analytic techniques using MLM and OLS.
1. Simulate the data set.seed(1234) #for reproducability nG <- 20 #number of groups nJ <- 30 #cluster size W1 <- 2 #level 2 coeff X1 <- 3 #level 1 coeff tmp2 <- rnorm(nG) #generate 20 random numbers, m = 0, sd = 1 l2 <- rep(tmp2, each = nJ) #all units in l2 have the same value group <- gl(nG, k = nJ) #creating cluster variable tmp2 <- rnorm(nG) #error term for level 2 err2 <- rep(tmp2, each = nJ) #all units in l2 have the same value l1 <- rnorm(nG * nJ) #total sample size is nG * nJ err1 <- rnorm(nG * nJ) #level 1 #putting it all together y <- W1 * l2 + X1 * l1 + err2 + err1 dat <- data.