Illustrates why OVB is an issue This issue plagues a lot of the analysis using secondary or observational data
Data are already existing We may have unobserved characteristics that were not collected To illustrate how OVB may affect regression results, we examine some simulated data.
Create some correlated data library(stargazer) #to create simpler regression output library(gendata) #to simulate data #1 create two correlated variables X1 and X2 (r = .
Variance/Covariance To start off, the sample variance formula is:
[s^2 = \frac{\sum_{i=1}^{n}(x_i - \overline{x})^2} {n - 1 }]
First of all, (x - \overline{x}) is a deviation score (deviation from what? deviation from the mean). Summing the deviations will just get us zero so the deviations are squared and then added together. The numerator of this formula is then called the sum of squared deviations which is literally what it is.