Fun with residuals

Nov 21, 2022 · 2 min read

Random stuff II: Plotting residuals

I was poking around my old teaching files and I found an old file and I wasn’t sure what it was:

dat <- read.table("https://raw.githubusercontent.com/flh3/pubdata/main/Stefanski_2007/mizzo_1_data_yx1x5.txt")
head(dat)

##       V1       V2      V3       V4      V5       V6
## 1 -0.224  0.00546  0.3803  0.01351  0.2092  0.14671
## 2  0.844  0.10737 -0.0265  0.04586  0.0130 -0.02719
## 3  1.062  0.09112  0.1813  0.05017 -0.1887 -0.01208
## 4 -1.042  0.44049  0.2460  0.00542 -0.2129  0.10152
## 5  0.157 -0.17051  0.1476  0.08363 -0.0953 -0.00785
## 6 -0.135  0.06160 -0.8041 -0.02595  0.2917 -0.07838

dim(dat)

## [1] 3785    6

Turns out it was an old data file I had used in class discussing regression diagnostics. We often talk about the assumption of the homoskedasticity of residuals and we graphically depict that by plotting the fitted values on the X axis and the residuals on the y axis. If all is well, we are told that we should have any discernible pattern.

So this is a dataset of 3,785 observations and 6 variables. We can predict the first variable (V1) using all the other variables in the dataset (V2 to V6).