Computing the point estimates and standard errors with mixed models using matrices

1. Estimating the OLS model using matrices

As a starting point, just begin with using an OLS model and estimating the results using different matrices. The standard ordinary least squares model (OLS) can be written as: \(y = XB + \varepsilon\) where \(X\) is a design matrix, \(B\) is a column vector of coefficents, plus some random error \(\varepsilon\).

Of interest is \(B\) which is obtained through: \(B = (X'X)^{-1}(X'y)\). For example (compare results using matrices and just using the coef function):

ols1 <- lm(mpg ~ wt + am, data = mtcars) #toy example 
X <- model.matrix(ols1)
y <- mtcars$mpg
(betas <- solve(t(X) %*% X) %*% t(X) %*% y)
##                    [,1]
## (Intercept) 37.32155131
## wt          -5.35281145
## am          -0.02361522
coef(ols1)
## (Intercept)          wt          am 
## 37.32155131 -5.35281145 -0.02361522

The variance/covariance matrix of the betas, \(Var(\hat{\beta})\), of which the square root of the diagonal is the standard error, is: \(\sigma^2(X'X)^{-1}\). \(\sigma^2\) is based on the residuals and the residuals are merely the observed value less the predicted value:

modres <- residuals(ols1)
modres2 <- y - X %*% betas #manual computation
## compare
head(modres)
##         Mazda RX4     Mazda RX4 Wag        Datsun 710    Hornet 4 Drive 
##        -2.2735701        -0.9086032        -2.0794135         1.2877375 
## Hornet Sportabout           Valiant 
##        -0.2078799        -0.7008237
head(modres2)
##                         [,1]
## Mazda RX4         -2.2735701
## Mazda RX4 Wag     -0.9086032
## Datsun 710        -2.0794135
## Hornet 4 Drive     1.2877375
## Hornet Sportabout -0.2078799
## Valiant           -0.7008237

\(\sigma^2\) is the sum of squared residuals divided by \(N - k\) where k is the number of predictors (including the intercept).

sigma(ols1)^2 #just using a function
## [1] 9.597231
n <- nobs(ols1)
k <- ols1$rank
sum(modres^2) / (n - k) #manual computation
## [1] 9.597231

The sum of squared residuals can be computed using \(u'u\) where \(u\) contains the residuals. (we use as.numeric to convert it into a scalar that we can use to multiply later on).

u <- modres2
(sigma2 <- as.numeric(t(u) %*% u / (n - k)))
## [1] 9.597231
#manual computation using matrices

Using \(\sigma^2\), we can obtain the variance/covariance matrix:

(vcm <- sigma2 * solve(t(X) %*% X)) #manual computation
##             (Intercept)         wt         am
## (Intercept)    9.330816 -2.3417207 -3.6849277
## wt            -2.341721  0.6213282  0.8436983
## am            -3.684928  0.8436983  2.3890195
vcov(ols1) #compare when using a function
##             (Intercept)         wt         am
## (Intercept)    9.330816 -2.3417207 -3.6849277
## wt            -2.341721  0.6213282  0.8436983
## am            -3.684928  0.8436983  2.3890195

The square root of the diagonal is the standard error. Can compare to the results using the lm function.

se <- sqrt(diag(vcm))
tstat <- betas / se
pval <-  2 * pt(-abs(tstat), df = n - k)
data.frame(betas, se, tstat, pval)
##                   betas        se       tstat         pval
## (Intercept) 37.32155131 3.0546385 12.21799285 5.843477e-13
## wt          -5.35281145 0.7882438 -6.79080719 1.867415e-07
## am          -0.02361522 1.5456453 -0.01527855 9.879146e-01
summary(ols1)$coefficients #compare results
##                Estimate Std. Error     t value     Pr(>|t|)
## (Intercept) 37.32155131  3.0546385 12.21799285 5.843477e-13
## wt          -5.35281145  0.7882438 -6.79080719 1.867415e-07
## am          -0.02361522  1.5456453 -0.01527855 9.879146e-01

2. Using matrices to reconstruct the mixed model coefficients

Note, the procedure outlined below does not estimate the coefficients- which is done using an iterative process. The steps below merely reconstruct the mixed model coefficients given the already estimated elements.

For mixed models (a.k.a. multilevel models, hierarchical linear models, random coefficient models), this can be expressed as: \(y = X\beta + Zb + \varepsilon\). The difference is the presence of the random effects terms which use \(Zb\). \(Z\) is an \(n \times q\) design matrix of random effects where \(q\) is the number of random effects. \(b\) is a \(q \times 1\) vector of random effects.1 Sometimes \(b\) is shown as \(u\) but I used \(u\) already to represent the residuals so didn’t want to mix that up.

Using the bdf dataset in the mlmRev package, we can use the lmer function in lme4. There are 2,287 students from 131 schools.

library(lme4)
data(bdf, package = 'mlmRev')
mlm1 <- lmer(aritPOST ~ aritPRET + sex + 
 schoolSES + (sex | schoolNR), data = bdf)

With a random intercept model, there is only one (q = 1) random effect. With a random coefficient model (in this case sex), we have two (q = 2). We just specified in this example that slopes can randomly vary by sex. NOTE: this function (as written) only works if the original data is sorted by the clustering variable and there is no missing data. That’s not an assumption for mixed models, just a limitation of my presentation/syntax. If you need to sort your data by the clusterid, can use dat <- dat[order(dat$cluster), ].

# The data are sorted by cluster already
X <- model.matrix(mlm1) #the design matrix
B <- fixef(mlm1) #to extract the coefficients from lme4-- just as a check, don't use it here
y <- mlm1@resp$y #outcome
Z <- getME(mlm1, 'Z') #sparse Z design matrix
b <- getME(mlm1, 'b') #random effects
Gname <- names(getME(mlm1, 'l_i')) #name of clustering variable
js <- table(mlm1@frame[, Gname]) #how many observation in each cluster

\(Z\) is a sparse matrix (just zeroes in the off diagonal).

qq <- getME(mlm1, 'q') #columns in Z matrix
NG <- ngrps(mlm1)
nre <- getME(mlm1, 'p_i') #qq/NG --> number of random effects  
inde <- cumsum(js) #number per group summed to create an index
cols <- seq(1, (NG * nre), by = nre) #dynamic columns for Z matrix.
#if a random intercept model, cols is only a sequence from 1
#to NG
G <- bdiag(VarCorr(mlm1)) #G matrix

\(G\) is the variance/covariance matrix of the random effects (estimated using lmer). The following is used to create the \(V\) matrix, Probably other (better and faster) ways to to do this but I think this is the most “transparent” and works. The coefficients in actuality are estimated using an iterative process (which is different from the approach using OLS).

ml <- list() #empty list to store matrices

The ml object is an empty list of matrices. The following creates a matrix of \(ZGZ' + r\) for each cluster. \(G\) is the covariance matrix for the random effects– this matrix is used several times (and is just one matrix).

G
## 2 x 2 sparse Matrix of class "dgCMatrix"
##             (Intercept)       sex1
## (Intercept)   4.9241939 -0.3289331
## sex1         -0.3289331  0.9752748

If a random intercept model, \(G\) is a single number only. If there is one random slope, this becomes a \(2 \times 2\) matrix. \(r\) is composed of the a diagonal matrix with the residual variance (sigma(mlm1)^2) on the diagonal.

for (i in 1:NG){

    if (i == 1) { #if first cluster
      st = 1} else { #start at row 1
        st = inde[i - 1] + 1}
    
    end = st + js[i] - 1
    
    nc <- cols[i]
    ncend <- cols[i] + (nre - 1)
    
Zi <- Z[st:end, nc:ncend] #depends on how many obs in a cluster and 
# how many rand effects
ml[[i]] <- Zi %*% G %*% t(Zi) + diag(sigma(mlm1)^2, nrow = js[i]) #ZGZ' + r
}

Each cluster/group has its own matrix. For example, cluster 2 has 7 observations and cluster 3 has 5 observations. This indicates how large the matrix is for those clusters.

Elements of matrix 2 and 3 in the list (which is cluster 2 and 3):

ml[[2]] #for the second cluster
## 7 x 7 sparse Matrix of class "dgCMatrix"
##           26        27        28        29        30        31        32
## 26 24.930147  4.595261  4.595261  5.241602  4.595261  5.241602  4.595261
## 27  4.595261 24.612739  4.924194  4.595261  4.924194  4.595261  4.924194
## 28  4.595261  4.924194 24.612739  4.595261  4.924194  4.595261  4.924194
## 29  5.241602  4.595261  4.595261 24.930147  4.595261  5.241602  4.595261
## 30  4.595261  4.924194  4.924194  4.595261 24.612739  4.595261  4.924194
## 31  5.241602  4.595261  4.595261  5.241602  4.595261 24.930147  4.595261
## 32  4.595261  4.924194  4.924194  4.595261  4.924194  4.595261 24.612739
ml[[3]] #for the third cluster
## 5 x 5 sparse Matrix of class "dgCMatrix"
##           33        34        35        36        37
## 33 24.612739  4.924194  4.595261  4.595261  4.595261
## 34  4.924194 24.612739  4.595261  4.595261  4.595261
## 35  4.595261  4.595261 24.930147  5.241602  5.241602
## 36  4.595261  4.595261  5.241602 24.930147  5.241602
## 37  4.595261  4.595261  5.241602  5.241602 24.930147

We can use the bdiag function to create a block diagonal matrix which accepts a list of matrices as an input. This ml is used to create a V matrix.

Vm <- bdiag(ml) #makes a block diagonal weighting matrix
image(Vm) #shows that this is a sparse matrix

\(Vm\) is a block diagonal (sparse) matrix (V matrix). This matrix is used when computing the gamma coefficients and standard errors.

If using nlme, can extract the matrices using a function to put together the \(V\) matrix. This is actually simpler using:

### THIS IS JUST IF THE MODEL IS ESTIMATED USING LME
### NOT RUN IN THIS EXAMPLE

ml <- list()
for (j in 1:NG){
  test <- getVarCov(mlm1, individuals = j, type = 'marginal')
  ml[[j]] <- test[[1]]
}
     
Vm <- bdiag(ml)

For the fixed effects: \((X'V^{-1}X)^{-1} X'V^{-1}y\).

gammas <- solve(t(X) %*% solve(Vm) %*% X) %*% (t(X) %*% solve(Vm) %*% y)
gammas #manually computed using existing matrices
## 4 x 1 Matrix of class "dgeMatrix"
##            [,1]
## [1,]  1.2277854
## [2,]  1.1250565
## [3,] -0.8269814
## [4,]  0.2660345
fixef(mlm1) #using the mlm object
## (Intercept)    aritPRET        sex1   schoolSES 
##   1.2277854   1.1250565  -0.8269814   0.2660345

The variance/covariance matrix (used to get the standard errors) is: \((X'V^{-1}X)^{-1}\)

mlm.vc <- solve(t(X) %*% solve(Vm) %*% X)
mlm.vc #manual computation
## 4 x 4 Matrix of class "dgeMatrix"
##              [,1]          [,2]          [,3]          [,4]
## [1,]  0.966855981 -0.0083507665 -0.0195023895 -0.0437988235
## [2,] -0.008350766  0.0008909363  0.0001694822 -0.0001181135
## [3,] -0.019502389  0.0001694822  0.0449218913 -0.0001316506
## [4,] -0.043798823 -0.0001181135 -0.0001316506  0.0024375222
vcov(mlm1) #automatic
## 4 x 4 Matrix of class "dpoMatrix"
##              (Intercept)      aritPRET          sex1     schoolSES
## (Intercept)  0.966855981 -0.0083507665 -0.0195023895 -0.0437988235
## aritPRET    -0.008350766  0.0008909363  0.0001694822 -0.0001181135
## sex1        -0.019502389  0.0001694822  0.0449218913 -0.0001316506
## schoolSES   -0.043798823 -0.0001181135 -0.0001316506  0.0024375222

Putting this together:

gammas <- as.numeric(gammas)
mlm.se <- as.numeric(sqrt(diag(mlm.vc)))
tstat <- gammas / mlm.se
data.frame(vars = names(fixef(mlm1)), estimate = gammas, se = mlm.se, t = tstat)
##          vars   estimate         se         t
## 1 (Intercept)  1.2277854 0.98328835  1.248652
## 2    aritPRET  1.1250565 0.02984856 37.692159
## 3        sex1 -0.8269814 0.21194785 -3.901816
## 4   schoolSES  0.2660345 0.04937127  5.388447
summary(mlm1)$coef
##               Estimate Std. Error   t value
## (Intercept)  1.2277854 0.98328835  1.248652
## aritPRET     1.1250565 0.02984856 37.692159
## sex1        -0.8269814 0.21194785 -3.901816
## schoolSES    0.2660345 0.04937127  5.388447

Results are the same.

END OF NOTES

Related

Next
Previous
comments powered by Disqus