# R

## Robust standard errors in mixed models

In a recent article in Multivariate Behavioral Research, we (Huang, Wiedermann, and Zhang; HWZ; doi: 10.108000273171.2022.2077290) discuss a robust standard error that can be used with mixed models that accounts for violations of homogeneity. Note that these robust standard errors have been around for years though are not always provided in statistical software. These can also be computed using the CR2 package or the clubSandwich package. This page shows how to compute the traditional Liang and Zeger (1986) robust standard errors (CR0) and the CR2 estimator- see Bell and McCaffrey (2002) as well as McCaffrey, Bell, and Botts (2001) (BM and MBB).

## Logistic regression from scratch (Newton Raphson and Fisher Scoring)

In an earlier post, I had shown this using iteratively reweighted least squares (IRLS). This is just an alternative method using Newton Raphson and the Fisher scoring algorithm. For further details, you can look here as well. library(MLMusingR) data(suspend) m1 <- glm(sus ~ male + gpa * frpl + fight + frmp.c * pminor.c, data = suspend, family = binomial) ### extracting raw components dat <- model.frame(m1) fml <- formula(m1) X <- model.

## Getting GEE estimates using iteratively reweighted least squares (IRLS)

I show this in a recent JEBS article on using Generalized Estimating Equations (GEEs). Shown below is some annotated syntax and examples. Huang, F. (2021). Analyzing cross-sectionally clustered data using generalized estimating equations. Journal of Educational and Behavioral Statistics. doi: 10.310210769986211017480 In the original paper draft, I had a section which showed how much more widely used mixed models (i.e., MLMs, HLMs) were compared to GEEs but was asked to remove that (to save space).

## Using REML and ML for estimation

More notes to self… Obtaining estimates of the unknown parameters in multilevel models is often done by optimizing a likelihood function. The estimates are the values that maximize the likelihood function given certain distributional assumptions. The likelihood function differs depending on whether maximum (ML) or restricted maximum (REML) likelihood is used. For ML, the log likelihood function to be maximized is: [ \ell{ML}(\theta)=-0.5n \times ln(2\pi) -0.5 \times \sum{i}{ln(det(V_i))} - 0.

## Extracting the V matrix for MLMs

Notes to self (and anyone else who might find this useful). With the general linear mixed models (to simplify, I am just omitting super/subscripts): [Y = X\beta + Zu + e] where we assume (u \sim MVN(0, G)) and (e \sim MVN(0, R)). (V) is: [V = ZGZ^T + R] Software estimates (V) iteratively and maximizes the likelihood (the function of which depends on whether ML or REML is used).

## Computing the point estimates and standard errors with mixed models using matrices

1. Estimating the OLS model using matrices As a starting point, just begin with using an OLS model and estimating the results using different matrices. The standard ordinary least squares model (OLS) can be written as: (y = XB + \varepsilon) where (X) is a design matrix, (B) is a column vector of coefficents, plus some random error (\varepsilon). Of interest is (B) which is obtained through: (B = (X'X)^{-1}(X'y)).

## Logistic regression (from scratch) using matrices

Logistic regression is a modeling technique that has attracted a lot of attention, especially from folks interested in classification, machine learning, and prediction using binary outcomes. One of the neat things about using R is that users can revisit commonly used procedures and figure out how they work. What follows are some logistic regression notes (this is not on interpreting results). Even though I’ve written about how other alternatives might be simpler than logistic regression or that there are challenges when comparing coefficients across models, it is interesting to see how the procedure works.

## Fun with PCA using Images

Creating grayscale images using PCA The following links were helpful: 1. https://cran.r-project.org/web/packages/imager/vignettes/gettingstarted.html 2. https://stats.stackexchange.com/questions/229092/how-to-reverse-pca-and-reconstruct-original-variables-from-several-principal-com library(tidyverse) #for ggplot, %>% library(imager) #to read in the jpg image1 <- load.image("C:/Users/huangf/Box Sync/Fun/snorlax/01_data/snorlax_g2.jpg") Can download the image from: http://faculty.missouri.edu/huangf/data/snorlax_g2.jpg Somehow is not reading directly from the website… To view the image plot(image1) ## NOTE 0,0 is in the upper left corner image1 ## Image. Width: 500 pix Height: 550 pix Depth: 1 Colour channels: 1 dim(image1) ## [1] 500 550 1 1 #500 x 550 pixels xwidth = dim(image1)[1] #just saving these yheight = dim(image1)[2] dat <- as.

## Principal Components Analysis using R

[Rough notes: Let me know if there are corrections] Principal components analysis (PCA) is a convenient way to reduce high-dimensional data into a smaller number number of ‘components.’ PCA has been referred to as a data reduction/compression technique (i.e., dimensionality reduction). PCA is often used as a means to an end and is not the end in itself. For example, instead of performing a regression with six (and highly correlated) variables, we may be able to compress the data into one or two meaningful components instead and use these in our models instead of the original six variables.

## Multiple imputation in R (with regression output, clustering, and weights)

The Problem There are several guides on using multiple imputation in R. However, analyzing imputed models with certain options (i.e., with clustering, with weights) is a bit more challenging. More challenging even (at least for me), is getting the results to display a certain way that can be used in publications (i.e., showing regressions in a hierarchical fashion or multiple models side by side) that can be exported to MS Word.