# R

## Computing the point estimates and standard errors with mixed models using matrices

1. Estimating the OLS model using matrices As a starting point, just begin with using an OLS model and estimating the results using different matrices. The standard ordinary least squares model (OLS) can be written as: (y = XB + \varepsilon) where (X) is a design matrix, (B) is a column vector of coefficents, plus some random error (\varepsilon). Of interest is (B) which is obtained through: (B = (X'X)^{-1}(X'y)).

## Fun with PCA using Images

Creating grayscale images using PCA The following links were helpful: 1. https://cran.r-project.org/web/packages/imager/vignettes/gettingstarted.html 2. https://stats.stackexchange.com/questions/229092/how-to-reverse-pca-and-reconstruct-original-variables-from-several-principal-com library(tidyverse) #for ggplot, %>% library(imager) #to read in the jpg image1 <- load.image("C:/Users/huangf/Box Sync/Fun/snorlax/01_data/snorlax_g2.jpg") Can download the image from: http://faculty.missouri.edu/huangf/data/snorlax_g2.jpg Somehow is not reading directly from the website… To view the image plot(image1) ## NOTE 0,0 is in the upper left corner image1 ## Image. Width: 500 pix Height: 550 pix Depth: 1 Colour channels: 1 dim(image1) ## [1] 500 550 1 1 #500 x 550 pixels xwidth = dim(image1)[1] #just saving these yheight = dim(image1)[2] dat <- as.

## Principal Components Analysis using R

[Rough notes: Let me know if there are corrections] Principal components analysis (PCA) is a convenient way to reduce high-dimensional data into a smaller number number of ‘components.’ PCA has been referred to as a data reduction/compression technique (i.e., dimensionality reduction). PCA is often used as a means to an end and is not the end in itself. For example, instead of performing a regression with six (and highly correlated) variables, we may be able to compress the data into one or two meaningful components instead and use these in our models instead of the original six variables.

## Installing R and RStudio on a Pixel Slate

Instructions how to install R and R Studio on a Pixel Slate (or a Chromebook) The Google Pixel Slate allow users to run Linux applications. Getting R and R studio up and running is (relatively) straightforward (after a little bit of trial and error). First, enable Linux by going to the settings of Chrome and then turning on the Linux virtual machine (see below): Click install. After, the Linux terminal should appear.

## Simple Labelled Barchart

Sometimes, a simple side-by-side/comparative bar plot (with labels) is all that is needed to get your point across. For that, Excel can easily plot that in a few seconds with minimum fuss (see figure below). Now replicating that in R seems pretty straightforward. However, several small details require some manual specification. First, let’s provide some data to plot: x <- c(-1, 0, 1) Black <- c(23.6, 21.4, 19.4) White <- c(15.

## Instrumental variables within an SEM framework

Earlier this year, I wrote an article on using instrumental variables (IV) to analyze data from randomized experiments with imperfect compliance (read the manuscript for full details; link updated; it’s open access). In the article, I described the steps of IV estimation and the logic behind it. The sample code using two stage least squares regression (the correct analysis) is shown below (see article for specifics): library(ivpack) dat <- read.csv('http://faculty.missouri.edu/huangf/data/pubdata/pare/ivexample.csv&#39;) head(dat) ## assign takeup y ## 1 0 0 0 ## 2 0 0 0 ## 3 0 0 0 ## 4 0 0 0 ## 5 0 0 0 ## 6 0 0 0 tail(dat) ## assign takeup y ## 195 1 1 9 ## 196 1 1 10 ## 197 1 1 10 ## 198 1 1 12 ## 199 1 1 11 ## 200 1 1 9 summary(dat) ## assign takeup y ## Min.

## Multiple imputation in R (with regression output, clustering, and weights)

The Problem There are several guides on using multiple imputation in R. However, analyzing imputed models with certain options (i.e., with clustering, with weights) is a bit more challenging. More challenging even (at least for me), is getting the results to display a certain way that can be used in publications (i.e., showing regressions in a hierarchical fashion or multiple models side by side) that can be exported to MS Word.

## Class Example: Standard Errors Too Small

In our module on regression diagnostics, I mentioned 1) that at times (with clustered data) standard errors may be misestimated and may be too low, resulting in a greater chance of making a Type I error (i.e., claiming statistically significant results when they should not be). In our ANCOVA session, I also indicated that 2) covariates are helpful because they help to lower the (standard) error in the model and increase power.