---
title: "350.week9b"
author: "William Revelle"
date: "05/15/2023"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
options(width=100)
```

```{r}
library(psych)
library(psychTools)

```
# Exploratory and Confirmatory Factor Analysis

Exploratory *factor analysis* (EFA) is a technique to better understand the complexity of your data.  *Confirmatory Factor Analysis* allows you to test hypotheses about the structure of your data.  Both are useful procedures.  We have previously discussed EFA, today we talk about CFA. 

# the `lavaan` package is both user friendly and very powerful

`lavaan` has been described as the gateway drug to R.  It is a powerful system for doing *latent variable analysis*.  It has a very helfpul web page associated with it and an active user community.

You must first go to CRAN and install lavaan.  Once installed, make it active:
  
```{r}
library(lavaan)  #we will use this one today
```
Like most packages, `lavaan` has built in example data sets.  We will use the one from the help menu for `lavaan`. The data come from Holzinger and Swineford who examined the performance of elementary school children on a number of cognitive tests.  The example is for 9 of the variables.  The complete data set and a description of it is in the `holzinger.swineford` dataset in the `psychTools` package.

See the much longer discussion in the class notes.  

## lavaan syntax is very straight forward:

    Regression 
      y ~ f1 + f2 + x1 + x2
      f1 ~ f2 + f3
      f2 ~ f3 + x1 + x2
  
     Latent variables 
     f1 =~ y1 + y2 + y3
     f2 =~ y4 + y5 + y6
     f3 =~ y7 + y8 + y9 + y10

      Variances and covariances
      y1 ~~ y1
      y1 ~~ y2
      f1 ~~ f2

       Intercepts 
       y1 ~ 1
       f1 ~ 1


We enter the code into `lavaan` by quoting it as a string:

      model ='  here is some lavaan code'
      
Lets do this for the 9 variables in the HolzingerSwineford data set.


```{r}
# The Holzinger and Swineford (1939) example
HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

fit <- lavaan(HS.model, data=HolzingerSwineford1939,
              auto.var=TRUE, auto.fix.first=TRUE,
              auto.cov.lv.x=TRUE)
summary(fit, fit.measures=TRUE)
lavaan.diagram(fit) #show the results
```

This solution fixes the variances of the first variable for each factor to be 1.  Alternatively, we can fix the variance of the factors to be 1.

```{r}
# The Holzinger and Swineford (1939) example
HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

fit <- lavaan(HS.model, data=HolzingerSwineford1939,
              auto.var=TRUE, auto.fix.first=TRUE,
              auto.cov.lv.x=TRUE,std.lv=TRUE)    #note this specification
summary(fit, fit.measures=TRUE)
lavaan.diagram(fit) #show the results
```


## Compare the CFA solution to the EFA solution from `psych`

```{r efa}
f3 <- fa(HolzingerSwineford1939[7:15], 3) #specify we want three factors
f3  #show the results
fa.diagram(f3) #show the results, suppress small loadings
fa.diagram(f3,simple=FALSE)  #don't suppress cross loading
fa.diagram(f3,cut =.1) #supress cross loadings
```

## Fixing parameters

Unlike EFA, CFA allows us to fix certain parameters to particular values.

Try the HS problem with orthogonal factors

```{r orthogonal}
 fit.HS.ortho <- cfa(HS.model, data=HolzingerSwineford1939, orthogonal=TRUE)
summary(fit.HS.ortho)
```

Because these are  nested models, can compare whether they differ (they do)

```{r}
anova(fit,fit.HS.ortho)
```
The benefit of having fewer parameters to estimate was overweighed by the lack of fit.