--- title: "350.week9b" author: "William Revelle" date: "05/15/2023" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) options(width=100) ``` ```{r} library(psych) library(psychTools) ``` # Exploratory and Confirmatory Factor Analysis Exploratory *factor analysis* (EFA) is a technique to better understand the complexity of your data. *Confirmatory Factor Analysis* allows you to test hypotheses about the structure of your data. Both are useful procedures. We have previously discussed EFA, today we talk about CFA. # the `lavaan` package is both user friendly and very powerful `lavaan` has been described as the gateway drug to R. It is a powerful system for doing *latent variable analysis*. It has a very helfpul web page associated with it and an active user community. You must first go to CRAN and install lavaan. Once installed, make it active: ```{r} library(lavaan) #we will use this one today ``` Like most packages, `lavaan` has built in example data sets. We will use the one from the help menu for `lavaan`. The data come from Holzinger and Swineford who examined the performance of elementary school children on a number of cognitive tests. The example is for 9 of the variables. The complete data set and a description of it is in the `holzinger.swineford` dataset in the `psychTools` package. See the much longer discussion in the class notes. ## lavaan syntax is very straight forward: Regression y ~ f1 + f2 + x1 + x2 f1 ~ f2 + f3 f2 ~ f3 + x1 + x2 Latent variables f1 =~ y1 + y2 + y3 f2 =~ y4 + y5 + y6 f3 =~ y7 + y8 + y9 + y10 Variances and covariances y1 ~~ y1 y1 ~~ y2 f1 ~~ f2 Intercepts y1 ~ 1 f1 ~ 1 We enter the code into `lavaan` by quoting it as a string: model =' here is some lavaan code' Lets do this for the 9 variables in the HolzingerSwineford data set. ```{r} # The Holzinger and Swineford (1939) example HS.model <- ' visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 ' fit <- lavaan(HS.model, data=HolzingerSwineford1939, auto.var=TRUE, auto.fix.first=TRUE, auto.cov.lv.x=TRUE) summary(fit, fit.measures=TRUE) lavaan.diagram(fit) #show the results ``` This solution fixes the variances of the first variable for each factor to be 1. Alternatively, we can fix the variance of the factors to be 1. ```{r} # The Holzinger and Swineford (1939) example HS.model <- ' visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 ' fit <- lavaan(HS.model, data=HolzingerSwineford1939, auto.var=TRUE, auto.fix.first=TRUE, auto.cov.lv.x=TRUE,std.lv=TRUE) #note this specification summary(fit, fit.measures=TRUE) lavaan.diagram(fit) #show the results ``` ## Compare the CFA solution to the EFA solution from `psych` ```{r efa} f3 <- fa(HolzingerSwineford1939[7:15], 3) #specify we want three factors f3 #show the results fa.diagram(f3) #show the results, suppress small loadings fa.diagram(f3,simple=FALSE) #don't suppress cross loading fa.diagram(f3,cut =.1) #supress cross loadings ``` ## Fixing parameters Unlike EFA, CFA allows us to fix certain parameters to particular values. Try the HS problem with orthogonal factors ```{r orthogonal} fit.HS.ortho <- cfa(HS.model, data=HolzingerSwineford1939, orthogonal=TRUE) summary(fit.HS.ortho) ``` Because these are nested models, can compare whether they differ (they do) ```{r} anova(fit,fit.HS.ortho) ``` The benefit of having fewer parameters to estimate was overweighed by the lack of fit.