library(psych)
library(psychTools)
Exploratory factor analysis (EFA) is a technique to better understand the complexity of your data. Confirmatory Factor Analysis allows you to test hypotheses about the structure of your data. Both are useful procedures. We have previously discussed EFA, today we talk about CFA.
lavaan
package is both user friendly and very
powerfullavaan
has been described as the gateway drug to R. It
is a powerful system for doing latent variable analysis. It has
a very helfpul web page associated with it and an active user
community.
You must first go to CRAN and install lavaan. Once installed, make it active:
library(lavaan) #we will use this one today
## This is lavaan 0.6-15
## lavaan is FREE software! Please report any bugs.
##
## Attaching package: 'lavaan'
## The following object is masked from 'package:psych':
##
## cor2cov
Like most packages, lavaan
has built in example data
sets. We will use the one from the help menu for lavaan
.
The data come from Holzinger and Swineford who examined the performance
of elementary school children on a number of cognitive tests. The
example is for 9 of the variables. The complete data set and a
description of it is in the holzinger.swineford
dataset in
the psychTools
package.
See the much longer discussion in the class notes.
Regression
y ~ f1 + f2 + x1 + x2
f1 ~ f2 + f3
f2 ~ f3 + x1 + x2
Latent variables
f1 =~ y1 + y2 + y3
f2 =~ y4 + y5 + y6
f3 =~ y7 + y8 + y9 + y10
Variances and covariances
y1 ~~ y1
y1 ~~ y2
f1 ~~ f2
Intercepts
y1 ~ 1
f1 ~ 1
We enter the code into lavaan
by quoting it as a
string:
model =' here is some lavaan code'
Lets do this for the 9 variables in the HolzingerSwineford data set.
# The Holzinger and Swineford (1939) example
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
fit <- lavaan(HS.model, data=HolzingerSwineford1939,
auto.var=TRUE, auto.fix.first=TRUE,
auto.cov.lv.x=TRUE)
summary(fit, fit.measures=TRUE)
## lavaan 0.6.15 ended normally after 35 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 21
##
## Number of observations 301
##
## Model Test User Model:
##
## Test statistic 85.306
## Degrees of freedom 24
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 918.852
## Degrees of freedom 36
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.931
## Tucker-Lewis Index (TLI) 0.896
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -3737.745
## Loglikelihood unrestricted model (H1) -3695.092
##
## Akaike (AIC) 7517.490
## Bayesian (BIC) 7595.339
## Sample-size adjusted Bayesian (SABIC) 7528.739
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.092
## 90 Percent confidence interval - lower 0.071
## 90 Percent confidence interval - upper 0.114
## P-value H_0: RMSEA <= 0.050 0.001
## P-value H_0: RMSEA >= 0.080 0.840
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.065
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## visual =~
## x1 1.000
## x2 0.554 0.100 5.554 0.000
## x3 0.729 0.109 6.685 0.000
## textual =~
## x4 1.000
## x5 1.113 0.065 17.014 0.000
## x6 0.926 0.055 16.703 0.000
## speed =~
## x7 1.000
## x8 1.180 0.165 7.152 0.000
## x9 1.082 0.151 7.155 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## visual ~~
## textual 0.408 0.074 5.552 0.000
## speed 0.262 0.056 4.660 0.000
## textual ~~
## speed 0.173 0.049 3.518 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .x1 0.549 0.114 4.833 0.000
## .x2 1.134 0.102 11.146 0.000
## .x3 0.844 0.091 9.317 0.000
## .x4 0.371 0.048 7.779 0.000
## .x5 0.446 0.058 7.642 0.000
## .x6 0.356 0.043 8.277 0.000
## .x7 0.799 0.081 9.823 0.000
## .x8 0.488 0.074 6.573 0.000
## .x9 0.566 0.071 8.003 0.000
## visual 0.809 0.145 5.564 0.000
## textual 0.979 0.112 8.737 0.000
## speed 0.384 0.086 4.451 0.000
lavaan.diagram(fit) #show the results
This solution fixes the variances of the first variable for each factor to be 1. Alternatively, we can fix the variance of the factors to be 1.
# The Holzinger and Swineford (1939) example
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
fit <- lavaan(HS.model, data=HolzingerSwineford1939,
auto.var=TRUE, auto.fix.first=TRUE,
auto.cov.lv.x=TRUE,std.lv=TRUE) #note this specification
summary(fit, fit.measures=TRUE)
## lavaan 0.6.15 ended normally after 20 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 21
##
## Number of observations 301
##
## Model Test User Model:
##
## Test statistic 85.306
## Degrees of freedom 24
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 918.852
## Degrees of freedom 36
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.931
## Tucker-Lewis Index (TLI) 0.896
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -3737.745
## Loglikelihood unrestricted model (H1) -3695.092
##
## Akaike (AIC) 7517.490
## Bayesian (BIC) 7595.339
## Sample-size adjusted Bayesian (SABIC) 7528.739
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.092
## 90 Percent confidence interval - lower 0.071
## 90 Percent confidence interval - upper 0.114
## P-value H_0: RMSEA <= 0.050 0.001
## P-value H_0: RMSEA >= 0.080 0.840
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.065
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## visual =~
## x1 0.900 0.081 11.128 0.000
## x2 0.498 0.077 6.429 0.000
## x3 0.656 0.074 8.817 0.000
## textual =~
## x4 0.990 0.057 17.474 0.000
## x5 1.102 0.063 17.576 0.000
## x6 0.917 0.054 17.082 0.000
## speed =~
## x7 0.619 0.070 8.903 0.000
## x8 0.731 0.066 11.090 0.000
## x9 0.670 0.065 10.305 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## visual ~~
## textual 0.459 0.064 7.189 0.000
## speed 0.471 0.073 6.461 0.000
## textual ~~
## speed 0.283 0.069 4.117 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .x1 0.549 0.114 4.833 0.000
## .x2 1.134 0.102 11.146 0.000
## .x3 0.844 0.091 9.317 0.000
## .x4 0.371 0.048 7.779 0.000
## .x5 0.446 0.058 7.642 0.000
## .x6 0.356 0.043 8.277 0.000
## .x7 0.799 0.081 9.823 0.000
## .x8 0.488 0.074 6.573 0.000
## .x9 0.566 0.071 8.003 0.000
## visual 1.000
## textual 1.000
## speed 1.000
lavaan.diagram(fit) #show the results
psych
f3 <- fa(HolzingerSwineford1939[7:15], 3) #specify we want three factors
## Loading required namespace: GPArotation
f3 #show the results
## Factor Analysis using method = minres
## Call: fa(r = HolzingerSwineford1939[7:15], nfactors = 3)
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 MR3 MR2 h2 u2 com
## x1 0.20 0.59 0.03 0.48 0.52 1.2
## x2 0.04 0.51 -0.12 0.26 0.74 1.1
## x3 -0.06 0.69 0.02 0.45 0.55 1.0
## x4 0.85 0.02 0.01 0.73 0.27 1.0
## x5 0.89 -0.06 0.01 0.75 0.25 1.0
## x6 0.81 0.08 -0.01 0.69 0.31 1.0
## x7 0.04 -0.15 0.74 0.52 0.48 1.1
## x8 -0.03 0.13 0.69 0.52 0.48 1.1
## x9 0.03 0.38 0.46 0.46 0.54 2.0
##
## MR1 MR3 MR2
## SS loadings 2.24 1.34 1.27
## Proportion Var 0.25 0.15 0.14
## Cumulative Var 0.25 0.40 0.54
## Proportion Explained 0.46 0.28 0.26
## Cumulative Proportion 0.46 0.74 1.00
##
## With factor correlations of
## MR1 MR3 MR2
## MR1 1.00 0.32 0.21
## MR3 0.32 1.00 0.26
## MR2 0.21 0.26 1.00
##
## Mean item complexity = 1.2
## Test of the hypothesis that 3 factors are sufficient.
##
## df null model = 36 with the objective function = 3.05 with Chi Square = 904.1
## df of the model are 12 and the objective function was 0.08
##
## The root mean square of the residuals (RMSR) is 0.02
## The df corrected root mean square of the residuals is 0.03
##
## The harmonic n.obs is 301 with the empirical chi square 7.87 with prob < 0.8
## The total n.obs was 301 with Likelihood Chi Square = 22.56 with prob < 0.032
##
## Tucker Lewis Index of factoring reliability = 0.963
## RMSEA index = 0.054 and the 90 % confidence intervals are 0.016 0.088
## BIC = -45.93
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## MR1 MR3 MR2
## Correlation of (regression) scores with factors 0.94 0.84 0.85
## Multiple R square of scores with factors 0.89 0.71 0.72
## Minimum correlation of possible factor scores 0.78 0.42 0.45
fa.diagram(f3) #show the results, suppress small loadings
fa.diagram(f3,simple=FALSE) #don't suppress cross loading
fa.diagram(f3,cut =.1) #supress cross loadings
Unlike EFA, CFA allows us to fix certain parameters to particular values.
Try the HS problem with orthogonal factors
fit.HS.ortho <- cfa(HS.model, data=HolzingerSwineford1939, orthogonal=TRUE)
summary(fit.HS.ortho)
## lavaan 0.6.15 ended normally after 32 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 18
##
## Number of observations 301
##
## Model Test User Model:
##
## Test statistic 153.527
## Degrees of freedom 27
## P-value (Chi-square) 0.000
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## visual =~
## x1 1.000
## x2 0.778 0.141 5.532 0.000
## x3 1.107 0.214 5.173 0.000
## textual =~
## x4 1.000
## x5 1.133 0.067 16.906 0.000
## x6 0.924 0.056 16.391 0.000
## speed =~
## x7 1.000
## x8 1.225 0.190 6.460 0.000
## x9 0.854 0.121 7.046 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## visual ~~
## textual 0.000
## speed 0.000
## textual ~~
## speed 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .x1 0.835 0.118 7.064 0.000
## .x2 1.065 0.105 10.177 0.000
## .x3 0.633 0.129 4.899 0.000
## .x4 0.382 0.049 7.805 0.000
## .x5 0.416 0.059 7.038 0.000
## .x6 0.369 0.044 8.367 0.000
## .x7 0.746 0.086 8.650 0.000
## .x8 0.366 0.097 3.794 0.000
## .x9 0.696 0.072 9.640 0.000
## visual 0.524 0.130 4.021 0.000
## textual 0.969 0.112 8.640 0.000
## speed 0.437 0.097 4.520 0.000
Because these are nested models, can compare whether they differ (they do)
anova(fit,fit.HS.ortho)
##
## Chi-Squared Difference Test
##
## Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
## fit 24 7517.5 7595.3 85.305
## fit.HS.ortho 27 7579.7 7646.4 153.527 68.222 0.26875 3 1.026e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The benefit of having fewer parameters to estimate was overweighed by the lack of fit.