Factor Analysis Homework

Preliminaries

The first step is to make sure that you have the most recent version of the psych package and that you have made it active.

#install.packages("psych",repos="http://personality-project.org/r", type="source")
library(psych) #make it active
sessionInfo()   #psych should be 1.8.4
## R Under development (unstable) (2018-03-26 r74469)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.4
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] psych_1.8.4
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.16    lattice_0.20-35 digest_0.6.15   rprojroot_1.3-2 grid_3.6.0      nlme_3.1-136   
##  [7] backports_1.1.2 magrittr_1.5    evaluate_0.10.1 stringi_1.1.7   rmarkdown_1.9   tools_3.6.0    
## [13] foreign_0.8-70  stringr_1.3.0   yaml_2.1.18     parallel_3.6.0  compiler_3.6.0  mnormt_1.5-5   
## [19] htmltools_0.3.6 knitr_1.20

Problem 1: A mood data set

Emotions may be described either as discrete emotions or in dimensional terms. The Motivational State Questionnaire (MSQ) was developed to study emotions in laboratory and field settings. The data can be well described in terms of a two dimensional solution of energy vs tiredness and tension versus calmness. Additional items include what time of day the data were collected and a few personality questionnaire scores. 3082 unique participants took the MSQ at least once, 2753 at least twice, 446 three times, and 181 four times. The 3032 also took the sai state anxiety inventory at the same time.

Here we examine the factor structure of 12 mood items.

The first step is to select them from the larger set and then to describe them.

We use the acs function in psych to convert a string into a character vector. We alphabetize them as a demonstration of ordering.

The msqR data set includes multiple occasions. We choose just the first time point.

Problem 1

  1. select a subset of mood data (active,energetic,vigorous,sleepy,tired,drowsy,intense, jittery, fearful,at.rest,calm,still)

  2. Find the descriptive statistics

  3. Examine the correlation matrix

  4. how many factors are in the data?

  5. What are they

  6. Plot them

  7. Reorganize the correlation matrix for a cleaner structure

Problem 2 Ability

16 multiple choice ability items 1525 subjects taken from the Synthetic Aperture Personality Assessment (SAPA) web based personality assessment project are saved as iqitems. Those data are shown as examples of how to score multiple choice tests and analyses of response alternatives. When scored correct or incorrect, the data are useful for demonstrations of tetrachoric based factor analysis irt.fa and finding tetrachoric correlations.

  1. select 16 ability items from the ability data set

  2. Find the descriptive statistics

  3. Examine the correlation matrix

  4. how many factors are in the data?

  5. What are they

  6. Plot them

  7. Reorganize the correlation matrix for a cleaner structure

Problem 3 The Big 5

25 personality self report items taken from the International Personality Item Pool (ipip.ori.org) were included as part of the Synthetic Aperture Personality Assessment (SAPA) web based personality assessment project. The data from 2800 subjects are included here as a demonstration set for scale construction, factor analysis, and Item Response Theory analysis. Three additional demographic variables (sex, education, and age) are also included.

  1. select 25 ‘personality’ items from the bfi data set

  2. Find the descriptive statistics

  3. Examine the correlation matrix

  4. how many factors are in the data?

  5. What are they

  6. Plot them

Answers to problem 1

select <- acs(active,energetic,vigorous,sleepy,tired,drowsy,intense, jittery, fearful,at.rest,calm,still)
#alphabetize these 
 select <- select[order(select)]
select   #show the alpha list
##  [1] "active"    "at.rest"   "calm"      "drowsy"    "energetic" "fearful"   "intense"   "jittery"  
##  [9] "sleepy"    "still"     "tired"     "vigorous"
msqR1 <- subset(msqR,msqR$time==1) #choose just the first time 
#Always describe the data
describe(msqR1[select])
##           vars    n mean   sd median trimmed  mad min max range  skew kurtosis   se
## active       1 3026 1.03 0.93      1    0.95 1.48   0   3     3  0.47    -0.75 0.02
## at.rest      2 3014 1.18 0.93      1    1.11 1.48   0   3     3  0.34    -0.77 0.02
## calm         3 3020 1.56 0.92      2    1.57 1.48   0   3     3 -0.05    -0.84 0.02
## drowsy       4 3021 1.18 1.03      1    1.10 1.48   0   3     3  0.43    -0.98 0.02
## energetic    5 3024 0.83 0.91      1    0.72 1.48   0   3     3  0.79    -0.36 0.02
## fearful      6 3015 0.16 0.48      0    0.02 0.00   0   3     3  3.57    14.07 0.01
## intense      7 3024 0.73 0.88      0    0.61 0.00   0   3     3  0.93    -0.14 0.02
## jittery      8 3026 0.66 0.83      0    0.53 0.00   0   3     3  1.10     0.41 0.02
## sleepy       9 3015 1.28 1.05      1    1.23 1.48   0   3     3  0.34    -1.09 0.02
## still       10 3021 1.17 0.89      1    1.12 1.48   0   3     3  0.33    -0.65 0.02
## tired       11 3019 1.42 1.02      1    1.40 1.48   0   3     3  0.18    -1.09 0.02
## vigorous    12 3020 0.63 0.82      0    0.51 0.00   0   3     3  1.06     0.17 0.01
#show the correlation matrix
R <- lowerCor(msqR1[select])
##           activ at.rs calm  drwsy enrgt ferfl intns jttry slepy still tired vigrs
## active     1.00                                                                  
## at.rest    0.12  1.00                                                            
## calm       0.03  0.48  1.00                                                      
## drowsy    -0.39 -0.09  0.07  1.00                                                
## energetic  0.72  0.14  0.04 -0.45  1.00                                          
## fearful    0.03 -0.14 -0.17  0.05  0.03  1.00                                    
## intense    0.39 -0.08 -0.20 -0.15  0.38  0.27  1.00                              
## jittery    0.27 -0.19 -0.31 -0.13  0.29  0.29  0.41  1.00                        
## sleepy    -0.39 -0.09  0.03  0.83 -0.46  0.05 -0.16 -0.14  1.00                  
## still     -0.20  0.29  0.44  0.28 -0.21 -0.05 -0.16 -0.20  0.25  1.00            
## tired     -0.42 -0.11  0.02  0.76 -0.48  0.06 -0.16 -0.13  0.79  0.25  1.00      
## vigorous   0.66  0.12  0.02 -0.36  0.72  0.08  0.39  0.30 -0.38 -0.19 -0.39  1.00

How many factors are in the data?

This is a hard problem. We try three different approaches: parallel analysis, Very Simple Structure, and then a whole range of tests using nfactors.

Parallel analysis compares the solution to a random set of data. Although useful for 100-400 subjects, this is probably not as helpful for 3000 subjects.

fa.parallel(msqR1[select])

## Parallel analysis suggests that the number of factors =  4  and the number of components =  3

Very Simple Structure asks how many factors are ‘simple’. It tests the goodness of fit for 1 … 8 factors where every item is thought to load on 1 .. 4 factors. This is essentially what people do when they try to interpret a factor solution.

VSS(msqR1[select])

## 
## Very Simple Structure
## Call: vss(x = x, n = n, rotate = rotate, diagonal = diagonal, fm = fm, 
##     n.obs = n.obs, plot = plot, title = title, use = use, cor = cor)
## VSS complexity 1 achieves a maximimum of 0.69  with  2  factors
## VSS complexity 2 achieves a maximimum of 0.87  with  4  factors
## 
## The Velicer MAP achieves a minimum of 0.05  with  3  factors 
## BIC achieves a minimum of  NA  with  4  factors
## Sample Size adjusted BIC achieves a minimum of  NA  with  4  factors
## 
## Statistics by number of factors 
##   vss1 vss2   map dof   chisq    prob sqresid  fit RMSEA  BIC  SABIC complex  eChisq    SRMR  eCRMS
## 1 0.64 0.00 0.074  54 7.6e+03 0.0e+00     9.7 0.64 0.215 7184 7355.1     1.0 9.5e+03 1.5e-01 0.1704
## 2 0.69 0.79 0.078  43 5.3e+03 0.0e+00     5.5 0.79 0.201 4934 5070.4     1.2 4.1e+03 1.0e-01 0.1250
## 3 0.66 0.87 0.046  33 5.3e+02 2.9e-90     3.1 0.88 0.070  262  367.0     1.4 3.9e+02 3.1e-02 0.0443
## 4 0.64 0.87 0.050  24 5.0e+01 1.3e-03     2.4 0.91 0.019 -142  -65.7     1.4 1.6e+01 6.3e-03 0.0105
## 5 0.64 0.84 0.078  16 2.8e+01 3.0e-02     2.1 0.92 0.016 -100  -49.2     1.4 7.0e+00 4.2e-03 0.0085
## 6 0.65 0.87 0.115   9 2.0e+01 1.8e-02     1.8 0.93 0.020  -52  -23.5     1.5 3.3e+00 2.9e-03 0.0078
## 7 0.63 0.85 0.233   3 6.4e+00 9.6e-02     1.6 0.94 0.019  -18   -8.2     1.6 7.4e-01 1.4e-03 0.0064
## 8 0.65 0.84 0.255  -2 2.2e-04      NA     1.7 0.94    NA   NA     NA     1.7 3.2e-05 8.9e-06     NA
##   eBIC
## 1 9072
## 2 3730
## 3  128
## 4 -176
## 5 -121
## 6  -69
## 7  -23
## 8   NA

An alternative is to consider 10 different tests using the nfactors function.

nfactors(msqR1[select])

## 
## Number of factors
## Call: vss(x = x, n = n, rotate = rotate, diagonal = diagonal, fm = fm, 
##     n.obs = n.obs, plot = FALSE, title = title, use = use, cor = cor)
## VSS complexity 1 achieves a maximimum of 0.69  with  2  factors
## VSS complexity 2 achieves a maximimum of 0.88  with  9  factors
## The Velicer MAP achieves a minimum of 0.05  with  3  factors 
## Empirical BIC achieves a minimum of  -176.34  with  4  factors
## Sample Size adjusted BIC achieves a minimum of  -65.73  with  4  factors
## 
## Statistics by number of factors 
##    vss1 vss2   map dof   chisq    prob sqresid  fit RMSEA  BIC  SABIC complex  eChisq    SRMR
## 1  0.64 0.00 0.074  54 7.6e+03 0.0e+00     9.7 0.64 0.215 7184 7355.1     1.0 9.5e+03 1.5e-01
## 2  0.69 0.79 0.078  43 5.3e+03 0.0e+00     5.5 0.79 0.201 4934 5070.4     1.2 4.1e+03 1.0e-01
## 3  0.66 0.87 0.046  33 5.3e+02 2.9e-90     3.1 0.88 0.070  262  367.0     1.4 3.9e+02 3.1e-02
## 4  0.64 0.87 0.050  24 5.0e+01 1.3e-03     2.4 0.91 0.019 -142  -65.7     1.4 1.6e+01 6.3e-03
## 5  0.64 0.84 0.078  16 2.8e+01 3.0e-02     2.1 0.92 0.016 -100  -49.2     1.4 7.0e+00 4.2e-03
## 6  0.65 0.87 0.115   9 2.0e+01 1.8e-02     1.8 0.93 0.020  -52  -23.5     1.5 3.3e+00 2.9e-03
## 7  0.63 0.85 0.233   3 6.4e+00 9.6e-02     1.6 0.94 0.019  -18   -8.2     1.6 7.4e-01 1.4e-03
## 8  0.65 0.84 0.255  -2 2.2e-04      NA     1.7 0.94    NA   NA     NA     1.7 3.2e-05 8.9e-06
## 9  0.64 0.88 0.306  -6 3.6e-06      NA     2.1 0.92    NA   NA     NA     1.5 9.4e-07 1.5e-06
## 10 0.64 0.87 0.485  -9 9.1e-08      NA     2.1 0.92    NA   NA     NA     1.5 1.3e-08 1.8e-07
## 11 0.64 0.87 1.000 -11 0.0e+00      NA     2.1 0.92    NA   NA     NA     1.5 7.6e-14 4.3e-10
## 12 0.64 0.87    NA -12 0.0e+00      NA     2.1 0.92    NA   NA     NA     1.5 7.6e-14 4.3e-10
##     eCRMS eBIC
## 1  0.1704 9072
## 2  0.1250 3730
## 3  0.0443  128
## 4  0.0105 -176
## 5  0.0085 -121
## 6  0.0078  -69
## 7  0.0064  -23
## 8      NA   NA
## 9      NA   NA
## 10     NA   NA
## 11     NA   NA
## 12     NA   NA

What are these factors?

f2 <- fa(msqR1[select],2)  #a two factor solution
## Loading required namespace: GPArotation
f2 # show the solution
## Factor Analysis using method =  minres
## Call: fa(r = msqR1[select], nfactors = 2)
## Standardized loadings (pattern matrix) based upon correlation matrix
##             MR1   MR2   h2   u2 com
## active     0.69  0.10 0.50 0.50 1.0
## at.rest    0.22 -0.55 0.33 0.67 1.3
## calm       0.08 -0.68 0.45 0.55 1.0
## drowsy    -0.76  0.06 0.57 0.43 1.0
## energetic  0.76  0.08 0.60 0.40 1.0
## fearful   -0.03  0.35 0.12 0.88 1.0
## intense    0.32  0.41 0.29 0.71 1.9
## jittery    0.22  0.52 0.34 0.66 1.4
## sleepy    -0.78  0.08 0.60 0.40 1.0
## still     -0.24 -0.43 0.26 0.74 1.6
## tired     -0.79  0.10 0.61 0.39 1.0
## vigorous   0.66  0.13 0.47 0.53 1.1
## 
##                        MR1  MR2
## SS loadings           3.58 1.58
## Proportion Var        0.30 0.13
## Cumulative Var        0.30 0.43
## Proportion Explained  0.69 0.31
## Cumulative Proportion 0.69 1.00
## 
##  With factor correlations of 
##      MR1  MR2
## MR1 1.00 0.11
## MR2 0.11 1.00
## 
## Mean item complexity =  1.2
## Test of the hypothesis that 2 factors are sufficient.
## 
## The degrees of freedom for the null model are  66  and the objective function was  5.49 with Chi Square of  16602.24
## The degrees of freedom for the model are 43  and the objective function was  1.75 
## 
## The root mean square of the residuals (RMSR) is  0.1 
## The df corrected root mean square of the residuals is  0.13 
## 
## The harmonic number of observations is  3015 with the empirical chi square  4051.3  with prob <  0 
## The total number of observations was  3032  with Likelihood Chi Square =  5278.5  with prob <  0 
## 
## Tucker Lewis Index of factoring reliability =  0.514
## RMSEA index =  0.201  and the 90 % confidence intervals are  0.196 0.205
## BIC =  4933.77
## Fit based upon off diagonal values = 0.91
## Measures of factor score adequacy             
##                                                    MR1  MR2
## Correlation of (regression) scores with factors   0.94 0.84
## Multiple R square of scores with factors          0.89 0.71
## Minimum correlation of possible factor scores     0.78 0.42
fa.sort(f2)   #show it in a more meaningful way
## Factor Analysis using method =  minres
## Call: fa(r = msqR1[select], nfactors = 2)
## Standardized loadings (pattern matrix) based upon correlation matrix
##             MR1   MR2   h2   u2 com
## tired     -0.79  0.10 0.61 0.39 1.0
## sleepy    -0.78  0.08 0.60 0.40 1.0
## energetic  0.76  0.08 0.60 0.40 1.0
## drowsy    -0.76  0.06 0.57 0.43 1.0
## active     0.69  0.10 0.50 0.50 1.0
## vigorous   0.66  0.13 0.47 0.53 1.1
## calm       0.08 -0.68 0.45 0.55 1.0
## at.rest    0.22 -0.55 0.33 0.67 1.3
## jittery    0.22  0.52 0.34 0.66 1.4
## still     -0.24 -0.43 0.26 0.74 1.6
## intense    0.32  0.41 0.29 0.71 1.9
## fearful   -0.03  0.35 0.12 0.88 1.0
## 
##                        MR1  MR2
## SS loadings           3.58 1.58
## Proportion Var        0.30 0.13
## Cumulative Var        0.30 0.43
## Proportion Explained  0.69 0.31
## Cumulative Proportion 0.69 1.00
## 
##  With factor correlations of 
##      MR1  MR2
## MR1 1.00 0.11
## MR2 0.11 1.00
## 
## Mean item complexity =  1.2
## Test of the hypothesis that 2 factors are sufficient.
## 
## The degrees of freedom for the null model are  66  and the objective function was  5.49 with Chi Square of  16602.24
## The degrees of freedom for the model are 43  and the objective function was  1.75 
## 
## The root mean square of the residuals (RMSR) is  0.1 
## The df corrected root mean square of the residuals is  0.13 
## 
## The harmonic number of observations is  3015 with the empirical chi square  4051.3  with prob <  0 
## The total number of observations was  3032  with Likelihood Chi Square =  5278.5  with prob <  0 
## 
## Tucker Lewis Index of factoring reliability =  0.514
## RMSEA index =  0.201  and the 90 % confidence intervals are  0.196 0.205
## BIC =  4933.77
## Fit based upon off diagonal values = 0.91
## Measures of factor score adequacy             
##                                                    MR1  MR2
## Correlation of (regression) scores with factors   0.94 0.84
## Multiple R square of scores with factors          0.89 0.71
## Minimum correlation of possible factor scores     0.78 0.42
 fa.plot(f2,labels=colnames(msqR[select]))  #draw it in two space

fa.diagram(f2)  #show it as a path diagram

 Rs <- mat.sort(R,f2)
 corPlot(Rs,numbers=TRUE)