Always do this first!

library(psych)
library(psychTools)

Factor analysis as a data description tool

Notes to accompany https://personality-project.org/courses/350/350.factoranalysis.pdf

Iterative fit

• Specifying a model and loss function

• Ideally supplying a first derivative to the loss function

• Can be done empirically

• Some way to evaluate the quality of the fit

• Minimization of function, but then how good is this minimum

Multivariate models in general use iterative fitting

• Many procedures use this concept of iterative fitting

• Some, such as principal components are“closed form”and can just be solved directly

• Others, such as“factor analysis”need to find solutions by successive approximations

• The issue of factor or component rotation is an iterative procedure.

• Principal Components and Factor analysis are all the result of some basic matrix equations to approximate a matrix

• Conceptually, we are just taking the square root of a correlation matrix:

• R≈CC′ or R≈FF′+U2

• For any given correlation matrix, R, can we find a C or an F matrix which approximates it?

Simulate a correlation matrix

Classical Test Theory (CTT) considers four or more tests to be congenerically equivalent if all tests may be expressed in terms of one factor and a residual error. Parallel tests are the special case where (usually two) tests have equal factor loadings. Tau equivalent tests have equal factor loadings but may have unequal errors. Congeneric tests may differ in both factor loading and error variances. Minor factors may be added as systematic but trivial disturbances.

The sim.congeneric function will create model or sample data to meet a congeneric model. We first just use the default parameters to generate a population matrix.

 R <- sim.congeneric()  #use the default values 
 R
##      V1   V2   V3   V4
## V1 1.00 0.56 0.48 0.40
## V2 0.56 1.00 0.42 0.35
## V3 0.48 0.42 1.00 0.30
## V4 0.40 0.35 0.30 1.00

This looks like a simple multiplication table. See if we can factor it

f1 <- fa(R)  #defaults to 1 factor
f1   #show the output
## Factor Analysis using method =  minres
## Call: fa(r = R)
## Standardized loadings (pattern matrix) based upon correlation matrix
##    MR1   h2   u2 com
## V1 0.8 0.64 0.36   1
## V2 0.7 0.49 0.51   1
## V3 0.6 0.36 0.64   1
## V4 0.5 0.25 0.75   1
## 
##                 MR1
## SS loadings    1.74
## Proportion Var 0.43
## 
## Mean item complexity =  1
## Test of the hypothesis that 1 factor is sufficient.
## 
## df null model =  6  with the objective function =  0.9
## df of  the model are 2  and the objective function was  0 
## 
## The root mean square of the residuals (RMSR) is  0 
## The df corrected root mean square of the residuals is  0 
## 
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    MR1
## Correlation of (regression) scores with factors   0.89
## Multiple R square of scores with factors          0.78
## Minimum correlation of possible factor scores     0.57

We can find the residuals by calculation

F <- f1$loadings
F
## 
## Loadings:
##    MR1
## V1 0.8
## V2 0.7
## V3 0.6
## V4 0.5
## 
##                  MR1
## SS loadings    1.740
## Proportion Var 0.435
model <- F %*% t(F)   #The factor model is that R = FF'
model   #too many decimals
##           V1        V2        V3        V4
## V1 0.6399993 0.5599976 0.4800018 0.4000001
## V2 0.5599976 0.4899963 0.4200002 0.3499989
## V3 0.4800018 0.4200002 0.3600031 0.3000015
## V4 0.4000001 0.3499989 0.3000015 0.2500003
round(model,2) #this looks better
##      V1   V2   V3   V4
## V1 0.64 0.56 0.48 0.40
## V2 0.56 0.49 0.42 0.35
## V3 0.48 0.42 0.36 0.30
## V4 0.40 0.35 0.30 0.25
round(R - model,2)  #show the resduals
##      V1   V2   V3   V4
## V1 0.36 0.00 0.00 0.00
## V2 0.00 0.51 0.00 0.00
## V3 0.00 0.00 0.64 0.00
## V4 0.00 0.00 0.00 0.75

Or, we can ask for the residuals

resid(f1)
##    V1   V2   V3   V4  
## V1 0.36               
## V2 0.00 0.51          
## V3 0.00 0.00 0.64     
## V4 0.00 0.00 0.00 0.75

Notice how the fit if great on the off diagonals but is not good on the diagonal. The complete factor model is that R = FF’ + U2 where the U2 is the difference is the diagonal of R - FF’. This U2 (uniqueness) is that amount of each variable that is not common with the other variables.

A digression (Eigen value decomposition)

Factor analysis and its cousin, principal components analysis are variations on an eigen value decomposition of a matrix.

It is possible to represent a correlation matrix R in terms of an orthogonal matrix (X) and a diagonal matrix \(\lambda\) such that

X X’ = I (the matrices are orthogonal)

and

R = X \(\lambda\) X’

We find the eigen values and vectors of a matrix by using the eigen function.

These eigen vectors form what is known as the basis space of the correlation matrix and are just linear composites of the correlations formed in such a way as to be orthogonal. That is, the eigen vectors when multiplied by the their transpose equal the identity matrix.

When the eigen vectors are stretched by their eigen values, the resulting matrix is the orginal matrix. We can find the eigen values and vectors using the eigen function.

ev <- eigen(R)  #find the eigen values and eigen vectors

ev   #show them  (not very understandable)
## eigen() decomposition
## $values
## [1] 2.2679589 0.7178316 0.5863824 0.4278271
## 
## $vectors
##            [,1]       [,2]       [,3]       [,4]
## [1,] -0.5511821 -0.1254972  0.2324911  0.7914522
## [2,] -0.5234358 -0.1890642  0.6029258 -0.5716206
## [3,] -0.4844868 -0.4327312 -0.7362121 -0.1897578
## [4,] -0.4329962  0.8724958 -0.2010477 -0.1041404
ev$vectors %*% t(ev$vectors)   #this is an identity matrix, which means that the vectors   are orthogonal 
##               [,1]          [,2]          [,3]          [,4]
## [1,]  1.000000e+00  2.302473e-16  5.173414e-17 -4.536019e-19
## [2,]  2.302473e-16  1.000000e+00 -1.712048e-16 -4.471980e-16
## [3,]  5.173414e-17 -1.712048e-16  1.000000e+00  2.641260e-16
## [4,] -4.536019e-19 -4.471980e-16  2.641260e-16  1.000000e+00
ev$vectors %*% diag(ev$values) %*%  t(ev$vectors) # this is what we started with
##      [,1] [,2] [,3] [,4]
## [1,] 1.00 0.56 0.48 0.40
## [2,] 0.56 1.00 0.42 0.35
## [3,] 0.48 0.42 1.00 0.30
## [4,] 0.40 0.35 0.30 1.00

Principal components

Principal components are just convenient ways of showing eigen vectors and eigen values.

C = X \(\sqrt{\lambda}\) and therefore CC’ = R

C = ev$vectors %*% diag(sqrt(ev$values))
C  # show them
##            [,1]       [,2]       [,3]        [,4]
## [1,] -0.8300661 -0.1063275  0.1780315  0.51767699
## [2,] -0.7882810 -0.1601846  0.4616941 -0.37388846
## [3,] -0.7296247 -0.3666313 -0.5637589 -0.12411773
## [4,] -0.6520813  0.7392215 -0.1539535 -0.06811668
# or
pca(R,4,rotate="none")
## Principal Components Analysis
## Call: principal(r = r, nfactors = nfactors, residuals = residuals, 
##     rotate = rotate, n.obs = n.obs, covar = covar, scores = scores, 
##     missing = missing, impute = impute, oblique.scores = oblique.scores, 
##     method = method, use = use, cor = cor, correct = 0.5, weight = NULL)
## Standardized loadings (pattern matrix) based upon correlation matrix
##     PC1   PC2   PC3   PC4 h2       u2 com
## V1 0.83 -0.11 -0.18 -0.52  1 -4.4e-16 1.8
## V2 0.79 -0.16 -0.46  0.37  1  4.4e-16 2.2
## V3 0.73 -0.37  0.56  0.12  1  2.2e-16 2.5
## V4 0.65  0.74  0.15  0.07  1  8.9e-16 2.1
## 
##                        PC1  PC2  PC3  PC4
## SS loadings           2.27 0.72 0.59 0.43
## Proportion Var        0.57 0.18 0.15 0.11
## Cumulative Var        0.57 0.75 0.89 1.00
## Proportion Explained  0.57 0.18 0.15 0.11
## Cumulative Proportion 0.57 0.75 0.89 1.00
## 
## Mean item complexity =  2.2
## Test of the hypothesis that 4 components are sufficient.
## 
## The root mean square of the residuals (RMSR) is  0 
## 
## Fit based upon off diagonal values = 1

Typically, we just think of the first few components as an approximation to R

Thus, the default of the pca function is to find the first principal component

p1 <- pca(R)
p1  #show it
## Principal Components Analysis
## Call: principal(r = r, nfactors = nfactors, residuals = residuals, 
##     rotate = rotate, n.obs = n.obs, covar = covar, scores = scores, 
##     missing = missing, impute = impute, oblique.scores = oblique.scores, 
##     method = method, use = use, cor = cor, correct = 0.5, weight = NULL)
## Standardized loadings (pattern matrix) based upon correlation matrix
##     PC1   h2   u2 com
## V1 0.83 0.69 0.31   1
## V2 0.79 0.62 0.38   1
## V3 0.73 0.53 0.47   1
## V4 0.65 0.43 0.57   1
## 
##                 PC1
## SS loadings    2.27
## Proportion Var 0.57
## 
## Mean item complexity =  1
## Test of the hypothesis that 1 component is sufficient.
## 
## The root mean square of the residuals (RMSR) is  0.15 
## 
## Fit based upon off diagonal values = 0.88
resid(p1)
##    V1    V2    V3    V4   
## V1  0.31                  
## V2 -0.09  0.38            
## V3 -0.13 -0.16  0.47      
## V4 -0.14 -0.16 -0.18  0.57

The first principal component accounts for the most variance in the correlation matrix. The successive components account for progressively less.

The factor model versus the components model

The components model: R = CC’

or the factor model: R = FF’ + U^2

Both are simple eigen value problems, but the factor model is an eigen value solution of the reduced matrix, (the diagonals have been replaced with what FF’ would give.)

That is Factors are components of R - U^2.

The problem is finding the Uniqueness (U^2). This is where we do some iterative fitting.

R1 <- R - diag(c(.36, .51, .64, .75))
R1  #The reduced matrix 
##      V1   V2   V3   V4
## V1 0.64 0.56 0.48 0.40
## V2 0.56 0.49 0.42 0.35
## V3 0.48 0.42 0.36 0.30
## V4 0.40 0.35 0.30 0.25
ev1 <-eigen(R1)  #find the eigen solution of R1
ev1   # show it
## eigen() decomposition
## $values
## [1]  1.740000e+00  4.440892e-16 -5.435927e-17 -7.831427e-17
## 
## $vectors
##           [,1]       [,2]       [,3]       [,4]
## [1,] 0.6064784  0.7950999  0.0000000  0.0000000
## [2,] 0.5306686 -0.4047781 -0.5720776 -0.4767313
## [3,] 0.4548588 -0.3469527  0.8037255 -0.1635621
## [4,] 0.3790490 -0.2891272 -0.1635621  0.8636983
ev1$vectors %*% sqrt(diag(ev1$values)) #the loadings
## Warning in sqrt(diag(ev1$values)): NaNs produced
##      [,1]          [,2] [,3] [,4]
## [1,]  0.8  1.675548e-08  NaN  NaN
## [2,]  0.7 -8.530062e-09  NaN  NaN
## [3,]  0.6 -7.311481e-09  NaN  NaN
## [4,]  0.5 -6.092901e-09  NaN  NaN

Factoring as an iterative procedure.

We create a function which evaluates how well a model fits the data. (fit) and then run this iteratively to find a solution (the communalities), and then convert this to a factor matrix.

Because we want to generalize to more than one factor, we write the code with a parameter (n.factors) which we set to 1 for our first example.

.

fit <- function(R,h2, n.factors=1){
   model <- R
   diag(model) <- h2
   ev <- eigen(model)
   F1 <- ev$vectors[,1:n.factors] %*% diag(sqrt(ev$values[1:n.factors]),ncol=length(h2))
   FF  <- F1 %*% t(F1)
   res  <- R -FF
   fit <- sum(res^2)/sum(R^2)
   h2 <- diag(FF)
   return(list(fit=fit,h2=h2))
}

iterative.fit <- function(R,h2,n.factors=1, ntrials=10) {
   fits <- common <- list()
    trial_i <- fit(R,h2, n.factors=n.factors)
   for (i in 1:ntrials){
     
      trial_i <- fit(R,trial_i$h2,n.factor=n.factors)
      fits[[i]] <- trial_i$fit
      common[[i]] <- trial_i$h2
   }
   return(list(h2 = trial_i$h2,fits=fits,communality=common))  #tracks our progress  
}

#this next function just works for one factor
report.f <- function(R,h2,n.trials=40) {
      h2 <-iterative.fit(R,h2,ntrials=n.trials)$h2
      F <- sqrt(h2)
      return(F)
}
#but this the generalization the works for as many factors as we want
report.multi <- function(R,n.factors=1,h2 = NULL,n.trials=40) {
     if (is.null(h2)) h2 <- smc(R) #create the default values
     #this next line does all the work
      h2 <-iterative.fit(R,h2,ntrials=n.trials, n.factors=n.factors)$h2
      #now, we have the best communality solutions, lets show the factors 
     diag(R) <- h2
     ev <- eigen(R)
      F <- ev$vectors[,1:n.factors]  %*% diag(sqrt(ev$values[1:n.factors]),ncol=n.factors) 
     return(F)                                           
}      

try these little functions.

First, we just show how the fit gets better over iterations when we use iterative fit

R   #show the correlation matrix
##      V1   V2   V3   V4
## V1 1.00 0.56 0.48 0.40
## V2 0.56 1.00 0.42 0.35
## V3 0.48 0.42 1.00 0.30
## V4 0.40 0.35 0.30 1.00
temp <- iterative.fit(R,rep(1,NCOL(R)))
temp
## $h2
## [1] 0.6392198 0.4903916 0.3601200 0.2500535
## 
## $fits
## $fits[[1]]
## [1] 0.1987778
## 
## $fits[[2]]
## [1] 0.2145627
## 
## $fits[[3]]
## [1] 0.2186953
## 
## $fits[[4]]
## [1] 0.2197183
## 
## $fits[[5]]
## [1] 0.2199894
## 
## $fits[[6]]
## [1] 0.2200736
## 
## $fits[[7]]
## [1] 0.2201065
## 
## $fits[[8]]
## [1] 0.2201227
## 
## $fits[[9]]
## [1] 0.2201321
## 
## $fits[[10]]
## [1] 0.220138
## 
## 
## $communality
## $communality[[1]]
## [1] 0.6318596 0.5310441 0.4093544 0.2852405
## 
## $communality[[2]]
## [1] 0.6253149 0.5064775 0.3757212 0.2566750
## 
## $communality[[3]]
## [1] 0.6280253 0.4983401 0.3658070 0.2513934
## 
## $communality[[4]]
## [1] 0.6314826 0.4948932 0.3625332 0.2504821
## 
## $communality[[5]]
## [1] 0.6341853 0.4930877 0.3612844 0.2503003
## 
## $communality[[6]]
## [1] 0.6360865 0.4920115 0.3607280 0.2502220
## 
## $communality[[7]]
## [1] 0.6373799 0.4913282 0.3604430 0.2501621
## 
## $communality[[8]]
## [1] 0.6382493 0.4908821 0.3602811 0.2501144
## 
## $communality[[9]]
## [1] 0.6388312 0.4905874 0.3601825 0.2500788
## 
## $communality[[10]]
## [1] 0.6392198 0.4903916 0.3601200 0.2500535

Now, use iterative.fit but report the results for one factor

report.multi(R)
##      [,1]
## [1,]  0.8
## [2,]  0.7
## [3,]  0.6
## [4,]  0.5

Use some more complicated data

R2 <- lowerCor(bfi[1:10],show=FALSE)
report.multi(R2,2)
##             [,1]       [,2]
##  [1,]  0.2618963  0.2854302
##  [2,] -0.5519772 -0.3964413
##  [3,] -0.5893908 -0.4653052
##  [4,] -0.4724715 -0.1761617
##  [5,] -0.5082833 -0.3403847
##  [6,] -0.4182883  0.3636337
##  [7,] -0.5106367  0.3735386
##  [8,] -0.4688629  0.2923000
##  [9,]  0.5344426 -0.3706416
## [10,]  0.5066513 -0.2883404

Some example data sets to compare our function with fa

The advantage of simulation is we know the right answer. However, lets look at real data as well.

We have a number of nice example sets. Use the help function to find them and read about them.

Try some of the examples in each one.

test on the ability dataset which has one dimension

R.ab <- lowerCor(ability,show=FALSE)
round(report.multi(R.ab),2)
##        [,1]
##  [1,] -0.55
##  [2,] -0.45
##  [3,] -0.54
##  [4,] -0.47
##  [5,] -0.52
##  [6,] -0.48
##  [7,] -0.54
##  [8,] -0.53
##  [9,] -0.41
## [10,] -0.43
## [11,] -0.47
## [12,] -0.35
## [13,] -0.50
## [14,] -0.55
## [15,] -0.53
## [16,] -0.46

compare with our fa function just the loadings

f.ab <- fa(R.ab)
round(f.ab$loadings,2)
## 
## Loadings:
##           MR1 
## reason.4  0.55
## reason.16 0.45
## reason.17 0.54
## reason.19 0.47
## letter.7  0.52
## letter.33 0.48
## letter.34 0.54
## letter.58 0.53
## matrix.45 0.41
## matrix.46 0.43
## matrix.47 0.47
## matrix.55 0.35
## rotate.3  0.50
## rotate.4  0.55
## rotate.6  0.53
## rotate.8  0.46
## 
##                  MR1
## SS loadings    3.832
## Proportion Var 0.240

The bfi has 5 dimensions, lets try our little function on it

R.bfi <- lowerCor(bfi,show=FALSE)
temp.5 <- report.multi(R.bfi,5)
round(temp.5,2)
##        [,1]  [,2]  [,3]  [,4]  [,5]
##  [1,]  0.24  0.02 -0.10 -0.02 -0.46
##  [2,] -0.47 -0.30  0.14  0.13  0.35
##  [3,] -0.53 -0.31  0.22  0.09  0.25
##  [4,] -0.40 -0.13  0.10  0.26  0.15
##  [5,] -0.57 -0.20  0.24  0.03  0.15
##  [6,] -0.32 -0.11 -0.44  0.12 -0.08
##  [7,] -0.33 -0.19 -0.46  0.28 -0.06
##  [8,] -0.31 -0.07 -0.34  0.31 -0.02
##  [9,]  0.46 -0.12  0.42 -0.24  0.03
## [10,]  0.49 -0.14  0.25 -0.30  0.13
## [11,]  0.40  0.19 -0.27  0.13  0.19
## [12,]  0.60  0.04 -0.25  0.08  0.27
## [13,] -0.52 -0.32  0.12 -0.21 -0.12
## [14,] -0.58 -0.20  0.35  0.06 -0.22
## [15,] -0.51 -0.27 -0.10 -0.06 -0.21
## [16,]  0.43 -0.61 -0.04  0.05 -0.23
## [17,]  0.42 -0.61 -0.09  0.03 -0.17
## [18,]  0.41 -0.62 -0.05  0.04 -0.02
## [19,]  0.52 -0.40 -0.11 -0.05  0.20
## [20,]  0.34 -0.45  0.02  0.21  0.11
## [21,] -0.33 -0.18 -0.22 -0.37 -0.03
## [22,]  0.20 -0.10  0.29  0.32 -0.05
## [23,] -0.41 -0.26 -0.18 -0.45  0.01
## [24,]  0.06 -0.25 -0.22 -0.22  0.27
## [25,]  0.21  0.02  0.28  0.38 -0.15
## [26,] -0.09 -0.20  0.07  0.23  0.13
## [27,] -0.06  0.03 -0.08 -0.10  0.16
## [28,] -0.18  0.04 -0.07  0.01  0.20

factor.congruence compares factor solutions.

Lets compare our temp.5 with fa output

f5 <- fa(R.bfi,5, rotate="none")
factor.congruence(f5,temp.5)
##     [,1] [,2] [,3] [,4] [,5]
## MR1   -1    0    0    0    0
## MR2    0   -1    0    0    0
## MR3    0    0   -1    0    0
## MR4    0    0    0    1    0
## MR5    0    0    0    0    1

Factor rotation

Factors as extracted are meant to account for the most variance in each factor. But this does not necessarily lead to easily interpretable solutions. Thus we take the factor loadings matrix and rotate it to another solution.

To show this, we look at the first 10 items of the bfi and plot the loadings using fa.plot.

R2 <- lowerCor(bfi[1:10],show=FALSE)
b2 <- report.multi(R2,2)
fa.plot(b2)

Hand rotate this solution by 45 degrees.

fa.plot(factor.rotate(b2,45))

The fa function

We just showed how you can write your own factor analysis function without “too much” work. This was done to demistify factor analysis and to let you seen inside the process. But the fa function has many additional useful features that you should explore.

Input can be from raw data or correlation matrices. Factor extraction can be one of many different methods. Rotations in the GPArotation package are done by default.

Try factoring these different data sets

?bfi

?sai

?msqR

?spi

Remember, first describe them first so you know what is there. Don’t factor the coding data (e.g. ids, study, etc.)

The number of factor problems

No right answer

  1. Statistical

• Extracting factors until the χ2 of the residual matrix is not significant.

• Extracting factors until the change in χ2 from factor n to factor n+1 is not significant.

  1. Rules of Thumb

• Parallel Extracting factors until the eigenvalues of the real data are less than the corresponding eigenvalues of a random data set of the same size (parallel analysis)

• Plotting the magnitude of the successive eigenvalues and applying the scree test.

• Recent suggestion of dimensionality by del Giudice (2020) (to be implemented in psych)

  1. Interpretability

• Extracting factors as long as they are interpretable.

• Using the Very Simple Structure Criterion (VSS)

• Using the Minimum Average Partial criterion (MAP).

  1. Eigen Value of 1 rule

Simulations as way of knowing `Truth’

  1. With real data, we do not know ‘truth’

  2. With simulated data, we can know the right answer

  3. Lets examine the sim.item function

• defaults to certain parameter values

• processes the data (make it up)

• returns values

  1. Simulations can default to certain values but allow you to specify other values
sim.item  #to show a function, just type the name
## function (nvar = 72, nsub = 500, circum = FALSE, xloading = 0.6, 
##     yloading = 0.6, gloading = 0, xbias = 0, ybias = 0, categorical = FALSE, 
##     low = -3, high = 3, truncate = FALSE, threshold = NULL) 
## {
##     avloading <- (xloading + yloading)/2
##     errorweight <- sqrt(1 - (avloading^2 + gloading^2))
##     g <- rnorm(nsub)
##     truex <- rnorm(nsub) * xloading + xbias
##     truey <- rnorm(nsub) * yloading + ybias
##     if (!is.null(threshold)) {
##         if (length(threshold) < nvar) 
##             threshold <- sample(threshold, nvar, replace = TRUE)
##     }
##     if (circum) {
##         radia <- seq(0, 2 * pi, len = nvar + 1)
##         rad <- radia[which(radia < 2 * pi)]
##     }
##     else rad <- c(rep(0, nvar/4), rep(pi/2, nvar/4), rep(pi, 
##         nvar/4), rep(3 * pi/2, nvar/4))
##     error <- matrix(rnorm(nsub * (nvar)), nsub)
##     trueitem <- outer(truex, cos(rad)) + outer(truey, sin(rad))
##     item <- gloading * g + trueitem + errorweight * error
##     if (categorical) {
##         if (is.null(threshold)) {
##             item = round(item)
##             item[(item <= low)] <- low
##             item[(item > high)] <- high
##         }
##         else {
##             i <- 1:nvar
##             item <- t(t(item[, i]) > threshold[i]) + 0
##         }
##     }
##     colnames(item) <- paste("V", 1:nvar, sep = "")
##     return(item)
## }
## <bytecode: 0x1073d0420>
## <environment: namespace:psych>

This simulates with two basic options and a host of other options

Simple structure versus Circumplex structure

set.seed(42)
my.data <- sim.item(12)
describe(my.data)  #to see what they look like
##     vars   n  mean   sd median trimmed  mad   min  max range  skew kurtosis   se
## V1     1 500  0.01 1.01   0.03    0.01 0.97 -3.26 3.44  6.69  0.05     0.13 0.04
## V2     2 500  0.00 1.00   0.00    0.00 1.05 -3.16 3.12  6.28  0.02    -0.14 0.04
## V3     3 500 -0.03 1.05  -0.04   -0.04 1.06 -2.95 2.80  5.76  0.10    -0.28 0.05
## V4     4 500 -0.05 0.99  -0.04   -0.04 0.96 -2.96 2.82  5.78 -0.06     0.10 0.04
## V5     5 500 -0.04 0.96  -0.05   -0.05 0.96 -2.76 2.71  5.47  0.07    -0.20 0.04
## V6     6 500 -0.05 1.04  -0.06   -0.06 1.03 -3.65 3.07  6.72  0.05     0.09 0.05
## V7     7 500  0.01 0.97   0.01    0.02 0.95 -3.45 3.53  6.98 -0.13     0.38 0.04
## V8     8 500  0.02 1.04   0.02    0.02 1.09 -2.94 3.95  6.89  0.06     0.05 0.05
## V9     9 500  0.04 1.04  -0.02    0.04 1.05 -3.07 2.78  5.85  0.04    -0.25 0.05
## V10   10 500  0.04 0.98   0.10    0.07 0.97 -2.75 2.47  5.22 -0.23    -0.15 0.04
## V11   11 500 -0.01 0.98  -0.03   -0.01 1.02 -3.31 2.88  6.19 -0.01    -0.02 0.04
## V12   12 500 -0.01 0.97   0.00   -0.01 0.91 -3.04 2.57  5.61 -0.08     0.15 0.04
R <- lowerCor(my.data)  #save the correlation matrix
##     V1    V2    V3    V4    V5    V6    V7    V8    V9    V10   V11   V12  
## V1   1.00                                                                  
## V2   0.36  1.00                                                            
## V3   0.38  0.37  1.00                                                      
## V4  -0.01 -0.04 -0.01  1.00                                                
## V5   0.05 -0.02  0.01  0.34  1.00                                          
## V6   0.03  0.01  0.01  0.37  0.35  1.00                                    
## V7  -0.35 -0.37 -0.38 -0.09 -0.01 -0.05  1.00                              
## V8  -0.40 -0.34 -0.39  0.00  0.08  0.11  0.34  1.00                        
## V9  -0.41 -0.36 -0.32  0.00  0.02 -0.03  0.32  0.39  1.00                  
## V10  0.06  0.07  0.01 -0.33 -0.32 -0.39 -0.04 -0.11 -0.06  1.00            
## V11  0.02  0.03  0.05 -0.37 -0.35 -0.32  0.02 -0.12 -0.01  0.41  1.00      
## V12  0.01  0.01 -0.11 -0.31 -0.30 -0.33  0.08 -0.02  0.00  0.36  0.39  1.00
corPlot(my.data)  #to see it graphically

We can organize this correlation matrix by sorting it by a factor structure (see below).

We pass parameters to the corPlot function to make prettier output.

f2 <- fa(R,2)    #do a default factor analysis
## Loading required namespace: GPArotation
R.sorted <- matSort(R,f2)   #sort the matrix of correlations
corPlot(R.sorted ,cex=.5,main="Sorted by loadings")  #make the text size smaller using the cex option.

f2  #show the output of the factor analysis
## Factor Analysis using method =  minres
## Call: fa(r = R, nfactors = 2)
## Standardized loadings (pattern matrix) based upon correlation matrix
##       MR1   MR2   h2   u2 com
## V1   0.64 -0.02 0.41 0.59 1.0
## V2   0.59  0.02 0.35 0.65 1.0
## V3   0.61 -0.04 0.37 0.63 1.0
## V4   0.03 -0.58 0.34 0.66 1.0
## V5   0.01 -0.55 0.30 0.70 1.0
## V6   0.03 -0.60 0.36 0.64 1.0
## V7  -0.58  0.08 0.34 0.66 1.0
## V8  -0.62 -0.10 0.39 0.61 1.1
## V9  -0.59  0.00 0.35 0.65 1.0
## V10  0.07  0.61 0.38 0.62 1.0
## V11  0.03  0.63 0.39 0.61 1.0
## V12 -0.06  0.57 0.33 0.67 1.0
## 
##                        MR1  MR2
## SS loadings           2.21 2.12
## Proportion Var        0.18 0.18
## Cumulative Var        0.18 0.36
## Proportion Explained  0.51 0.49
## Cumulative Proportion 0.51 1.00
## 
##  With factor correlations of 
##      MR1  MR2
## MR1 1.00 0.04
## MR2 0.04 1.00
## 
## Mean item complexity =  1
## Test of the hypothesis that 2 factors are sufficient.
## 
## df null model =  66  with the objective function =  2.52
## df of  the model are 43  and the objective function was  0.11 
## 
## The root mean square of the residuals (RMSR) is  0.03 
## The df corrected root mean square of the residuals is  0.03 
## 
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy             
##                                                    MR1  MR2
## Correlation of (regression) scores with factors   0.88 0.88
## Multiple R square of scores with factors          0.78 0.77
## Minimum correlation of possible factor scores     0.56 0.53

What do all of these numbers mean?

Loadings are just the correlation of the factors with the variables.

Communalities (h^2) are how much variance for each variable is accounted for by all the factors. This is the sum of squared loadings row wise. (if the factors are uncorrelated)

Uniqueness is 1 - h^2. How much variance in each variable is not accounted for by the factors.

SS loadings is the sum of squared loadings (columnwise)

Complexity of an item is how well it is explained by the factors.

The various fit statistics compare the residual correlations to the original correlations.

The correlation of the factor score (estimates) with the factors is a measure of factor determinancy.

Simple structure

The variables were generated following a model known as simple structure. That is, every item was thought to represent just one factor. Although this helps interpretability, it is not necessarily correct. With real data, simple structure is an approximation. We simulated data with simple structure, and so it fits well.

We can show this simple structure in a factor diagram (fa.diagram) or a plot of the loadings (fa.plot)

fa.diagram(f2)

fa.plot(f2)

Note how the items have large loadings correlations with just one or the other of the two factors.

Circumplex

An alternative model from sim.item is the circumplex model. Try this parameter setting to see what it looks like.

Circumplex structurese are said to be present in affect (mood) data as well as in interpersonal data.

set.seed(42)
my.circ <- sim.item(12,circum=TRUE)
describe(my.circ)  #to see what they look like
##     vars   n  mean   sd median trimmed  mad   min  max range  skew kurtosis   se
## V1     1 500  0.01 1.01   0.03    0.01 0.97 -3.26 3.44  6.69  0.05     0.13 0.04
## V2     2 500 -0.01 0.98   0.01   -0.01 0.94 -3.80 2.89  6.69 -0.07     0.11 0.04
## V3     3 500 -0.04 1.05  -0.07   -0.06 1.01 -2.74 2.88  5.62  0.13    -0.14 0.05
## V4     4 500 -0.05 0.99  -0.04   -0.04 0.96 -2.96 2.82  5.78 -0.06     0.10 0.04
## V5     5 500 -0.03 0.97  -0.07   -0.04 1.01 -2.69 3.21  5.89  0.10    -0.11 0.04
## V6     6 500 -0.02 1.01  -0.04   -0.03 1.06 -3.08 3.33  6.42  0.11     0.04 0.05
## V7     7 500  0.01 0.97   0.01    0.02 0.95 -3.45 3.53  6.98 -0.13     0.38 0.04
## V8     8 500  0.03 1.01   0.01    0.02 1.04 -3.07 3.85  6.93  0.07     0.01 0.05
## V9     9 500  0.06 1.02   0.03    0.05 1.05 -2.67 3.29  5.95  0.12    -0.26 0.05
## V10   10 500  0.04 0.98   0.10    0.07 0.97 -2.75 2.47  5.22 -0.23    -0.15 0.04
## V11   11 500 -0.02 1.00  -0.04   -0.03 1.00 -3.37 3.11  6.48  0.07     0.01 0.04
## V12   12 500 -0.04 0.97  -0.08   -0.03 0.94 -4.17 2.61  6.78 -0.12     0.44 0.04
lowerCor(my.circ)
##     V1    V2    V3    V4    V5    V6    V7    V8    V9    V10   V11   V12  
## V1   1.00                                                                  
## V2   0.31  1.00                                                            
## V3   0.19  0.29  1.00                                                      
## V4  -0.01  0.13  0.26  1.00                                                
## V5  -0.14 -0.04  0.15  0.27  1.00                                          
## V6  -0.28 -0.17 -0.02  0.20  0.32  1.00                                    
## V7  -0.35 -0.34 -0.23 -0.09  0.18  0.27  1.00                              
## V8  -0.35 -0.30 -0.29 -0.16  0.11  0.29  0.31  1.00                        
## V9  -0.22 -0.27 -0.30 -0.28 -0.13  0.00  0.17  0.29  1.00                  
## V10  0.06 -0.11 -0.29 -0.33 -0.28 -0.24 -0.04  0.07  0.25  1.00            
## V11  0.20  0.03 -0.11 -0.31 -0.36 -0.29 -0.17 -0.13  0.16  0.37  1.00      
## V12  0.34  0.21 -0.07 -0.12 -0.27 -0.31 -0.26 -0.21 -0.04  0.20  0.36  1.00
corPlot(my.circ, main="Circumplex structure", cex=.5) 

In both of these simulations, the correlations were not very large. This is to simulate the structure of real personality items, which do not have large correlations.

Look at those two matrices (the simple and the circumplex). Can you see any patterning there?

Do we really need 12 dimensions (items), or can we account for then with fewer dimensions. This the Number of factors problem.

Try parallel analysis

One way to see how many dimensions (factors) are in the data is to compare how well many alternative models fit.

Parallel analysis simulates the data with no factors and compares factor solutions to simulated (0 factor) data to the observed solution. When the random data solution is better than the real solution, you have gone too far.

fa.parallel(my.data)

## Parallel analysis suggests that the number of factors =  2  and the number of components =  2

This suggests 2 factors (which is correct, since that is how we made the data)

Try a 2 factor solution. We use the fa function, telling it where the data are, and how many factors we want. We specify two following the suggestion of fa.parallel.

We then organize the output to show a cleaner structure.

f2 <- fa(my.data,nfactors=2)
f2
## Factor Analysis using method =  minres
## Call: fa(r = my.data, nfactors = 2)
## Standardized loadings (pattern matrix) based upon correlation matrix
##       MR1   MR2   h2   u2 com
## V1   0.64 -0.02 0.41 0.59 1.0
## V2   0.59  0.02 0.35 0.65 1.0
## V3   0.61 -0.04 0.37 0.63 1.0
## V4   0.03 -0.58 0.34 0.66 1.0
## V5   0.01 -0.55 0.30 0.70 1.0
## V6   0.03 -0.60 0.36 0.64 1.0
## V7  -0.58  0.08 0.34 0.66 1.0
## V8  -0.62 -0.10 0.39 0.61 1.1
## V9  -0.59  0.00 0.35 0.65 1.0
## V10  0.07  0.61 0.38 0.62 1.0
## V11  0.03  0.63 0.39 0.61 1.0
## V12 -0.06  0.57 0.33 0.67 1.0
## 
##                        MR1  MR2
## SS loadings           2.21 2.12
## Proportion Var        0.18 0.18
## Cumulative Var        0.18 0.36
## Proportion Explained  0.51 0.49
## Cumulative Proportion 0.51 1.00
## 
##  With factor correlations of 
##      MR1  MR2
## MR1 1.00 0.04
## MR2 0.04 1.00
## 
## Mean item complexity =  1
## Test of the hypothesis that 2 factors are sufficient.
## 
## df null model =  66  with the objective function =  2.52 with Chi Square =  1246.71
## df of  the model are 43  and the objective function was  0.11 
## 
## The root mean square of the residuals (RMSR) is  0.03 
## The df corrected root mean square of the residuals is  0.03 
## 
## The harmonic n.obs is  500 with the empirical chi square  43.95  with prob <  0.43 
## The total n.obs was  500  with Likelihood Chi Square =  54.57  with prob <  0.11 
## 
## Tucker Lewis Index of factoring reliability =  0.985
## RMSEA index =  0.023  and the 90 % confidence intervals are  0 0.04
## BIC =  -212.66
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy             
##                                                    MR1  MR2
## Correlation of (regression) scores with factors   0.88 0.88
## Multiple R square of scores with factors          0.78 0.77
## Minimum correlation of possible factor scores     0.56 0.53
fa.sort(f2)  #a cleaner output
## Factor Analysis using method =  minres
## Call: fa(r = my.data, nfactors = 2)
## Standardized loadings (pattern matrix) based upon correlation matrix
##       MR1   MR2   h2   u2 com
## V1   0.64 -0.02 0.41 0.59 1.0
## V8  -0.62 -0.10 0.39 0.61 1.1
## V3   0.61 -0.04 0.37 0.63 1.0
## V2   0.59  0.02 0.35 0.65 1.0
## V9  -0.59  0.00 0.35 0.65 1.0
## V7  -0.58  0.08 0.34 0.66 1.0
## V11  0.03  0.63 0.39 0.61 1.0
## V10  0.07  0.61 0.38 0.62 1.0
## V6   0.03 -0.60 0.36 0.64 1.0
## V4   0.03 -0.58 0.34 0.66 1.0
## V12 -0.06  0.57 0.33 0.67 1.0
## V5   0.01 -0.55 0.30 0.70 1.0
## 
##                        MR1  MR2
## SS loadings           2.21 2.12
## Proportion Var        0.18 0.18
## Cumulative Var        0.18 0.36
## Proportion Explained  0.51 0.49
## Cumulative Proportion 0.51 1.00
## 
##  With factor correlations of 
##      MR1  MR2
## MR1 1.00 0.04
## MR2 0.04 1.00
## 
## Mean item complexity =  1
## Test of the hypothesis that 2 factors are sufficient.
## 
## df null model =  66  with the objective function =  2.52 with Chi Square =  1246.71
## df of  the model are 43  and the objective function was  0.11 
## 
## The root mean square of the residuals (RMSR) is  0.03 
## The df corrected root mean square of the residuals is  0.03 
## 
## The harmonic n.obs is  500 with the empirical chi square  43.95  with prob <  0.43 
## The total n.obs was  500  with Likelihood Chi Square =  54.57  with prob <  0.11 
## 
## Tucker Lewis Index of factoring reliability =  0.985
## RMSEA index =  0.023  and the 90 % confidence intervals are  0 0.04
## BIC =  -212.66
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy             
##                                                    MR1  MR2
## Correlation of (regression) scores with factors   0.88 0.88
## Multiple R square of scores with factors          0.78 0.77
## Minimum correlation of possible factor scores     0.56 0.53
fa.diagram(f2) #show the structure

corPlot(fa.sort(f2))  #this shows the loadings

fa.plot(f2)  #This shows the loadings as items in the factor space

# we can also sort the correlation matrix based upon the factor loadings

R <- cor(my.data)
new.R <- matSort(R,f2)
corPlot(new.R, cex =.5)

What do all of these numbers mean? (reprise)

Loadings are just the correlation of the factors with the variables.

Communalities (h^2) are how much variance for each variable is accounted for by all the factors. This is the sum of squared loadings row wise. (if the factors are uncorrelated)

Uniqueness is 1 - h^2. How much variance in each variable is not accounted for by the factors.

SS loadings is the sum of squared loadings (columnwise)

Complexity of an item is how well it is explained by the factors.

The various fit statistics compare the residual correlations to the original correlations.

The correlation of the factor score (estimates) with the factors is a measure of factor determinancy.

Lets simulate another structure that does not have simple structure

set.seed(47)  #reset the seed for reproducibility
my.circ <- sim.item(12,circum=TRUE)
describe(my.circ)
##     vars   n  mean   sd median trimmed  mad   min  max range  skew kurtosis   se
## V1     1 500 -0.03 1.00  -0.05   -0.03 1.06 -2.51 2.59  5.09  0.06    -0.42 0.04
## V2     2 500 -0.06 0.98   0.02   -0.03 0.95 -3.57 3.33  6.90 -0.28     0.24 0.04
## V3     3 500 -0.13 1.01  -0.12   -0.15 1.10 -3.01 3.22  6.23  0.17    -0.07 0.05
## V4     4 500 -0.12 0.99  -0.16   -0.16 1.07 -2.43 3.50  5.93  0.36     0.08 0.04
## V5     5 500 -0.04 0.99  -0.05   -0.04 0.99 -2.93 3.93  6.86  0.16     0.38 0.04
## V6     6 500 -0.06 1.06  -0.06   -0.05 0.99 -3.09 3.46  6.56 -0.05     0.15 0.05
## V7     7 500  0.02 1.06   0.02    0.03 1.04 -3.41 3.51  6.92 -0.06     0.24 0.05
## V8     8 500  0.07 1.01   0.06    0.06 0.95 -2.73 3.44  6.17  0.19     0.25 0.05
## V9     9 500  0.06 1.04   0.09    0.05 1.01 -3.65 3.19  6.83  0.07     0.20 0.05
## V10   10 500  0.05 1.02   0.01    0.06 1.07 -4.29 2.76  7.05 -0.19     0.27 0.05
## V11   11 500  0.05 0.98   0.01    0.05 0.94 -3.08 3.26  6.34  0.04     0.01 0.04
## V12   12 500  0.01 1.03   0.06    0.01 1.02 -3.17 2.81  5.99 -0.03    -0.02 0.05
corPlot(my.circ)

how many factors? (we put in two, do we get that back)

fa.parallel(my.circ)

## Parallel analysis suggests that the number of factors =  2  and the number of components =  2

extract 2 factors

f2.circ <- fa(my.circ,2) #note that we know that the second parameter is nfactors
fa.sort(f2.circ)
## Factor Analysis using method =  minres
## Call: fa(r = my.circ, nfactors = 2)
## Standardized loadings (pattern matrix) based upon correlation matrix
##       MR1   MR2   h2   u2 com
## V1   0.60 -0.22 0.42 0.58 1.3
## V2   0.59  0.15 0.37 0.63 1.1
## V8  -0.58 -0.09 0.34 0.66 1.0
## V7  -0.58  0.10 0.35 0.65 1.1
## V3   0.49  0.40 0.38 0.62 1.9
## V12  0.48 -0.40 0.40 0.60 1.9
## V9  -0.46 -0.37 0.33 0.67 1.9
## V5  -0.09  0.66 0.45 0.55 1.0
## V4   0.11  0.63 0.40 0.60 1.1
## V11  0.11 -0.58 0.35 0.65 1.1
## V10 -0.21 -0.54 0.33 0.67 1.3
## V6  -0.42  0.45 0.39 0.61 2.0
## 
##                        MR1  MR2
## SS loadings           2.32 2.20
## Proportion Var        0.19 0.18
## Cumulative Var        0.19 0.38
## Proportion Explained  0.51 0.49
## Cumulative Proportion 0.51 1.00
## 
##  With factor correlations of 
##       MR1   MR2
## MR1  1.00 -0.04
## MR2 -0.04  1.00
## 
## Mean item complexity =  1.4
## Test of the hypothesis that 2 factors are sufficient.
## 
## df null model =  66  with the objective function =  2.73 with Chi Square =  1348.83
## df of  the model are 43  and the objective function was  0.11 
## 
## The root mean square of the residuals (RMSR) is  0.03 
## The df corrected root mean square of the residuals is  0.03 
## 
## The harmonic n.obs is  500 with the empirical chi square  42.45  with prob <  0.5 
## The total n.obs was  500  with Likelihood Chi Square =  54.8  with prob <  0.11 
## 
## Tucker Lewis Index of factoring reliability =  0.986
## RMSEA index =  0.023  and the 90 % confidence intervals are  0 0.04
## BIC =  -212.43
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy             
##                                                    MR1  MR2
## Correlation of (regression) scores with factors   0.89 0.88
## Multiple R square of scores with factors          0.79 0.78
## Minimum correlation of possible factor scores     0.58 0.57
fa.diagram(f2.circ,main="Forcing simple structure") # by default, shows just one loading per item

fa.diagram(f2.circ, simple=FALSE,main="Allowing more complex structures" ) # allows more than one per item

corPlot(fa.sort(f2.circ))

fa.plot(f2.circ)    #this shows the difference 

Compare the two plots

par(mfrow=c(1,2)) #make a two column plot
fa.plot(f2,title="simple structure")
fa.plot(f2.circ,title = "circumplex structure")

par(mfrow=c(1,1)) #return to 1 column plots

Do real data show circumplex structure?

Yes, mood.

f2 <- fa(msqR[1:72],2)
fa.plot(f2,labels=colnames(msqR)[1:72],title="2 dimensions of affect at the item level")

Do that graph again, but omit the points, just show the labels

fa.plot(f2,labels=colnames(msqR)[1:72],title="2 dimensions of affect at the item level",show.points=FALSE)

Real data (continued)

The ability data set has 1525 participants taking 16 ability items

describe(ability)
##           vars    n mean   sd median trimmed mad min max range  skew kurtosis   se
## reason.4     1 1442 0.68 0.47      1    0.72   0   0   1     1 -0.75    -1.44 0.01
## reason.16    2 1463 0.73 0.45      1    0.78   0   0   1     1 -1.02    -0.96 0.01
## reason.17    3 1440 0.74 0.44      1    0.80   0   0   1     1 -1.08    -0.84 0.01
## reason.19    4 1456 0.64 0.48      1    0.68   0   0   1     1 -0.60    -1.64 0.01
## letter.7     5 1441 0.63 0.48      1    0.67   0   0   1     1 -0.56    -1.69 0.01
## letter.33    6 1438 0.61 0.49      1    0.63   0   0   1     1 -0.43    -1.82 0.01
## letter.34    7 1455 0.64 0.48      1    0.68   0   0   1     1 -0.59    -1.65 0.01
## letter.58    8 1438 0.47 0.50      0    0.46   0   0   1     1  0.12    -1.99 0.01
## matrix.45    9 1458 0.55 0.50      1    0.56   0   0   1     1 -0.20    -1.96 0.01
## matrix.46   10 1470 0.57 0.50      1    0.59   0   0   1     1 -0.28    -1.92 0.01
## matrix.47   11 1465 0.64 0.48      1    0.67   0   0   1     1 -0.57    -1.67 0.01
## matrix.55   12 1459 0.39 0.49      0    0.36   0   0   1     1  0.45    -1.80 0.01
## rotate.3    13 1456 0.20 0.40      0    0.13   0   0   1     1  1.48     0.19 0.01
## rotate.4    14 1460 0.22 0.42      0    0.15   0   0   1     1  1.34    -0.21 0.01
## rotate.6    15 1456 0.31 0.46      0    0.27   0   0   1     1  0.80    -1.35 0.01
## rotate.8    16 1460 0.19 0.39      0    0.12   0   0   1     1  1.55     0.41 0.01
lowerCor(ability)
##           rsn.4 rs.16 rs.17 rs.19 ltt.7 lt.33 lt.34 lt.58 mt.45 mt.46 mt.47 mt.55 rtt.3 rtt.4
## reason.4  1.00                                                                               
## reason.16 0.28  1.00                                                                         
## reason.17 0.40  0.32  1.00                                                                   
## reason.19 0.30  0.25  0.34  1.00                                                             
## letter.7  0.28  0.27  0.29  0.25  1.00                                                       
## letter.33 0.23  0.20  0.26  0.25  0.34  1.00                                                 
## letter.34 0.29  0.26  0.29  0.27  0.40  0.37  1.00                                           
## letter.58 0.29  0.21  0.29  0.25  0.33  0.28  0.32  1.00                                     
## matrix.45 0.25  0.18  0.20  0.22  0.20  0.20  0.21  0.19  1.00                               
## matrix.46 0.25  0.18  0.24  0.18  0.24  0.23  0.27  0.21  0.33  1.00                         
## matrix.47 0.24  0.24  0.27  0.23  0.27  0.23  0.30  0.23  0.24  0.23  1.00                   
## matrix.55 0.16  0.15  0.16  0.15  0.14  0.17  0.14  0.23  0.21  0.14  0.21  1.00             
## rotate.3  0.23  0.16  0.17  0.18  0.18  0.17  0.19  0.24  0.16  0.15  0.20  0.18  1.00       
## rotate.4  0.25  0.20  0.20  0.21  0.23  0.21  0.21  0.27  0.17  0.17  0.20  0.18  0.53  1.00 
## rotate.6  0.25  0.20  0.27  0.19  0.20  0.21  0.19  0.26  0.15  0.20  0.18  0.17  0.43  0.45 
## rotate.8  0.21  0.16  0.18  0.16  0.13  0.14  0.15  0.22  0.16  0.15  0.17  0.19  0.43  0.44 
##          rtt.6 rtt.8
## rotate.6 1.00       
## rotate.8 0.42  1.00
corPlot(ability)

fa.parallel(ability)

## Parallel analysis suggests that the number of factors =  4  and the number of components =  2
f1.ab <- fa(ability) 
f1.ab
## Factor Analysis using method =  minres
## Call: fa(r = ability)
## Standardized loadings (pattern matrix) based upon correlation matrix
##            MR1   h2   u2 com
## reason.4  0.55 0.30 0.70   1
## reason.16 0.45 0.20 0.80   1
## reason.17 0.54 0.29 0.71   1
## reason.19 0.47 0.22 0.78   1
## letter.7  0.52 0.27 0.73   1
## letter.33 0.48 0.23 0.77   1
## letter.34 0.54 0.29 0.71   1
## letter.58 0.53 0.28 0.72   1
## matrix.45 0.41 0.17 0.83   1
## matrix.46 0.43 0.18 0.82   1
## matrix.47 0.47 0.22 0.78   1
## matrix.55 0.35 0.12 0.88   1
## rotate.3  0.50 0.25 0.75   1
## rotate.4  0.55 0.30 0.70   1
## rotate.6  0.53 0.28 0.72   1
## rotate.8  0.46 0.21 0.79   1
## 
##                 MR1
## SS loadings    3.81
## Proportion Var 0.24
## 
## Mean item complexity =  1
## Test of the hypothesis that 1 factor is sufficient.
## 
## df null model =  120  with the objective function =  3.28 with Chi Square =  4973.83
## df of  the model are 104  and the objective function was  0.7 
## 
## The root mean square of the residuals (RMSR) is  0.07 
## The df corrected root mean square of the residuals is  0.07 
## 
## The harmonic n.obs is  1426 with the empirical chi square  1476.62  with prob <  3e-241 
## The total n.obs was  1525  with Likelihood Chi Square =  1063.25  with prob <  9.5e-159 
## 
## Tucker Lewis Index of factoring reliability =  0.772
## RMSEA index =  0.078  and the 90 % confidence intervals are  0.074 0.082
## BIC =  300.96
## Fit based upon off diagonal values = 0.93
## Measures of factor score adequacy             
##                                                    MR1
## Correlation of (regression) scores with factors   0.91
## Multiple R square of scores with factors          0.84
## Minimum correlation of possible factor scores     0.67
f2.ab <- fa(ability,2)
f2.ab
## Factor Analysis using method =  minres
## Call: fa(r = ability, nfactors = 2)
## Standardized loadings (pattern matrix) based upon correlation matrix
##             MR1   MR2   h2   u2 com
## reason.4   0.51  0.08 0.31 0.69 1.0
## reason.16  0.45  0.03 0.21 0.79 1.0
## reason.17  0.57  0.00 0.32 0.68 1.0
## reason.19  0.48  0.02 0.24 0.76 1.0
## letter.7   0.60 -0.05 0.33 0.67 1.0
## letter.33  0.52 -0.01 0.26 0.74 1.0
## letter.34  0.63 -0.07 0.35 0.65 1.0
## letter.58  0.45  0.12 0.28 0.72 1.2
## matrix.45  0.41  0.02 0.18 0.82 1.0
## matrix.46  0.45  0.01 0.20 0.80 1.0
## matrix.47  0.46  0.04 0.23 0.77 1.0
## matrix.55  0.24  0.15 0.12 0.88 1.7
## rotate.3  -0.03  0.72 0.49 0.51 1.0
## rotate.4   0.02  0.71 0.52 0.48 1.0
## rotate.6   0.09  0.59 0.40 0.60 1.0
## rotate.8  -0.03  0.65 0.40 0.60 1.0
## 
##                        MR1  MR2
## SS loadings           2.96 1.91
## Proportion Var        0.19 0.12
## Cumulative Var        0.19 0.30
## Proportion Explained  0.61 0.39
## Cumulative Proportion 0.61 1.00
## 
##  With factor correlations of 
##      MR1  MR2
## MR1 1.00 0.52
## MR2 0.52 1.00
## 
## Mean item complexity =  1.1
## Test of the hypothesis that 2 factors are sufficient.
## 
## df null model =  120  with the objective function =  3.28 with Chi Square =  4973.83
## df of  the model are 89  and the objective function was  0.18 
## 
## The root mean square of the residuals (RMSR) is  0.03 
## The df corrected root mean square of the residuals is  0.03 
## 
## The harmonic n.obs is  1426 with the empirical chi square  288.87  with prob <  5.9e-23 
## The total n.obs was  1525  with Likelihood Chi Square =  277.64  with prob <  2.9e-21 
## 
## Tucker Lewis Index of factoring reliability =  0.948
## RMSEA index =  0.037  and the 90 % confidence intervals are  0.032 0.042
## BIC =  -374.71
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy             
##                                                    MR1  MR2
## Correlation of (regression) scores with factors   0.90 0.89
## Multiple R square of scores with factors          0.81 0.79
## Minimum correlation of possible factor scores     0.63 0.58
fa.diagram(f2.ab)

Parallel analysis suggested 4 factors. Lets try that

f4 <- fa(ability,nfactors=4)  #specify that we want four factors 
f4  #show the solution
## Factor Analysis using method =  minres
## Call: fa(r = ability, nfactors = 4)
## Standardized loadings (pattern matrix) based upon correlation matrix
##             MR2   MR1   MR4   MR3   h2   u2 com
## reason.4   0.09  0.07  0.44  0.10 0.35 0.65 1.3
## reason.16  0.05  0.14  0.34  0.03 0.23 0.77 1.4
## reason.17 -0.02  0.00  0.73 -0.02 0.51 0.49 1.0
## reason.19  0.04  0.16  0.33  0.06 0.25 0.75 1.5
## letter.7   0.00  0.61  0.04 -0.01 0.39 0.61 1.0
## letter.33  0.03  0.53 -0.01  0.03 0.31 0.69 1.0
## letter.34 -0.02  0.66  0.00  0.00 0.43 0.57 1.0
## letter.58  0.16  0.34  0.12  0.03 0.28 0.72 1.7
## matrix.45 -0.01 -0.01 -0.01  0.77 0.57 0.43 1.0
## matrix.46  0.03  0.20  0.07  0.31 0.24 0.76 1.8
## matrix.47  0.07  0.25  0.13  0.15 0.23 0.77 2.5
## matrix.55  0.16  0.08  0.06  0.17 0.13 0.87 2.6
## rotate.3   0.72  0.02 -0.06  0.01 0.50 0.50 1.0
## rotate.4   0.72  0.08 -0.04 -0.02 0.53 0.47 1.0
## rotate.6   0.58 -0.02  0.14 -0.02 0.41 0.59 1.1
## rotate.8   0.64 -0.10  0.05  0.04 0.41 0.59 1.1
## 
##                        MR2  MR1  MR4  MR3
## SS loadings           1.98 1.63 1.27 0.89
## Proportion Var        0.12 0.10 0.08 0.06
## Cumulative Var        0.12 0.23 0.30 0.36
## Proportion Explained  0.34 0.28 0.22 0.15
## Cumulative Proportion 0.34 0.63 0.85 1.00
## 
##  With factor correlations of 
##      MR2  MR1  MR4  MR3
## MR2 1.00 0.43 0.42 0.31
## MR1 0.43 1.00 0.64 0.44
## MR4 0.42 0.64 1.00 0.41
## MR3 0.31 0.44 0.41 1.00
## 
## Mean item complexity =  1.4
## Test of the hypothesis that 4 factors are sufficient.
## 
## df null model =  120  with the objective function =  3.28 with Chi Square =  4973.83
## df of  the model are 62  and the objective function was  0.05 
## 
## The root mean square of the residuals (RMSR) is  0.01 
## The df corrected root mean square of the residuals is  0.02 
## 
## The harmonic n.obs is  1426 with the empirical chi square  63.75  with prob <  0.41 
## The total n.obs was  1525  with Likelihood Chi Square =  70.96  with prob <  0.2 
## 
## Tucker Lewis Index of factoring reliability =  0.996
## RMSEA index =  0.01  and the 90 % confidence intervals are  0 0.019
## BIC =  -383.48
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    MR2  MR1  MR4  MR3
## Correlation of (regression) scores with factors   0.89 0.86 0.85 0.81
## Multiple R square of scores with factors          0.79 0.75 0.71 0.65
## Minimum correlation of possible factor scores     0.58 0.49 0.43 0.31
fa.diagram(f4) #notice the high correlations

fa.plot(f4) #yet another way of drawing it

Hierarchical factors allow for factoring of factors

One function (already discussed) is the omega function

It factors the variables, and then factors the correlations of the factors.

om <- omega(ability,4)  #it defaults to 3, we specify we want 4

omega.diagram(om, sl=FALSE) #two ways of drawing the results

Try more data – the bfi

describe(bfi)
##           vars    n  mean    sd median trimmed   mad min max range  skew kurtosis   se
## A1           1 2784  2.41  1.41      2    2.23  1.48   1   6     5  0.83    -0.31 0.03
## A2           2 2773  4.80  1.17      5    4.98  1.48   1   6     5 -1.12     1.05 0.02
## A3           3 2774  4.60  1.30      5    4.79  1.48   1   6     5 -1.00     0.44 0.02
## A4           4 2781  4.70  1.48      5    4.93  1.48   1   6     5 -1.03     0.04 0.03
## A5           5 2784  4.56  1.26      5    4.71  1.48   1   6     5 -0.85     0.16 0.02
## C1           6 2779  4.50  1.24      5    4.64  1.48   1   6     5 -0.85     0.30 0.02
## C2           7 2776  4.37  1.32      5    4.50  1.48   1   6     5 -0.74    -0.14 0.03
## C3           8 2780  4.30  1.29      5    4.42  1.48   1   6     5 -0.69    -0.13 0.02
## C4           9 2774  2.55  1.38      2    2.41  1.48   1   6     5  0.60    -0.62 0.03
## C5          10 2784  3.30  1.63      3    3.25  1.48   1   6     5  0.07    -1.22 0.03
## E1          11 2777  2.97  1.63      3    2.86  1.48   1   6     5  0.37    -1.09 0.03
## E2          12 2784  3.14  1.61      3    3.06  1.48   1   6     5  0.22    -1.15 0.03
## E3          13 2775  4.00  1.35      4    4.07  1.48   1   6     5 -0.47    -0.47 0.03
## E4          14 2791  4.42  1.46      5    4.59  1.48   1   6     5 -0.82    -0.30 0.03
## E5          15 2779  4.42  1.33      5    4.56  1.48   1   6     5 -0.78    -0.09 0.03
## N1          16 2778  2.93  1.57      3    2.82  1.48   1   6     5  0.37    -1.01 0.03
## N2          17 2779  3.51  1.53      4    3.51  1.48   1   6     5 -0.08    -1.05 0.03
## N3          18 2789  3.22  1.60      3    3.16  1.48   1   6     5  0.15    -1.18 0.03
## N4          19 2764  3.19  1.57      3    3.12  1.48   1   6     5  0.20    -1.09 0.03
## N5          20 2771  2.97  1.62      3    2.85  1.48   1   6     5  0.37    -1.06 0.03
## O1          21 2778  4.82  1.13      5    4.96  1.48   1   6     5 -0.90     0.43 0.02
## O2          22 2800  2.71  1.57      2    2.56  1.48   1   6     5  0.59    -0.81 0.03
## O3          23 2772  4.44  1.22      5    4.56  1.48   1   6     5 -0.77     0.30 0.02
## O4          24 2786  4.89  1.22      5    5.10  1.48   1   6     5 -1.22     1.08 0.02
## O5          25 2780  2.49  1.33      2    2.34  1.48   1   6     5  0.74    -0.24 0.03
## gender      26 2800  1.67  0.47      2    1.71  0.00   1   2     1 -0.73    -1.47 0.01
## education   27 2577  3.19  1.11      3    3.22  1.48   1   5     4 -0.05    -0.32 0.02
## age         28 2800 28.78 11.13     26   27.43 10.38   3  86    83  1.02     0.56 0.21
corPlot(bfi)

fa.parallel(bfi) #how many factors do we think are there?

## Parallel analysis suggests that the number of factors =  7  and the number of components =  7
f5 <- fa(bfi,5)
fa.lookup(f5,dictionary=bfi.dictionary[1:2])
##             MR2   MR1   MR3   MR5   MR4   h2  com ItemLabel
## N1         0.78  0.09  0.00 -0.12 -0.03 0.61 1.08     q_952
## N2         0.76  0.03  0.01 -0.09  0.02 0.58 1.03     q_974
## N3         0.73 -0.06 -0.03  0.06  0.02 0.56 1.03    q_1099
## N5         0.53 -0.15  0.01  0.21 -0.17 0.37 1.70    q_1505
## N4         0.50 -0.36 -0.13  0.10  0.09 0.48 2.15    q_1479
## E4         0.00  0.67  0.02  0.18 -0.07 0.55 1.17    q_1410
## E2         0.15 -0.64 -0.02 -0.03 -0.06 0.51 1.13     q_901
## E1        -0.03 -0.54  0.10 -0.07 -0.09 0.32 1.17     q_712
## E3         0.08  0.49  0.00  0.16  0.29 0.45 1.93    q_1205
## E5         0.12  0.41  0.27  0.03  0.22 0.39 2.58    q_1768
## C2         0.16 -0.05  0.66  0.04  0.04 0.44 1.15     q_530
## C4         0.18  0.06 -0.62 -0.01 -0.04 0.46 1.20     q_626
## C3         0.04 -0.05  0.56  0.08 -0.06 0.31 1.09     q_619
## C5         0.20 -0.12 -0.56  0.01  0.10 0.43 1.43    q_1949
## C1         0.07 -0.02  0.55 -0.04  0.15 0.33 1.21     q_124
## A2         0.01  0.07  0.08  0.63  0.03 0.47 1.07    q_1162
## A3         0.00  0.23  0.03  0.58  0.03 0.49 1.31    q_1206
## A1         0.20  0.19  0.07 -0.51 -0.05 0.28 1.67     q_146
## A5        -0.10  0.32  0.01  0.46  0.05 0.45 1.93    q_1419
## A4        -0.04  0.13  0.20  0.40 -0.14 0.28 2.03    q_1364
## gender     0.14  0.00  0.11  0.29 -0.16 0.12 2.42    gender
## age       -0.15 -0.12  0.08  0.19  0.08 0.08 3.49       age
## O3         0.03  0.17  0.02  0.05  0.62 0.47 1.18     q_492
## O1         0.01  0.13  0.07 -0.03  0.52 0.33 1.17     q_128
## O5         0.13  0.14 -0.03 -0.02 -0.52 0.29 1.28    q_1964
## O2         0.20  0.11 -0.08  0.10 -0.44 0.24 1.77     q_316
## O4         0.16 -0.28 -0.02  0.18  0.37 0.24 2.83    q_1738
## education -0.10 -0.14 -0.01  0.10  0.15 0.05 3.47 education
##                                                 Item
## N1                                 Get angry easily.
## N2                             Get irritated easily.
## N3                        Have frequent mood swings.
## N5                                     Panic easily.
## N4                                  Often feel blue.
## E4                              Make friends easily.
## E2             Find it difficult to approach others.
## E1                                 Don't talk a lot.
## E3                     Know how to captivate people.
## E5                                      Take charge.
## C2             Continue until everything is perfect.
## C4                   Do things in a half-way manner.
## C3                    Do things according to a plan.
## C5                                    Waste my time.
## C1                           Am exacting in my work.
## A2                 Inquire about others' well-being.
## A3                       Know how to comfort others.
## A1         Am indifferent to the feelings of others.
## A5                         Make people feel at ease.
## A4                                    Love children.
## gender                            males=1, females=2
## age                                     age in years
## O3         Carry the conversation to a higher level.
## O1                                 Am full of ideas.
## O5             Will not probe deeply into a subject.
## O2                 Avoid difficult reading material.
## O4                  Spend time reflecting on things.
## education in HS, fin HS, coll,  coll grad , grad deg
fa.diagram(f5)  

But age, gender and education are not in the same space.

Do the analysis without including age, gender and education as variables

describe(bfi)
##           vars    n  mean    sd median trimmed   mad min max range  skew kurtosis   se
## A1           1 2784  2.41  1.41      2    2.23  1.48   1   6     5  0.83    -0.31 0.03
## A2           2 2773  4.80  1.17      5    4.98  1.48   1   6     5 -1.12     1.05 0.02
## A3           3 2774  4.60  1.30      5    4.79  1.48   1   6     5 -1.00     0.44 0.02
## A4           4 2781  4.70  1.48      5    4.93  1.48   1   6     5 -1.03     0.04 0.03
## A5           5 2784  4.56  1.26      5    4.71  1.48   1   6     5 -0.85     0.16 0.02
## C1           6 2779  4.50  1.24      5    4.64  1.48   1   6     5 -0.85     0.30 0.02
## C2           7 2776  4.37  1.32      5    4.50  1.48   1   6     5 -0.74    -0.14 0.03
## C3           8 2780  4.30  1.29      5    4.42  1.48   1   6     5 -0.69    -0.13 0.02
## C4           9 2774  2.55  1.38      2    2.41  1.48   1   6     5  0.60    -0.62 0.03
## C5          10 2784  3.30  1.63      3    3.25  1.48   1   6     5  0.07    -1.22 0.03
## E1          11 2777  2.97  1.63      3    2.86  1.48   1   6     5  0.37    -1.09 0.03
## E2          12 2784  3.14  1.61      3    3.06  1.48   1   6     5  0.22    -1.15 0.03
## E3          13 2775  4.00  1.35      4    4.07  1.48   1   6     5 -0.47    -0.47 0.03
## E4          14 2791  4.42  1.46      5    4.59  1.48   1   6     5 -0.82    -0.30 0.03
## E5          15 2779  4.42  1.33      5    4.56  1.48   1   6     5 -0.78    -0.09 0.03
## N1          16 2778  2.93  1.57      3    2.82  1.48   1   6     5  0.37    -1.01 0.03
## N2          17 2779  3.51  1.53      4    3.51  1.48   1   6     5 -0.08    -1.05 0.03
## N3          18 2789  3.22  1.60      3    3.16  1.48   1   6     5  0.15    -1.18 0.03
## N4          19 2764  3.19  1.57      3    3.12  1.48   1   6     5  0.20    -1.09 0.03
## N5          20 2771  2.97  1.62      3    2.85  1.48   1   6     5  0.37    -1.06 0.03
## O1          21 2778  4.82  1.13      5    4.96  1.48   1   6     5 -0.90     0.43 0.02
## O2          22 2800  2.71  1.57      2    2.56  1.48   1   6     5  0.59    -0.81 0.03
## O3          23 2772  4.44  1.22      5    4.56  1.48   1   6     5 -0.77     0.30 0.02
## O4          24 2786  4.89  1.22      5    5.10  1.48   1   6     5 -1.22     1.08 0.02
## O5          25 2780  2.49  1.33      2    2.34  1.48   1   6     5  0.74    -0.24 0.03
## gender      26 2800  1.67  0.47      2    1.71  0.00   1   2     1 -0.73    -1.47 0.01
## education   27 2577  3.19  1.11      3    3.22  1.48   1   5     4 -0.05    -0.32 0.02
## age         28 2800 28.78 11.13     26   27.43 10.38   3  86    83  1.02     0.56 0.21
corPlot(bfi[1:25])

fa.parallel(bfi[1:25]) #how many factors do we think are there?

## Parallel analysis suggests that the number of factors =  6  and the number of components =  6
f5 <- fa(bfi[1:25],5)
fa.lookup(f5,dictionary=bfi.dictionary[1:2])
##      MR2   MR1   MR3   MR5   MR4   h2  com ItemLabel                                      Item
## N1  0.81  0.10  0.00 -0.11 -0.05 0.65 1.08     q_952                         Get angry easily.
## N2  0.78  0.04  0.01 -0.09  0.01 0.60 1.04     q_974                     Get irritated easily.
## N3  0.71 -0.10 -0.04  0.08  0.02 0.55 1.07    q_1099                Have frequent mood swings.
## N5  0.49 -0.20  0.00  0.21 -0.15 0.35 1.96    q_1505                             Panic easily.
## N4  0.47 -0.39 -0.14  0.09  0.08 0.49 2.27    q_1479                          Often feel blue.
## E2  0.10 -0.68 -0.02 -0.05 -0.06 0.54 1.07     q_901     Find it difficult to approach others.
## E4  0.01  0.59  0.02  0.29 -0.08 0.53 1.49    q_1410                      Make friends easily.
## E1 -0.06 -0.56  0.11 -0.08 -0.10 0.35 1.21     q_712                         Don't talk a lot.
## E5  0.15  0.42  0.27  0.05  0.21 0.40 2.60    q_1768                              Take charge.
## E3  0.08  0.42  0.00  0.25  0.28 0.44 2.55    q_1205             Know how to captivate people.
## C2  0.15 -0.09  0.67  0.08  0.04 0.45 1.17     q_530     Continue until everything is perfect.
## C4  0.17  0.00 -0.61  0.04 -0.05 0.45 1.18     q_626           Do things in a half-way manner.
## C3  0.03 -0.06  0.57  0.09 -0.07 0.32 1.11     q_619            Do things according to a plan.
## C5  0.19 -0.14 -0.55  0.02  0.09 0.43 1.44    q_1949                            Waste my time.
## C1  0.07 -0.03  0.55 -0.02  0.15 0.33 1.19     q_124                   Am exacting in my work.
## A3 -0.03  0.12  0.02  0.66  0.03 0.52 1.07    q_1206               Know how to comfort others.
## A2 -0.02  0.00  0.08  0.64  0.03 0.45 1.04    q_1162         Inquire about others' well-being.
## A5 -0.11  0.23  0.01  0.53  0.04 0.46 1.49    q_1419                 Make people feel at ease.
## A4 -0.06  0.06  0.19  0.43 -0.15 0.28 1.74    q_1364                            Love children.
## A1  0.21  0.17  0.07 -0.41 -0.06 0.19 1.97     q_146 Am indifferent to the feelings of others.
## O3  0.03  0.15  0.02  0.08  0.61 0.46 1.17     q_492 Carry the conversation to a higher level.
## O5  0.13  0.10 -0.03  0.04 -0.54 0.30 1.21    q_1964     Will not probe deeply into a subject.
## O1  0.02  0.10  0.07  0.02  0.51 0.31 1.13     q_128                         Am full of ideas.
## O2  0.19  0.06 -0.08  0.16 -0.46 0.26 1.75     q_316         Avoid difficult reading material.
## O4  0.13 -0.32 -0.02  0.17  0.37 0.25 2.69    q_1738          Spend time reflecting on things.
fa.diagram(f5)

#but parallel analysis suggests six factors
f6 <- fa(bfi[1:25],6)
fa.lookup(f6,dictionary=bfi.dictionary[1:2])
##      MR2   MR1   MR3   MR5   MR4   MR6   h2  com ItemLabel
## N1  0.84 -0.10  0.01 -0.07 -0.05  0.00 0.68 1.05     q_952
## N2  0.83 -0.06  0.02 -0.04 -0.01 -0.07 0.66 1.03     q_974
## N3  0.67  0.13 -0.03  0.07  0.05  0.08 0.54 1.15    q_1099
## N5  0.44  0.23  0.00  0.18 -0.10  0.16 0.35 2.37    q_1505
## N4  0.43  0.42 -0.13  0.08  0.10  0.05 0.49 2.40    q_1479
## E2  0.05  0.70 -0.02 -0.07 -0.06  0.03 0.56 1.05     q_901
## E1 -0.13  0.59  0.11 -0.12 -0.08  0.09 0.39 1.35     q_712
## E4 -0.05 -0.53  0.03  0.20  0.04  0.29 0.55 1.90    q_1410
## E5  0.15 -0.40  0.27  0.05  0.23  0.01 0.40 2.84    q_1768
## C2  0.07  0.14  0.67  0.02  0.11  0.17 0.49 1.31     q_530
## C4  0.06  0.09 -0.64 -0.06  0.07  0.30 0.57 1.53     q_626
## C3  0.01  0.06  0.55  0.08 -0.04  0.06 0.31 1.11     q_619
## C5  0.15  0.18 -0.54 -0.01  0.11  0.05 0.43 1.50    q_1949
## C1  0.01  0.06  0.54 -0.06  0.19  0.07 0.34 1.34     q_124
## A2  0.04 -0.04  0.08  0.68  0.00 -0.05 0.50 1.05    q_1162
## A3 -0.02 -0.12  0.03  0.61  0.07  0.11 0.51 1.17    q_1206
## A1  0.09 -0.09  0.08 -0.56  0.06  0.30 0.34 1.73     q_146
## A5 -0.16 -0.19  0.01  0.45  0.13  0.21 0.47 2.35    q_1419
## A4 -0.07 -0.06  0.19  0.39 -0.10  0.15 0.28 2.10    q_1364
## O3 -0.02 -0.09  0.02  0.03  0.65 -0.02 0.48 1.05     q_492
## O1 -0.05 -0.04  0.07 -0.05  0.57  0.03 0.35 1.08     q_128
## O5  0.03 -0.04 -0.02 -0.04 -0.45  0.41 0.37 2.04    q_1964
## E3  0.00 -0.34  0.00  0.15  0.40  0.21 0.48 2.85    q_1205
## O4  0.09  0.35 -0.02  0.15  0.37 -0.04 0.25 2.47    q_1738
## O2  0.11  0.00 -0.07  0.09 -0.36  0.36 0.29 2.40     q_316
##                                         Item
## N1                         Get angry easily.
## N2                     Get irritated easily.
## N3                Have frequent mood swings.
## N5                             Panic easily.
## N4                          Often feel blue.
## E2     Find it difficult to approach others.
## E1                         Don't talk a lot.
## E4                      Make friends easily.
## E5                              Take charge.
## C2     Continue until everything is perfect.
## C4           Do things in a half-way manner.
## C3            Do things according to a plan.
## C5                            Waste my time.
## C1                   Am exacting in my work.
## A2         Inquire about others' well-being.
## A3               Know how to comfort others.
## A1 Am indifferent to the feelings of others.
## A5                 Make people feel at ease.
## A4                            Love children.
## O3 Carry the conversation to a higher level.
## O1                         Am full of ideas.
## O5     Will not probe deeply into a subject.
## E3             Know how to captivate people.
## O4          Spend time reflecting on things.
## O2         Avoid difficult reading material.
fa.diagram(f6)

Yet another way to understand multivariate structure is Cluster Analysis

An example of cluster analysis devised for forming scales is iclust (for item cluster analysis). iclust was developed specifically to form homogenous scales. See the help page for iclust.

Basic concept is easy:

1 Find the correlations.

2 Find the most similar pair of items.

3 Combine those two variables to make a new, composite variable.

4 Repeat steps 2 - 3 until some criterion is reached

5 What criterion? a) alpha does not increase b) beta (the worst split half) does not increase

Try this on the ability items

iclust(ability)

## ICLUST (Item Cluster Analysis)
## Call: iclust(r.mat = ability)
## 
## Purified Alpha:
## [1] 0.83
## 
## G6* reliability:
## [1] 1
## 
## Original Beta:
## [1] 0.68
## 
## Cluster size:
## [1] 16
## 
## Item by Cluster Structure matrix:
##           [,1]
## reason.4  0.54
## reason.16 0.44
## reason.17 0.54
## reason.19 0.47
## letter.7  0.52
## letter.33 0.48
## letter.34 0.54
## letter.58 0.52
## matrix.45 0.42
## matrix.46 0.43
## matrix.47 0.47
## matrix.55 0.35
## rotate.3  0.51
## rotate.4  0.56
## rotate.6  0.53
## rotate.8  0.47
## 
## With Sums of squares of:
## [1] 3.8
## 
## Purified scale intercorrelations
##  reliabilities on diagonal
##  correlations corrected for attenuation above diagonal: 
##      [,1]
## [1,] 0.83
## 
## Cluster fit =  0.66   Pattern fit =  0.97  RMSR =  0.07

Now try the ‘bfi’ items

ic <- iclust(bfi[1:25]) #we want to keep these results

fa.lookup(ic,dictionary=bfi.dictionary[1:2])
##      C20   C16   C15   C21 ItemLabel                                      Item
## E4  0.66 -0.22 -0.22 -0.06    q_1410                      Make friends easily.
## A3  0.65 -0.11 -0.22 -0.19    q_1206               Know how to comfort others.
## A5  0.65 -0.23 -0.23 -0.19    q_1419                 Make people feel at ease.
## E2 -0.61  0.34  0.24  0.15     q_901     Find it difficult to approach others.
## E3  0.59 -0.10 -0.20 -0.39    q_1205             Know how to captivate people.
## A2  0.59 -0.07 -0.23 -0.19    q_1162         Inquire about others' well-being.
## E5  0.50 -0.10 -0.40 -0.32    q_1768                              Take charge.
## E1 -0.50  0.11  0.06  0.16     q_712                         Don't talk a lot.
## A4  0.43 -0.14 -0.29 -0.01    q_1364                            Love children.
## A1 -0.27  0.14  0.05  0.13     q_146 Am indifferent to the feelings of others.
## N1 -0.18  0.76  0.20  0.11     q_952                         Get angry easily.
## N2 -0.18  0.75  0.18  0.05     q_974                     Get irritated easily.
## N3 -0.15  0.74  0.19  0.02    q_1099                Have frequent mood swings.
## N4 -0.34  0.62  0.29  0.01    q_1479                          Often feel blue.
## N5 -0.13  0.55  0.13  0.17    q_1505                             Panic easily.
## C4 -0.26  0.31  0.66  0.21     q_626           Do things in a half-way manner.
## C2  0.22 -0.01 -0.62 -0.21     q_530     Continue until everything is perfect.
## C5 -0.30  0.36  0.59  0.09    q_1949                            Waste my time.
## C1  0.18 -0.08 -0.54 -0.29     q_124                   Am exacting in my work.
## C3  0.20 -0.09 -0.54 -0.08     q_619            Do things according to a plan.
## O3  0.39 -0.07 -0.20 -0.62     q_492 Carry the conversation to a higher level.
## O5 -0.12  0.11  0.15  0.53    q_1964     Will not probe deeply into a subject.
## O1  0.28 -0.09 -0.20 -0.53     q_128                         Am full of ideas.
## O2 -0.04  0.19  0.18  0.44     q_316         Avoid difficult reading material.
## O4 -0.02  0.21  0.00 -0.33    q_1738          Spend time reflecting on things.

Compare the factor solution to the cluster solution

f5 <- fa(bfi[1:25],5)
factor.congruence(f5,ic)
##       C20   C16   C15   C21
## MR2 -0.25  0.92  0.30  0.14
## MR1  0.72 -0.40 -0.35 -0.28
## MR3  0.37 -0.28 -0.89 -0.37
## MR5  0.71 -0.14 -0.33 -0.30
## MR4  0.32 -0.12 -0.28 -0.92

Factor congruence was a statistic developed by Burt (1937, 1948) but also by Tucker. It is the cosine of the factor vectors. It is functionally cosine of the angle between the vectors. It is not the correlation of the vectors (which become zero centered.)

    cross <- t(y) %*% x
    sumsx <- sqrt(1/diag(t(x) %*% x))
   sumsy <- sqrt(1/diag(t(y) %*% y))
   congruence <- cross * sumsqx * sumsqy

Compare the correlations with the congruences:

factor.congruence(f5,ic)
##       C20   C16   C15   C21
## MR2 -0.25  0.92  0.30  0.14
## MR1  0.72 -0.40 -0.35 -0.28
## MR3  0.37 -0.28 -0.89 -0.37
## MR5  0.71 -0.14 -0.33 -0.30
## MR4  0.32 -0.12 -0.28 -0.92
round(cor(f5$loadings,ic$loadings),2)
##       C20   C16   C15   C21
## MR2 -0.48  0.92  0.46  0.32
## MR1  0.74 -0.45 -0.36 -0.29
## MR3  0.35 -0.40 -0.89 -0.35
## MR5  0.71 -0.44 -0.31 -0.25
## MR4  0.31 -0.19 -0.27 -0.92