Always do this first!
library(psych)
library(psychTools)
Notes to accompany https://personality-project.org/courses/350/350.factoranalysis.pdf
• Specifying a model and loss function
• Ideally supplying a first derivative to the loss function
• Can be done empirically
• Some way to evaluate the quality of the fit
• Minimization of function, but then how good is this minimum
• Many procedures use this concept of iterative fitting
• Some, such as principal components are“closed form”and can just be solved directly
• Others, such as“factor analysis”need to find solutions by successive approximations
• The issue of factor or component rotation is an iterative procedure.
• Principal Components and Factor analysis are all the result of some basic matrix equations to approximate a matrix
• Conceptually, we are just taking the square root of a correlation matrix:
• R≈CC′ or R≈FF′+U2
• For any given correlation matrix, R, can we find a C or an F matrix which approximates it?
Classical Test Theory (CTT) considers four or more tests to be congenerically equivalent if all tests may be expressed in terms of one factor and a residual error. Parallel tests are the special case where (usually two) tests have equal factor loadings. Tau equivalent tests have equal factor loadings but may have unequal errors. Congeneric tests may differ in both factor loading and error variances. Minor factors may be added as systematic but trivial disturbances.
The sim.congeneric
function will create model or
sample
data to meet a congeneric model. We first just use
the default parameters to generate a population matrix.
R <- sim.congeneric() #use the default values
R
## V1 V2 V3 V4
## V1 1.00 0.56 0.48 0.40
## V2 0.56 1.00 0.42 0.35
## V3 0.48 0.42 1.00 0.30
## V4 0.40 0.35 0.30 1.00
This looks like a simple multiplication table. See if we can factor it
f1 <- fa(R) #defaults to 1 factor
f1 #show the output
## Factor Analysis using method = minres
## Call: fa(r = R)
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 h2 u2 com
## V1 0.8 0.64 0.36 1
## V2 0.7 0.49 0.51 1
## V3 0.6 0.36 0.64 1
## V4 0.5 0.25 0.75 1
##
## MR1
## SS loadings 1.74
## Proportion Var 0.43
##
## Mean item complexity = 1
## Test of the hypothesis that 1 factor is sufficient.
##
## df null model = 6 with the objective function = 0.9
## df of the model are 2 and the objective function was 0
##
## The root mean square of the residuals (RMSR) is 0
## The df corrected root mean square of the residuals is 0
##
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## MR1
## Correlation of (regression) scores with factors 0.89
## Multiple R square of scores with factors 0.78
## Minimum correlation of possible factor scores 0.57
We can find the residuals by calculation
F <- f1$loadings
F
##
## Loadings:
## MR1
## V1 0.8
## V2 0.7
## V3 0.6
## V4 0.5
##
## MR1
## SS loadings 1.740
## Proportion Var 0.435
model <- F %*% t(F) #The factor model is that R = FF'
model #too many decimals
## V1 V2 V3 V4
## V1 0.6399993 0.5599976 0.4800018 0.4000001
## V2 0.5599976 0.4899963 0.4200002 0.3499989
## V3 0.4800018 0.4200002 0.3600031 0.3000015
## V4 0.4000001 0.3499989 0.3000015 0.2500003
round(model,2) #this looks better
## V1 V2 V3 V4
## V1 0.64 0.56 0.48 0.40
## V2 0.56 0.49 0.42 0.35
## V3 0.48 0.42 0.36 0.30
## V4 0.40 0.35 0.30 0.25
round(R - model,2) #show the resduals
## V1 V2 V3 V4
## V1 0.36 0.00 0.00 0.00
## V2 0.00 0.51 0.00 0.00
## V3 0.00 0.00 0.64 0.00
## V4 0.00 0.00 0.00 0.75
Or, we can ask for the residuals
resid(f1)
## V1 V2 V3 V4
## V1 0.36
## V2 0.00 0.51
## V3 0.00 0.00 0.64
## V4 0.00 0.00 0.00 0.75
Notice how the fit if great on the off diagonals but is not good on the diagonal. The complete factor model is that R = FF’ + U2 where the U2 is the difference is the diagonal of R - FF’. This U2 (uniqueness) is that amount of each variable that is not common with the other variables.
Factor analysis and its cousin,
principal components analysis
are variations on an
eigen value decomposition of a matrix.
It is possible to represent a correlation matrix R in terms of an orthogonal matrix (X) and a diagonal matrix \(\lambda\) such that
X X’ = I (the matrices are orthogonal)
and
R = X \(\lambda\) X’
We find the eigen values and vectors of a matrix by using the
eigen
function.
These eigen vectors
form what is known as the
basis space
of the correlation matrix and are just linear
composites of the correlations formed in such a way as to be
orthogonal
. That is, the eigen vectors when multiplied by
the their transpose equal the identity matrix.
When the eigen vectors are stretched by their eigen values, the
resulting matrix is the orginal matrix. We can find the eigen values and
vectors using the eigen
function.
ev <- eigen(R) #find the eigen values and eigen vectors
ev #show them (not very understandable)
## eigen() decomposition
## $values
## [1] 2.2679589 0.7178316 0.5863824 0.4278271
##
## $vectors
## [,1] [,2] [,3] [,4]
## [1,] -0.5511821 -0.1254972 0.2324911 0.7914522
## [2,] -0.5234358 -0.1890642 0.6029258 -0.5716206
## [3,] -0.4844868 -0.4327312 -0.7362121 -0.1897578
## [4,] -0.4329962 0.8724958 -0.2010477 -0.1041404
ev$vectors %*% t(ev$vectors) #this is an identity matrix, which means that the vectors are orthogonal
## [,1] [,2] [,3] [,4]
## [1,] 1.000000e+00 2.302473e-16 5.173414e-17 -4.536019e-19
## [2,] 2.302473e-16 1.000000e+00 -1.712048e-16 -4.471980e-16
## [3,] 5.173414e-17 -1.712048e-16 1.000000e+00 2.641260e-16
## [4,] -4.536019e-19 -4.471980e-16 2.641260e-16 1.000000e+00
ev$vectors %*% diag(ev$values) %*% t(ev$vectors) # this is what we started with
## [,1] [,2] [,3] [,4]
## [1,] 1.00 0.56 0.48 0.40
## [2,] 0.56 1.00 0.42 0.35
## [3,] 0.48 0.42 1.00 0.30
## [4,] 0.40 0.35 0.30 1.00
Principal components are just convenient ways of showing eigen vectors and eigen values.
C = X \(\sqrt{\lambda}\) and therefore CC’ = R
C = ev$vectors %*% diag(sqrt(ev$values))
C # show them
## [,1] [,2] [,3] [,4]
## [1,] -0.8300661 -0.1063275 0.1780315 0.51767699
## [2,] -0.7882810 -0.1601846 0.4616941 -0.37388846
## [3,] -0.7296247 -0.3666313 -0.5637589 -0.12411773
## [4,] -0.6520813 0.7392215 -0.1539535 -0.06811668
# or
pca(R,4,rotate="none")
## Principal Components Analysis
## Call: principal(r = r, nfactors = nfactors, residuals = residuals,
## rotate = rotate, n.obs = n.obs, covar = covar, scores = scores,
## missing = missing, impute = impute, oblique.scores = oblique.scores,
## method = method, use = use, cor = cor, correct = 0.5, weight = NULL)
## Standardized loadings (pattern matrix) based upon correlation matrix
## PC1 PC2 PC3 PC4 h2 u2 com
## V1 0.83 -0.11 -0.18 -0.52 1 -4.4e-16 1.8
## V2 0.79 -0.16 -0.46 0.37 1 4.4e-16 2.2
## V3 0.73 -0.37 0.56 0.12 1 2.2e-16 2.5
## V4 0.65 0.74 0.15 0.07 1 8.9e-16 2.1
##
## PC1 PC2 PC3 PC4
## SS loadings 2.27 0.72 0.59 0.43
## Proportion Var 0.57 0.18 0.15 0.11
## Cumulative Var 0.57 0.75 0.89 1.00
## Proportion Explained 0.57 0.18 0.15 0.11
## Cumulative Proportion 0.57 0.75 0.89 1.00
##
## Mean item complexity = 2.2
## Test of the hypothesis that 4 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0
##
## Fit based upon off diagonal values = 1
Thus, the default of the pca
function is to find the
first principal component
p1 <- pca(R)
p1 #show it
## Principal Components Analysis
## Call: principal(r = r, nfactors = nfactors, residuals = residuals,
## rotate = rotate, n.obs = n.obs, covar = covar, scores = scores,
## missing = missing, impute = impute, oblique.scores = oblique.scores,
## method = method, use = use, cor = cor, correct = 0.5, weight = NULL)
## Standardized loadings (pattern matrix) based upon correlation matrix
## PC1 h2 u2 com
## V1 0.83 0.69 0.31 1
## V2 0.79 0.62 0.38 1
## V3 0.73 0.53 0.47 1
## V4 0.65 0.43 0.57 1
##
## PC1
## SS loadings 2.27
## Proportion Var 0.57
##
## Mean item complexity = 1
## Test of the hypothesis that 1 component is sufficient.
##
## The root mean square of the residuals (RMSR) is 0.15
##
## Fit based upon off diagonal values = 0.88
resid(p1)
## V1 V2 V3 V4
## V1 0.31
## V2 -0.09 0.38
## V3 -0.13 -0.16 0.47
## V4 -0.14 -0.16 -0.18 0.57
The first principal component accounts for the most variance in the correlation matrix. The successive components account for progressively less.
The components model: R = CC’
or the factor model: R = FF’ + U^2
Both are simple eigen value problems, but the factor model is an eigen value solution of the reduced matrix, (the diagonals have been replaced with what FF’ would give.)
That is Factors are components of R - U^2.
The problem is finding the Uniqueness (U^2). This is where we do some iterative fitting.
R1 <- R - diag(c(.36, .51, .64, .75))
R1 #The reduced matrix
## V1 V2 V3 V4
## V1 0.64 0.56 0.48 0.40
## V2 0.56 0.49 0.42 0.35
## V3 0.48 0.42 0.36 0.30
## V4 0.40 0.35 0.30 0.25
ev1 <-eigen(R1) #find the eigen solution of R1
ev1 # show it
## eigen() decomposition
## $values
## [1] 1.740000e+00 4.440892e-16 -5.435927e-17 -7.831427e-17
##
## $vectors
## [,1] [,2] [,3] [,4]
## [1,] 0.6064784 0.7950999 0.0000000 0.0000000
## [2,] 0.5306686 -0.4047781 -0.5720776 -0.4767313
## [3,] 0.4548588 -0.3469527 0.8037255 -0.1635621
## [4,] 0.3790490 -0.2891272 -0.1635621 0.8636983
ev1$vectors %*% sqrt(diag(ev1$values)) #the loadings
## Warning in sqrt(diag(ev1$values)): NaNs produced
## [,1] [,2] [,3] [,4]
## [1,] 0.8 1.675548e-08 NaN NaN
## [2,] 0.7 -8.530062e-09 NaN NaN
## [3,] 0.6 -7.311481e-09 NaN NaN
## [4,] 0.5 -6.092901e-09 NaN NaN
We create a function which evaluates how well a model fits the data. (fit) and then run this iteratively to find a solution (the communalities), and then convert this to a factor matrix.
Because we want to generalize to more than one factor, we write the code with a parameter (n.factors) which we set to 1 for our first example.
.
fit <- function(R,h2, n.factors=1){
model <- R
diag(model) <- h2
ev <- eigen(model)
F1 <- ev$vectors[,1:n.factors] %*% diag(sqrt(ev$values[1:n.factors]),ncol=length(h2))
FF <- F1 %*% t(F1)
res <- R -FF
fit <- sum(res^2)/sum(R^2)
h2 <- diag(FF)
return(list(fit=fit,h2=h2))
}
iterative.fit <- function(R,h2,n.factors=1, ntrials=10) {
fits <- common <- list()
trial_i <- fit(R,h2, n.factors=n.factors)
for (i in 1:ntrials){
trial_i <- fit(R,trial_i$h2,n.factor=n.factors)
fits[[i]] <- trial_i$fit
common[[i]] <- trial_i$h2
}
return(list(h2 = trial_i$h2,fits=fits,communality=common)) #tracks our progress
}
#this next function just works for one factor
report.f <- function(R,h2,n.trials=40) {
h2 <-iterative.fit(R,h2,ntrials=n.trials)$h2
F <- sqrt(h2)
return(F)
}
#but this the generalization the works for as many factors as we want
report.multi <- function(R,n.factors=1,h2 = NULL,n.trials=40) {
if (is.null(h2)) h2 <- smc(R) #create the default values
#this next line does all the work
h2 <-iterative.fit(R,h2,ntrials=n.trials, n.factors=n.factors)$h2
#now, we have the best communality solutions, lets show the factors
diag(R) <- h2
ev <- eigen(R)
F <- ev$vectors[,1:n.factors] %*% diag(sqrt(ev$values[1:n.factors]),ncol=n.factors)
return(F)
}
First, we just show how the fit gets better over iterations when we
use iterative fit
R #show the correlation matrix
## V1 V2 V3 V4
## V1 1.00 0.56 0.48 0.40
## V2 0.56 1.00 0.42 0.35
## V3 0.48 0.42 1.00 0.30
## V4 0.40 0.35 0.30 1.00
temp <- iterative.fit(R,rep(1,NCOL(R)))
temp
## $h2
## [1] 0.6392198 0.4903916 0.3601200 0.2500535
##
## $fits
## $fits[[1]]
## [1] 0.1987778
##
## $fits[[2]]
## [1] 0.2145627
##
## $fits[[3]]
## [1] 0.2186953
##
## $fits[[4]]
## [1] 0.2197183
##
## $fits[[5]]
## [1] 0.2199894
##
## $fits[[6]]
## [1] 0.2200736
##
## $fits[[7]]
## [1] 0.2201065
##
## $fits[[8]]
## [1] 0.2201227
##
## $fits[[9]]
## [1] 0.2201321
##
## $fits[[10]]
## [1] 0.220138
##
##
## $communality
## $communality[[1]]
## [1] 0.6318596 0.5310441 0.4093544 0.2852405
##
## $communality[[2]]
## [1] 0.6253149 0.5064775 0.3757212 0.2566750
##
## $communality[[3]]
## [1] 0.6280253 0.4983401 0.3658070 0.2513934
##
## $communality[[4]]
## [1] 0.6314826 0.4948932 0.3625332 0.2504821
##
## $communality[[5]]
## [1] 0.6341853 0.4930877 0.3612844 0.2503003
##
## $communality[[6]]
## [1] 0.6360865 0.4920115 0.3607280 0.2502220
##
## $communality[[7]]
## [1] 0.6373799 0.4913282 0.3604430 0.2501621
##
## $communality[[8]]
## [1] 0.6382493 0.4908821 0.3602811 0.2501144
##
## $communality[[9]]
## [1] 0.6388312 0.4905874 0.3601825 0.2500788
##
## $communality[[10]]
## [1] 0.6392198 0.4903916 0.3601200 0.2500535
report.multi(R)
## [,1]
## [1,] 0.8
## [2,] 0.7
## [3,] 0.6
## [4,] 0.5
R2 <- lowerCor(bfi[1:10],show=FALSE)
report.multi(R2,2)
## [,1] [,2]
## [1,] 0.2618963 0.2854302
## [2,] -0.5519772 -0.3964413
## [3,] -0.5893908 -0.4653052
## [4,] -0.4724715 -0.1761617
## [5,] -0.5082833 -0.3403847
## [6,] -0.4182883 0.3636337
## [7,] -0.5106367 0.3735386
## [8,] -0.4688629 0.2923000
## [9,] 0.5344426 -0.3706416
## [10,] 0.5066513 -0.2883404
fa
The advantage of simulation is we know the right answer. However, lets look at real data as well.
We have a number of nice example sets. Use the help function to find them and read about them.
Try some of the examples in each one.
ability
dataset which has one
dimensionR.ab <- lowerCor(ability,show=FALSE)
round(report.multi(R.ab),2)
## [,1]
## [1,] -0.55
## [2,] -0.45
## [3,] -0.54
## [4,] -0.47
## [5,] -0.52
## [6,] -0.48
## [7,] -0.54
## [8,] -0.53
## [9,] -0.41
## [10,] -0.43
## [11,] -0.47
## [12,] -0.35
## [13,] -0.50
## [14,] -0.55
## [15,] -0.53
## [16,] -0.46
fa
function just the loadingsf.ab <- fa(R.ab)
round(f.ab$loadings,2)
##
## Loadings:
## MR1
## reason.4 0.55
## reason.16 0.45
## reason.17 0.54
## reason.19 0.47
## letter.7 0.52
## letter.33 0.48
## letter.34 0.54
## letter.58 0.53
## matrix.45 0.41
## matrix.46 0.43
## matrix.47 0.47
## matrix.55 0.35
## rotate.3 0.50
## rotate.4 0.55
## rotate.6 0.53
## rotate.8 0.46
##
## MR1
## SS loadings 3.832
## Proportion Var 0.240
bfi
has 5 dimensions, lets try our little function
on itR.bfi <- lowerCor(bfi,show=FALSE)
temp.5 <- report.multi(R.bfi,5)
round(temp.5,2)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.24 0.02 -0.10 -0.02 -0.46
## [2,] -0.47 -0.30 0.14 0.13 0.35
## [3,] -0.53 -0.31 0.22 0.09 0.25
## [4,] -0.40 -0.13 0.10 0.26 0.15
## [5,] -0.57 -0.20 0.24 0.03 0.15
## [6,] -0.32 -0.11 -0.44 0.12 -0.08
## [7,] -0.33 -0.19 -0.46 0.28 -0.06
## [8,] -0.31 -0.07 -0.34 0.31 -0.02
## [9,] 0.46 -0.12 0.42 -0.24 0.03
## [10,] 0.49 -0.14 0.25 -0.30 0.13
## [11,] 0.40 0.19 -0.27 0.13 0.19
## [12,] 0.60 0.04 -0.25 0.08 0.27
## [13,] -0.52 -0.32 0.12 -0.21 -0.12
## [14,] -0.58 -0.20 0.35 0.06 -0.22
## [15,] -0.51 -0.27 -0.10 -0.06 -0.21
## [16,] 0.43 -0.61 -0.04 0.05 -0.23
## [17,] 0.42 -0.61 -0.09 0.03 -0.17
## [18,] 0.41 -0.62 -0.05 0.04 -0.02
## [19,] 0.52 -0.40 -0.11 -0.05 0.20
## [20,] 0.34 -0.45 0.02 0.21 0.11
## [21,] -0.33 -0.18 -0.22 -0.37 -0.03
## [22,] 0.20 -0.10 0.29 0.32 -0.05
## [23,] -0.41 -0.26 -0.18 -0.45 0.01
## [24,] 0.06 -0.25 -0.22 -0.22 0.27
## [25,] 0.21 0.02 0.28 0.38 -0.15
## [26,] -0.09 -0.20 0.07 0.23 0.13
## [27,] -0.06 0.03 -0.08 -0.10 0.16
## [28,] -0.18 0.04 -0.07 0.01 0.20
factor.congruence
compares factor solutions.Lets compare our temp.5 with fa
output
f5 <- fa(R.bfi,5, rotate="none")
factor.congruence(f5,temp.5)
## [,1] [,2] [,3] [,4] [,5]
## MR1 -1 0 0 0 0
## MR2 0 -1 0 0 0
## MR3 0 0 -1 0 0
## MR4 0 0 0 1 0
## MR5 0 0 0 0 1
Factors as extracted are meant to account for the most variance in
each factor. But this does not necessarily lead to easily interpretable
solutions. Thus we take the factor loadings matrix and
rotate
it to another solution.
To show this, we look at the first 10 items of the bfi
and plot the loadings using fa.plot
.
R2 <- lowerCor(bfi[1:10],show=FALSE)
b2 <- report.multi(R2,2)
fa.plot(b2)
Hand rotate this solution by 45 degrees.
fa.plot(factor.rotate(b2,45))
fa
functionWe just showed how you can write your own factor analysis function
without “too much” work. This was done to demistify factor analysis and
to let you seen inside the process. But the fa
function has
many additional useful features that you should explore.
Input can be from raw data or correlation matrices. Factor extraction
can be one of many different methods. Rotations in the
GPArotation
package are done by default.
?bfi
?sai
?msqR
?spi
Remember, first describe
them first so you know what is
there. Don’t factor the coding data (e.g. ids, study, etc.)
No right answer
• Extracting factors until the χ2 of the residual matrix is not significant.
• Extracting factors until the change in χ2 from factor n to factor n+1 is not significant.
• Parallel Extracting factors until the eigenvalues of the real data are less than the corresponding eigenvalues of a random data set of the same size (parallel analysis)
• Plotting the magnitude of the successive eigenvalues and applying the scree test.
• Recent suggestion of dimensionality by del Giudice (2020) (to be implemented in psych)
• Extracting factors as long as they are interpretable.
• Using the Very Simple Structure Criterion (VSS)
• Using the Minimum Average Partial criterion (MAP).
With real data, we do not know ‘truth’
With simulated data, we can know the right answer
Lets examine the sim.item
function
• defaults to certain parameter values
• processes the data (make it up)
• returns values
sim.item #to show a function, just type the name
## function (nvar = 72, nsub = 500, circum = FALSE, xloading = 0.6,
## yloading = 0.6, gloading = 0, xbias = 0, ybias = 0, categorical = FALSE,
## low = -3, high = 3, truncate = FALSE, threshold = NULL)
## {
## avloading <- (xloading + yloading)/2
## errorweight <- sqrt(1 - (avloading^2 + gloading^2))
## g <- rnorm(nsub)
## truex <- rnorm(nsub) * xloading + xbias
## truey <- rnorm(nsub) * yloading + ybias
## if (!is.null(threshold)) {
## if (length(threshold) < nvar)
## threshold <- sample(threshold, nvar, replace = TRUE)
## }
## if (circum) {
## radia <- seq(0, 2 * pi, len = nvar + 1)
## rad <- radia[which(radia < 2 * pi)]
## }
## else rad <- c(rep(0, nvar/4), rep(pi/2, nvar/4), rep(pi,
## nvar/4), rep(3 * pi/2, nvar/4))
## error <- matrix(rnorm(nsub * (nvar)), nsub)
## trueitem <- outer(truex, cos(rad)) + outer(truey, sin(rad))
## item <- gloading * g + trueitem + errorweight * error
## if (categorical) {
## if (is.null(threshold)) {
## item = round(item)
## item[(item <= low)] <- low
## item[(item > high)] <- high
## }
## else {
## i <- 1:nvar
## item <- t(t(item[, i]) > threshold[i]) + 0
## }
## }
## colnames(item) <- paste("V", 1:nvar, sep = "")
## return(item)
## }
## <bytecode: 0x1073d0420>
## <environment: namespace:psych>
Simple structure versus Circumplex structure
set.seed(42)
my.data <- sim.item(12)
describe(my.data) #to see what they look like
## vars n mean sd median trimmed mad min max range skew kurtosis se
## V1 1 500 0.01 1.01 0.03 0.01 0.97 -3.26 3.44 6.69 0.05 0.13 0.04
## V2 2 500 0.00 1.00 0.00 0.00 1.05 -3.16 3.12 6.28 0.02 -0.14 0.04
## V3 3 500 -0.03 1.05 -0.04 -0.04 1.06 -2.95 2.80 5.76 0.10 -0.28 0.05
## V4 4 500 -0.05 0.99 -0.04 -0.04 0.96 -2.96 2.82 5.78 -0.06 0.10 0.04
## V5 5 500 -0.04 0.96 -0.05 -0.05 0.96 -2.76 2.71 5.47 0.07 -0.20 0.04
## V6 6 500 -0.05 1.04 -0.06 -0.06 1.03 -3.65 3.07 6.72 0.05 0.09 0.05
## V7 7 500 0.01 0.97 0.01 0.02 0.95 -3.45 3.53 6.98 -0.13 0.38 0.04
## V8 8 500 0.02 1.04 0.02 0.02 1.09 -2.94 3.95 6.89 0.06 0.05 0.05
## V9 9 500 0.04 1.04 -0.02 0.04 1.05 -3.07 2.78 5.85 0.04 -0.25 0.05
## V10 10 500 0.04 0.98 0.10 0.07 0.97 -2.75 2.47 5.22 -0.23 -0.15 0.04
## V11 11 500 -0.01 0.98 -0.03 -0.01 1.02 -3.31 2.88 6.19 -0.01 -0.02 0.04
## V12 12 500 -0.01 0.97 0.00 -0.01 0.91 -3.04 2.57 5.61 -0.08 0.15 0.04
R <- lowerCor(my.data) #save the correlation matrix
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
## V1 1.00
## V2 0.36 1.00
## V3 0.38 0.37 1.00
## V4 -0.01 -0.04 -0.01 1.00
## V5 0.05 -0.02 0.01 0.34 1.00
## V6 0.03 0.01 0.01 0.37 0.35 1.00
## V7 -0.35 -0.37 -0.38 -0.09 -0.01 -0.05 1.00
## V8 -0.40 -0.34 -0.39 0.00 0.08 0.11 0.34 1.00
## V9 -0.41 -0.36 -0.32 0.00 0.02 -0.03 0.32 0.39 1.00
## V10 0.06 0.07 0.01 -0.33 -0.32 -0.39 -0.04 -0.11 -0.06 1.00
## V11 0.02 0.03 0.05 -0.37 -0.35 -0.32 0.02 -0.12 -0.01 0.41 1.00
## V12 0.01 0.01 -0.11 -0.31 -0.30 -0.33 0.08 -0.02 0.00 0.36 0.39 1.00
corPlot(my.data) #to see it graphically
We can organize this correlation matrix by sorting it by a
factor structure
(see below).
We pass parameters to the corPlot
function to make
prettier output.
f2 <- fa(R,2) #do a default factor analysis
## Loading required namespace: GPArotation
R.sorted <- matSort(R,f2) #sort the matrix of correlations
corPlot(R.sorted ,cex=.5,main="Sorted by loadings") #make the text size smaller using the cex option.
f2 #show the output of the factor analysis
## Factor Analysis using method = minres
## Call: fa(r = R, nfactors = 2)
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 MR2 h2 u2 com
## V1 0.64 -0.02 0.41 0.59 1.0
## V2 0.59 0.02 0.35 0.65 1.0
## V3 0.61 -0.04 0.37 0.63 1.0
## V4 0.03 -0.58 0.34 0.66 1.0
## V5 0.01 -0.55 0.30 0.70 1.0
## V6 0.03 -0.60 0.36 0.64 1.0
## V7 -0.58 0.08 0.34 0.66 1.0
## V8 -0.62 -0.10 0.39 0.61 1.1
## V9 -0.59 0.00 0.35 0.65 1.0
## V10 0.07 0.61 0.38 0.62 1.0
## V11 0.03 0.63 0.39 0.61 1.0
## V12 -0.06 0.57 0.33 0.67 1.0
##
## MR1 MR2
## SS loadings 2.21 2.12
## Proportion Var 0.18 0.18
## Cumulative Var 0.18 0.36
## Proportion Explained 0.51 0.49
## Cumulative Proportion 0.51 1.00
##
## With factor correlations of
## MR1 MR2
## MR1 1.00 0.04
## MR2 0.04 1.00
##
## Mean item complexity = 1
## Test of the hypothesis that 2 factors are sufficient.
##
## df null model = 66 with the objective function = 2.52
## df of the model are 43 and the objective function was 0.11
##
## The root mean square of the residuals (RMSR) is 0.03
## The df corrected root mean square of the residuals is 0.03
##
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy
## MR1 MR2
## Correlation of (regression) scores with factors 0.88 0.88
## Multiple R square of scores with factors 0.78 0.77
## Minimum correlation of possible factor scores 0.56 0.53
Loadings are just the correlation of the factors with the variables.
Communalities (h^2) are how much variance for each variable is accounted for by all the factors. This is the sum of squared loadings row wise. (if the factors are uncorrelated)
Uniqueness is 1 - h^2. How much variance in each variable is not accounted for by the factors.
SS loadings is the sum of squared loadings (columnwise)
Complexity of an item is how well it is explained by the factors.
The various fit statistics compare the residual correlations to the original correlations.
The correlation of the factor score (estimates) with the factors is a
measure of factor determinancy
.
The variables were generated following a model known as
simple structure
. That is, every item was thought to
represent just one factor. Although this helps interpretability, it is
not necessarily correct. With real data, simple structure is an
approximation. We simulated data with simple structure, and so it fits
well.
We can show this simple structure in a factor diagram (fa.diagram) or a plot of the loadings (fa.plot)
fa.diagram(f2)
fa.plot(f2)
Note how the items have large loadings correlations with just one or the other of the two factors.
An alternative model from sim.item
is the
circumplex
model. Try this parameter setting to see what it
looks like.
Circumplex structurese are said to be present in affect (mood) data as well as in interpersonal data.
set.seed(42)
my.circ <- sim.item(12,circum=TRUE)
describe(my.circ) #to see what they look like
## vars n mean sd median trimmed mad min max range skew kurtosis se
## V1 1 500 0.01 1.01 0.03 0.01 0.97 -3.26 3.44 6.69 0.05 0.13 0.04
## V2 2 500 -0.01 0.98 0.01 -0.01 0.94 -3.80 2.89 6.69 -0.07 0.11 0.04
## V3 3 500 -0.04 1.05 -0.07 -0.06 1.01 -2.74 2.88 5.62 0.13 -0.14 0.05
## V4 4 500 -0.05 0.99 -0.04 -0.04 0.96 -2.96 2.82 5.78 -0.06 0.10 0.04
## V5 5 500 -0.03 0.97 -0.07 -0.04 1.01 -2.69 3.21 5.89 0.10 -0.11 0.04
## V6 6 500 -0.02 1.01 -0.04 -0.03 1.06 -3.08 3.33 6.42 0.11 0.04 0.05
## V7 7 500 0.01 0.97 0.01 0.02 0.95 -3.45 3.53 6.98 -0.13 0.38 0.04
## V8 8 500 0.03 1.01 0.01 0.02 1.04 -3.07 3.85 6.93 0.07 0.01 0.05
## V9 9 500 0.06 1.02 0.03 0.05 1.05 -2.67 3.29 5.95 0.12 -0.26 0.05
## V10 10 500 0.04 0.98 0.10 0.07 0.97 -2.75 2.47 5.22 -0.23 -0.15 0.04
## V11 11 500 -0.02 1.00 -0.04 -0.03 1.00 -3.37 3.11 6.48 0.07 0.01 0.04
## V12 12 500 -0.04 0.97 -0.08 -0.03 0.94 -4.17 2.61 6.78 -0.12 0.44 0.04
lowerCor(my.circ)
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
## V1 1.00
## V2 0.31 1.00
## V3 0.19 0.29 1.00
## V4 -0.01 0.13 0.26 1.00
## V5 -0.14 -0.04 0.15 0.27 1.00
## V6 -0.28 -0.17 -0.02 0.20 0.32 1.00
## V7 -0.35 -0.34 -0.23 -0.09 0.18 0.27 1.00
## V8 -0.35 -0.30 -0.29 -0.16 0.11 0.29 0.31 1.00
## V9 -0.22 -0.27 -0.30 -0.28 -0.13 0.00 0.17 0.29 1.00
## V10 0.06 -0.11 -0.29 -0.33 -0.28 -0.24 -0.04 0.07 0.25 1.00
## V11 0.20 0.03 -0.11 -0.31 -0.36 -0.29 -0.17 -0.13 0.16 0.37 1.00
## V12 0.34 0.21 -0.07 -0.12 -0.27 -0.31 -0.26 -0.21 -0.04 0.20 0.36 1.00
corPlot(my.circ, main="Circumplex structure", cex=.5)
In both of these simulations, the correlations were not very large. This is to simulate the structure of real personality items, which do not have large correlations.
Look at those two matrices (the simple and the circumplex). Can you see any patterning there?
Do we really need 12 dimensions (items), or can we account for then
with fewer dimensions. This the Number of factors
problem.
One way to see how many dimensions (factors) are in the data is to compare how well many alternative models fit.
Parallel analysis simulates the data with no factors and compares factor solutions to simulated (0 factor) data to the observed solution. When the random data solution is better than the real solution, you have gone too far.
fa.parallel(my.data)
## Parallel analysis suggests that the number of factors = 2 and the number of components = 2
Try a 2 factor solution. We use the fa
function, telling
it where the data are, and how many factors we want. We specify two
following the suggestion of fa.parallel
.
We then organize the output to show a cleaner structure.
f2 <- fa(my.data,nfactors=2)
f2
## Factor Analysis using method = minres
## Call: fa(r = my.data, nfactors = 2)
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 MR2 h2 u2 com
## V1 0.64 -0.02 0.41 0.59 1.0
## V2 0.59 0.02 0.35 0.65 1.0
## V3 0.61 -0.04 0.37 0.63 1.0
## V4 0.03 -0.58 0.34 0.66 1.0
## V5 0.01 -0.55 0.30 0.70 1.0
## V6 0.03 -0.60 0.36 0.64 1.0
## V7 -0.58 0.08 0.34 0.66 1.0
## V8 -0.62 -0.10 0.39 0.61 1.1
## V9 -0.59 0.00 0.35 0.65 1.0
## V10 0.07 0.61 0.38 0.62 1.0
## V11 0.03 0.63 0.39 0.61 1.0
## V12 -0.06 0.57 0.33 0.67 1.0
##
## MR1 MR2
## SS loadings 2.21 2.12
## Proportion Var 0.18 0.18
## Cumulative Var 0.18 0.36
## Proportion Explained 0.51 0.49
## Cumulative Proportion 0.51 1.00
##
## With factor correlations of
## MR1 MR2
## MR1 1.00 0.04
## MR2 0.04 1.00
##
## Mean item complexity = 1
## Test of the hypothesis that 2 factors are sufficient.
##
## df null model = 66 with the objective function = 2.52 with Chi Square = 1246.71
## df of the model are 43 and the objective function was 0.11
##
## The root mean square of the residuals (RMSR) is 0.03
## The df corrected root mean square of the residuals is 0.03
##
## The harmonic n.obs is 500 with the empirical chi square 43.95 with prob < 0.43
## The total n.obs was 500 with Likelihood Chi Square = 54.57 with prob < 0.11
##
## Tucker Lewis Index of factoring reliability = 0.985
## RMSEA index = 0.023 and the 90 % confidence intervals are 0 0.04
## BIC = -212.66
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy
## MR1 MR2
## Correlation of (regression) scores with factors 0.88 0.88
## Multiple R square of scores with factors 0.78 0.77
## Minimum correlation of possible factor scores 0.56 0.53
fa.sort(f2) #a cleaner output
## Factor Analysis using method = minres
## Call: fa(r = my.data, nfactors = 2)
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 MR2 h2 u2 com
## V1 0.64 -0.02 0.41 0.59 1.0
## V8 -0.62 -0.10 0.39 0.61 1.1
## V3 0.61 -0.04 0.37 0.63 1.0
## V2 0.59 0.02 0.35 0.65 1.0
## V9 -0.59 0.00 0.35 0.65 1.0
## V7 -0.58 0.08 0.34 0.66 1.0
## V11 0.03 0.63 0.39 0.61 1.0
## V10 0.07 0.61 0.38 0.62 1.0
## V6 0.03 -0.60 0.36 0.64 1.0
## V4 0.03 -0.58 0.34 0.66 1.0
## V12 -0.06 0.57 0.33 0.67 1.0
## V5 0.01 -0.55 0.30 0.70 1.0
##
## MR1 MR2
## SS loadings 2.21 2.12
## Proportion Var 0.18 0.18
## Cumulative Var 0.18 0.36
## Proportion Explained 0.51 0.49
## Cumulative Proportion 0.51 1.00
##
## With factor correlations of
## MR1 MR2
## MR1 1.00 0.04
## MR2 0.04 1.00
##
## Mean item complexity = 1
## Test of the hypothesis that 2 factors are sufficient.
##
## df null model = 66 with the objective function = 2.52 with Chi Square = 1246.71
## df of the model are 43 and the objective function was 0.11
##
## The root mean square of the residuals (RMSR) is 0.03
## The df corrected root mean square of the residuals is 0.03
##
## The harmonic n.obs is 500 with the empirical chi square 43.95 with prob < 0.43
## The total n.obs was 500 with Likelihood Chi Square = 54.57 with prob < 0.11
##
## Tucker Lewis Index of factoring reliability = 0.985
## RMSEA index = 0.023 and the 90 % confidence intervals are 0 0.04
## BIC = -212.66
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy
## MR1 MR2
## Correlation of (regression) scores with factors 0.88 0.88
## Multiple R square of scores with factors 0.78 0.77
## Minimum correlation of possible factor scores 0.56 0.53
fa.diagram(f2) #show the structure
corPlot(fa.sort(f2)) #this shows the loadings
fa.plot(f2) #This shows the loadings as items in the factor space
# we can also sort the correlation matrix based upon the factor loadings
R <- cor(my.data)
new.R <- matSort(R,f2)
corPlot(new.R, cex =.5)
Loadings are just the correlation of the factors with the variables.
Communalities (h^2) are how much variance for each variable is accounted for by all the factors. This is the sum of squared loadings row wise. (if the factors are uncorrelated)
Uniqueness is 1 - h^2. How much variance in each variable is not accounted for by the factors.
SS loadings is the sum of squared loadings (columnwise)
Complexity of an item is how well it is explained by the factors.
The various fit statistics compare the residual correlations to the original correlations.
The correlation of the factor score (estimates) with the factors is a
measure of factor determinancy
.
set.seed(47) #reset the seed for reproducibility
my.circ <- sim.item(12,circum=TRUE)
describe(my.circ)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## V1 1 500 -0.03 1.00 -0.05 -0.03 1.06 -2.51 2.59 5.09 0.06 -0.42 0.04
## V2 2 500 -0.06 0.98 0.02 -0.03 0.95 -3.57 3.33 6.90 -0.28 0.24 0.04
## V3 3 500 -0.13 1.01 -0.12 -0.15 1.10 -3.01 3.22 6.23 0.17 -0.07 0.05
## V4 4 500 -0.12 0.99 -0.16 -0.16 1.07 -2.43 3.50 5.93 0.36 0.08 0.04
## V5 5 500 -0.04 0.99 -0.05 -0.04 0.99 -2.93 3.93 6.86 0.16 0.38 0.04
## V6 6 500 -0.06 1.06 -0.06 -0.05 0.99 -3.09 3.46 6.56 -0.05 0.15 0.05
## V7 7 500 0.02 1.06 0.02 0.03 1.04 -3.41 3.51 6.92 -0.06 0.24 0.05
## V8 8 500 0.07 1.01 0.06 0.06 0.95 -2.73 3.44 6.17 0.19 0.25 0.05
## V9 9 500 0.06 1.04 0.09 0.05 1.01 -3.65 3.19 6.83 0.07 0.20 0.05
## V10 10 500 0.05 1.02 0.01 0.06 1.07 -4.29 2.76 7.05 -0.19 0.27 0.05
## V11 11 500 0.05 0.98 0.01 0.05 0.94 -3.08 3.26 6.34 0.04 0.01 0.04
## V12 12 500 0.01 1.03 0.06 0.01 1.02 -3.17 2.81 5.99 -0.03 -0.02 0.05
corPlot(my.circ)
fa.parallel(my.circ)
## Parallel analysis suggests that the number of factors = 2 and the number of components = 2
f2.circ <- fa(my.circ,2) #note that we know that the second parameter is nfactors
fa.sort(f2.circ)
## Factor Analysis using method = minres
## Call: fa(r = my.circ, nfactors = 2)
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 MR2 h2 u2 com
## V1 0.60 -0.22 0.42 0.58 1.3
## V2 0.59 0.15 0.37 0.63 1.1
## V8 -0.58 -0.09 0.34 0.66 1.0
## V7 -0.58 0.10 0.35 0.65 1.1
## V3 0.49 0.40 0.38 0.62 1.9
## V12 0.48 -0.40 0.40 0.60 1.9
## V9 -0.46 -0.37 0.33 0.67 1.9
## V5 -0.09 0.66 0.45 0.55 1.0
## V4 0.11 0.63 0.40 0.60 1.1
## V11 0.11 -0.58 0.35 0.65 1.1
## V10 -0.21 -0.54 0.33 0.67 1.3
## V6 -0.42 0.45 0.39 0.61 2.0
##
## MR1 MR2
## SS loadings 2.32 2.20
## Proportion Var 0.19 0.18
## Cumulative Var 0.19 0.38
## Proportion Explained 0.51 0.49
## Cumulative Proportion 0.51 1.00
##
## With factor correlations of
## MR1 MR2
## MR1 1.00 -0.04
## MR2 -0.04 1.00
##
## Mean item complexity = 1.4
## Test of the hypothesis that 2 factors are sufficient.
##
## df null model = 66 with the objective function = 2.73 with Chi Square = 1348.83
## df of the model are 43 and the objective function was 0.11
##
## The root mean square of the residuals (RMSR) is 0.03
## The df corrected root mean square of the residuals is 0.03
##
## The harmonic n.obs is 500 with the empirical chi square 42.45 with prob < 0.5
## The total n.obs was 500 with Likelihood Chi Square = 54.8 with prob < 0.11
##
## Tucker Lewis Index of factoring reliability = 0.986
## RMSEA index = 0.023 and the 90 % confidence intervals are 0 0.04
## BIC = -212.43
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy
## MR1 MR2
## Correlation of (regression) scores with factors 0.89 0.88
## Multiple R square of scores with factors 0.79 0.78
## Minimum correlation of possible factor scores 0.58 0.57
fa.diagram(f2.circ,main="Forcing simple structure") # by default, shows just one loading per item
fa.diagram(f2.circ, simple=FALSE,main="Allowing more complex structures" ) # allows more than one per item
corPlot(fa.sort(f2.circ))
fa.plot(f2.circ) #this shows the difference
par(mfrow=c(1,2)) #make a two column plot
fa.plot(f2,title="simple structure")
fa.plot(f2.circ,title = "circumplex structure")
par(mfrow=c(1,1)) #return to 1 column plots
Yes, mood.
f2 <- fa(msqR[1:72],2)
fa.plot(f2,labels=colnames(msqR)[1:72],title="2 dimensions of affect at the item level")
Do that graph again, but omit the points, just show the labels
fa.plot(f2,labels=colnames(msqR)[1:72],title="2 dimensions of affect at the item level",show.points=FALSE)
The ability data set has 1525 participants taking 16 ability items
describe(ability)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## reason.4 1 1442 0.68 0.47 1 0.72 0 0 1 1 -0.75 -1.44 0.01
## reason.16 2 1463 0.73 0.45 1 0.78 0 0 1 1 -1.02 -0.96 0.01
## reason.17 3 1440 0.74 0.44 1 0.80 0 0 1 1 -1.08 -0.84 0.01
## reason.19 4 1456 0.64 0.48 1 0.68 0 0 1 1 -0.60 -1.64 0.01
## letter.7 5 1441 0.63 0.48 1 0.67 0 0 1 1 -0.56 -1.69 0.01
## letter.33 6 1438 0.61 0.49 1 0.63 0 0 1 1 -0.43 -1.82 0.01
## letter.34 7 1455 0.64 0.48 1 0.68 0 0 1 1 -0.59 -1.65 0.01
## letter.58 8 1438 0.47 0.50 0 0.46 0 0 1 1 0.12 -1.99 0.01
## matrix.45 9 1458 0.55 0.50 1 0.56 0 0 1 1 -0.20 -1.96 0.01
## matrix.46 10 1470 0.57 0.50 1 0.59 0 0 1 1 -0.28 -1.92 0.01
## matrix.47 11 1465 0.64 0.48 1 0.67 0 0 1 1 -0.57 -1.67 0.01
## matrix.55 12 1459 0.39 0.49 0 0.36 0 0 1 1 0.45 -1.80 0.01
## rotate.3 13 1456 0.20 0.40 0 0.13 0 0 1 1 1.48 0.19 0.01
## rotate.4 14 1460 0.22 0.42 0 0.15 0 0 1 1 1.34 -0.21 0.01
## rotate.6 15 1456 0.31 0.46 0 0.27 0 0 1 1 0.80 -1.35 0.01
## rotate.8 16 1460 0.19 0.39 0 0.12 0 0 1 1 1.55 0.41 0.01
lowerCor(ability)
## rsn.4 rs.16 rs.17 rs.19 ltt.7 lt.33 lt.34 lt.58 mt.45 mt.46 mt.47 mt.55 rtt.3 rtt.4
## reason.4 1.00
## reason.16 0.28 1.00
## reason.17 0.40 0.32 1.00
## reason.19 0.30 0.25 0.34 1.00
## letter.7 0.28 0.27 0.29 0.25 1.00
## letter.33 0.23 0.20 0.26 0.25 0.34 1.00
## letter.34 0.29 0.26 0.29 0.27 0.40 0.37 1.00
## letter.58 0.29 0.21 0.29 0.25 0.33 0.28 0.32 1.00
## matrix.45 0.25 0.18 0.20 0.22 0.20 0.20 0.21 0.19 1.00
## matrix.46 0.25 0.18 0.24 0.18 0.24 0.23 0.27 0.21 0.33 1.00
## matrix.47 0.24 0.24 0.27 0.23 0.27 0.23 0.30 0.23 0.24 0.23 1.00
## matrix.55 0.16 0.15 0.16 0.15 0.14 0.17 0.14 0.23 0.21 0.14 0.21 1.00
## rotate.3 0.23 0.16 0.17 0.18 0.18 0.17 0.19 0.24 0.16 0.15 0.20 0.18 1.00
## rotate.4 0.25 0.20 0.20 0.21 0.23 0.21 0.21 0.27 0.17 0.17 0.20 0.18 0.53 1.00
## rotate.6 0.25 0.20 0.27 0.19 0.20 0.21 0.19 0.26 0.15 0.20 0.18 0.17 0.43 0.45
## rotate.8 0.21 0.16 0.18 0.16 0.13 0.14 0.15 0.22 0.16 0.15 0.17 0.19 0.43 0.44
## rtt.6 rtt.8
## rotate.6 1.00
## rotate.8 0.42 1.00
corPlot(ability)
fa.parallel(ability)
## Parallel analysis suggests that the number of factors = 4 and the number of components = 2
f1.ab <- fa(ability)
f1.ab
## Factor Analysis using method = minres
## Call: fa(r = ability)
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 h2 u2 com
## reason.4 0.55 0.30 0.70 1
## reason.16 0.45 0.20 0.80 1
## reason.17 0.54 0.29 0.71 1
## reason.19 0.47 0.22 0.78 1
## letter.7 0.52 0.27 0.73 1
## letter.33 0.48 0.23 0.77 1
## letter.34 0.54 0.29 0.71 1
## letter.58 0.53 0.28 0.72 1
## matrix.45 0.41 0.17 0.83 1
## matrix.46 0.43 0.18 0.82 1
## matrix.47 0.47 0.22 0.78 1
## matrix.55 0.35 0.12 0.88 1
## rotate.3 0.50 0.25 0.75 1
## rotate.4 0.55 0.30 0.70 1
## rotate.6 0.53 0.28 0.72 1
## rotate.8 0.46 0.21 0.79 1
##
## MR1
## SS loadings 3.81
## Proportion Var 0.24
##
## Mean item complexity = 1
## Test of the hypothesis that 1 factor is sufficient.
##
## df null model = 120 with the objective function = 3.28 with Chi Square = 4973.83
## df of the model are 104 and the objective function was 0.7
##
## The root mean square of the residuals (RMSR) is 0.07
## The df corrected root mean square of the residuals is 0.07
##
## The harmonic n.obs is 1426 with the empirical chi square 1476.62 with prob < 3e-241
## The total n.obs was 1525 with Likelihood Chi Square = 1063.25 with prob < 9.5e-159
##
## Tucker Lewis Index of factoring reliability = 0.772
## RMSEA index = 0.078 and the 90 % confidence intervals are 0.074 0.082
## BIC = 300.96
## Fit based upon off diagonal values = 0.93
## Measures of factor score adequacy
## MR1
## Correlation of (regression) scores with factors 0.91
## Multiple R square of scores with factors 0.84
## Minimum correlation of possible factor scores 0.67
f2.ab <- fa(ability,2)
f2.ab
## Factor Analysis using method = minres
## Call: fa(r = ability, nfactors = 2)
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 MR2 h2 u2 com
## reason.4 0.51 0.08 0.31 0.69 1.0
## reason.16 0.45 0.03 0.21 0.79 1.0
## reason.17 0.57 0.00 0.32 0.68 1.0
## reason.19 0.48 0.02 0.24 0.76 1.0
## letter.7 0.60 -0.05 0.33 0.67 1.0
## letter.33 0.52 -0.01 0.26 0.74 1.0
## letter.34 0.63 -0.07 0.35 0.65 1.0
## letter.58 0.45 0.12 0.28 0.72 1.2
## matrix.45 0.41 0.02 0.18 0.82 1.0
## matrix.46 0.45 0.01 0.20 0.80 1.0
## matrix.47 0.46 0.04 0.23 0.77 1.0
## matrix.55 0.24 0.15 0.12 0.88 1.7
## rotate.3 -0.03 0.72 0.49 0.51 1.0
## rotate.4 0.02 0.71 0.52 0.48 1.0
## rotate.6 0.09 0.59 0.40 0.60 1.0
## rotate.8 -0.03 0.65 0.40 0.60 1.0
##
## MR1 MR2
## SS loadings 2.96 1.91
## Proportion Var 0.19 0.12
## Cumulative Var 0.19 0.30
## Proportion Explained 0.61 0.39
## Cumulative Proportion 0.61 1.00
##
## With factor correlations of
## MR1 MR2
## MR1 1.00 0.52
## MR2 0.52 1.00
##
## Mean item complexity = 1.1
## Test of the hypothesis that 2 factors are sufficient.
##
## df null model = 120 with the objective function = 3.28 with Chi Square = 4973.83
## df of the model are 89 and the objective function was 0.18
##
## The root mean square of the residuals (RMSR) is 0.03
## The df corrected root mean square of the residuals is 0.03
##
## The harmonic n.obs is 1426 with the empirical chi square 288.87 with prob < 5.9e-23
## The total n.obs was 1525 with Likelihood Chi Square = 277.64 with prob < 2.9e-21
##
## Tucker Lewis Index of factoring reliability = 0.948
## RMSEA index = 0.037 and the 90 % confidence intervals are 0.032 0.042
## BIC = -374.71
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy
## MR1 MR2
## Correlation of (regression) scores with factors 0.90 0.89
## Multiple R square of scores with factors 0.81 0.79
## Minimum correlation of possible factor scores 0.63 0.58
fa.diagram(f2.ab)
f4 <- fa(ability,nfactors=4) #specify that we want four factors
f4 #show the solution
## Factor Analysis using method = minres
## Call: fa(r = ability, nfactors = 4)
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR2 MR1 MR4 MR3 h2 u2 com
## reason.4 0.09 0.07 0.44 0.10 0.35 0.65 1.3
## reason.16 0.05 0.14 0.34 0.03 0.23 0.77 1.4
## reason.17 -0.02 0.00 0.73 -0.02 0.51 0.49 1.0
## reason.19 0.04 0.16 0.33 0.06 0.25 0.75 1.5
## letter.7 0.00 0.61 0.04 -0.01 0.39 0.61 1.0
## letter.33 0.03 0.53 -0.01 0.03 0.31 0.69 1.0
## letter.34 -0.02 0.66 0.00 0.00 0.43 0.57 1.0
## letter.58 0.16 0.34 0.12 0.03 0.28 0.72 1.7
## matrix.45 -0.01 -0.01 -0.01 0.77 0.57 0.43 1.0
## matrix.46 0.03 0.20 0.07 0.31 0.24 0.76 1.8
## matrix.47 0.07 0.25 0.13 0.15 0.23 0.77 2.5
## matrix.55 0.16 0.08 0.06 0.17 0.13 0.87 2.6
## rotate.3 0.72 0.02 -0.06 0.01 0.50 0.50 1.0
## rotate.4 0.72 0.08 -0.04 -0.02 0.53 0.47 1.0
## rotate.6 0.58 -0.02 0.14 -0.02 0.41 0.59 1.1
## rotate.8 0.64 -0.10 0.05 0.04 0.41 0.59 1.1
##
## MR2 MR1 MR4 MR3
## SS loadings 1.98 1.63 1.27 0.89
## Proportion Var 0.12 0.10 0.08 0.06
## Cumulative Var 0.12 0.23 0.30 0.36
## Proportion Explained 0.34 0.28 0.22 0.15
## Cumulative Proportion 0.34 0.63 0.85 1.00
##
## With factor correlations of
## MR2 MR1 MR4 MR3
## MR2 1.00 0.43 0.42 0.31
## MR1 0.43 1.00 0.64 0.44
## MR4 0.42 0.64 1.00 0.41
## MR3 0.31 0.44 0.41 1.00
##
## Mean item complexity = 1.4
## Test of the hypothesis that 4 factors are sufficient.
##
## df null model = 120 with the objective function = 3.28 with Chi Square = 4973.83
## df of the model are 62 and the objective function was 0.05
##
## The root mean square of the residuals (RMSR) is 0.01
## The df corrected root mean square of the residuals is 0.02
##
## The harmonic n.obs is 1426 with the empirical chi square 63.75 with prob < 0.41
## The total n.obs was 1525 with Likelihood Chi Square = 70.96 with prob < 0.2
##
## Tucker Lewis Index of factoring reliability = 0.996
## RMSEA index = 0.01 and the 90 % confidence intervals are 0 0.019
## BIC = -383.48
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## MR2 MR1 MR4 MR3
## Correlation of (regression) scores with factors 0.89 0.86 0.85 0.81
## Multiple R square of scores with factors 0.79 0.75 0.71 0.65
## Minimum correlation of possible factor scores 0.58 0.49 0.43 0.31
fa.diagram(f4) #notice the high correlations
fa.plot(f4) #yet another way of drawing it
One function (already discussed) is the omega
function
It factors the variables, and then factors the correlations of the factors.
om <- omega(ability,4) #it defaults to 3, we specify we want 4
omega.diagram(om, sl=FALSE) #two ways of drawing the results
describe(bfi)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## A1 1 2784 2.41 1.41 2 2.23 1.48 1 6 5 0.83 -0.31 0.03
## A2 2 2773 4.80 1.17 5 4.98 1.48 1 6 5 -1.12 1.05 0.02
## A3 3 2774 4.60 1.30 5 4.79 1.48 1 6 5 -1.00 0.44 0.02
## A4 4 2781 4.70 1.48 5 4.93 1.48 1 6 5 -1.03 0.04 0.03
## A5 5 2784 4.56 1.26 5 4.71 1.48 1 6 5 -0.85 0.16 0.02
## C1 6 2779 4.50 1.24 5 4.64 1.48 1 6 5 -0.85 0.30 0.02
## C2 7 2776 4.37 1.32 5 4.50 1.48 1 6 5 -0.74 -0.14 0.03
## C3 8 2780 4.30 1.29 5 4.42 1.48 1 6 5 -0.69 -0.13 0.02
## C4 9 2774 2.55 1.38 2 2.41 1.48 1 6 5 0.60 -0.62 0.03
## C5 10 2784 3.30 1.63 3 3.25 1.48 1 6 5 0.07 -1.22 0.03
## E1 11 2777 2.97 1.63 3 2.86 1.48 1 6 5 0.37 -1.09 0.03
## E2 12 2784 3.14 1.61 3 3.06 1.48 1 6 5 0.22 -1.15 0.03
## E3 13 2775 4.00 1.35 4 4.07 1.48 1 6 5 -0.47 -0.47 0.03
## E4 14 2791 4.42 1.46 5 4.59 1.48 1 6 5 -0.82 -0.30 0.03
## E5 15 2779 4.42 1.33 5 4.56 1.48 1 6 5 -0.78 -0.09 0.03
## N1 16 2778 2.93 1.57 3 2.82 1.48 1 6 5 0.37 -1.01 0.03
## N2 17 2779 3.51 1.53 4 3.51 1.48 1 6 5 -0.08 -1.05 0.03
## N3 18 2789 3.22 1.60 3 3.16 1.48 1 6 5 0.15 -1.18 0.03
## N4 19 2764 3.19 1.57 3 3.12 1.48 1 6 5 0.20 -1.09 0.03
## N5 20 2771 2.97 1.62 3 2.85 1.48 1 6 5 0.37 -1.06 0.03
## O1 21 2778 4.82 1.13 5 4.96 1.48 1 6 5 -0.90 0.43 0.02
## O2 22 2800 2.71 1.57 2 2.56 1.48 1 6 5 0.59 -0.81 0.03
## O3 23 2772 4.44 1.22 5 4.56 1.48 1 6 5 -0.77 0.30 0.02
## O4 24 2786 4.89 1.22 5 5.10 1.48 1 6 5 -1.22 1.08 0.02
## O5 25 2780 2.49 1.33 2 2.34 1.48 1 6 5 0.74 -0.24 0.03
## gender 26 2800 1.67 0.47 2 1.71 0.00 1 2 1 -0.73 -1.47 0.01
## education 27 2577 3.19 1.11 3 3.22 1.48 1 5 4 -0.05 -0.32 0.02
## age 28 2800 28.78 11.13 26 27.43 10.38 3 86 83 1.02 0.56 0.21
corPlot(bfi)
fa.parallel(bfi) #how many factors do we think are there?
## Parallel analysis suggests that the number of factors = 7 and the number of components = 7
f5 <- fa(bfi,5)
fa.lookup(f5,dictionary=bfi.dictionary[1:2])
## MR2 MR1 MR3 MR5 MR4 h2 com ItemLabel
## N1 0.78 0.09 0.00 -0.12 -0.03 0.61 1.08 q_952
## N2 0.76 0.03 0.01 -0.09 0.02 0.58 1.03 q_974
## N3 0.73 -0.06 -0.03 0.06 0.02 0.56 1.03 q_1099
## N5 0.53 -0.15 0.01 0.21 -0.17 0.37 1.70 q_1505
## N4 0.50 -0.36 -0.13 0.10 0.09 0.48 2.15 q_1479
## E4 0.00 0.67 0.02 0.18 -0.07 0.55 1.17 q_1410
## E2 0.15 -0.64 -0.02 -0.03 -0.06 0.51 1.13 q_901
## E1 -0.03 -0.54 0.10 -0.07 -0.09 0.32 1.17 q_712
## E3 0.08 0.49 0.00 0.16 0.29 0.45 1.93 q_1205
## E5 0.12 0.41 0.27 0.03 0.22 0.39 2.58 q_1768
## C2 0.16 -0.05 0.66 0.04 0.04 0.44 1.15 q_530
## C4 0.18 0.06 -0.62 -0.01 -0.04 0.46 1.20 q_626
## C3 0.04 -0.05 0.56 0.08 -0.06 0.31 1.09 q_619
## C5 0.20 -0.12 -0.56 0.01 0.10 0.43 1.43 q_1949
## C1 0.07 -0.02 0.55 -0.04 0.15 0.33 1.21 q_124
## A2 0.01 0.07 0.08 0.63 0.03 0.47 1.07 q_1162
## A3 0.00 0.23 0.03 0.58 0.03 0.49 1.31 q_1206
## A1 0.20 0.19 0.07 -0.51 -0.05 0.28 1.67 q_146
## A5 -0.10 0.32 0.01 0.46 0.05 0.45 1.93 q_1419
## A4 -0.04 0.13 0.20 0.40 -0.14 0.28 2.03 q_1364
## gender 0.14 0.00 0.11 0.29 -0.16 0.12 2.42 gender
## age -0.15 -0.12 0.08 0.19 0.08 0.08 3.49 age
## O3 0.03 0.17 0.02 0.05 0.62 0.47 1.18 q_492
## O1 0.01 0.13 0.07 -0.03 0.52 0.33 1.17 q_128
## O5 0.13 0.14 -0.03 -0.02 -0.52 0.29 1.28 q_1964
## O2 0.20 0.11 -0.08 0.10 -0.44 0.24 1.77 q_316
## O4 0.16 -0.28 -0.02 0.18 0.37 0.24 2.83 q_1738
## education -0.10 -0.14 -0.01 0.10 0.15 0.05 3.47 education
## Item
## N1 Get angry easily.
## N2 Get irritated easily.
## N3 Have frequent mood swings.
## N5 Panic easily.
## N4 Often feel blue.
## E4 Make friends easily.
## E2 Find it difficult to approach others.
## E1 Don't talk a lot.
## E3 Know how to captivate people.
## E5 Take charge.
## C2 Continue until everything is perfect.
## C4 Do things in a half-way manner.
## C3 Do things according to a plan.
## C5 Waste my time.
## C1 Am exacting in my work.
## A2 Inquire about others' well-being.
## A3 Know how to comfort others.
## A1 Am indifferent to the feelings of others.
## A5 Make people feel at ease.
## A4 Love children.
## gender males=1, females=2
## age age in years
## O3 Carry the conversation to a higher level.
## O1 Am full of ideas.
## O5 Will not probe deeply into a subject.
## O2 Avoid difficult reading material.
## O4 Spend time reflecting on things.
## education in HS, fin HS, coll, coll grad , grad deg
fa.diagram(f5)
Do the analysis without including age, gender and education as variables
describe(bfi)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## A1 1 2784 2.41 1.41 2 2.23 1.48 1 6 5 0.83 -0.31 0.03
## A2 2 2773 4.80 1.17 5 4.98 1.48 1 6 5 -1.12 1.05 0.02
## A3 3 2774 4.60 1.30 5 4.79 1.48 1 6 5 -1.00 0.44 0.02
## A4 4 2781 4.70 1.48 5 4.93 1.48 1 6 5 -1.03 0.04 0.03
## A5 5 2784 4.56 1.26 5 4.71 1.48 1 6 5 -0.85 0.16 0.02
## C1 6 2779 4.50 1.24 5 4.64 1.48 1 6 5 -0.85 0.30 0.02
## C2 7 2776 4.37 1.32 5 4.50 1.48 1 6 5 -0.74 -0.14 0.03
## C3 8 2780 4.30 1.29 5 4.42 1.48 1 6 5 -0.69 -0.13 0.02
## C4 9 2774 2.55 1.38 2 2.41 1.48 1 6 5 0.60 -0.62 0.03
## C5 10 2784 3.30 1.63 3 3.25 1.48 1 6 5 0.07 -1.22 0.03
## E1 11 2777 2.97 1.63 3 2.86 1.48 1 6 5 0.37 -1.09 0.03
## E2 12 2784 3.14 1.61 3 3.06 1.48 1 6 5 0.22 -1.15 0.03
## E3 13 2775 4.00 1.35 4 4.07 1.48 1 6 5 -0.47 -0.47 0.03
## E4 14 2791 4.42 1.46 5 4.59 1.48 1 6 5 -0.82 -0.30 0.03
## E5 15 2779 4.42 1.33 5 4.56 1.48 1 6 5 -0.78 -0.09 0.03
## N1 16 2778 2.93 1.57 3 2.82 1.48 1 6 5 0.37 -1.01 0.03
## N2 17 2779 3.51 1.53 4 3.51 1.48 1 6 5 -0.08 -1.05 0.03
## N3 18 2789 3.22 1.60 3 3.16 1.48 1 6 5 0.15 -1.18 0.03
## N4 19 2764 3.19 1.57 3 3.12 1.48 1 6 5 0.20 -1.09 0.03
## N5 20 2771 2.97 1.62 3 2.85 1.48 1 6 5 0.37 -1.06 0.03
## O1 21 2778 4.82 1.13 5 4.96 1.48 1 6 5 -0.90 0.43 0.02
## O2 22 2800 2.71 1.57 2 2.56 1.48 1 6 5 0.59 -0.81 0.03
## O3 23 2772 4.44 1.22 5 4.56 1.48 1 6 5 -0.77 0.30 0.02
## O4 24 2786 4.89 1.22 5 5.10 1.48 1 6 5 -1.22 1.08 0.02
## O5 25 2780 2.49 1.33 2 2.34 1.48 1 6 5 0.74 -0.24 0.03
## gender 26 2800 1.67 0.47 2 1.71 0.00 1 2 1 -0.73 -1.47 0.01
## education 27 2577 3.19 1.11 3 3.22 1.48 1 5 4 -0.05 -0.32 0.02
## age 28 2800 28.78 11.13 26 27.43 10.38 3 86 83 1.02 0.56 0.21
corPlot(bfi[1:25])
fa.parallel(bfi[1:25]) #how many factors do we think are there?
## Parallel analysis suggests that the number of factors = 6 and the number of components = 6
f5 <- fa(bfi[1:25],5)
fa.lookup(f5,dictionary=bfi.dictionary[1:2])
## MR2 MR1 MR3 MR5 MR4 h2 com ItemLabel Item
## N1 0.81 0.10 0.00 -0.11 -0.05 0.65 1.08 q_952 Get angry easily.
## N2 0.78 0.04 0.01 -0.09 0.01 0.60 1.04 q_974 Get irritated easily.
## N3 0.71 -0.10 -0.04 0.08 0.02 0.55 1.07 q_1099 Have frequent mood swings.
## N5 0.49 -0.20 0.00 0.21 -0.15 0.35 1.96 q_1505 Panic easily.
## N4 0.47 -0.39 -0.14 0.09 0.08 0.49 2.27 q_1479 Often feel blue.
## E2 0.10 -0.68 -0.02 -0.05 -0.06 0.54 1.07 q_901 Find it difficult to approach others.
## E4 0.01 0.59 0.02 0.29 -0.08 0.53 1.49 q_1410 Make friends easily.
## E1 -0.06 -0.56 0.11 -0.08 -0.10 0.35 1.21 q_712 Don't talk a lot.
## E5 0.15 0.42 0.27 0.05 0.21 0.40 2.60 q_1768 Take charge.
## E3 0.08 0.42 0.00 0.25 0.28 0.44 2.55 q_1205 Know how to captivate people.
## C2 0.15 -0.09 0.67 0.08 0.04 0.45 1.17 q_530 Continue until everything is perfect.
## C4 0.17 0.00 -0.61 0.04 -0.05 0.45 1.18 q_626 Do things in a half-way manner.
## C3 0.03 -0.06 0.57 0.09 -0.07 0.32 1.11 q_619 Do things according to a plan.
## C5 0.19 -0.14 -0.55 0.02 0.09 0.43 1.44 q_1949 Waste my time.
## C1 0.07 -0.03 0.55 -0.02 0.15 0.33 1.19 q_124 Am exacting in my work.
## A3 -0.03 0.12 0.02 0.66 0.03 0.52 1.07 q_1206 Know how to comfort others.
## A2 -0.02 0.00 0.08 0.64 0.03 0.45 1.04 q_1162 Inquire about others' well-being.
## A5 -0.11 0.23 0.01 0.53 0.04 0.46 1.49 q_1419 Make people feel at ease.
## A4 -0.06 0.06 0.19 0.43 -0.15 0.28 1.74 q_1364 Love children.
## A1 0.21 0.17 0.07 -0.41 -0.06 0.19 1.97 q_146 Am indifferent to the feelings of others.
## O3 0.03 0.15 0.02 0.08 0.61 0.46 1.17 q_492 Carry the conversation to a higher level.
## O5 0.13 0.10 -0.03 0.04 -0.54 0.30 1.21 q_1964 Will not probe deeply into a subject.
## O1 0.02 0.10 0.07 0.02 0.51 0.31 1.13 q_128 Am full of ideas.
## O2 0.19 0.06 -0.08 0.16 -0.46 0.26 1.75 q_316 Avoid difficult reading material.
## O4 0.13 -0.32 -0.02 0.17 0.37 0.25 2.69 q_1738 Spend time reflecting on things.
fa.diagram(f5)
#but parallel analysis suggests six factors
f6 <- fa(bfi[1:25],6)
fa.lookup(f6,dictionary=bfi.dictionary[1:2])
## MR2 MR1 MR3 MR5 MR4 MR6 h2 com ItemLabel
## N1 0.84 -0.10 0.01 -0.07 -0.05 0.00 0.68 1.05 q_952
## N2 0.83 -0.06 0.02 -0.04 -0.01 -0.07 0.66 1.03 q_974
## N3 0.67 0.13 -0.03 0.07 0.05 0.08 0.54 1.15 q_1099
## N5 0.44 0.23 0.00 0.18 -0.10 0.16 0.35 2.37 q_1505
## N4 0.43 0.42 -0.13 0.08 0.10 0.05 0.49 2.40 q_1479
## E2 0.05 0.70 -0.02 -0.07 -0.06 0.03 0.56 1.05 q_901
## E1 -0.13 0.59 0.11 -0.12 -0.08 0.09 0.39 1.35 q_712
## E4 -0.05 -0.53 0.03 0.20 0.04 0.29 0.55 1.90 q_1410
## E5 0.15 -0.40 0.27 0.05 0.23 0.01 0.40 2.84 q_1768
## C2 0.07 0.14 0.67 0.02 0.11 0.17 0.49 1.31 q_530
## C4 0.06 0.09 -0.64 -0.06 0.07 0.30 0.57 1.53 q_626
## C3 0.01 0.06 0.55 0.08 -0.04 0.06 0.31 1.11 q_619
## C5 0.15 0.18 -0.54 -0.01 0.11 0.05 0.43 1.50 q_1949
## C1 0.01 0.06 0.54 -0.06 0.19 0.07 0.34 1.34 q_124
## A2 0.04 -0.04 0.08 0.68 0.00 -0.05 0.50 1.05 q_1162
## A3 -0.02 -0.12 0.03 0.61 0.07 0.11 0.51 1.17 q_1206
## A1 0.09 -0.09 0.08 -0.56 0.06 0.30 0.34 1.73 q_146
## A5 -0.16 -0.19 0.01 0.45 0.13 0.21 0.47 2.35 q_1419
## A4 -0.07 -0.06 0.19 0.39 -0.10 0.15 0.28 2.10 q_1364
## O3 -0.02 -0.09 0.02 0.03 0.65 -0.02 0.48 1.05 q_492
## O1 -0.05 -0.04 0.07 -0.05 0.57 0.03 0.35 1.08 q_128
## O5 0.03 -0.04 -0.02 -0.04 -0.45 0.41 0.37 2.04 q_1964
## E3 0.00 -0.34 0.00 0.15 0.40 0.21 0.48 2.85 q_1205
## O4 0.09 0.35 -0.02 0.15 0.37 -0.04 0.25 2.47 q_1738
## O2 0.11 0.00 -0.07 0.09 -0.36 0.36 0.29 2.40 q_316
## Item
## N1 Get angry easily.
## N2 Get irritated easily.
## N3 Have frequent mood swings.
## N5 Panic easily.
## N4 Often feel blue.
## E2 Find it difficult to approach others.
## E1 Don't talk a lot.
## E4 Make friends easily.
## E5 Take charge.
## C2 Continue until everything is perfect.
## C4 Do things in a half-way manner.
## C3 Do things according to a plan.
## C5 Waste my time.
## C1 Am exacting in my work.
## A2 Inquire about others' well-being.
## A3 Know how to comfort others.
## A1 Am indifferent to the feelings of others.
## A5 Make people feel at ease.
## A4 Love children.
## O3 Carry the conversation to a higher level.
## O1 Am full of ideas.
## O5 Will not probe deeply into a subject.
## E3 Know how to captivate people.
## O4 Spend time reflecting on things.
## O2 Avoid difficult reading material.
fa.diagram(f6)
An example of cluster analysis devised for forming scales is
iclust
(for item cluster analysis). iclust was developed
specifically to form homogenous scales. See the help page for
iclust.
1 Find the correlations.
2 Find the most similar pair of items.
3 Combine those two variables to make a new, composite variable.
4 Repeat steps 2 - 3 until some criterion is reached
5 What criterion? a) alpha does not increase b) beta (the worst split half) does not increase
iclust(ability)
## ICLUST (Item Cluster Analysis)
## Call: iclust(r.mat = ability)
##
## Purified Alpha:
## [1] 0.83
##
## G6* reliability:
## [1] 1
##
## Original Beta:
## [1] 0.68
##
## Cluster size:
## [1] 16
##
## Item by Cluster Structure matrix:
## [,1]
## reason.4 0.54
## reason.16 0.44
## reason.17 0.54
## reason.19 0.47
## letter.7 0.52
## letter.33 0.48
## letter.34 0.54
## letter.58 0.52
## matrix.45 0.42
## matrix.46 0.43
## matrix.47 0.47
## matrix.55 0.35
## rotate.3 0.51
## rotate.4 0.56
## rotate.6 0.53
## rotate.8 0.47
##
## With Sums of squares of:
## [1] 3.8
##
## Purified scale intercorrelations
## reliabilities on diagonal
## correlations corrected for attenuation above diagonal:
## [,1]
## [1,] 0.83
##
## Cluster fit = 0.66 Pattern fit = 0.97 RMSR = 0.07
ic <- iclust(bfi[1:25]) #we want to keep these results
fa.lookup(ic,dictionary=bfi.dictionary[1:2])
## C20 C16 C15 C21 ItemLabel Item
## E4 0.66 -0.22 -0.22 -0.06 q_1410 Make friends easily.
## A3 0.65 -0.11 -0.22 -0.19 q_1206 Know how to comfort others.
## A5 0.65 -0.23 -0.23 -0.19 q_1419 Make people feel at ease.
## E2 -0.61 0.34 0.24 0.15 q_901 Find it difficult to approach others.
## E3 0.59 -0.10 -0.20 -0.39 q_1205 Know how to captivate people.
## A2 0.59 -0.07 -0.23 -0.19 q_1162 Inquire about others' well-being.
## E5 0.50 -0.10 -0.40 -0.32 q_1768 Take charge.
## E1 -0.50 0.11 0.06 0.16 q_712 Don't talk a lot.
## A4 0.43 -0.14 -0.29 -0.01 q_1364 Love children.
## A1 -0.27 0.14 0.05 0.13 q_146 Am indifferent to the feelings of others.
## N1 -0.18 0.76 0.20 0.11 q_952 Get angry easily.
## N2 -0.18 0.75 0.18 0.05 q_974 Get irritated easily.
## N3 -0.15 0.74 0.19 0.02 q_1099 Have frequent mood swings.
## N4 -0.34 0.62 0.29 0.01 q_1479 Often feel blue.
## N5 -0.13 0.55 0.13 0.17 q_1505 Panic easily.
## C4 -0.26 0.31 0.66 0.21 q_626 Do things in a half-way manner.
## C2 0.22 -0.01 -0.62 -0.21 q_530 Continue until everything is perfect.
## C5 -0.30 0.36 0.59 0.09 q_1949 Waste my time.
## C1 0.18 -0.08 -0.54 -0.29 q_124 Am exacting in my work.
## C3 0.20 -0.09 -0.54 -0.08 q_619 Do things according to a plan.
## O3 0.39 -0.07 -0.20 -0.62 q_492 Carry the conversation to a higher level.
## O5 -0.12 0.11 0.15 0.53 q_1964 Will not probe deeply into a subject.
## O1 0.28 -0.09 -0.20 -0.53 q_128 Am full of ideas.
## O2 -0.04 0.19 0.18 0.44 q_316 Avoid difficult reading material.
## O4 -0.02 0.21 0.00 -0.33 q_1738 Spend time reflecting on things.
f5 <- fa(bfi[1:25],5)
factor.congruence(f5,ic)
## C20 C16 C15 C21
## MR2 -0.25 0.92 0.30 0.14
## MR1 0.72 -0.40 -0.35 -0.28
## MR3 0.37 -0.28 -0.89 -0.37
## MR5 0.71 -0.14 -0.33 -0.30
## MR4 0.32 -0.12 -0.28 -0.92
Factor congruence was a statistic developed by Burt (1937, 1948) but also by Tucker. It is the cosine of the factor vectors. It is functionally cosine of the angle between the vectors. It is not the correlation of the vectors (which become zero centered.)
cross <- t(y) %*% x
sumsx <- sqrt(1/diag(t(x) %*% x))
sumsy <- sqrt(1/diag(t(y) %*% y))
congruence <- cross * sumsqx * sumsqy
Compare the correlations with the congruences:
factor.congruence(f5,ic)
## C20 C16 C15 C21
## MR2 -0.25 0.92 0.30 0.14
## MR1 0.72 -0.40 -0.35 -0.28
## MR3 0.37 -0.28 -0.89 -0.37
## MR5 0.71 -0.14 -0.33 -0.30
## MR4 0.32 -0.12 -0.28 -0.92
round(cor(f5$loadings,ic$loadings),2)
## C20 C16 C15 C21
## MR2 -0.48 0.92 0.46 0.32
## MR1 0.74 -0.45 -0.36 -0.29
## MR3 0.35 -0.40 -0.89 -0.35
## MR5 0.71 -0.44 -0.31 -0.25
## MR4 0.31 -0.19 -0.27 -0.92