\documentclass[11pt,notitlepage]{report}
\usepackage{geometry}                % See geometry.pdf to learn the layout options. There are lots.
\geometry{letterpaper}                   % ... or a4paper or a5paper or ... 
%\geometry{landscape}                % Activate for for rotated page geometry
\usepackage[parfill]{parskip}    % Activate to begin paragraphs with an empty line rather than an indent
\usepackage[ae,hyper]{Rd}
\usepackage{graphicx}
%\usepackage{amssymb}
\usepackage{epstopdf}
%\usepackage[ae,hyper]{Rd}

\DeclareGraphicsRule{.tif}{png}{.png}{`convert #1 `dirname #1`/`basename #1 .tif`.png}

\title{Chapter 3: Testing alternative models of data}
\author{\href{http://personality-project.org/revelle.html}{William Revelle}\\
Northwestern University\\
Prepared as part of course on latent variable analysis (\href{http://personality-project.org/revelle/syllabi/454/454.syllabus.pdf}{Psychology 454})\\
 and as a supplement to the \href{http://personality-project.org/}{Short Guide to R for psychologists} \\ }

\usepackage{a4wide}
\usepackage{/Library/Frameworks/R.framework/Versions/2.4/Resources/share/texmf/Sweave}
\begin{document}
\setcounter{chapter}{3}
\maketitle
\date{}              % Activate to display a given date or no date
%\tableofcontents
\Rdcontents{}


In this chapter we consider how to test nested alternative models of some basic data types.    Using the simulation tools introduced in the previous chapter, we generate a data set from a congeneric reliability model with unequal true score loadings and fit three alternative models to the data.  Then we simulate a two factor data structure and consider a set of alternative models.  Finally, we consider ways of representing (and modeling) hierarchical data structures.

For these examples, as well as the other ones, we need to load the psych and sem packages.  

<<print=FALSE, echo=TRUE>>=
library(sem)
library(psych)
@

\section{One factor --- congeneric data model}
The classic test theory structure of 4 observed variables V1 $\dots$ V4 all loading on a single factor, $\theta$, may be analyzed in multiple ways.  The most restrictive model considers all the loadings to be fixed values (perhaps .7).  A more reasonable model is to consider the four variables to be parallel, that is to say, that they have equal loadings on the latent variable and equal error variances.  Less restrictive models would be tau equivalence, and then the least restrictive model is known as the ``congeneric" model.  

We can generate  data under a congeneric model and then test it with progressively more restricted models (i.e.,start with the most unrestricted model, the congeneric model, fix some parameters for the tau equivalent model, add equality constraints for the parallel test model, and then fit arbitrarily fixed parameters). To do this, we first create a function, sim.sem, which we apply to make our data.

\begin{figure}
\includegraphics{congeneric.pdf}
\caption{The basic congeneric model is one latent (true score) factor accounting for the correlations of multiple observed scores.  If there are at least 4 observed variables, the model is identified.  For fewer variables, assumptions need to be made (i.e., for two parallel tests, the path coefficients are all equal.) }
\label{congeneric.fig}
\end{figure}

\subsection{Generating the data}
We create a function, \textbf{sim.sem}, to simulate data with a variety of possible structures.  Although the function defaults to four variables with specific loadings on one factor, we can vary both the number of variables as well as the loadings and the number of factors.  The function returns either the pattern matrix used to generate the data, the implied structure matrix, or simulated raw data.  

<<print=FALSE, echo=TRUE>>=
sim.sem <- function(N=1000,loads =c(.8,.7,.6,.5),phi=NULL,obs=TRUE)  {
 if (!is.matrix(loads)) {loading <- matrix(loads,ncol=1)} else {loading <- loads}
 nv <- dim(loading)[1]
 nf <- dim(loading)[2]
   error <- diag(1,nrow=nv)
diag(error) <- sqrt(1- diag(loading %*% t(loading)))
if (is.null(phi)) phi <- diag(1,nrow=nf)
pattern <- cbind(loading,error)
colnames(pattern) <- c(paste("theta",seq(1:nf),sep=""),paste("e",seq(1:nv),sep=""))
rownames(pattern) <- c(paste("V",seq(1:nv),sep=""))
temp <- diag(1,nv+nf)
temp[1:nf,1:nf] <- phi
phi <- temp
colnames(phi) <- c(paste("theta",seq(1:nf),sep=""),paste("e",seq(1:nv),sep=""))
structure <- pattern %*% phi
latent <- matrix(rnorm(N*(nf+nv)),ncol = (nf+nv))
if (nf>1) {for (i in 1:nf) {for (j in i+1:nf) {phi[i,j] <- 0.0} }}
observed <- latent %*%t( pattern %*% phi)  
if(obs) {return(observed)} else {
ps <- list(pattern=pattern,structure=structure,phi)
return(ps)
  } }
@

Use the \textbf{sim.sem} function to show the pattern matrix, the implied correlation matrix, and then take a sample of 1000 from that population.  Note that even with 1000 simulated subjects the sample correlation matrix is not the same as the population matrix.  As you develop your theory testing skills, it is useful to remember that you are trying to make inferences about the population based upon our parameter estimates from the sample.

<<print=FALSE, echo=TRUE>>=
N <- 1000
sim <- sim.sem(obs=FALSE)
round(sim$pattern,2)

population <- (sim$pattern %*% t(sim$pattern))
population
set.seed(42)
data.f1 <- sim.sem()
round(cor(data.f1),2)
@

\subsection{Estimate a congeneric model}
Using the simulated data generated above, we find the covariance matrix from the sample data and  apply sem to the data. (The sem package needs to be loaded first.) Examine the statistics of fit as well as the residual matrix.

<<print=FALSE, echo=TRUE>>=

S.congeneric <- cov(data.f1)
model.congeneric <-  matrix(c(
	'theta ->  V1',      'a', NA,
 	'theta -> V2' ,      'b', NA,
 	'theta -> V3' ,      'c', NA,
	'theta -> V4',     'd', NA,
	'V1 <-> V1' ,      'u', NA,
 	'V2 <-> V2' ,      'v', NA,
	'V3 <-> V3' ,      'w', NA,
 	'V4 <-> V4' ,      'x', NA,
 	'theta <-> theta',   NA,1),
 	ncol=3, byrow=TRUE)
colnames(model.congeneric) <- c("path","label","initial estimate")
model.congeneric
sem.congeneric= sem(model.congeneric,S.congeneric,N)
summary(sem.congeneric,digits=3)
round(residuals(sem.congeneric),2)
@
%
\subsection{Estimate a tau equivalent model with equal true score and unequal error loadings}
A more constrained model, ``Tau equivalence", assumes that the theta paths are equal but allows the error variances to be unequal. 

<<print=FALSE, echo=TRUE>>=

S.congeneric <- cov(data.f1)
model.tau <-  matrix(c(
	'theta ->  V1',      'a', NA,
 	'theta -> V2' ,      'a', NA,
 	'theta -> V3' ,      'a', NA,
	'theta -> V4',     'a', NA,
	'V1 <-> V1' ,      'u', NA,
 	'V2 <-> V2' ,      'v', NA,
	'V3 <-> V3' ,      'w', NA,
 	'V4 <-> V4' ,      'x', NA,
 	'theta <-> theta',   NA,1),
 	ncol=3, byrow=TRUE)
colnames(model.tau) <- c("path","label","initial estimate")
model.tau 
sem.tau= sem(model.tau,S.congeneric,N)
summary(sem.tau,digits=3)
round(residuals(sem.tau),2)
@

Note that this model has a much worse fit (as it should), with a very large change in the $\chi^2$ that far exceeds the benefit of greater parsimony (the change in degrees of freedom from 2 to 5).


\subsection{Estimate a parallel test model with equal true score and equal error loadings}

An even more unrealistic model would a model of parallel tests where the true score variances are the same for all tests, as are the error variances.  
<<print=FALSE, echo=TRUE>>=


model.parallel <-  matrix(c(
	'theta ->  V1',      'a', NA,
 	'theta -> V2' ,      'a', NA,
 	'theta -> V3' ,      'a', NA,
	'theta -> V4',     'a', NA,
	'V1 <-> V1' ,      'u', NA,
 	'V2 <-> V2' ,      'u', NA,
	'V3 <-> V3' ,      'u', NA,
 	'V4 <-> V4' ,      'u', NA,
 	'theta <-> theta',   NA,1),
 	ncol=3, byrow=TRUE)
 colnames(model.parallel) <- c("path","label","initial estimate")
model.parallel 
sem.parallel= sem(model.parallel,S.congeneric,N)
summary(sem.parallel,digits=3)
round(residuals(sem.parallel),2)
@
\subsection{Estimate a parallel test model with fixed loadings}
The most restrictive model estimates the fewest parameters and considers the case where all loadings are fixed at a particular value.  (This is truely a stupid model).  Notice how large the residuals are.

<<print=FALSE, echo=TRUE>>=


model.fixed <-  matrix(c(
	'theta ->  V1',      NA, .6,
 	'theta -> V2' ,     NA, .6,
 	'theta -> V3' ,      NA, .6,
	'theta -> V4',     NA, .6,
	'V1 <-> V1' ,      'u', NA,
 	'V2 <-> V2' ,      'u', NA,
	'V3 <-> V3' ,      'u', NA,
 	'V4 <-> V4' ,      'u', NA,
 	'theta <-> theta',   NA,1),
 	ncol=3, byrow=TRUE)
colnames(model.fixed) <- c("path","label","initial estimate")
model.fixed 
sem.fixed= sem(model.fixed,S.congeneric,N)
summary(sem.fixed,digits=3)
round(residuals(sem.fixed),2)
@

\subsection{Comparison of models}

We can examine the degrees of freedom in each of previous analyses and compare them to the goodness of fit.  Form a list of the different analyses, and then show the summary statistics.

<<print=FALSE, echo=TRUE>>=
summary.list <- list()
summary.list[[1]] <- summary(sem.congeneric)[1:2]
summary.list[[2]] <- summary(sem.tau)[1:2]
summary.list[[3]] <- summary(sem.parallel)[1:2]
summary.list[[4]] <- summary(sem.fixed)[1:2]
summary.data <- matrix(unlist(summary.list),nrow=4,byrow=TRUE)
rownames(summary.data) <- c("congeneric","tau","parallel","fixed")
colnames(summary.data) <- c("chisq", "df")
summary.data
@


\section{Two (perhaps correlated) factors}

We now consider more interesting problems. The case of two correlated factors sometimes appears as a classic prediction problem (multiple measures of X, multiple measures of Y, what is the correlation between the two latent constructs) and sometimes as a measurement problem (multiple subfactors of X).  The generation structure is similar.  

\subsection{Generating the data}
\label{twofactor}
We use the sim.sem function from before, and specify a two factor, uncorrelated structure.

<<print=FALSE, echo=TRUE>>=

 set.seed(42)
 N <- 1000
 pattern <- matrix(c(
        .9,0,
        .8,0,
        .7,0,
         0,.8,
         0,.7,
         0,.6),ncol=2,byrow=TRUE)
 phi <- matrix(c(1,0,0,1),ncol=2)
 
 population <- sim.sem(loads = pattern, phi=phi,obs=FALSE)
 round(population$pattern,2)
 pop.cor <- round(population$structure %*% t(population$pattern),2)
 pop.cor
 data.f2 <- sim.sem(loads = pattern, phi=phi)
 pairs.panels(data.f2)
@
\begin{figure}
\includegraphics{2factor.pdf}
\caption{Six variables with two factors. This notation shows the error of measurement in the observed and latent variables.  If g >0, then the two factors are correlated. }
\end{figure}


\subsection{Exploratory Factor analysis of the data}
\label{efa}
This structure may be analyzed in a variety of different ways, including exploratory factor analysis.  A ``scree" plot of the eigen values of the matrix suggests a two factor solution.  Based upon this ``prior" hypotheses, we extract two factors using the \textbf{factanal} function.

\begin{figure}
<<print=FALSE,echo=TRUE,fig=TRUE,eps=FALSE>>=
VSS.scree(cor(data.f2))
@
\caption{A scree plot of the eigen values of the simulated data suggests that two factors are the best representation of the data.  Compare this to the two correlated factor problem, Figure \ref{VSS2.r}, and the three correlated factor problem, Figure \ref{VSS3.r} }
\label{VSS2.0}
\end{figure}

<<print=FALSE, echo=TRUE>>=
f2 <- factanal(data.f2,2)
f2
@

The factor loadings nicely capture the population values specified in section \ref{twofactor}.

\subsection{Confirmatory analysis with a predicted structure}
We can also analyze these data taking a confirmatory, proposing that the first 3 variables load on one factor, and the second 3 variables load on a second factor. 

<<print=FALSE, echo=TRUE>>=
S.f2 <- cov(data.f2)
model.two <-  matrix(c(
	'theta1 ->  V1',      'a', NA,
 	'theta1 -> V2' ,      'b', NA,
 	'theta1 -> V3' ,      'c', NA,
	'theta2 -> V4',     'd', NA,
	'theta2 -> V5',     'e', NA,
	'theta2 -> V6',     'f', NA,
	'V1 <-> V1' ,      'u', NA,
 	'V2 <-> V2' ,      'v', NA,
	'V3 <-> V3' ,      'w', NA,
 	'V4 <-> V4' ,      'x', NA,
 	'V5 <-> V5' ,      'y', NA,
 	 'V6 <-> V6' ,      'z', NA,
 	'theta1 <-> theta1',   NA,1,
 	'theta2 <-> theta2',   NA,1),
 	ncol=3, byrow=TRUE)
colnames(model.two) <- c("path","label","initial estimate")
model.two
sem.two= sem(model.two,S.f2,N)
summary(sem.two,digits=3)
round(residuals(sem.two),2)
std.coef(sem.two)
@

It is useful to compare these ``confirmatory" factor loadings with the factor loadings obtained by the sem in section \ref{efa}. 
%
\subsection{Confirmatory factor analysis with two independent factors with equal loadings within factors}
The previous model allowed the factor loadings (and hence the quality of measurement of the variables) to differ.  A more restrictive model (e.g., tau equivalence) forces the true score loadings to be equal within each factor.

<<print=FALSE, echo=TRUE>>=

model.twotau <-  matrix(c(
	'theta1 ->  V1',      'a', NA,
 	'theta1 -> V2' ,      'a', NA,
 	'theta1 -> V3' ,      'a', NA,
	'theta2 -> V4',     'd', NA,
	'theta2 -> V5',     'd', NA,
	'theta2 -> V6',     'd', NA,
	'V1 <-> V1' ,      'u', NA,
 	'V2 <-> V2' ,      'v', NA,
	'V3 <-> V3' ,      'w', NA,
 	'V4 <-> V4' ,      'x', NA,
 	'V5 <-> V5' ,      'y', NA,
 	 'V6 <-> V6' ,      'z', NA,
 	'theta1 <-> theta1',   NA,1,
 	'theta2 <-> theta2',   NA,1),
 	ncol=3, byrow=TRUE)
colnames(model.twotau) <- c("path","label","initial estimate")
model.twotau
sem.twotau= sem(model.twotau,S.f2,N)
summary(sem.twotau,digits=3)
round(residuals(sem.twotau),2)
std.coef(sem.twotau)
@

\subsection{Structure invariance, part I--- unequal loadings within factors - matched across factors}
Are the two factors measured the same way?  That is, are the loadings for the first factor the same as those for the second factor?  We can test the model that the ordered loadings are the same across the two factors.  We allow the errors to differ.  
<<print=FALSE, echo=TRUE>>=

model.two.invar <-  matrix(c(
	'theta1 ->  V1',      'a', NA,
 	'theta1 -> V2' ,      'b', NA,
 	'theta1 -> V3' ,      'c', NA,
	'theta2 -> V4',     'a', NA,
	'theta2 -> V5',     'b', NA,
	'theta2 -> V6',     'c', NA,
	'V1 <-> V1' ,      'u', NA,
 	'V2 <-> V2' ,      'v', NA,
	'V3 <-> V3' ,      'w', NA,
 	'V4 <-> V4' ,      'x', NA,
 	'V5 <-> V5' ,      'y', NA,
 	 'V6 <-> V6' ,      'z', NA,
 	'theta1 <-> theta1',   NA,1,
 	'theta2 <-> theta2',   NA,1),
 	ncol=3, byrow=TRUE)
colnames(model.two.invar) <- c("path","label","initial estimate")
model.two.invar
sem.two.invar= sem(model.two.invar,S.f2,N)
summary(sem.two.invar,digits=3)
round(residuals(sem.two.invar),2)
std.coef(sem.two.invar)
@

What is both interesting and disappointing from this example is that although the true loadings (refer back to \ref{twofactor}) are not matched across the two factors, estimating a model that they are equivalent across factors can not be rejected, even with 1000 subjects. 
%
%
\subsection{Estimate two correlated factors}
This next example is a bit more subtle, in that we generate data with a particular causal structure.  The matrix of intercorrelations of the two factors leads to correlations between the variables, but reflects the idea of a path coefficent from the first latent variable to the second one.

<<print=FALSE,echo=TRUE>>=
 set.seed(42)
 N <- 1000
 pattern <- matrix(c(
        .9,0,
        .8,0,
        .7,0,
         0,.8,
         0,.7,
         0,.6),ncol=2,byrow=TRUE)
 phi <- matrix(c(1,.4,.4,1),ncol=2)
 
 population <- sim.sem(loads = pattern, phi=phi,obs=FALSE)
 round(population$pattern,2)
 round(population$structure,2)
 pop.cor <- population$structure %*% t(population$pattern)
 round(pop.cor,2)
 data.f2 <- sim.sem(loads = pattern, phi=phi)
@

\begin{figure}
<<print=FALSE,echo=TRUE,fig=TRUE,eps=FALSE>>=
pairs.panels(data.f2)
@
\caption{Six variables loading on 2 correlated factors} 
\end{figure}

The scree test for this problem also suggests two factors, although not as clearly as in example \ref{twofactor}. 

\begin{figure}
<<print=FALSE,echo=TRUE,fig=TRUE,eps=FALSE>>=
VSS.scree(cor(data.f2))
@
\caption{Scree plot of two correlated factors.  Compare to Figure \ref{VSS2.0}}.
\label{VSS2.r}
\end{figure}


<<print=FALSE,echo=TRUE>>=
f2 <- factanal(data.f2,2)
f2
@

The sem for uncorrelated factors does not fit very well

<<print=FALSE, echo=TRUE>>=
S.f2 <- cov(data.f2)
model.two <-  matrix(c(
	'theta1 ->  V1',      'a', NA,
 	'theta1 -> V2' ,      'b', NA,
 	'theta1 -> V3' ,      'c', NA,
	'theta2 -> V4',     'd', NA,
	'theta2 -> V5',     'e', NA,
	'theta2 -> V6',     'f', NA,
	'V1 <-> V1' ,      'u', NA,
 	'V2 <-> V2' ,      'v', NA,
	'V3 <-> V3' ,      'w', NA,
 	'V4 <-> V4' ,      'x', NA,
 	'V5 <-> V5' ,      'y', NA,
 	 'V6 <-> V6' ,      'z', NA,
 	'theta1 <-> theta1',   NA,1,
 	'theta2 <-> theta2',   NA,1),
 	ncol=3, byrow=TRUE)
colnames(model.two) <- c("path","label","initial estimate")
model.two
sem.two= sem(model.two,S.f2,N)
summary(sem.two,digits=3)
 std.coef(sem.two)

round(residuals(sem.two),2)
@

and so we allow the two factors to be correlated.

<<print=FALSE, echo=TRUE>>=
S.f2 <- cov(data.f2)
model.two <-  matrix(c(
	'theta1 ->  V1',      'a', NA,
 	'theta1 -> V2' ,      'b', NA,
 	'theta1 -> V3' ,      'c', NA,
	'theta2 -> V4',     'd', NA,
	'theta2 -> V5',     'e', NA,
	'theta2 -> V6',     'f', NA,
	'V1 <-> V1' ,      'u', NA,
 	'V2 <-> V2' ,      'v', NA,
	'V3 <-> V3' ,      'w', NA,
 	'V4 <-> V4' ,      'x', NA,
 	'V5 <-> V5' ,      'y', NA,
 	 'V6 <-> V6' ,      'z', NA,
 	'theta1 <-> theta1',   NA,1,
 	'theta2 <-> theta2',   NA,1,
 	'theta1 <-> theta2',   'g',NA),
 	ncol=3, byrow=TRUE)
colnames(model.two) <- c("path","label","initial estimate")
model.two
sem.two= sem(model.two,S.f2,N)
summary(sem.two,digits=3)
std.coef(sem.two)
round(residuals(sem.two),2)
@
\section{Hierarchical models}

The two correlated factors of section \ref{twofactor} may be thought of as repressenting two lower level factors each of which loads on a higher level factor.  With just two lower level factors, the loadings on the higher level factor are not unique (one correlation, r, between the two factors may be represented in an infinite number of ways as the product of loadings ga and gb).

There are several ways of representing hierarchical models, including correlated level one factors with a g factor and  uncorrelated lower level factors with a g factor (a bifactor solution).  The latter may be estimated directly from the data, or may be found by using the Schmid-Leiman transformation of the correlated factors.

\begin{figure}
\includegraphics{2factorg.pdf}
\caption{The correlation between two factors may be modeled by a g, general, factor.  This representation shows all the errors that need to be estimated. }
\label{2gfactors.fig}
\end{figure}

\subsection{Two Correlated factors with a g factor}
The hierarchical model of a g factor is underidentified unless we specify one of the g paths.  Here we set it to 1 and then estimate the rest of the model.

<<print=FALSE, echo=TRUE>>=
S.g2 <- cov(data.f2)
model.g2 <-  matrix(c(
	'theta1 ->  V1',      'a', NA,
 	'theta1 -> V2' ,      'b', NA,
 	'theta1 -> V3' ,      'c', NA,
	'theta2 -> V4',     'd', NA,
	'theta2 -> V5',     'e', NA,
	'theta2 -> V6',     'f', NA,
	'g -> theta1',      NA,1,
	'g -> theta2',      'g2',NA,
	'V1 <-> V1' ,      'u', NA,
 	'V2 <-> V2' ,      'v', NA,
	'V3 <-> V3' ,      'w', NA,
 	'V4 <-> V4' ,      'x', NA,
 	'V5 <-> V5' ,      'y', NA,
 	 'V6 <-> V6' ,      'z', NA,
 	'theta1 <-> theta1',   NA,1,
 	'theta2 <-> theta2',   NA,1,
 	'g <-> g',   NA,1),
 	ncol=3, byrow=TRUE)
colnames(model.g2) <- c("path","label","initial estimate")
model.g2
sem.g2= sem(model.g2,S.g2,N)
summary(sem.g2,digits=3)
std.coef(sem.g2)
round(residuals(sem.g2),2)
@

\begin{figure}
\includegraphics{twogfactor.pdf}
\caption{The correlation between two factors may be modeled by a g, general, factor.  This representation is somewhat more compact than the previous figure (\ref{2gfactors.fig}. }
\label{2g.fig}
\end{figure}

\subsection{Generating the data for 3 correlated factors}
\label{3rf}
We have two demonstrations: the first is the two correlated factor data from section \ref{twofactor}, the second is a three correlated factors. To create the later we use the sim.sem function with three latent variables.

<<print=FALSE, echo=TRUE>>=
pattern <- matrix(c(.9,.8,.7,0,0,0,0,0,0,
                    0,0,0,.8,.7,.6,0,0,0,
                    0,0,0,0,0,0,.6,.5,.4),ncol=3)
colnames(pattern) <- c("F1","F2","F3")
rownames(pattern) <- paste("V",1:dim(pattern)[1],sep="")
pattern
phi <- matrix(c(1,.0,.0,
                .5,1,0,
                .4,.4,1),ncol=3,byrow=TRUE)
phi
data.f3 <- sim.sem(loads=pattern,phi=phi)
@

\begin{figure}
<<print=FALSE,echo=TRUE,fig=TRUE,eps=FALSE>>=
VSS.scree(cor(data.f3))
@
\caption{Scree plot of three correlated factors.  Compare to the two uncorrelated factors, Figure \ref{VSS2.0}, and the two correlated factors, \ref{VSS2r}}.
\label{VSS3.r}
\end{figure}

\subsection{Exploratory factor analysis with 3 factors}
As a first approximation to these data, we can do a three factor exploratory analysis to try to understand the structure of the data.  

<<print=FALSE, echo=TRUE>>=
f3 <- factanal(data.f3,3,rotation="none")
f3
@

\begin{figure}
\includegraphics{3factorg.pdf}
\caption{The correlation between three factors may be modeled by a g, general, factor.  }
\label{3g.fig}
\end{figure}

\subsubsection{Orthogonal Rotation}
The loadings from this factor analysis are not particularly easy to understand and can be rotated to a more somewhat more understandable structure using the VARIMAX rotation (which is actually the default for factanal).  We use the \textbf{GPArotaton} package.

<<print=FALSE, echo=TRUE>>=
library(GPArotation)
f3v <- Varimax(loadings(f3))
round(loadings(f3v),2)
@

The structure is more easy to understand than the original one, but still is somewhat hard to understand.  

\subsubsection{Oblique Rotation}

By allowing the factors to be correlated, we are able to find a more simple representation of the factor pattern.  However, we need to report both the factor loadings as well as the factor intercorrelations.

<<print=FALSE, echo=TRUE>>=
f3o <- oblimin(loadings(f3))
round(loadings(f3o),2)
@

The alternatives to exploratory factor analysis is to apply a confirmatory model specifying the ``expected" structure.  We do this with both a hierarchical g factor model as well as a bifactor model.

\subsection{Three correlated factors with a g factor}
\label{3rg}
<<print=FALSE, echo=TRUE>>=
S.g3 <- cov(data.f3)
model.g3 <-  matrix(c(
	'theta1 -> V1',      'a', NA,
 	'theta1 -> V2' ,      'b', NA,
 	'theta1 -> V3' ,      'c', NA,
	'theta2 -> V4',     'd', NA,
	'theta2 -> V5',     'e', NA,
	'theta2 -> V6',     'f', NA,
	'theta3 -> V7',     'g', NA,
	'theta3 -> V8',     'h', NA,
	'theta3 -> V9',     'i', NA,
	'g -> theta1',      'g1',NA,
	'g -> theta2',      'g2',NA,
	'g -> theta3',      'g3', NA,
	'V1 <-> V1' ,      'u', NA,
 	'V2 <-> V2' ,      'v', NA,
	'V3 <-> V3' ,      'w', NA,
 	'V4 <-> V4' ,      'x', NA,
 	'V5 <-> V5' ,      'y', NA,
 	 'V6 <-> V6' ,      'z', NA,
 	 'V7 <-> V7' ,      's', NA,
 	'V8 <-> V8' ,      't', NA,
 	 'V9 <-> V9' ,      'r', NA,
 	'theta1 <-> theta1',   NA,1,
 	'theta2 <-> theta2',   NA,1,
 	'theta3 <-> theta3',   NA,1,
 	'g <-> g',   NA,1),
 	ncol=3, byrow=TRUE)
colnames(model.g3) <- c("path","label","initial estimate")
model.g3
sem.g3= sem(model.g3,S.g3,N)
summary(sem.g3,digits=3)
std.coef(sem.g3)
@
%
\begin{figure}
\includegraphics{semg3.pdf}
\caption{A hierarchical solution to the three correlated factors problem.}
\label{bifig}
\end{figure}
%
\subsection{Bifactor solutions}
\label{3rbi}
An alternative to the correlated lower level factors and a g factor is a ``bifactor" model where each item is represented by two factors, a lower level, group, factor and a higher level, g, factor.  This may be found directly through sem - cfa, or may be done indirectly by using a Schmid-Leiman transformation of the correlated factors.  We use the same three factor data set as in the two previous sections (\ref{3rf}, \ref{3rg}) 

<<print=FALSE, echo=false>>=
S.g3 <- cov(data.f3)
model.bi <-  matrix(c(
	'theta1 -> V1',      'a', NA,
 	'theta1 -> V2' ,      'b', NA,
 	'theta1 -> V3' ,      'c', NA,
	'theta2 -> V4',     'd', NA,
	'theta2 -> V5',     'e', NA,
	'theta2 -> V6',     'f', NA,
	'theta3 -> V7',     'g', NA,
	'theta3 -> V8',     'h', NA,
	'theta3 -> V9',     'i', NA,
	'g -> V1',      'g1',NA,
	'g -> V2',      'g2',NA,
	'g -> V3',      'g3', NA,
	'g -> V4',      'g4',NA,
	'g -> V5',      'g5',NA,
	'g -> V6',      'g6', NA,
	'g -> V7',      'g7',NA,
	'g -> V8',      'g8',NA,
	'g -> V9',      'g9', NA,
	'V1 <-> V1' ,      'u', NA,
 	'V2 <-> V2' ,      'v', NA,
	'V3 <-> V3' ,      'w', NA,
 	'V4 <-> V4' ,      'x', NA,
 	'V5 <-> V5' ,      'y', NA,
 	 'V6 <-> V6' ,      'z', NA,
 	 'V7 <-> V7' ,      's', NA,
 	'V8 <-> V8' ,      't', NA,
 	 'V9 <-> V9' ,      'r', NA,
 	'theta1 <-> theta1',   NA,1,
 	'theta2 <-> theta2',   NA,1,
 	'theta3 <-> theta3',   NA,1,
 	'g <-> g',   NA,1),
 	ncol=3, byrow=TRUE)
colnames(model.bi) <- c("path","label","initial estimate")
model.bi
sem.bi= sem(model.bi,S.g3,N)
summary(sem.bi,digits=3)
std.coef(sem.bi)

@


\begin{figure}
\includegraphics{sembi.pdf}
\caption{A bifactor solution to the three correlated factors problem.}
\label{bifig}
\end{figure}


\subsection{Schmid Leiman transformations to orthogonalize the factors}

Coming soon!


\end{document}