---
title: "405.homework.fa"
author: "William Revelle"
date: "4/22/2018"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
options(width=100)
```

# Factor Analysis Homework

##Preliminaries

The first step is to make sure that you have the most recent version of the psych package and that you have made it active.

```{r}
#install.packages("psych",repos="http://personality-project.org/r", type="source")
library(psych) #make it active
sessionInfo()   #psych should be 1.8.4
```
## Problem 1:  A mood data set
Emotions may be described either as discrete emotions or in dimensional terms. The Motivational State Questionnaire (MSQ) was developed to study emotions in laboratory and field settings. The data can be well described in terms of a two dimensional solution of energy vs tiredness and tension versus calmness. Additional items include what time of day the data were collected and a few personality questionnaire scores. 3082 unique participants took the MSQ at least once, 2753 at least twice, 446 three times, and 181 four times. The 3032 also took the sai state anxiety inventory at the same time.

Here we examine the factor structure of 12 mood items.

The first step is to select them from the larger set and then to describe them.

We use the acs function in psych to convert a string into a character vector. We alphabetize them as a demonstration of ordering.

The msqR data set includes multiple occasions.  We choose just the first time point.


###Problem 1
1) select a subset of mood data (active,energetic,vigorous,sleepy,tired,drowsy,intense, jittery, fearful,at.rest,calm,still)

2) Find the descriptive statistics

3) Examine the correlation matrix

4) how many factors are in the data?

5) What are they

6) Plot them

7) Reorganize the correlation matrix for a cleaner structure


###Problem 2  Ability

16 multiple choice ability items 1525 subjects taken from the Synthetic Aperture Personality Assessment (SAPA) web based personality assessment project are saved as iqitems. Those data are shown as examples of how to score multiple choice tests and analyses of response alternatives. When scored correct or incorrect, the data are useful for demonstrations of tetrachoric based factor analysis irt.fa and finding tetrachoric correlations.


1) select 16 ability items from the ability data set

2) Find the descriptive statistics

3) Examine the correlation matrix

4) how many factors are in the data?

5) What are they

6) Plot them

7) Reorganize the correlation matrix for a cleaner structure

###Problem 3 The Big 5

25 personality self report items taken from the International Personality Item Pool (ipip.ori.org) were included as part of the Synthetic Aperture Personality Assessment (SAPA) web based personality assessment project. The data from 2800 subjects are included here as a demonstration set for scale construction, factor analysis, and Item Response Theory analysis. Three additional demographic variables (sex, education, and age) are also included.

1) select 25 'personality' items from the bfi data set

2) Find the descriptive statistics

3) Examine the correlation matrix

4) how many factors are in the data?

5) What are they

6) Plot them


#Answers to problem 1

```{r}
select <- acs(active,energetic,vigorous,sleepy,tired,drowsy,intense, jittery, fearful,at.rest,calm,still)
#alphabetize these 
 select <- select[order(select)]
select   #show the alpha list
msqR1 <- subset(msqR,msqR$time==1) #choose just the first time 
#Always describe the data
describe(msqR1[select])
#show the correlation matrix
R <- lowerCor(msqR1[select])
```
###How many factors are in the data?
This is a hard problem.  We try three different approaches: parallel analysis, Very Simple Structure, and then a whole range of tests using nfactors.

Parallel analysis compares the solution to a random set of data.  Although useful for 100-400 subjects, this is probably not as helpful for 3000 subjects.
```{r}
fa.parallel(msqR1[select])
```

Very Simple Structure asks how many factors are 'simple'.  It tests the goodness of fit for 1 ... 8 factors where every item is thought to load on 1 .. 4 factors.  This is essentially what people do when they try to interpret a factor solution.

```{r}
VSS(msqR1[select])
```

An alternative is to consider 10 different tests using the nfactors function.

```{r}
nfactors(msqR1[select])
```


###What are these factors?

```{r}
f2 <- fa(msqR1[select],2)  #a two factor solution
f2 # show the solution

fa.sort(f2)   #show it in a more meaningful way
 fa.plot(f2,labels=colnames(msqR[select]))  #draw it in two space
fa.diagram(f2)  #show it as a path diagram
 Rs <- mat.sort(R,f2)
 corPlot(Rs,numbers=TRUE)

```