---
title: "350 Week 9 data manipulation"
author: "William Revelle"
date: "06/01/2020"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
options(width=100)
```
# Psych has been updated, install the latest version
```{r}
#install.packages("psych",repos="http://personality-project.org/r",type="source")
library(psych)
library(psychTools)
sessionInfo() #psych and psychTools should be 2.0.4
```

#Various R commands for data manipulation and display

These are examples taken from the help file for the `sai` and `msqR` data sets.  `sai` for the State Anxiety Inventory
`msqR` for the Motivational State Questionnaire.


## These data were collected at the PMC lab:

The standard experimental study at the Personality, Motivation and Cognition (PMC) laboratory (Revelle and Anderson, 1997) was to administer a number of personality trait and state measures (e.g. the epi, msq, msqR and sai) to participants before some experimental manipulation of arousal/effort/anxiety. Following the manipulation (with a 30 minute delay if giving caffeine/placebo), some performance task was given, followed once again by measures of state arousal/effort/anxiety.

Here are the item level data on the `sai` (state anxiety) and the `tai` (trait anxiety). Scores on these scales may be found using the scoring keys. The affect data set includes pre and post scores for two studies (flat and maps) which manipulated state by using four types of movies.

In addition to being useful for studies of motivational state, these studies provide examples of test-retest and alternate form reliabilities. Given that 10 items overlap with the msqR data, they also allow for a comparison of immediate duplication of items with 30 minute delays.

Studies CART, FAST, SHED, RAFT, and SHOP were either control groups, or did not experimentally vary arousal/effort/anxiety.

AGES, CITY, EMIT, RIM, SALT, and XRAY were caffeine manipulations between time 1 and 2 (RIM and VALE were repeated day 1 and day 2)

FIAT, FLAT, MAPS, MIXX, and THRU were 1 day studies with film manipulation between time 1 and time 2.

SAM1 and SAM2 were the first and second day of a two day study. The STAI was given once per day. MSQ not MSQR was given.

VALE and PAT were two day studies with the STAI given pre and post on both days

RIM was a two day study with the STAI and MSQ given once per day.

Usually, time of day 1 = 8:50-9am am, and 2 = 7:30 pm, however, in rob, with paid subjects, the times were 0530 and 22:30.

##The `table` command  tells you about the data

```{r}

data(sai)  #actually not necessary, because the data are 'lazy loaded' 
dim(sai)  #how many subjects (row) and how many columns (variables)
colnames(sai) #and what are the variables?
table(sai$study) #what are the names of the studies in this file
table(sai$study,sai$time) #cross tabulate the studies by time adminstered.
```

##We can choose various studies using the `subset` and `%in%` commands

```{r}
control <- subset(sai,sai$study %in% c("Cart", "Fast", "SHED",  "RAFT", "SHOP")) 
table(control$study,control$time)  #it still knows the names of the studies, even though no one is there
dim(control)
#pre and post drug studies
drug <- subset(sai,sai$study %in% c("AGES", "CITY","EMIT", "SALT", "VALE", "XRAY")) 
#pre and post film studies
film <- subset(sai,sai$study %in% c("FIAT","FLAT", "MAPS", "MIXX")) 
```   
 
##Lets explore the `%in%` command
 
 Z <- X %in% Y  returns all the elements in X that are in Y 
 X %in% Y returns the logical values (TRUE, FALSE) for each element in X
 
```{r}
 table(sai$study) #shows the counts in each study
 studies <- names(table(sai$study)) #gets the names of the studies
 studies %in% c("Cart", "Fast", "SHED",  "RAFT", "SHOP")
 #versus
 c("Cart", "Fast", "SHED",  "RAFT", "SHOP") %in% studies
 select <- studies %in% c("Cart", "Fast", "SHED",  "RAFT", "SHOP")
 select
```
 
# Various data manipulation and displays

We will show a `corPlot` of the `sai` data several different ways.  The first is the most naive way, by variable order
```{r}
R <- corPlot(sai[4:23]) #draw the corPlot and return the correlation matrix
```

## use a graphical parameter to make that graph better

The previous graph does not show all of the variables on the x axis.  The `xlas` command makes the labels on the xaxis vertical or horizontal (default).

```{r}
R <- corPlot(sai[4:23],xlas=3) #draw the corPlot and return the correlation matrix
```

Do a factor analysis of the data, take out 2 factors, and then sort the variables by the loadings on these two factors.  We can do this using the `fa` and `mat.sort` functions.

```{r}
f2 <- fa(R,2) #extract two factors
f2 #show the factor analysis output
fa.diagram(f2)  #show the fa output as a path diagram
plot(f2, labels = colnames(R)) #show it as a spatial plot
Rs <- matSort(R,f2) #sort the matrix by the factor loadings
corPlot(Rs,main="SAI sorted by factor loadings")
corPlot(Rs, xlas=2,main = "SAI sorted by factor loadings") #set x names to be vertical
```
#What are the scores for these anxiety items? Can we form scales?

We use `scoreItems` to form scales as well as `scoreOverlap` to examine the structure.

We first get (create) a `keys.list` to show which items go on which scales. 

```{r}
sai.keys <- list(sai = c("tense","regretful" , "upset", "worrying", "anxious", "nervous" ,  
"jittery" , "high.strung", "worried" , "rattled","-calm", 
"-secure","-at.ease","-rested","-comfortable", "-confident" ,"-relaxed" , "-content" , 
"-joyful", "-pleasant"  ) ,
sai.p = c("calm","at.ease","rested","comfortable", "confident", "secure" ,"relaxed" ,     
       "content" , "joyful", "pleasant" ),  
sai.n = c( "tense" , "anxious", "nervous" , "jittery" , "rattled",     "high.strung",  
         "upset", "worrying","worried","regretful" )
) 
#show the keys
sai.keys
```

The keys are used for specifying which items go on which scales.  Note that the sai.p and sai.n are the subsets of the total and represent positive and negative items.  They will correlate with the total partly because of item overlap.  We can adjust for this by using the `scoreOverlap` function on the correlation matrix. Although we can adjust the correlations, we can not adjust the scores, and thus there is no scores object returned from `scoreOverlap`.

```{r}
sai.scores <- scoreItems(sai.keys,sai) 
sai.scores  #this produces a fair amount of output
summary(sai.scores) #just show the correlation matrix of the scales
sai.overlap <- scoreOverlap(sai.keys,sai)
sai.overlap  #show the full output
summary(sai.scores)
summary(sai.overlap) #compare these two, why do they differ?
```

#Use the scores we just found and compare them over time

The `sai.scores$scores` object has the scores, but we need to combine with the experimental data
```{r}
dim(sai.scores$scores)
sai.df <- data.frame(sai[1:3],sai.scores$scores)
describe(sai.df)
```