---
title: "SMEP:2020 Using the psych package"
author: "William Revelle"
date: "10/8/2020"
output: html_document
---

[Slides available at](https://personality-project.org/r/tutorials/swissarmy/SMEP.20.html) https://personality-project.org/r/tutorials/swissarmy/SMEP.20.html  The [Rmd file](https://personality-project.org/r/tutorials/swissarmy/SMEP.20.Rmd) is there as well.  If you download the Rmd file and copy it to RStudio you can follow along.  

```{r setup, include=FALSE}
knitr::opts_chunk$set(comment="",echo = TRUE)
options(width=110)
```
# psych:  a Swiss Army knife package

Psych is analogous to a Swiss Army knife.  Definitely not the best tool for any specific job but a pretty good tool for many jobs.  Today we discuss using psych for some "basic" psychometrics.

## Preliminaries

It is always necessary to make sure that `psych` and `psychTools` are available. The assumption is that you have installed R and the packages.  Then just start up R and make sure that the packages you want are available.  

```{r library}
library(psych)
library(psychTools)
```

## psych and psychTools

Psych has been under development for the past 15 years.  As it has gotten bigger, it has exceeded Comprehensive R Archive Network (CRAN) guidelines of no more than a 5 MB file.  Thus several of the vignettes, the larger data sets, and some convenient tools have been moved to `psychTools`.  


# Basic Input

`psychTools` offers several functions for convenient input.  These are `read.clipboard` and `read.file`.  Both of these functions are described in detail in the vignette {http://personality-project.org/r/psych/vignettes/intro.pdf}{(introduction to the psych package)}  Look at that vignette and the ones cross linked from there for more information.

Here is a brief example that  you can run.
Copy the following text to your clipboard

             
Sentences         1.00                                                 
Vocabulary        0.83  1.00                                           
Sent.Completion   0.78  0.78  1.00                                     
First.Letters     0.44  0.49  0.46  1.00                               
Four.Letter.Words 0.43  0.46  0.42  0.67  1.00                         
Suffixes          0.45  0.49  0.44  0.59  0.54  1.00                   
Letter.Series     0.45  0.43  0.40  0.38  0.40  0.29  1.00             
Pedigrees         0.54  0.54  0.53  0.35  0.37  0.32  0.56  1.00       
Letter.Group      0.38  0.36  0.36  0.42  0.45  0.32  0.60  0.45  1.00 

Then enter

R <- read.clipboard.lower(names=TRUE)

I can not show this in the html, because that is dynamic command, but the static version will produce the same output:


```{r}
R <- Thurstone  #a built in data set
lowerMat(R)  #neatly print out the data matrix
```

## Built in data sets

We will make use of three of the data sets built into psychTools.  `ability`, `bfi` and `spi`.  To see descriptions of these data sets, just ask for help:

```{r}
?ability
```
To see all of the data sets available in `psych` or `psychTools`:

```{r data}
data(package= "psychTools")
data(package="psych")
```

# Basic psychometrics, correlations, factors, and components

The `ability` dataset:  16 items taken from the International Cognitive Ability Resource (ICAR) 16 item example. [Condon and Revelle, (2014)](https://doi.org/10.1016/j.intell.2014.01.004).  See also [Revelle, Dworak and Condon, (2020)](https://journals.sagepub.com/doi/full/10.1177/0963721420922178) and [Dworak, Revelle, Doebler and Condon (2020)](https://www.sciencedirect.com/science/article/pii/S0191886920300957)  Data were collected from the SAPA project.


## Always examine the data before continuing -- the `describe` and `corplot` functions

```{r describe}
describe(ability)
corPlot(ability,xlas=3)
```

## Factor analysis of the ability data set

How many factors?  Who knows?  To quote Henry Kaiser, "the number of factor problems is easy, I solve it everyday before breakfast.  But what is the right answer".

```{r parallel}
fa.parallel(ability)
#or another way
nfactors(ability)
```

## Lets go for 4 factors (with oblimin rotation by default)

```{r fa}
f4 <- fa(ability,4)
fa.diagram(f4)
```

## But the ability are dichotomous, shouldn't we find tetrachorics?

Yes.  Either find them directly using the `tetrachoric` function, or find them and then factor them and display the results as a 2pl solution.

```{r irt.fa}
ability.irt <- irt.fa(ability) #take out a one factor solution
ability.irt.4 <- irt.fa(ability.irt,4) #use the correlations found here, and do it again with 4 factors
```

## lets look at the hierachical structure
 
```{r omega}
om <- omega(ability,4) #show a Schmid Leiman Solution
om  #show the reslts
omega.diagram(om,sl=FALSE) #show the hierarchical model
omegaDirect(ability,4) #use the Neils Waller approach
```

# Reliability and its discontents

See [Revelle and Condon (2019)](https://personality-project.org/revelle/publications/rc.pa.19.pdf) for more than you ever wanted to know about reliability. And if that wasn't enough, go to the online [appendix](https://personality-project.org/revelle/publications/rc.pa.supplement.pdf) for R code for doing the analyses.  

We just showed three ways of looking at reliability. $\omega_h$, $\alpha$ and $\omega_t$. What about splitHalf?

We can find all split halves for up to about 24 items, or sample the top 10,000 (or so) for bigger problems.

```{r splitHalf}
sp <-splitHalf(ability,raw=TRUE)
sp  #report basic statistics

hist(sp$raw) # show the histogram of all possible splits 
```

# More factoring  (the bfi data set)

The `bfi` data set IS NOT THE BFI (John and Soto). It is a little set of 25 items taken from SAPA.

Lets find 5 factors (hence the name) and then show the item content.  We use the `fa` function as well as the `faLookup` function.  The later takes the factor output, sorts it, and then merges it with item discriptions held in the `bfi.dictionary`.

```{r bfi}
f5 <- fa(bfi[1:25],5)  #do the factoring, specify we want 5 factors
fa.diagram(f5)  #show the structure
fa.lookup(f5,dictionary=bfi.dictionary[1:2])
```

## BassAckward factoring (Goldberg and Waller)

Can be done with successive pca and correlate the component scores (Goldberg) or just using matrix algebra on the components (Waller) or doing matrix algrebra on the factor correlations (Revelle, who doesn't know better).

```{r bassack}
ba.bfi <- bassAckward(bfi[1:25], c(2,3,5))
```


# Lets try regression, mediation, and moderation

In core-R, regression is done using the `lm` function from the raw data. `setCor` in *psych* will do it from the raw data or from the correlation matrix.  The syntax is meant to be the same.

We will use a data set from Nurit Tal-Or, Jonanathan Cohen, Yariv Tasfati, and Albert Gunther (2010) (with their permission).

```{r TalOr}
describe(Tal_Or)
pairs.panels(Tal_Or)   
summary(lm(reaction ~ cond + pmi, data=Tal_Or))  #the classic lm approach
setCor(reaction ~ cond + pmi, data=Tal_Or,std=FALSE) #the psych approach
mod <- mediate(reaction ~ cond + (pmi), data=Tal_Or) #a mediation model
summary(mod)
```

## Try two mediators and then an interaction

When supplying an interaction term, the data are centered before multiplying
```{r two}
mod2 <- mediate(reaction ~ cond + (pmi) + (import), data=Tal_Or) #a mediation model
mod3 <-  mediate(reaction ~ cond + (pmi) * import, data=Tal_Or)
summary(mod2)
summary(mod3)
anova(mod2,mod3)  #compare the models
```

## `setCor` from the correlations.

An example from Marian Spengler and Rodrica Damian
A longitudinal study of the project talent data set.

```{r spengler}
Spengler.stat #show the basic descriptives of the original data set
psych::lowerMat(Spengler[psych::cs(IQ,Parental,Ed.11,OccPres.50),
                        psych::cs(IQ,Parental,Ed.11,OccPres.50)])
psych::setCor(OccPres.50 ~ IQ + Parental + Ed.11,data=Spengler)
#we reduce the number of subjects for faster replication in this example
mod <- psych::mediate(OccPres.50 ~ IQ + Parental + (Ed.11),data=Spengler,
       n.iter=50,n.obs=1000) #for speed
summary(mod)
```

# How about scoring items into scales?

Unless just finding scores for one scale, it is necessary to specify a scoring key for each scale.

Consider the scoring key for the `bfi`.  This key is supplied as an example of scoring keys.

```{r score}
bfi.keys  #show the key information  (note how a negative sign implies item reversal)
score.bfi <- scoreItems(bfi.keys,bfi[1:25])
score.bfi  #lots of output
#the scores are an object within the score.bfi
pairs.panels(score.bfi$scores)

```

## Compare those scores to IRT based 2pl scores

```{r scoreirt}
bfi.irt <- scoreIrt.2pl(bfi.keys,bfi)
cor2(bfi.irt,score.bfi$scores)

```


# Now lets do some *dustbowl empiricism* (aka machine learning)

A very simple statistical learning model is incorporated into the `bestScales` function.  Also known as the BISCUIT (Best Items Scales that are Cross validated, Unit weighted, Informative, and Transparent) algorithm.  As discussed in [Elleman et al. (2020)](https://psyarxiv.com/tuqap/) best scales chooses the most highly correlated items, unit weight them, cross validates (using either k-fold or bootstrapping) and forms a final scale.

We use the `bfi` data set to show this.  

```{r biscuit}
bestboot <- bestScales(psychTools::bfi,criteria=cs(gender,age,education), 
   n.iter=100,dictionary=psychTools::bfi.dictionary[1:3])
bestboot
#compare with 10 fold cross validation 

tenfold <- bestScales(psychTools::bfi,criteria=cs(gender,age,education),fold=10,
           dictionary= psychTools::bfi.dictionary[1:3])
tenfold

#Then, to display the results graphically
#Note that we scale the two graphs with the same x.lim values

 error.dots(tenfold,eyes=TRUE,pch=16,xlim=c(0,.4),main="Comparing boot strap (bars) and k-fold (eyes)")
    error.dots(bestboot,add=TRUE,xlim=c(0,.4),main="")
```

## More graphic displays

We can show cohen.d effects with their confidence intervals using `cohen.d` and `error.dots`.

```{r cohen.d}
cd <- cohen.d(bfi ~ gender, dictionary=bfi.dictionary[,2,drop=FALSE])
error.dots(cd, head=13,tail=13,main="Cohen d for gender: bfi items")
abline(v=0)
```

We can also show scatter plots with densities by a grouping variable using the `scatterHist` function.  We do this for the bfi scores that we found before.

```{r scatterhist}
scores.irt.df <- data.frame(bfi.irt,gender=bfi$gender)
scatterHist(neuroticism ~ agree + gender, data=scores.irt.df,legend=1)
```

# Summary

Although `psych` is not the best package for any particular problem, it is useful tool for many problems. For more information using `psych`, visit the various tutorials available at the [personality-project](https://personality-project.org/r).


To see all the functions in psych use the objects function.

```{r objects}
objects(package:psych)
```


A minor comment on formatting in Rmarkdown. 
^[Note, that in r setup, if we specify that knitr::opts_chunk$set(comment="",echo = TRUE) then we suppress the ## marks at the beginning of output.  This is useful if trying to get text to copy.]