---
title: "Homework Week8"
author: "William Revelle"
date: "05/18/2022"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
options(width=100) 
```

# Finding reliability using the *alpha*, *omega* and *reliability* functions.

We have discussed a number of ways of estimating reliability.  For this assignment you are to 

a) use standard (*psych*) functions to find the `alpha` reliability for the *ability* data set.

b) You can  use several functions to do this (including `alpha`, `omega`, `scoreItems` and `reliability`).  These last two require more specifications to use.

```{r}
library(psych)
library(psychTools)
alpha(ability)
```
An alternative estimate to `alpha` is the pair of \omega  estimates:  $\omega_h$ and $\omega_t$.  Where `alpha` is based upon assumptions of item variances, `omega` performs 'model based' estimates by doing factor analysis. 


b) Use the *omega* function to find two other estimates ($\omega_h$ and $\omega_t$) for the ability data set.  

```{r}
omega(ability,4) #we use 4 factors rather than the default to 3
```
c) An alternative is to find the intra-class correlations by doing a variance decomposition of the test.  For this, we use the *ICC* function.

```{r}
ICC(ability)  #compare the 6 answers. Why are they different
```
# Part 2 Creating a function to find *alpha*

The $\alpha$ estimate of reliability developed by Cronbach (1951) is identical to the $\lambda_3$ coefficient of Guttman (1945) and the KR20 reliability of Kuder and Richardson (1937).  All of these formulas are shortcuts for estimating the average correlation within a test without actually finding the correlations. They are based upon a variance decompostion of the test.

The variance of the total test ($V_t$) is made up of the sum of the inter-item covarariances ($\sigma_{ij}$) off the diagonal and the item varances $\sigma_i^2$ on the diagonal.  The diagonal elements are a mixture of reliable signal and unreliable noise. 

Reliability is the ratio of reliable signal to total signal and may be found by

$$\alpha = \frac{V_t - \Sigma (\sigma_i^2)}{V_t}* \frac{k}{k-1}$$ 
where k is the number of items.

Thus, to find $\alpha$ you need to write a function to find the total test variance ($V_t$) and each of the item variances.

The *var* command will find variances, but first you need to find a total score by adding up the responses for each individual.  Given a data.frame or matrix X with elements $x_ij$ this means you need find a total for each subject 

total = rowSums(X)

and then find the variances of total, and for each column of X (hint, use apply)

##Assignment d) 

Create a function (my.alpha) and then compare your results to the *alpha* function.

```{r}
my.alpha <- function(x) {
  total.score <- apply(x,1,sum,na.rm=TRUE)
  total.var <- var(total.score)
  item.var <- apply(x,2,var,na.rm=TRUE)
  k <- ncol(x)
  alpha <-  (total.var - sum(item.var))/total.var * (k/(k-1))
  return(alpha)
}
my.alpha(attitude)
```

## Some comments.

This procedure works for the case of no missing data.  But if you have missing data the technique of finding the total.score will attribute a missing response to no response.  We have two alternatives: substitute in some value for the missing response (e.g., the item mean or item median for the other subjects who have scores on the item) or we can find mean scores.  This imputes the person's mean score to the missing data, rather than the item mean.  

If we do the later (find means) then we need to adjust the formula for alpha, because the variance of the mean scores will be $k^2$ smaller than that of total scores.  

```{r}
my.alpha.means <- function(x) {
   k <- ncol(x)
  total.score <- apply(x,1,mean,na.rm=TRUE)
  total.var <- var(total.score,na.rm=TRUE) * k^2  #this corrects for the fact that we are using means
  item.var <- apply(x,2,var,na.rm=TRUE)
 
  alpha <-  (total.var - sum(item.var))/total.var * (k/(k-1))
  return(alpha)
}
my.alpha.means(attitude)
```

## An alternative imputation

An alternative way of imputing the missing items is to give total score (estimated) based upon the mean of the responses scaled by the number anwered.


```{r}
my.imputed.alpha <- function(x) {
   k <- ncol(x)
  mean.score <- apply(x,1,mean,na.rm=TRUE)
  total.answered <-apply(x,1,function(x) sum(!is.na(x),na.rm=TRUE))
  adjusted.score <- mean.score * k 
  total.var <- var(adjusted.score,na.rm=TRUE)   
  item.var <- apply(x,2,var,na.rm=TRUE)
 
  alpha <-  (total.var - sum(item.var))/total.var * (k/(k-1))
  return(alpha)
}
my.alpha.means(attitude)
```

# This gives a different result when finding alpha for ability data set.  


```{r}
my.alpha(ability)
#compare with
my.alpha.means(ability)
# and with
my.imputed.alpha(ability)
```


## Yet more ways of exploring reliability

# The split half reliability

This the correlation between two random split halfs of the test, which are then adjusted for test length using a formula derived from Spearman and Brown in 1910.

$r_{xx} = \frac{2r}{1+r}$

We can specify the splits, or let the `splitHalf` function find all possible splits.

```{r}
sp <- splitHalf(ability,raw=TRUE)  #we specify that we want to keep the results
sp  #show them
hist(sp$raw)    #draw a histogram of distributions of the split halfs
``` 

# Three other functions for reliability

`alpha' will find the $\alpha$ value for one scale.  `scoreItems` will do this for one or for many scales.  This requires specify a 'keys' list.  `scoreOverlap` will also find reliability estimates, but correcting the correlations between scales for item overlap.

The `reliability` function is a wrapper for `alpha', `omega` and `splitHalf`.  If not given a keys list, it will assume that all items are to be used. If given a keys list, it will score each scale individually.

For the final part of the homework, try all three of these procedures:

```{r}
bfi.keys  #show the keying information
scales <- scoreItems(bfi.keys,bfi)
summary(scales)
```

## `reliability` has multiple options to display the results -- 

Try the various options  (I just show one set of output. Try others.)

```{r}
rel <- reliability(bfi.keys,bfi)
rel
plot(rel)
```

# Final problem:  Use the reliability function on a different data set