---
title: "350 week 8 other topics"
author: "William Revelle"
date: "`r Sys.Date()`"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
options(width=100)
```

## Some miscellaneous topics

####  Getting the attibute information from an SPSS.sav file.
  
  
#### A function to do Hare voting


#### Bootstrapping a regression to find confidence intervals

## SPSS .sav files, the attributes feature
Although it is easy to read an SPSS.sav file to get the basic data, this does not recover the complex coding system embedded in SPSS.

Consider the `glbwarm.sav' data file in the class folder.

We will read the data in two different ways, one just getting the values, the other getting the various encodings saved by SPSS.

It turns out the the normal (in psych) read.file function does actually get the attribute information after all.  (Who knew?)
```{r}
library(psych)
library(psychTools)
fn <- 'https://personality-project.org/courses/350/datasets/glbwarm.sav'
glb <- read.file(fn)  #this gets the data and puts into a data.frame
dim(glb)  
names(glb)
colnames(glb)
attributes(glb)
library(foreign) #allows for more complicated reading of files
glb.sav <- read.spss(file=fn) #this uses the default values and saves everything as a list
names(glb.sav)
dim(glb.sav)  #it is a list and has a length but not a dim
attributes(glb.sav)
```

###  Why is this relevant?

One frequently wants to report the item names that go with a data set.

SPSS allows us to store the information with the data.

Normally, we include this information is a separate file (e.g. a dictionary associated with a data set).  e.g. 'fa.lookup' needs a dictionary.

# Hare scaling of voting behavior

Some voting systems and some organizations using a preference based voting system to choose officers. his is sometimes known as 'Hare' voting.   I was asked to take a set of rankings and find  convert them to a choice model.

```{r}
#consider the following ranks
data <- matrix(NA,20,6)
set.seed(42)

for (i in 1:20) {
data[i,]<- sample(6,6)
}

colnames(data) <- letters[1:NCOL(data)]
rownames(data) <- paste0("S",1:NROW(data))
dfOrder(data) #show it

describe(data)
apply(data,2,table)
colMeans(data)  #what is the mean ranking
```
# Thurstonian scaling of these data

```{r}
thurstone(data,ranks=TRUE)
```

## Lets write a function to do Hare voting

```{r}
#do hare ranking    
hare <- function(x, n=1) {
#loop over candidates
#find the person(s) with the fewest 1 place votes

while(NCOL(x) > n) {
#for(iter in (1:(NCOL(x)-n))) {  #continue until n +1 are left
count.first <- colSums(apply(x,2,function(x) x==1), na.rm=TRUE)
min.count <- min(count.first)
which.min <- which(count.first == min.count)
which.min <- which.min[1] #just do one candidate at a time
#reallocate their votes
who.min <- which(x[,which.min]==1, arr.ind=TRUE)
#the row is the person we want 
if(NROW(who.min )> 0) {
if(!is.null(dim(who.min))) who.min <- who.min[,1]

for(i in (1:length(who.min)))  {
 x[who.min[i],] <- x[who.min[i],] -1} 
 }  #reallocate their votes
x <- x[,-which.min,drop=FALSE]   #drop the lowest vote getter
   #do it again until the NCOL = the number we want

 }
 if(NCOL(x) > 1){
 count.first <- colSums(apply(x,2,function(x) x==1), na.rm=TRUE) 
  
 result <- sort(count.first,decreasing=TRUE)[1:n]} else {result <- colSums(x==1,na.rm=TRUE)
}
 
return(result)}   #show the winners
```
# Try the function with our data

```{r}
hare(data,6)
hare(data,5)
hare(data,4)
hare(data,3) #the top 3
hare(data,2) #the top 2
hare(data)  #just one winner

```


# Another problem Trying to show confidence intervals of regressions

Estimating confidence intervals from regression

1. This example is in response to a question from Mike Bailey

2. Given two predictors, what is their shared contribution

3. See the Venn diagram from Mike

4. We do this with a sample data set attitude

5. And then do some analyses

6. The case we use is actually an example of a suppressor, so
these are expressed in standardized regression coefficients.

7. We then bootstrap them to show their confidence intervals

## Use the attitude data set from core R

```{r}
dim(attitude)
describe(attitude)
lowerCor(attitude)
```

#  Apply the standard 'lm' function

```{r}
summary(lm(rating ~ complaints + privileges,data=attitude))
```

## Compare this to the `setCor' function

```{r}
 mod <-setCor(rating  ~ complaints + privileges,data=attitude,std=FALSE)
mod
```
## Standardized regression

```{r}
mod <-setCor(rating ~ complaints + privileges,data=attitude)
mod
names(mod)
```

Mike supplied a Venn Diagram of overlapping prdictors
a is the unique part of X1 and Y

b is the unique part of X2 and Y

c  is overlapping part of X1 , X2 and Y.

The question is what is c?

```{r}
r <- lowerCor(attitude[1:3])
a_c = r[1,2]^2    #the variance of a + c is rˆ2  of X1 with Y
b_c  <- r[1,3]^2  #the variance in b + c is rˆ2 of X2 with Y
R2 <- mod$R2  #from before
 a_b_c <- R2      #the total Rˆ2
 c <- (R2 - (a_c + b_c))/2   #From the Venn diagram
a_c
b_c
a_b_c
c

```
### What are the confidence intervals of c?

We need to do a bootstrap to find this.

## First make a small function to find the coefficients we want

```{r}
 find_c <- function(formula, data) {
 model <- setCor(formula,data=data,plot=FALSE)
 r <- cor(model$data[,-1])
 a_c = r[1,2]^2
 b_c  <- r[1,3]^2
 a_b_c <- model$R2
  c <- (R2 - (a_c + b_c))/2
  betas<- model$coefficients
  result <- list(R2=model$R2,coeffs = betas,c=c)
 return(result)}
 
```
 
 # try it
 
```{r}
 find_c (rating ~ complaints + privileges,data=attitude) #test it

```

 # Create a bootstrap function to call 'find_c' multiple times
 
 
```{r} 
boot.function <- function(formula, data,  n.iterations) {
result <- list()
nobs <- NROW(data)
for(i in 1:n.iterations  ){cases <-  sample(nobs,nobs,replace=TRUE)
boot.data <- data[cases,]
result[[i]] <- find_c(formula, data=boot.data)
}
rn <- rownames(result[[1]][["coeffs"]])
nc <- NROW(result[[1]][["coeffs"]]) + 2
result.df <- data.frame(matrix(unlist(result),ncol=nc,byrow=TRUE))
colnames(result.df) <- c("R2", rn, "c term")
return(result.df)
}

```
#now run it  (iterate 1000 times)


```{r}
temp <- boot.function(rating ~ complaints + privileges,attitude,1000)
describe(temp)

```