---
title: "350:week 2  correlation"
author: "William Revelle"
date: "4/1/2024"
output:
  html_document: default
  pdf_document: default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
options(width=100)
```

## Correlation using the Galton data set

Make the 'psych' and 'psychTools' packages active

```{r startup}
library(psych)
library(psychTools)
sessionInfo()
  
```

## Get the data 

We can get the 'Galton' data set by just calling it by name. It is a built in data set.

It is important to find the dimensions of the data and perhaps to describe the data.


```{r galton}
dim(galton)  #what are the dimensions
names(galton) #what are the variable names
describe(galton)  #basic descriptives
```
 
## Tabulate the data

First form the table using 'table', then sort it using the 'order' function

```{r tabulate}
galton.tab <- table(galton)
galton.tab   #this table is ordered from short parents to tall parents
rownames(galton.tab)
rank(rownames(galton.tab)) #string commands together
ord <- order(rank(rownames(galton.tab)),decreasing=TRUE)
ord
galton.tab[ord,]  #this table is now orderd from tall parents to short parents
```

```{r }
plot(galton)
```

That plot is not very helpful, because it does not show how many people are at each point.

Let use the 'jitter' command.

First set the random seed to a set value so the figures will agree

```{r}
set.seed(42)   #cite Adams, 1979
plot(galton,pch=20,col="blue") #show the original data points
points(jitter(galton[,1]),jitter(galton[,2]))  #add a little jitter

#note we used 'points' to add to the plot

set.seed(42)   #cite Adams, 1979
plot(galton,pch=20,col="blue") #show the original data points
points(jitter(galton[,1],2),jitter(galton[,2],2))  #add a little more jitter


set.seed(42)   #cite Adams, 1979
plot(galton,pch=20,col="blue") #show the original data points
points(jitter(galton[,1],5),jitter(galton[,2],5))  #add a little more jitter
```


### We can also display the means and error bars

```{r}
error.bars.by(child ~ parent,data=galton,eyes=FALSE,v.labels=63:73,main="Galton's Height data")

scatterHist(child ~ parent,data=galton ) #normal formula imput
scatterHist(jitter(galton$child,5) ,jitter(galton$parent,5),ylab="Child",xlab="Parent",main="Galton height data")  #but if the variables jittered, you need this alternative style


```

#Yet one more display  -- the 'pairs.panels' function

```{r}
pairs.panels(galton)
#but "jiggle" aka 'jitter" the points
pairs.panels(galton,jiggle=TRUE)

```

## create a function

```{r}
               # default values may be specified
small <- function(data=NULL,sample.size=20, n.iter=1000)  {
nsub <- nrow(data)    #this figures out the sampe size dynamically
result <- rep(NA,n.iter) #create this vector
#use a for loop to repeat the code inside the { }
for(i in 1:n.iter) { #repeat some code
   samp <- sample(nsub,sample.size,replace=TRUE) #boot strap resampling
    result[i] <- cor(data[samp,])[1,2]
         #find the correlation for this sample and save it
                      }    #end of the loop
return(result)   #return the value we find
}   #end of function
```

Use this function to generate some data

```{r}
test <- small(galton) #this uses the default valuex describe(test)
hist(test,breaks=21) #draw a histogram of the results
```
## now do it for a bunch of cases
```{r}

samp20 <- small(galton,20)
samp40 <- small(galton,40)
samp80 <- small(galton,80)
samp160 <- small(galton,160)
samp320 <- small(galton,320)
samp640<- small(galton,640)
sample.df <- data.frame(samp20 ,samp40, samp80, samp160,
           samp320,samp640)
describe(sample.df)
```
## Now, we try showing these results 

We show them several different ways and slowly make the figure better.

```{r violin}
violin(sample.df)
```
Add in error bars to the violin plot
```{r}

violin(sample.df)
error.bars(sample.df,add=TRUE)
```
Just show the error bars. Note that the current version of `psych' just shows 3 colors by default.  This has been fixed in the most recent version.  (Comimg soon)
```{r}
error.bars(sample.df) #this only has 3 colors
error.bars(sample.df, col=rainbow(6)) #six colors

#combine the violin and error bars
violin(sample.df, main="Density of boot strap resamples from Galton")
error.bars(sample.df, col=rainbow(ncol(sample.df)),add=TRUE) #six colors

```