Exploring distributions

Before Gosset (1908) the primary means of testing for differences of means was to use the z-test. This would compare differences to what you would expect given normal theory. Unfortunately, this required at least 30 subjects to be reasonably precise. The reason for this is that the standard error was based upon the observed sample standard deviation which will be smaller than the population value. We can show this problem by comparing the population standard deviation with that of a series of samples of varying size. (We are showing what Gosset derived. )

We will first generate a random distribution and then take samples from this of varying size. We examine the standard deviation of these samples.

set.seed(42)   #set a reproducible seed for our random numbers.
Population <- runif(10000) #a uniform distribution
hist(Population)  #show what it looks like.
library(psych)

library(psychTools)
describe(Population)
##    vars     n mean   sd median trimmed  mad min max range  skew kurtosis se
## X1    1 10000  0.5 0.29    0.5     0.5 0.37   0   1     1 -0.01    -1.21  0

First lets show the central limit theorem

One of the most amazing statistical results is known as the central limit theorem. For any distribution with finite variance, if we take progressively bigger samples from the population, the distribution of the means of those samples will tend towards the normal distribution. (The CLT does not apply to distributions with infinite variance. )

What follows is a very simple way of showing the CLT. Note that we define the limits of the x-axis to be the same in each figure. We take 1000 samples each of size 2, 4, 8, 16 and 32 and then plot them. This is an example of not very clean code, but simple to understand.

x2 <- (runif(1000)+ runif(1000))/2  # mean of two samples 
x4 <- (runif(1000)+ runif(1000) +runif(1000)+ runif(1000))/4  # mean of four samples
x8 <- (runif(1000)+ runif(1000) +runif(1000)+ runif(1000)+runif(1000)+ runif(1000) +runif(1000)+ runif(1000))/8
x16 <- (runif(1000)+ runif(1000) +runif(1000)+ runif(1000)+runif(1000)+ runif(1000) +runif(1000)+ runif(1000)+
         runif(1000)+ runif(1000) +runif(1000)+ runif(1000)+runif(1000)+ runif(1000) +runif(1000)+ runif(1000))/16
x32 <- (runif(1000)+ runif(1000) +runif(1000)+ runif(1000)+runif(1000)+ runif(1000) +runif(1000)+ runif(1000)+
         runif(1000)+ runif(1000) +runif(1000)+ runif(1000)+runif(1000)+ runif(1000) +runif(1000)+ runif(1000)+
  runif(1000)+ runif(1000) +runif(1000)+ runif(1000)+runif(1000)+ runif(1000) +runif(1000)+ runif(1000)+
         runif(1000)+ runif(1000) +runif(1000)+ runif(1000)+runif(1000)+ runif(1000) +runif(1000)+ runif(1000))/32
hist(x2,xlim=c(0,1))

hist(x4, xlim=c(0,1))

hist(x8,xlim=c(0,1))

hist(x16,xlim=c(0,1))

hist(x32,xlim=c(0,1))

#combine them all into one data frame
central.limit <- data.frame(one=Population, two=x2,four=x4,eight=x8,sixteen=x16,thirtytwo=x32)
describe(central.limit)
##           vars     n mean   sd median trimmed  mad  min  max range  skew kurtosis se
## one          1 10000 0.50 0.29   0.50    0.50 0.37 0.00 1.00  1.00 -0.01    -1.21  0
## two          2 10000 0.50 0.21   0.51    0.50 0.23 0.01 0.99  0.98 -0.07    -0.65  0
## four         3 10000 0.49 0.14   0.49    0.49 0.15 0.14 0.92  0.79  0.05    -0.41  0
## eight        4 10000 0.50 0.10   0.50    0.50 0.11 0.19 0.83  0.64  0.01    -0.17  0
## sixteen      5 10000 0.50 0.07   0.50    0.50 0.07 0.26 0.71  0.44  0.00     0.15  0
## thirtytwo    6 10000 0.50 0.05   0.50    0.50 0.05 0.37 0.66  0.29  0.12    -0.04  0

Note that the mean does not change as the sample size increases, but the standard deviation decreases. Four times the sample size leads to half the standard deviation.

An improved demonstration using R

The above code is rather clunky. It will do the job but is not easy to use. Lets consider some more useful code. This code has more flexibility. It has three characteristics:

  1. The first part specifies the parameters we will use

  2. The next part runs a `loop’ to do something multiple times

  3. The final part presents the results

prob.size <- c(2,4,8,16,32,64) #create a vector describing the problem size
n.obs <- 10  #specify the number of subjects to simulate
#create two matrices to store the results.  We do this dynamically to be flexible
results.mean <- matrix(NA,ncol=length(prob.size), nrow= n.obs)
results.sd <- results.mean   #this just sets up the place to store the results

#now run a loop
for (k in 1:length(prob.size)) {    #do this as many times as we have problems 
  n <- prob.size[k]               #For each problem size, specify the number of samples  
  temp <- matrix(runif(n * n.obs),ncol=n)    #create a  n.obs x n  matrix of random numbers
  results.mean[,k] <- apply(temp,1,mean)   #find the mean of these numbers for each row
  results.sd[,k] <-apply(temp,1,sd)
}
colnames(results.mean) <- colnames(results.sd) <- paste("N=",prob.size)
describe(results.mean)
##       vars  n mean   sd median trimmed  mad  min  max range  skew kurtosis   se
## N= 2     1 10 0.43 0.12   0.44    0.42 0.12 0.26 0.65  0.39  0.25    -1.10 0.04
## N= 4     2 10 0.58 0.14   0.58    0.59 0.12 0.32 0.80  0.49 -0.25    -0.66 0.04
## N= 8     3 10 0.55 0.10   0.51    0.55 0.07 0.40 0.69  0.29  0.29    -1.52 0.03
## N= 16    4 10 0.49 0.06   0.49    0.49 0.06 0.38 0.57  0.19 -0.30    -1.07 0.02
## N= 32    5 10 0.49 0.04   0.50    0.49 0.02 0.41 0.56  0.15 -0.51    -0.75 0.01
## N= 64    6 10 0.50 0.05   0.51    0.51 0.06 0.40 0.55  0.15 -0.72    -0.69 0.02
describe(results.sd)
##       vars  n mean   sd median trimmed  mad  min  max range  skew kurtosis   se
## N= 2     1 10 0.30 0.17   0.27    0.29 0.18 0.05 0.60  0.55  0.22    -1.13 0.05
## N= 4     2 10 0.28 0.06   0.29    0.28 0.06 0.16 0.35  0.19 -0.56    -1.04 0.02
## N= 8     3 10 0.26 0.07   0.27    0.26 0.03 0.15 0.41  0.26  0.33    -0.29 0.02
## N= 16    4 10 0.29 0.03   0.29    0.29 0.04 0.24 0.34  0.11  0.01    -1.24 0.01
## N= 32    5 10 0.30 0.02   0.30    0.30 0.02 0.26 0.32  0.06 -0.30    -1.19 0.01
## N= 64    6 10 0.29 0.02   0.28    0.29 0.01 0.26 0.31  0.05  0.08    -1.31 0.01
multi.hist(results.mean)

multi.hist(results.sd)

What do these results show? The mean of the sample does not change as sample size goes up, but the variance of the sample means goes down by the square root of the sample size (compare 2 to 8, 4 to 16, etc.). A more subtle change, however, is that the standard deviation of the sample values (shown as the means of results.sd) gradually increase as the sample size increases, tending towards the population value (.29). It is also clear that the distribution of sample standard deviations is very noisy for small samples.

That is, the sample standard deviation underestimates the population standard deviation. Thus, in any test comparing means, if we use the sample standard deviation, we need to adjust for the size of the sample. This was the realization of William Gosset when he developed the t.test.

Creating a little function for this demonstration.

Perhaps even more convenient is to take the script we had before, and turn it into a little function. This allows us to call the function without worrying about the various variable names inside the function.

1 To create the function, we give it a name (in this case central.limit)

2 and the list of parameters to pass to the function (the number of observations and the problem size)

3 These parameters are the default values if you do not specify your own.

4 the function then is defined by everything between the { and } brackets

central.limit <- function(n.obs= 1000, prob.size = c(1,2,4,8,16,32)) {  #this is the beginning of the function
#specify the number of subjects to simulate
#create two matrices to store the results.  We do this dynamically to be flexible
results.mean <- matrix(NA,ncol=length(prob.size), nrow= n.obs)
results.sd <- results.mean
#now run a loop
for (k in 1:length(prob.size)) {
  i <- prob.size[k]
  temp <- matrix(runif(i * n.obs),ncol=i)
  results.mean[,k] <- apply(temp,1,mean)
  results.sd[,k] <-apply(temp,1,sd)
}
colnames(results.mean) <- colnames(results.sd) <- paste("N=",prob.size)
results <- list(mean=results.mean,sd=results.sd)  #we are saving the results in a list with two element
return(results)
}   #this is the last line of the function

Now run our function

But specify different parameters than the default

prob.size = c(1,2,3,4,4,5,6,7,8,9,10)
test <- central.limit(n.obs=10000, prob.size=prob.size) #we change the default values
names(test) #what are the objects returned by test?
## [1] "mean" "sd"
d.sd <- describe(test$sd) #show the standard 
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
plot(d.sd$mean,typ="b",main="Sample SD is biased downwards")

But that graph does not show sample size varying as clearly as it really is. Try to make a clearer figure by specifying the x and y labels

sd.df <- data.frame(size=prob.size,sd=d.sd$mean)
with(sd.df,plot(sd ~ size,typ="b" ,ylab="sample sd",main="Sample SD is biased downwards"))

Run the function again, with the default values

another.test <- central.limit()  #we take the default values
dim(another.test)
## NULL
names(another.test)
## [1] "mean" "sd"
describe(another.test$mean)
##       vars    n mean   sd median trimmed  mad  min  max range  skew kurtosis   se
## N= 1     1 1000 0.49 0.29   0.48    0.49 0.38 0.00 1.00  1.00  0.07    -1.22 0.01
## N= 2     2 1000 0.49 0.20   0.50    0.50 0.22 0.01 0.99  0.98 -0.12    -0.64 0.01
## N= 4     3 1000 0.49 0.14   0.49    0.49 0.14 0.11 0.90  0.79  0.08    -0.16 0.00
## N= 8     4 1000 0.50 0.10   0.50    0.50 0.11 0.18 0.79  0.61 -0.01    -0.36 0.00
## N= 16    5 1000 0.50 0.07   0.50    0.50 0.08 0.27 0.73  0.46 -0.02    -0.03 0.00
## N= 32    6 1000 0.50 0.05   0.50    0.50 0.05 0.34 0.67  0.33  0.06     0.02 0.00
describe(another.test$sd)
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
##       vars    n mean   sd median trimmed  mad  min  max range  skew kurtosis   se
## N= 1     1    0  NaN   NA     NA     NaN   NA  Inf -Inf  -Inf    NA       NA   NA
## N= 2     2 1000 0.24 0.17   0.21    0.22 0.20 0.00 0.70  0.70  0.58    -0.65 0.01
## N= 4     3 1000 0.27 0.09   0.27    0.27 0.10 0.02 0.52  0.49 -0.14    -0.50 0.00
## N= 8     4 1000 0.28 0.05   0.28    0.28 0.05 0.09 0.44  0.35 -0.16    -0.10 0.00
## N= 16    5 1000 0.29 0.04   0.29    0.29 0.03 0.16 0.39  0.23 -0.16     0.15 0.00
## N= 32    6 1000 0.29 0.02   0.29    0.29 0.02 0.21 0.37  0.16 -0.04     0.21 0.00

Another demonstration of central limit - binomial as samples increase

The binomial distribution is just a coin flip. For a 1,000 coins, if we plot the distributions for single coins, roughly half will be 0 the other half 1.

But what if we plot the distribution of flipping two coins 1,000 times? What about 4, what about 8, 16, or 32.

Lets do this by simulation.

n.obs  <- 10000
size <- c(1,2,4,8,16,32)
x <- matrix(NA,nrow=n.obs,ncol=length(size))
for(i in 1:length(size)) { x[,i] <- rbinom(n.obs,size[i],prob=.5)}
multi.hist(x)

As the number of coins being flipped goes up, the average (or total in this case) becomes more normally distributed.

Gosset and the t-distribution

Gosset recognized that the distribution of means for small samples has less variance than big samples (see above). He developed a distribution (the t-distribution) to account for this. Given that his employer did not to encourage others to realize how powerful this statistic was, he published anonomously under the name of Student. And thus we have the Student's t distribution.

We can compare the t distribution with the normal distribution using the built in t and normal distributions.

We use the curve function with the second plot added to it.

curve(dnorm(x),-3,3,main="Comparing the Normal to the t distribution (df=4) ",ylab="density")
curve(dt(x,df=2),-3,3,lty="dotted",add=TRUE)
curve(dt(x,df=3),-3,3,lty=4,add=TRUE)
curve(dt(x,df=4),-3,3,lty="dashed",add=TRUE)
curve(dt(x,df=8),-3,3,lty=3,add=TRUE)
curve(dt(x,df=16),-3,3,lty=5,add=TRUE)

The t-test

The t-test, perhaps the fundamental statistical test, was developed by William Gosset who published under the name of Student (1908). The t-test is a small sample approximation to the large sample (z) test. His first example was a data set on the different effect of optical isomers of hyoscyamine hydrobromide reported by Cushny & Peebles (1905). The sleep of 10 patients was measured without any drug and then following administration of D. and L isomers. The data from Cushny are available as the cushny data set.

Before we discuss the t-test, we consider the boot strap as an alternative.

library(psychTools)
cushny
##    Control drug1 drug2L drug2R delta1 delta2L delta2R
## 1      0.6   1.3    2.5    2.1    0.7     1.9     1.5
## 2      3.0   1.4    3.8    4.4   -1.6     0.8     1.4
## 3      4.7   4.5    5.8    4.7   -0.2     1.1     0.0
## 4      5.5   4.3    5.6    4.8   -1.2     0.1    -0.7
## 5      6.2   6.1    6.1    6.7   -0.1    -0.1     0.5
## 6      3.2   6.6    7.6    8.3    3.4     4.4     5.1
## 7      2.5   6.2    8.0    8.2    3.7     5.5     5.7
## 8      2.8   3.6    4.4    4.3    0.8     1.6     1.5
## 9      1.1   1.1    5.7    5.8    0.0     4.6     4.7
## 10     2.9   4.9    6.3    6.4    2.0     3.4     3.5

Note that the last 3 columns are just the differences from the control condition.
## Desciptive statistics of the Cushny data

describe(cushny)
##         vars  n mean   sd median trimmed  mad  min max range  skew kurtosis   se
## Control    1 10 3.25 1.78   2.95    3.21 1.63  0.6 6.2   5.6  0.20    -1.23 0.56
## drug1      2 10 4.00 2.10   4.40    4.04 2.59  1.1 6.6   5.5 -0.25    -1.67 0.66
## drug2L     3 10 5.58 1.66   5.75    5.66 1.41  2.5 8.0   5.5 -0.30    -0.99 0.53
## drug2R     4 10 5.57 1.91   5.30    5.66 1.56  2.1 8.3   6.2 -0.09    -1.07 0.60
## delta1     5 10 0.75 1.79   0.35    0.67 1.56 -1.6 3.7   5.3  0.42    -1.30 0.57
## delta2L    6 10 2.33 2.00   1.75    2.24 2.45 -0.1 5.5   5.6  0.28    -1.66 0.63
## delta2R    7 10 2.32 2.27   1.50    2.28 2.59 -0.7 5.7   6.4  0.23    -1.68 0.72

It is hard to see patterns in simple descriptives, but a graphical display will help.

A graphical display of the data

We can show these data graphically using the error.bars function. We pass labels to the x and y axis using the xlab and ylab parameters, and then supply an appropriate figure title (main). We will use these data to show how to do t-tests as well as the generalization to Analysis of Variance.

First we show this the tradition way in terms of error bars, and then we add in cats eyes to show the range of confidence.

error.bars(cushny,xlab="Group",ylab="hours of sleep",
         main="The effect of drug upon sleep (95% confidence)",eyes=FALSE)

error.bars(cushny,xlab="Group",ylab="hours of sleep",
         main="The effect of drug upon sleep (95% confidence)",col=rainbow(8))#the default is to show cats eyes

Bootstrap resampling

Gosset/Student developed the t-test and derived the t-distribution. We can also find out what happens if we do many random samples from the data. We think of the observed data as our population and then sample from that population. We do the sampling with replacement because otherwise we would always just end up with the same data.

This involves using the sample function to take random samples from the original data (with replacement). As an example, consider sampling from the numbers 1-10. As usual, because we want to see and discuss the results, we set the random seed to a specific value (I tend to use 42 because then I can cite Adams (1962).) In a production run, do not set the seed and it is based on the computer clock (ms since startup).

set.seed(42) #to get a random ordering that can be used for teaching purposes.
original <- 1:10
sample(original,10) #don't specify replacement and you get a permutation.
##  [1]  1  5 10  8  2  4  6  9  7  3
sample(original,10, replace=TRUE) #notice the repetition of  10
##  [1]  8  7  4  9  5  4 10  2  3  9
sample(original,10, replace=TRUE) # which happened again!
##  [1]  9  4  5  5  4  2  8  3 10  1
sample(original,10, replace=TRUE)  # now we get a repetition of 1 and 7 
##  [1] 10  8  6 10  8  4  4  6  2  5

Boostraping the cushny data

Now, use this trick of sampling to choose subjects, find the mean of the columns for each sample, and then show those. We create a baby function. This is understandable but not elegant.

small <- function(data,n.trials=100) { #name the function and give the default value of 100, 
    subject <- 1:NROW(data) # dynamically set this
    results <- matrix(NA,ncol=NCOL(data),nrow=n.trials) #we store the results here
  
  #get it ready to store the data
    for (i in 1:n.trials) {#we are going to execute a loop
      new.order <- sample(subject,NROW(data),replace=TRUE) #find a new order
     results[i,] <- colMeans(data[new.order,]) #find the column means for each variable
  }   # end of loop
  
    colnames(results) <- colnames(data)  #name the results
    return(results)  
}  #end of small function

#now run the function
temp <- small(cushny,1000)  #specify that we want 1000 samples
error.bars(temp,sd=TRUE,eyes=FALSE)  #draw the results

describe(temp,quant=c(.025,.5,.975))  #show the top and bottom 2.5%
##         vars    n mean   sd median trimmed  mad   min  max range  skew kurtosis   se Q0.025 Q0.5 Q0.975
## Control    1 1000 3.24 0.53   3.24    3.24 0.57  1.61 4.71  3.10  0.02    -0.23 0.02   2.22 3.24   4.24
## drug1      2 1000 4.00 0.60   4.01    4.00 0.59  1.85 5.91  4.06 -0.10     0.22 0.02   2.75 4.01   5.23
## drug2L     3 1000 5.58 0.50   5.59    5.59 0.47  3.60 7.07  3.47 -0.23     0.50 0.02   4.53 5.59   6.53
## drug2R     4 1000 5.57 0.57   5.58    5.57 0.54  3.29 7.39  4.10 -0.16     0.60 0.02   4.41 5.58   6.67
## delta1     5 1000 0.75 0.53   0.74    0.75 0.52 -0.76 2.52  3.28  0.13    -0.04 0.02  -0.26 0.74   1.83
## delta2L    6 1000 2.33 0.60   2.31    2.32 0.62  0.79 4.28  3.49  0.19    -0.16 0.02   1.21 2.31   3.57
## delta2R    7 1000 2.32 0.67   2.31    2.31 0.70  0.27 4.38  4.11  0.15    -0.13 0.02   1.05 2.31   3.65

Note how the top and bottom 2.5% quantiles of the differences (delta1, delta2L and delta2R) do (delta1) or do not (delta2L and delta2R) include 0. We conclude that it unlikely that the latter two differences are equal to 0. Also note that the 95% confidence intervals of the drugs do not overlap with that of the control.

Student’s t.test as done by Student

Unfortunately, Student did not have R. Therefore he developed a test (comparing the means to their standard errors) and the distribution of that test (the t-distribution)

The t.test is a function core-R. It may be used in several different ways.

#this first approach uses the with command that allows us to specify just the variable name
with(cushny,t.test(delta1)) #control versus drug 1 difference scores 
## 
##  One Sample t-test
## 
## data:  delta1
## t = 1.3257, df = 9, p-value = 0.2176
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.5297804  2.0297804
## sample estimates:
## mean of x 
##      0.75
with(cushny,t.test(delta2L)) #control versus drug2L difference scores 
## 
##  One Sample t-test
## 
## data:  delta2L
## t = 3.6799, df = 9, p-value = 0.005076
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.8976775 3.7623225
## sample estimates:
## mean of x 
##      2.33
with(cushny,t.test(delta1,delta2L,paired=TRUE)) #difference of differences
## 
##  Paired t-test
## 
## data:  delta1 and delta2L
## t = -4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -2.4598858 -0.7001142
## sample estimates:
## mean difference 
##           -1.58
#or, we can just do it directly by specifying the object and the variable name
t.test(cushny$delta1)
## 
##  One Sample t-test
## 
## data:  cushny$delta1
## t = 1.3257, df = 9, p-value = 0.2176
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.5297804  2.0297804
## sample estimates:
## mean of x 
##      0.75
t.test(cushny$delta2L)
## 
##  One Sample t-test
## 
## data:  cushny$delta2L
## t = 3.6799, df = 9, p-value = 0.005076
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.8976775 3.7623225
## sample estimates:
## mean of x 
##      2.33

The previous error bars do not reflect the correlated nature of the data

lowerCor(cushny[1:4])  #show the correlations
##         Cntrl drug1 drg2L drg2R
## Control 1.00                   
## drug1   0.59  1.00             
## drug2L  0.32  0.81  1.00       
## drug2R  0.25  0.76  0.95  1.00
#plot the error bars correcting for the correlation
error.bars(cushny[1:4],xlab="Group",ylab="hours of sleep",
         main="The effect of drug upon sleep (95% confidence)",within=TRUE, col=rainbow(4))

Show this as a two panel graph, don’t use cats eyes

We can compare these two approaches in one two panel graph. We specify the number of rows and columns using the par(mfrow=c(r,c)) command.

op <- par(mfrow=c(1,2)) #one row, two columns

error.bars(cushny[1:4],xlab="Group",ylab="hours of sleep",
         main="Not repeated measures",eyes=FALSE,ylim=c(1,8))
error.bars(cushny[1:4],xlab="Group",ylab="hours of sleep",  eyes=FALSE,
         main="repeated measures ",within=TRUE,ylim=c(1,8))
op <- par(mfrow=c(1,1))   #set this back to the default
title("Effect of drug upon sleep",line=3)  #add a general title

error.bars(cushny[1:4],xlab="Group",ylab="hours of sleep",
         main="Not repeated measures",eyes=FALSE,ylim=c(1,8))
error.bars(cushny[1:4],xlab="Group",ylab="hours of sleep",  eyes=FALSE,
         main="repeated measures ",within=TRUE,ylim=c(1,8),add=TRUE,arrow.col="blue")

title("Effect of drug upon sleep",line=3)  #add a general title

Reorganization of the data frame: from wide to long

We can take the wide format of the cushny data set and make it long. The long format is used in a variety of statistical tests. The stack command will do this for us.

long.sleep <-
  stack(cushny[c("delta1","delta2L")])
long.sleep  #display the long format
##    values     ind
## 1     0.7  delta1
## 2    -1.6  delta1
## 3    -0.2  delta1
## 4    -1.2  delta1
## 5    -0.1  delta1
## 6     3.4  delta1
## 7     3.7  delta1
## 8     0.8  delta1
## 9     0.0  delta1
## 10    2.0  delta1
## 11    1.9 delta2L
## 12    0.8 delta2L
## 13    1.1 delta2L
## 14    0.1 delta2L
## 15   -0.1 delta2L
## 16    4.4 delta2L
## 17    5.5 delta2L
## 18    1.6 delta2L
## 19    4.6 delta2L
## 20    3.4 delta2L

Now do the t-test on these data

t.test(values ~ ind,data=long.sleep)   #ignoring the paired nature of the data
## 
##  Welch Two Sample t-test
## 
## data:  values by ind
## t = -1.8608, df = 17.776, p-value = 0.07939
## alternative hypothesis: true difference in means between group delta1 and group delta2L is not equal to 0
## 95 percent confidence interval:
##  -3.3654832  0.2054832
## sample estimates:
##  mean in group delta1 mean in group delta2L 
##                  0.75                  2.33
## Paired t-test
## The sleep data is actually paired, so could have been in wide format:
sleep2 <- reshape(sleep, direction = "wide",
                  idvar = "ID", timevar = "group")

## Traditional interface
t.test(sleep2$extra.1, sleep2$extra.2, paired = TRUE)
## 
##  Paired t-test
## 
## data:  sleep2$extra.1 and sleep2$extra.2
## t = -4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -2.4598858 -0.7001142
## sample estimates:
## mean difference 
##           -1.58
## Formula interface
t.test(Pair(extra.1, extra.2) ~ 1, data = sleep2)
## 
##  Paired t-test
## 
## data:  Pair(extra.1, extra.2)
## t = -4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -2.4598858 -0.7001142
## sample estimates:
## mean difference 
##           -1.58

The Cushny data are also available in the sleep data set.

This is perhaps a obvious way of organizing the data (in long format)

sleep
##    extra group ID
## 1    0.7     1  1
## 2   -1.6     1  2
## 3   -0.2     1  3
## 4   -1.2     1  4
## 5   -0.1     1  5
## 6    3.4     1  6
## 7    3.7     1  7
## 8    0.8     1  8
## 9    0.0     1  9
## 10   2.0     1 10
## 11   1.9     2  1
## 12   0.8     2  2
## 13   1.1     2  3
## 14   0.1     2  4
## 15  -0.1     2  5
## 16   4.4     2  6
## 17   5.5     2  7
## 18   1.6     2  8
## 19   4.6     2  9
## 20   3.4     2 10

Do the t.test again, but on the sleep data

with(sleep,t.test(extra~group))   #does not assume equal variances
## 
##  Welch Two Sample t-test
## 
## data:  extra by group
## t = -1.8608, df = 17.776, p-value = 0.07939
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
##  -3.3654832  0.2054832
## sample estimates:
## mean in group 1 mean in group 2 
##            0.75            2.33
 with(sleep,t.test(extra~group,var.equal=TRUE))  #assume equal variances
## 
##  Two Sample t-test
## 
## data:  extra by group
## t = -1.8608, df = 18, p-value = 0.07919
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
##  -3.363874  0.203874
## sample estimates:
## mean in group 1 mean in group 2 
##            0.75            2.33
# with(sleep,t.test(extra~group,paired=TRUE))   #noticing that the data were paired

Analysis of Variance (aov) is a special case of the the linear model and is a generalization of the t-test

was developed as a way of analyzing agricultural data (hence the terminology of such things as a split plot design). Thus one could analyze multiple seeds crossed with multiple fertilizers. The application to psychology is particular used in the case of experimental designs where two or more variables are experimentally “crossed”. It is important to understand although most modern approaches generalize it to the case of the linear model.

Rather than directly comparing means (the way t-test does), it considers the variance of the means and the variance within groups. The terminology is a decomposition of the Sums of Squares (SS) in terms of Sums of Squares between groups and Sums of Squares within groups. When divided by their degrees of freedom, Sums of Squares become Mean Squares which is an estimate of variance. The ratio of the these two variances came to be called the F-test (in honor of Ronald Fisher).

Fisher generalized the t-test to the case of more than two groups.

  1. aov provides a wrapper to lm for fitting linear models to balanced or unbalanced experimental designs.

  2. The main difference from lm is in the way print, summary and so on handle the fit: this is expressed in the traditional language of the analysis of variance rather than that of linear models.

  3. If the formula contains a single Error term, this is used to specify error strata, and appropriate models are fitted within each error stratum.

  4. The formula can specify multiple responses.

  5. aov is designed for balanced designs, and the results can be hard to interpret without balance: beware that missing values in the response(s) will likely lose the balance.

  6. If there are two or more error strata, the methods used are statistically inefficient without balance, and it may be better to use lme in package nlme.

for the sleep data set, compare with the t.test results

#independent subjects
summary(aov(extra ~ group, data=sleep))
##             Df Sum Sq Mean Sq F value Pr(>F)  
## group        1  12.48  12.482   3.463 0.0792 .
## Residuals   18  64.89   3.605                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#t.test
t.test(extra ~ group, equal.var=TRUE, data =sleep)
## 
##  Welch Two Sample t-test
## 
## data:  extra by group
## t = -1.8608, df = 17.776, p-value = 0.07939
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
##  -3.3654832  0.2054832
## sample estimates:
## mean in group 1 mean in group 2 
##            0.75            2.33
#correlated subjects
summary(aov(extra~group + Error(ID),data=sleep))
## 
## Error: ID
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals  9  58.08   6.453               
## 
## Error: Within
##           Df Sum Sq Mean Sq F value  Pr(>F)   
## group      1 12.482  12.482    16.5 0.00283 **
## Residuals  9  6.808   0.756                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#and the associated t.test
# t.test(extra ~ group ,data=sleep ,paired=TRUE)

The lm function is the linear model. Compare this to the results

model <- lm(extra ~ group,data=sleep)
model  #reports some of the statistics
## 
## Call:
## lm(formula = extra ~ group, data = sleep)
## 
## Coefficients:
## (Intercept)       group2  
##        0.75         1.58
summary(model) # a more common way of describing the results
## 
## Call:
## lm(formula = extra ~ group, data = sleep)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -2.430 -1.305 -0.580  1.455  3.170 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   0.7500     0.6004   1.249   0.2276  
## group2        1.5800     0.8491   1.861   0.0792 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.899 on 18 degrees of freedom
## Multiple R-squared:  0.1613, Adjusted R-squared:  0.1147 
## F-statistic: 3.463 on 1 and 18 DF,  p-value: 0.07919

Note that both the lm and the are not treating the repeated measures quality of the data

The aov and lm analyses did not treat the repeated measures of the data appropriately. This can be done with repeated measures or lme for the linear model.

A multifactorial design

(Taken from the help menu for aov)

Consider yield (remember, this was developed in agriculture) as a function of Nitrogen, Potassium and Phosphate. What is the effect of each variable seperately and in combination.

headTail(npk) #show a little bit of the data
##     block    N    P    K yield
## 1       1    0    1    1  49.5
## 2       1    1    1    0  62.8
## 3       1    0    0    0  46.8
## 4       1    1    0    1    57
## ...  <NA> <NA> <NA> <NA>   ...
## 21      6    1    0    1  57.2
## 22      6    1    1    0    59
## 23      6    0    1    1  53.2
## 24      6    0    0    0    56
pairs.panels(npk) # A good experiment makes the experimental conditions orthogonal

npk   #The data come from Venables
##    block N P K yield
## 1      1 0 1 1  49.5
## 2      1 1 1 0  62.8
## 3      1 0 0 0  46.8
## 4      1 1 0 1  57.0
## 5      2 1 0 0  59.8
## 6      2 1 1 1  58.5
## 7      2 0 0 1  55.5
## 8      2 0 1 0  56.0
## 9      3 0 1 0  62.8
## 10     3 1 1 1  55.8
## 11     3 1 0 0  69.5
## 12     3 0 0 1  55.0
## 13     4 1 0 0  62.0
## 14     4 1 1 1  48.8
## 15     4 0 0 1  45.5
## 16     4 0 1 0  44.2
## 17     5 1 1 0  52.0
## 18     5 0 0 0  51.5
## 19     5 1 0 1  49.8
## 20     5 0 1 1  48.8
## 21     6 1 0 1  57.2
## 22     6 1 1 0  59.0
## 23     6 0 1 1  53.2
## 24     6 0 0 0  56.0
mod1 <- aov(yield ~ N,data=npk)
mod2 <- aov(yield ~ N+ P + N*P,data=npk)
mod2a <- aov(yield ~N*P,data=npk)
mod3 <- aov(yield ~ N*P*K,data=npk)
mod4 <- aov(yield ~ block + N*P*K,data=npk)
summary(mod1)  #Just the effect of N
##             Df Sum Sq Mean Sq F value Pr(>F)  
## N            1  189.3  189.28   6.061 0.0221 *
## Residuals   22  687.1   31.23                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(mod3) #don't include the block variable
##             Df Sum Sq Mean Sq F value Pr(>F)  
## N            1  189.3  189.28   6.161 0.0245 *
## P            1    8.4    8.40   0.273 0.6082  
## K            1   95.2   95.20   3.099 0.0975 .
## N:P          1   21.3   21.28   0.693 0.4175  
## N:K          1   33.1   33.14   1.078 0.3145  
## P:K          1    0.5    0.48   0.016 0.9019  
## N:P:K        1   37.0   37.00   1.204 0.2887  
## Residuals   16  491.6   30.72                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(mod4)  #The fully crossed design
##             Df Sum Sq Mean Sq F value  Pr(>F)   
## block        5  343.3   68.66   4.447 0.01594 * 
## N            1  189.3  189.28  12.259 0.00437 **
## P            1    8.4    8.40   0.544 0.47490   
## K            1   95.2   95.20   6.166 0.02880 * 
## N:P          1   21.3   21.28   1.378 0.26317   
## N:K          1   33.1   33.14   2.146 0.16865   
## P:K          1    0.5    0.48   0.031 0.86275   
## Residuals   12  185.3   15.44                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

lm treats the blocking variable somewhat differently

mod5 <- lm(yield ~ N*P*K, data=npk)
summary(mod5)
## 
## Call:
## lm(formula = yield ~ N * P * K, data = npk)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.133  -4.133   1.250   3.125   8.467 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  51.4333     3.2002  16.072  2.7e-11 ***
## N1           12.3333     4.5258   2.725    0.015 *  
## P1            2.9000     4.5258   0.641    0.531    
## K1            0.5667     4.5258   0.125    0.902    
## N1:P1        -8.7333     6.4004  -1.365    0.191    
## N1:K1        -9.6667     6.4004  -1.510    0.150    
## P1:K1        -4.4000     6.4004  -0.687    0.502    
## N1:P1:K1      9.9333     9.0515   1.097    0.289    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.543 on 16 degrees of freedom
## Multiple R-squared:  0.4391, Adjusted R-squared:  0.1937 
## F-statistic: 1.789 on 7 and 16 DF,  p-value: 0.1586

More importantly, the interaction terms affect the main effects. We need to zero center (mean center) the data. To do this we have to convert the npk data into numeric values and then center them.

npk.nolevels <- levels2numeric(npk)
npk.cen  <- data.frame(scale(npk.nolevels,scale=FALSE))
#now, do the lm
mod6 <- lm(yield ~  N*P*K,data=npk.cen)
summary(mod6)  #compare this to model 3
## 
## Call:
## lm(formula = yield ~ N * P * K, data = npk.cen)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.133  -4.133   1.250   3.125   8.467 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -1.137e-15  1.131e+00   0.000   1.0000  
## N            5.617e+00  2.263e+00   2.482   0.0245 *
## P           -1.183e+00  2.263e+00  -0.523   0.6082  
## K           -3.983e+00  2.263e+00  -1.760   0.0975 .
## N:P         -3.767e+00  4.526e+00  -0.832   0.4175  
## N:K         -4.700e+00  4.526e+00  -1.038   0.3145  
## P:K          5.667e-01  4.526e+00   0.125   0.9019  
## N:P:K        9.933e+00  9.052e+00   1.097   0.2887  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.543 on 16 degrees of freedom
## Multiple R-squared:  0.4391, Adjusted R-squared:  0.1937 
## F-statistic: 1.789 on 7 and 16 DF,  p-value: 0.1586

AOV Another example

aov is designed for balanced designs, and the results can be hard to interpret without balance: beware that missing values in the response(s) will likely lose the balance.

We take an example where the data are on our web server

datafilename="http://personality-project.org/r/datasets/R.appendix2.data"
data.ex2=read.file(datafilename) #read the data into a data.frame 
## Data from the .txt file http://personality-project.org/r/datasets/R.appendix2.data has been loaded.
data.ex2 #show the data
##    Observation Gender Dosage Alertness
## 1            1      m      a         8
## 2            2      m      a        12
## 3            3      m      a        13
## 4            4      m      a        12
## 5            5      m      b         6
## 6            6      m      b         7
## 7            7      m      b        23
## 8            8      m      b        14
## 9            9      f      a        15
## 10          10      f      a        12
## 11          11      f      a        22
## 12          12      f      a        14
## 13          13      f      b        15
## 14          14      f      b        12
## 15          15      f      b        18
## 16          16      f      b        22
 #do the analysis of variance
aov.ex2 = aov(Alertness~Gender*Dosage,data=data.ex2)
summary(aov.ex2)   #and summarize it
##               Df Sum Sq Mean Sq F value Pr(>F)
## Gender         1  76.56   76.56   2.952  0.111
## Dosage         1   5.06    5.06   0.195  0.666
## Gender:Dosage  1   0.06    0.06   0.002  0.962
## Residuals     12 311.25   25.94
aov.ex2  #this shows just the coefficients
## Call:
##    aov(formula = Alertness ~ Gender * Dosage, data = data.ex2)
## 
## Terms:
##                   Gender   Dosage Gender:Dosage Residuals
## Sum of Squares   76.5625   5.0625        0.0625  311.2500
## Deg. of Freedom        1        1             1        12
## 
## Residual standard error: 5.092887
## Estimated effects may be unbalanced

But typically, we want to show the means

print(model.tables(aov.ex2,"means"),digits=3)  #show the summary table 
## Tables of means
## Grand mean
##         
## 14.0625 
## 
##  Gender 
## Gender
##     f     m 
## 16.25 11.88 
## 
##  Dosage 
## Dosage
##     a     b 
## 13.50 14.62 
## 
##  Gender:Dosage 
##       Dosage
## Gender a     b    
##      f 15.75 16.75
##      m 11.25 12.50

within subjects

  1. Somewhat more complicated because we need to convert “wide”data.frames to“long”or“narrow”data.frames.

  2. This can be done by using the stack function. Some data sets are already in the long format.

  3. A detailed discussion of how to work with repeated measures designs is at http://personality-project.org/r/r..html and at http://personality-project.org/r

  4. See also the tutorial by Jason French at http:// jason-french.com/tutorials/repeatedmeasures.html

  5. Many within subject designs can be treated as multi-level designs. For a discussion of analyzing multilevel data (particularly for personality dynamics), see http://personality-project.org/revelle/ publications/rw.paid.17.final.pdf

Analysis of variance within subjects: Getting and showing the data

filename="http://personality-project.org/r/datasets/R.appendix5.data"
    data.ex5=read.file(filename)   #read the data into a data.frame
## Data from the .txt file http://personality-project.org/r/datasets/R.appendix5.data has been loaded.
    headTail(data.ex5,6,12)   #Show the data (first 6, last 12)
##     Obs Subject Gender Dosage Task Valence Recall
## 1     1       A      M      A    F     Neg      8
## 2     2       A      M      A    F     Neu      9
## 3     3       A      M      A    F     Pos      5
## 4     4       A      M      A    C     Neg      7
## 5     5       A      M      A    C     Neu      9
## 6     6       A      M      A    C     Pos     10
## ... ...    <NA>   <NA>   <NA> <NA>    <NA>    ...
## 97   97       Q      F      C    F     Neg     18
## 98   98       Q      F      C    F     Neu     17
## 99   99       Q      F      C    F     Pos     18
## 100 100       Q      F      C    C     Neg     17
## 101 101       Q      F      C    C     Neu     19
## 102 102       Q      F      C    C     Pos     19
## 103 103       R      F      C    F     Neg     19
## 104 104       R      F      C    F     Neu     17
## 105 105       R      F      C    F     Pos     19
## 106 106       R      F      C    C     Neg     22
## 107 107       R      F      C    C     Neu     21
## 108 108       R      F      C    C     Pos     20
     describe(data.ex5)
##          vars   n  mean    sd median trimmed   mad min max range  skew kurtosis   se
## Obs         1 108 54.50 31.32   54.5   54.50 40.03   1 108   107  0.00    -1.23 3.01
## Subject*    2 108  9.50  5.21    9.5    9.50  6.67   1  18    17  0.00    -1.24 0.50
## Gender*     3 108  1.50  0.50    1.5    1.50  0.74   1   2     1  0.00    -2.02 0.05
## Dosage*     4 108  2.00  0.82    2.0    2.00  1.48   1   3     2  0.00    -1.53 0.08
## Task*       5 108  1.50  0.50    1.5    1.50  0.74   1   2     1  0.00    -2.02 0.05
## Valence*    6 108  2.00  0.82    2.0    2.00  1.48   1   3     2  0.00    -1.53 0.08
## Recall      7 108 15.63  5.07   15.0   15.74  4.45   4  25    21 -0.13    -0.64 0.49
   pairs.panels(data.ex5) #this shows that the design is balanced

and do the within subjects

  filename="http://personality-project.org/r/datasets/R.appendix5.data"
  data.ex5=read.table(filename,header=TRUE) #read the data into a table
#do the 
aov.ex5 = aov(Recall~(Task*Valence*Gender*Dosage)+Error(Subject/(Task*Valence))+
(Gender*Dosage),data.ex5)
#look at the output 
summary(aov.ex5)
## 
## Error: Subject
##               Df Sum Sq Mean Sq F value Pr(>F)  
## Gender         1  542.3   542.3   5.685 0.0345 *
## Dosage         2  694.9   347.5   3.643 0.0580 .
## Gender:Dosage  2   70.8    35.4   0.371 0.6976  
## Residuals     12 1144.6    95.4                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: Subject:Task
##                    Df Sum Sq Mean Sq F value   Pr(>F)    
## Task                1  96.33   96.33  39.862 3.87e-05 ***
## Task:Gender         1   1.33    1.33   0.552    0.472    
## Task:Dosage         2   8.17    4.08   1.690    0.226    
## Task:Gender:Dosage  2   3.17    1.58   0.655    0.537    
## Residuals          12  29.00    2.42                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: Subject:Valence
##                       Df Sum Sq Mean Sq F value Pr(>F)  
## Valence                2  14.69   7.343   2.998 0.0688 .
## Valence:Gender         2   3.91   1.954   0.798 0.4619  
## Valence:Dosage         4  20.26   5.065   2.068 0.1166  
## Valence:Gender:Dosage  4   1.04   0.259   0.106 0.9793  
## Residuals             24  58.78   2.449                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: Subject:Task:Valence
##                            Df Sum Sq Mean Sq F value Pr(>F)
## Task:Valence                2   5.39  2.6944   1.320  0.286
## Task:Valence:Gender         2   2.17  1.0833   0.531  0.595
## Task:Valence:Dosage         4   2.78  0.6944   0.340  0.848
## Task:Valence:Gender:Dosage  4   2.67  0.6667   0.327  0.857
## Residuals                  24  49.00  2.0417

Actually, we could have done all of this from the linear model (coming Wednesday)