Before Gosset (1908) the primary means of testing for differences of means was to use the z-test. This would compare differences to what you would expect given normal theory. Unfortunately, this required at least 30 subjects to be reasonably precise. The reason for this is that the standard error was based upon the observed sample standard deviation which will be smaller than the population value. We can show this problem by comparing the population standard deviation with that of a series of samples of varying size. (We are showing what Gosset derived. )
We will first generate a random distribution and then take samples from this of varying size. We examine the standard deviation of these samples.
set.seed(42) #set a reproducible seed for our random numbers.
Population <- runif(10000) #a uniform distribution
hist(Population) #show what it looks like.
library(psych)
library(psychTools)
describe(Population)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 10000 0.5 0.29 0.5 0.5 0.37 0 1 1 -0.01 -1.21 0
One of the most amazing statistical results is known as the central limit theorem. For any distribution with finite variance, if we take progressively bigger samples from the population, the distribution of the means of those samples will tend towards the normal distribution. (The CLT does not apply to distributions with infinite variance. )
What follows is a very simple way of showing the CLT. Note that we define the limits of the x-axis to be the same in each figure. We take 1000 samples each of size 2, 4, 8, 16 and 32 and then plot them. This is an example of not very clean code, but simple to understand.
x2 <- (runif(1000)+ runif(1000))/2 # mean of two samples
x4 <- (runif(1000)+ runif(1000) +runif(1000)+ runif(1000))/4 # mean of four samples
x8 <- (runif(1000)+ runif(1000) +runif(1000)+ runif(1000)+runif(1000)+ runif(1000) +runif(1000)+ runif(1000))/8
x16 <- (runif(1000)+ runif(1000) +runif(1000)+ runif(1000)+runif(1000)+ runif(1000) +runif(1000)+ runif(1000)+
runif(1000)+ runif(1000) +runif(1000)+ runif(1000)+runif(1000)+ runif(1000) +runif(1000)+ runif(1000))/16
x32 <- (runif(1000)+ runif(1000) +runif(1000)+ runif(1000)+runif(1000)+ runif(1000) +runif(1000)+ runif(1000)+
runif(1000)+ runif(1000) +runif(1000)+ runif(1000)+runif(1000)+ runif(1000) +runif(1000)+ runif(1000)+
runif(1000)+ runif(1000) +runif(1000)+ runif(1000)+runif(1000)+ runif(1000) +runif(1000)+ runif(1000)+
runif(1000)+ runif(1000) +runif(1000)+ runif(1000)+runif(1000)+ runif(1000) +runif(1000)+ runif(1000))/32
hist(x2,xlim=c(0,1))
hist(x4, xlim=c(0,1))
hist(x8,xlim=c(0,1))
hist(x16,xlim=c(0,1))
hist(x32,xlim=c(0,1))
#combine them all into one data frame
central.limit <- data.frame(one=Population, two=x2,four=x4,eight=x8,sixteen=x16,thirtytwo=x32)
describe(central.limit)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## one 1 10000 0.50 0.29 0.50 0.50 0.37 0.00 1.00 1.00 -0.01 -1.21 0
## two 2 10000 0.50 0.21 0.51 0.50 0.23 0.01 0.99 0.98 -0.07 -0.65 0
## four 3 10000 0.49 0.14 0.49 0.49 0.15 0.14 0.92 0.79 0.05 -0.41 0
## eight 4 10000 0.50 0.10 0.50 0.50 0.11 0.19 0.83 0.64 0.01 -0.17 0
## sixteen 5 10000 0.50 0.07 0.50 0.50 0.07 0.26 0.71 0.44 0.00 0.15 0
## thirtytwo 6 10000 0.50 0.05 0.50 0.50 0.05 0.37 0.66 0.29 0.12 -0.04 0
Note that the mean does not change as the sample size increases, but the standard deviation decreases. Four times the sample size leads to half the standard deviation.
The above code is rather clunky. It will do the job but is not easy to use. Lets consider some more useful code. This code has more flexibility. It has three characteristics:
The first part specifies the parameters we will use
The next part runs a `loop’ to do something multiple times
The final part presents the results
prob.size <- c(2,4,8,16,32,64) #create a vector describing the problem size
n.obs <- 10 #specify the number of subjects to simulate
#create two matrices to store the results. We do this dynamically to be flexible
results.mean <- matrix(NA,ncol=length(prob.size), nrow= n.obs)
results.sd <- results.mean #this just sets up the place to store the results
#now run a loop
for (k in 1:length(prob.size)) { #do this as many times as we have problems
n <- prob.size[k] #For each problem size, specify the number of samples
temp <- matrix(runif(n * n.obs),ncol=n) #create a n.obs x n matrix of random numbers
results.mean[,k] <- apply(temp,1,mean) #find the mean of these numbers for each row
results.sd[,k] <-apply(temp,1,sd)
}
colnames(results.mean) <- colnames(results.sd) <- paste("N=",prob.size)
describe(results.mean)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## N= 2 1 10 0.43 0.12 0.44 0.42 0.12 0.26 0.65 0.39 0.25 -1.10 0.04
## N= 4 2 10 0.58 0.14 0.58 0.59 0.12 0.32 0.80 0.49 -0.25 -0.66 0.04
## N= 8 3 10 0.55 0.10 0.51 0.55 0.07 0.40 0.69 0.29 0.29 -1.52 0.03
## N= 16 4 10 0.49 0.06 0.49 0.49 0.06 0.38 0.57 0.19 -0.30 -1.07 0.02
## N= 32 5 10 0.49 0.04 0.50 0.49 0.02 0.41 0.56 0.15 -0.51 -0.75 0.01
## N= 64 6 10 0.50 0.05 0.51 0.51 0.06 0.40 0.55 0.15 -0.72 -0.69 0.02
describe(results.sd)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## N= 2 1 10 0.30 0.17 0.27 0.29 0.18 0.05 0.60 0.55 0.22 -1.13 0.05
## N= 4 2 10 0.28 0.06 0.29 0.28 0.06 0.16 0.35 0.19 -0.56 -1.04 0.02
## N= 8 3 10 0.26 0.07 0.27 0.26 0.03 0.15 0.41 0.26 0.33 -0.29 0.02
## N= 16 4 10 0.29 0.03 0.29 0.29 0.04 0.24 0.34 0.11 0.01 -1.24 0.01
## N= 32 5 10 0.30 0.02 0.30 0.30 0.02 0.26 0.32 0.06 -0.30 -1.19 0.01
## N= 64 6 10 0.29 0.02 0.28 0.29 0.01 0.26 0.31 0.05 0.08 -1.31 0.01
multi.hist(results.mean)
multi.hist(results.sd)
What do these results show? The mean of the sample does not change as sample size goes up, but the variance of the sample means goes down by the square root of the sample size (compare 2 to 8, 4 to 16, etc.). A more subtle change, however, is that the standard deviation of the sample values (shown as the means of results.sd) gradually increase as the sample size increases, tending towards the population value (.29). It is also clear that the distribution of sample standard deviations is very noisy for small samples.
That is, the sample standard deviation underestimates the population standard deviation. Thus, in any test comparing means, if we use the sample standard deviation, we need to adjust for the size of the sample. This was the realization of William Gosset when he developed the t.test.
Perhaps even more convenient is to take the script we had before, and turn it into a little function. This allows us to call the function without worrying about the various variable names inside the function.
1 To create the function, we give it a name (in this case
central.limit
)
2 and the list of parameters to pass to the function (the number of observations and the problem size)
3 These parameters are the default values if you do not specify your own.
4 the function then is defined by everything between the { and } brackets
central.limit <- function(n.obs= 1000, prob.size = c(1,2,4,8,16,32)) { #this is the beginning of the function
#specify the number of subjects to simulate
#create two matrices to store the results. We do this dynamically to be flexible
results.mean <- matrix(NA,ncol=length(prob.size), nrow= n.obs)
results.sd <- results.mean
#now run a loop
for (k in 1:length(prob.size)) {
i <- prob.size[k]
temp <- matrix(runif(i * n.obs),ncol=i)
results.mean[,k] <- apply(temp,1,mean)
results.sd[,k] <-apply(temp,1,sd)
}
colnames(results.mean) <- colnames(results.sd) <- paste("N=",prob.size)
results <- list(mean=results.mean,sd=results.sd) #we are saving the results in a list with two element
return(results)
} #this is the last line of the function
But specify different parameters than the default
prob.size = c(1,2,3,4,4,5,6,7,8,9,10)
test <- central.limit(n.obs=10000, prob.size=prob.size) #we change the default values
names(test) #what are the objects returned by test?
## [1] "mean" "sd"
d.sd <- describe(test$sd) #show the standard
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
plot(d.sd$mean,typ="b",main="Sample SD is biased downwards")
But that graph does not show sample size varying as clearly as it really is. Try to make a clearer figure by specifying the x and y labels
sd.df <- data.frame(size=prob.size,sd=d.sd$mean)
with(sd.df,plot(sd ~ size,typ="b" ,ylab="sample sd",main="Sample SD is biased downwards"))
another.test <- central.limit() #we take the default values
dim(another.test)
## NULL
names(another.test)
## [1] "mean" "sd"
describe(another.test$mean)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## N= 1 1 1000 0.49 0.29 0.48 0.49 0.38 0.00 1.00 1.00 0.07 -1.22 0.01
## N= 2 2 1000 0.49 0.20 0.50 0.50 0.22 0.01 0.99 0.98 -0.12 -0.64 0.01
## N= 4 3 1000 0.49 0.14 0.49 0.49 0.14 0.11 0.90 0.79 0.08 -0.16 0.00
## N= 8 4 1000 0.50 0.10 0.50 0.50 0.11 0.18 0.79 0.61 -0.01 -0.36 0.00
## N= 16 5 1000 0.50 0.07 0.50 0.50 0.08 0.27 0.73 0.46 -0.02 -0.03 0.00
## N= 32 6 1000 0.50 0.05 0.50 0.50 0.05 0.34 0.67 0.33 0.06 0.02 0.00
describe(another.test$sd)
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
## vars n mean sd median trimmed mad min max range skew kurtosis se
## N= 1 1 0 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
## N= 2 2 1000 0.24 0.17 0.21 0.22 0.20 0.00 0.70 0.70 0.58 -0.65 0.01
## N= 4 3 1000 0.27 0.09 0.27 0.27 0.10 0.02 0.52 0.49 -0.14 -0.50 0.00
## N= 8 4 1000 0.28 0.05 0.28 0.28 0.05 0.09 0.44 0.35 -0.16 -0.10 0.00
## N= 16 5 1000 0.29 0.04 0.29 0.29 0.03 0.16 0.39 0.23 -0.16 0.15 0.00
## N= 32 6 1000 0.29 0.02 0.29 0.29 0.02 0.21 0.37 0.16 -0.04 0.21 0.00
The binomial distribution is just a coin flip. For a 1,000 coins, if we plot the distributions for single coins, roughly half will be 0 the other half 1.
But what if we plot the distribution of flipping two coins 1,000 times? What about 4, what about 8, 16, or 32.
Lets do this by simulation.
n.obs <- 10000
size <- c(1,2,4,8,16,32)
x <- matrix(NA,nrow=n.obs,ncol=length(size))
for(i in 1:length(size)) { x[,i] <- rbinom(n.obs,size[i],prob=.5)}
multi.hist(x)
As the number of coins being flipped goes up, the average (or total in
this case) becomes more normally distributed.
t-distribution
Gosset recognized that the distribution of means for small samples
has less variance than big samples (see above). He developed a
distribution (the t-distribution
) to account for this.
Given that his employer did not to encourage others to realize how
powerful this statistic was, he published anonomously under the name of
Student
. And thus we have the
Student's t distribution
.
We can compare the t distribution with the normal distribution using the built in t and normal distributions.
We use the curve function with the second plot added to it.
curve(dnorm(x),-3,3,main="Comparing the Normal to the t distribution (df=4) ",ylab="density")
curve(dt(x,df=2),-3,3,lty="dotted",add=TRUE)
curve(dt(x,df=3),-3,3,lty=4,add=TRUE)
curve(dt(x,df=4),-3,3,lty="dashed",add=TRUE)
curve(dt(x,df=8),-3,3,lty=3,add=TRUE)
curve(dt(x,df=16),-3,3,lty=5,add=TRUE)
The t-test, perhaps the fundamental statistical test, was developed by William Gosset who published under the name of Student (1908). The t-test is a small sample approximation to the large sample (z) test. His first example was a data set on the different effect of optical isomers of hyoscyamine hydrobromide reported by Cushny & Peebles (1905). The sleep of 10 patients was measured without any drug and then following administration of D. and L isomers. The data from Cushny are available as the cushny data set.
Before we discuss the t-test, we consider the boot strap as an alternative.
library(psychTools)
cushny
## Control drug1 drug2L drug2R delta1 delta2L delta2R
## 1 0.6 1.3 2.5 2.1 0.7 1.9 1.5
## 2 3.0 1.4 3.8 4.4 -1.6 0.8 1.4
## 3 4.7 4.5 5.8 4.7 -0.2 1.1 0.0
## 4 5.5 4.3 5.6 4.8 -1.2 0.1 -0.7
## 5 6.2 6.1 6.1 6.7 -0.1 -0.1 0.5
## 6 3.2 6.6 7.6 8.3 3.4 4.4 5.1
## 7 2.5 6.2 8.0 8.2 3.7 5.5 5.7
## 8 2.8 3.6 4.4 4.3 0.8 1.6 1.5
## 9 1.1 1.1 5.7 5.8 0.0 4.6 4.7
## 10 2.9 4.9 6.3 6.4 2.0 3.4 3.5
Note that the last 3 columns are just the differences from the
control condition.
## Desciptive statistics of the Cushny data
describe(cushny)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## Control 1 10 3.25 1.78 2.95 3.21 1.63 0.6 6.2 5.6 0.20 -1.23 0.56
## drug1 2 10 4.00 2.10 4.40 4.04 2.59 1.1 6.6 5.5 -0.25 -1.67 0.66
## drug2L 3 10 5.58 1.66 5.75 5.66 1.41 2.5 8.0 5.5 -0.30 -0.99 0.53
## drug2R 4 10 5.57 1.91 5.30 5.66 1.56 2.1 8.3 6.2 -0.09 -1.07 0.60
## delta1 5 10 0.75 1.79 0.35 0.67 1.56 -1.6 3.7 5.3 0.42 -1.30 0.57
## delta2L 6 10 2.33 2.00 1.75 2.24 2.45 -0.1 5.5 5.6 0.28 -1.66 0.63
## delta2R 7 10 2.32 2.27 1.50 2.28 2.59 -0.7 5.7 6.4 0.23 -1.68 0.72
It is hard to see patterns in simple descriptives, but a graphical display will help.
We can show these data graphically using the error.bars
function. We pass labels to the x and y axis using the xlab and
ylab parameters, and then supply an appropriate figure title
(main). We will use these data to show how to do t-tests as
well as the generalization to Analysis of Variance.
First we show this the tradition way in terms of error bars, and then
we add in cats eyes
to show the range of confidence.
error.bars(cushny,xlab="Group",ylab="hours of sleep",
main="The effect of drug upon sleep (95% confidence)",eyes=FALSE)
error.bars(cushny,xlab="Group",ylab="hours of sleep",
main="The effect of drug upon sleep (95% confidence)",col=rainbow(8))#the default is to show cats eyes
Gosset/Student developed the t-test and derived the t-distribution. We can also find out what happens if we do many random samples from the data. We think of the observed data as our population and then sample from that population. We do the sampling with replacement because otherwise we would always just end up with the same data.
This involves using the sample
function to take random
samples from the original data (with replacement). As an example,
consider sampling from the numbers 1-10. As usual, because we want to
see and discuss the results, we set the random seed to a specific value
(I tend to use 42 because then I can cite Adams (1962).) In a production
run, do not set the seed and it is based on the computer clock (ms since
startup).
set.seed(42) #to get a random ordering that can be used for teaching purposes.
original <- 1:10
sample(original,10) #don't specify replacement and you get a permutation.
## [1] 1 5 10 8 2 4 6 9 7 3
sample(original,10, replace=TRUE) #notice the repetition of 10
## [1] 8 7 4 9 5 4 10 2 3 9
sample(original,10, replace=TRUE) # which happened again!
## [1] 9 4 5 5 4 2 8 3 10 1
sample(original,10, replace=TRUE) # now we get a repetition of 1 and 7
## [1] 10 8 6 10 8 4 4 6 2 5
Now, use this trick of sampling to choose subjects, find the mean of the columns for each sample, and then show those. We create a baby function. This is understandable but not elegant.
small <- function(data,n.trials=100) { #name the function and give the default value of 100,
subject <- 1:NROW(data) # dynamically set this
results <- matrix(NA,ncol=NCOL(data),nrow=n.trials) #we store the results here
#get it ready to store the data
for (i in 1:n.trials) {#we are going to execute a loop
new.order <- sample(subject,NROW(data),replace=TRUE) #find a new order
results[i,] <- colMeans(data[new.order,]) #find the column means for each variable
} # end of loop
colnames(results) <- colnames(data) #name the results
return(results)
} #end of small function
#now run the function
temp <- small(cushny,1000) #specify that we want 1000 samples
error.bars(temp,sd=TRUE,eyes=FALSE) #draw the results
describe(temp,quant=c(.025,.5,.975)) #show the top and bottom 2.5%
## vars n mean sd median trimmed mad min max range skew kurtosis se Q0.025 Q0.5 Q0.975
## Control 1 1000 3.24 0.53 3.24 3.24 0.57 1.61 4.71 3.10 0.02 -0.23 0.02 2.22 3.24 4.24
## drug1 2 1000 4.00 0.60 4.01 4.00 0.59 1.85 5.91 4.06 -0.10 0.22 0.02 2.75 4.01 5.23
## drug2L 3 1000 5.58 0.50 5.59 5.59 0.47 3.60 7.07 3.47 -0.23 0.50 0.02 4.53 5.59 6.53
## drug2R 4 1000 5.57 0.57 5.58 5.57 0.54 3.29 7.39 4.10 -0.16 0.60 0.02 4.41 5.58 6.67
## delta1 5 1000 0.75 0.53 0.74 0.75 0.52 -0.76 2.52 3.28 0.13 -0.04 0.02 -0.26 0.74 1.83
## delta2L 6 1000 2.33 0.60 2.31 2.32 0.62 0.79 4.28 3.49 0.19 -0.16 0.02 1.21 2.31 3.57
## delta2R 7 1000 2.32 0.67 2.31 2.31 0.70 0.27 4.38 4.11 0.15 -0.13 0.02 1.05 2.31 3.65
Note how the top and bottom 2.5% quantiles of the differences (delta1, delta2L and delta2R) do (delta1) or do not (delta2L and delta2R) include 0. We conclude that it unlikely that the latter two differences are equal to 0. Also note that the 95% confidence intervals of the drugs do not overlap with that of the control.
Unfortunately, Student did not have R. Therefore he developed a test (comparing the means to their standard errors) and the distribution of that test (the t-distribution)
The t.test is a function core-R. It may be used in several different ways.
#this first approach uses the with command that allows us to specify just the variable name
with(cushny,t.test(delta1)) #control versus drug 1 difference scores
##
## One Sample t-test
##
## data: delta1
## t = 1.3257, df = 9, p-value = 0.2176
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.5297804 2.0297804
## sample estimates:
## mean of x
## 0.75
with(cushny,t.test(delta2L)) #control versus drug2L difference scores
##
## One Sample t-test
##
## data: delta2L
## t = 3.6799, df = 9, p-value = 0.005076
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 0.8976775 3.7623225
## sample estimates:
## mean of x
## 2.33
with(cushny,t.test(delta1,delta2L,paired=TRUE)) #difference of differences
##
## Paired t-test
##
## data: delta1 and delta2L
## t = -4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -2.4598858 -0.7001142
## sample estimates:
## mean difference
## -1.58
#or, we can just do it directly by specifying the object and the variable name
t.test(cushny$delta1)
##
## One Sample t-test
##
## data: cushny$delta1
## t = 1.3257, df = 9, p-value = 0.2176
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.5297804 2.0297804
## sample estimates:
## mean of x
## 0.75
t.test(cushny$delta2L)
##
## One Sample t-test
##
## data: cushny$delta2L
## t = 3.6799, df = 9, p-value = 0.005076
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 0.8976775 3.7623225
## sample estimates:
## mean of x
## 2.33
We can take the wide format of the cushny data set and make it long. The long format is used in a variety of statistical tests. The stack command will do this for us.
long.sleep <-
stack(cushny[c("delta1","delta2L")])
long.sleep #display the long format
## values ind
## 1 0.7 delta1
## 2 -1.6 delta1
## 3 -0.2 delta1
## 4 -1.2 delta1
## 5 -0.1 delta1
## 6 3.4 delta1
## 7 3.7 delta1
## 8 0.8 delta1
## 9 0.0 delta1
## 10 2.0 delta1
## 11 1.9 delta2L
## 12 0.8 delta2L
## 13 1.1 delta2L
## 14 0.1 delta2L
## 15 -0.1 delta2L
## 16 4.4 delta2L
## 17 5.5 delta2L
## 18 1.6 delta2L
## 19 4.6 delta2L
## 20 3.4 delta2L
t.test(values ~ ind,data=long.sleep) #ignoring the paired nature of the data
##
## Welch Two Sample t-test
##
## data: values by ind
## t = -1.8608, df = 17.776, p-value = 0.07939
## alternative hypothesis: true difference in means between group delta1 and group delta2L is not equal to 0
## 95 percent confidence interval:
## -3.3654832 0.2054832
## sample estimates:
## mean in group delta1 mean in group delta2L
## 0.75 2.33
## Paired t-test
## The sleep data is actually paired, so could have been in wide format:
sleep2 <- reshape(sleep, direction = "wide",
idvar = "ID", timevar = "group")
## Traditional interface
t.test(sleep2$extra.1, sleep2$extra.2, paired = TRUE)
##
## Paired t-test
##
## data: sleep2$extra.1 and sleep2$extra.2
## t = -4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -2.4598858 -0.7001142
## sample estimates:
## mean difference
## -1.58
## Formula interface
t.test(Pair(extra.1, extra.2) ~ 1, data = sleep2)
##
## Paired t-test
##
## data: Pair(extra.1, extra.2)
## t = -4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -2.4598858 -0.7001142
## sample estimates:
## mean difference
## -1.58
sleep
data
set.This is perhaps a obvious way of organizing the data (in long format)
sleep
## extra group ID
## 1 0.7 1 1
## 2 -1.6 1 2
## 3 -0.2 1 3
## 4 -1.2 1 4
## 5 -0.1 1 5
## 6 3.4 1 6
## 7 3.7 1 7
## 8 0.8 1 8
## 9 0.0 1 9
## 10 2.0 1 10
## 11 1.9 2 1
## 12 0.8 2 2
## 13 1.1 2 3
## 14 0.1 2 4
## 15 -0.1 2 5
## 16 4.4 2 6
## 17 5.5 2 7
## 18 1.6 2 8
## 19 4.6 2 9
## 20 3.4 2 10
with(sleep,t.test(extra~group)) #does not assume equal variances
##
## Welch Two Sample t-test
##
## data: extra by group
## t = -1.8608, df = 17.776, p-value = 0.07939
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
## -3.3654832 0.2054832
## sample estimates:
## mean in group 1 mean in group 2
## 0.75 2.33
with(sleep,t.test(extra~group,var.equal=TRUE)) #assume equal variances
##
## Two Sample t-test
##
## data: extra by group
## t = -1.8608, df = 18, p-value = 0.07919
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
## -3.363874 0.203874
## sample estimates:
## mean in group 1 mean in group 2
## 0.75 2.33
# with(sleep,t.test(extra~group,paired=TRUE)) #noticing that the data were paired
was developed as a way of analyzing agricultural data (hence the terminology of such things as a split plot design). Thus one could analyze multiple seeds crossed with multiple fertilizers. The application to psychology is particular used in the case of experimental designs where two or more variables are experimentally “crossed”. It is important to understand although most modern approaches generalize it to the case of the linear model.
Rather than directly comparing means (the way t-test does), it considers the variance of the means and the variance within groups. The terminology is a decomposition of the Sums of Squares (SS) in terms of Sums of Squares between groups and Sums of Squares within groups. When divided by their degrees of freedom, Sums of Squares become Mean Squares which is an estimate of variance. The ratio of the these two variances came to be called the F-test (in honor of Ronald Fisher).
Fisher generalized the t-test to the case of more than two groups.
aov
provides a wrapper to lm
for
fitting linear models to balanced or unbalanced experimental
designs.
The main difference from lm
is in the way print,
summary and so on handle the fit: this is expressed in the traditional
language of the analysis of variance rather than that of linear
models.
If the formula contains a single Error term, this is used to specify error strata, and appropriate models are fitted within each error stratum.
The formula can specify multiple responses.
aov
is designed for balanced designs, and the
results can be hard to interpret without balance: beware that missing
values in the response(s) will likely lose the balance.
If there are two or more error strata, the methods used are
statistically inefficient without balance, and it may be better to use
lme in package nlme
.
#independent subjects
summary(aov(extra ~ group, data=sleep))
## Df Sum Sq Mean Sq F value Pr(>F)
## group 1 12.48 12.482 3.463 0.0792 .
## Residuals 18 64.89 3.605
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#t.test
t.test(extra ~ group, equal.var=TRUE, data =sleep)
##
## Welch Two Sample t-test
##
## data: extra by group
## t = -1.8608, df = 17.776, p-value = 0.07939
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
## -3.3654832 0.2054832
## sample estimates:
## mean in group 1 mean in group 2
## 0.75 2.33
#correlated subjects
summary(aov(extra~group + Error(ID),data=sleep))
##
## Error: ID
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 9 58.08 6.453
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## group 1 12.482 12.482 16.5 0.00283 **
## Residuals 9 6.808 0.756
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#and the associated t.test
# t.test(extra ~ group ,data=sleep ,paired=TRUE)
model <- lm(extra ~ group,data=sleep)
model #reports some of the statistics
##
## Call:
## lm(formula = extra ~ group, data = sleep)
##
## Coefficients:
## (Intercept) group2
## 0.75 1.58
summary(model) # a more common way of describing the results
##
## Call:
## lm(formula = extra ~ group, data = sleep)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.430 -1.305 -0.580 1.455 3.170
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.7500 0.6004 1.249 0.2276
## group2 1.5800 0.8491 1.861 0.0792 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.899 on 18 degrees of freedom
## Multiple R-squared: 0.1613, Adjusted R-squared: 0.1147
## F-statistic: 3.463 on 1 and 18 DF, p-value: 0.07919
The aov and lm analyses did not treat the repeated measures of the data appropriately. This can be done with repeated measures or lme for the linear model.
(Taken from the help menu for aov)
Consider yield (remember, this was developed in agriculture) as a function of Nitrogen, Potassium and Phosphate. What is the effect of each variable seperately and in combination.
headTail(npk) #show a little bit of the data
## block N P K yield
## 1 1 0 1 1 49.5
## 2 1 1 1 0 62.8
## 3 1 0 0 0 46.8
## 4 1 1 0 1 57
## ... <NA> <NA> <NA> <NA> ...
## 21 6 1 0 1 57.2
## 22 6 1 1 0 59
## 23 6 0 1 1 53.2
## 24 6 0 0 0 56
pairs.panels(npk) # A good experiment makes the experimental conditions orthogonal
npk #The data come from Venables
## block N P K yield
## 1 1 0 1 1 49.5
## 2 1 1 1 0 62.8
## 3 1 0 0 0 46.8
## 4 1 1 0 1 57.0
## 5 2 1 0 0 59.8
## 6 2 1 1 1 58.5
## 7 2 0 0 1 55.5
## 8 2 0 1 0 56.0
## 9 3 0 1 0 62.8
## 10 3 1 1 1 55.8
## 11 3 1 0 0 69.5
## 12 3 0 0 1 55.0
## 13 4 1 0 0 62.0
## 14 4 1 1 1 48.8
## 15 4 0 0 1 45.5
## 16 4 0 1 0 44.2
## 17 5 1 1 0 52.0
## 18 5 0 0 0 51.5
## 19 5 1 0 1 49.8
## 20 5 0 1 1 48.8
## 21 6 1 0 1 57.2
## 22 6 1 1 0 59.0
## 23 6 0 1 1 53.2
## 24 6 0 0 0 56.0
mod1 <- aov(yield ~ N,data=npk)
mod2 <- aov(yield ~ N+ P + N*P,data=npk)
mod2a <- aov(yield ~N*P,data=npk)
mod3 <- aov(yield ~ N*P*K,data=npk)
mod4 <- aov(yield ~ block + N*P*K,data=npk)
summary(mod1) #Just the effect of N
## Df Sum Sq Mean Sq F value Pr(>F)
## N 1 189.3 189.28 6.061 0.0221 *
## Residuals 22 687.1 31.23
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(mod3) #don't include the block variable
## Df Sum Sq Mean Sq F value Pr(>F)
## N 1 189.3 189.28 6.161 0.0245 *
## P 1 8.4 8.40 0.273 0.6082
## K 1 95.2 95.20 3.099 0.0975 .
## N:P 1 21.3 21.28 0.693 0.4175
## N:K 1 33.1 33.14 1.078 0.3145
## P:K 1 0.5 0.48 0.016 0.9019
## N:P:K 1 37.0 37.00 1.204 0.2887
## Residuals 16 491.6 30.72
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(mod4) #The fully crossed design
## Df Sum Sq Mean Sq F value Pr(>F)
## block 5 343.3 68.66 4.447 0.01594 *
## N 1 189.3 189.28 12.259 0.00437 **
## P 1 8.4 8.40 0.544 0.47490
## K 1 95.2 95.20 6.166 0.02880 *
## N:P 1 21.3 21.28 1.378 0.26317
## N:K 1 33.1 33.14 2.146 0.16865
## P:K 1 0.5 0.48 0.031 0.86275
## Residuals 12 185.3 15.44
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
mod5 <- lm(yield ~ N*P*K, data=npk)
summary(mod5)
##
## Call:
## lm(formula = yield ~ N * P * K, data = npk)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.133 -4.133 1.250 3.125 8.467
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 51.4333 3.2002 16.072 2.7e-11 ***
## N1 12.3333 4.5258 2.725 0.015 *
## P1 2.9000 4.5258 0.641 0.531
## K1 0.5667 4.5258 0.125 0.902
## N1:P1 -8.7333 6.4004 -1.365 0.191
## N1:K1 -9.6667 6.4004 -1.510 0.150
## P1:K1 -4.4000 6.4004 -0.687 0.502
## N1:P1:K1 9.9333 9.0515 1.097 0.289
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.543 on 16 degrees of freedom
## Multiple R-squared: 0.4391, Adjusted R-squared: 0.1937
## F-statistic: 1.789 on 7 and 16 DF, p-value: 0.1586
More importantly, the interaction terms affect the main effects. We need to zero center (mean center) the data. To do this we have to convert the npk data into numeric values and then center them.
npk.nolevels <- levels2numeric(npk)
npk.cen <- data.frame(scale(npk.nolevels,scale=FALSE))
#now, do the lm
mod6 <- lm(yield ~ N*P*K,data=npk.cen)
summary(mod6) #compare this to model 3
##
## Call:
## lm(formula = yield ~ N * P * K, data = npk.cen)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.133 -4.133 1.250 3.125 8.467
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.137e-15 1.131e+00 0.000 1.0000
## N 5.617e+00 2.263e+00 2.482 0.0245 *
## P -1.183e+00 2.263e+00 -0.523 0.6082
## K -3.983e+00 2.263e+00 -1.760 0.0975 .
## N:P -3.767e+00 4.526e+00 -0.832 0.4175
## N:K -4.700e+00 4.526e+00 -1.038 0.3145
## P:K 5.667e-01 4.526e+00 0.125 0.9019
## N:P:K 9.933e+00 9.052e+00 1.097 0.2887
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.543 on 16 degrees of freedom
## Multiple R-squared: 0.4391, Adjusted R-squared: 0.1937
## F-statistic: 1.789 on 7 and 16 DF, p-value: 0.1586
aov is designed for balanced designs, and the results can be hard to interpret without balance: beware that missing values in the response(s) will likely lose the balance.
We take an example where the data are on our web server
datafilename="http://personality-project.org/r/datasets/R.appendix2.data"
data.ex2=read.file(datafilename) #read the data into a data.frame
## Data from the .txt file http://personality-project.org/r/datasets/R.appendix2.data has been loaded.
data.ex2 #show the data
## Observation Gender Dosage Alertness
## 1 1 m a 8
## 2 2 m a 12
## 3 3 m a 13
## 4 4 m a 12
## 5 5 m b 6
## 6 6 m b 7
## 7 7 m b 23
## 8 8 m b 14
## 9 9 f a 15
## 10 10 f a 12
## 11 11 f a 22
## 12 12 f a 14
## 13 13 f b 15
## 14 14 f b 12
## 15 15 f b 18
## 16 16 f b 22
#do the analysis of variance
aov.ex2 = aov(Alertness~Gender*Dosage,data=data.ex2)
summary(aov.ex2) #and summarize it
## Df Sum Sq Mean Sq F value Pr(>F)
## Gender 1 76.56 76.56 2.952 0.111
## Dosage 1 5.06 5.06 0.195 0.666
## Gender:Dosage 1 0.06 0.06 0.002 0.962
## Residuals 12 311.25 25.94
aov.ex2 #this shows just the coefficients
## Call:
## aov(formula = Alertness ~ Gender * Dosage, data = data.ex2)
##
## Terms:
## Gender Dosage Gender:Dosage Residuals
## Sum of Squares 76.5625 5.0625 0.0625 311.2500
## Deg. of Freedom 1 1 1 12
##
## Residual standard error: 5.092887
## Estimated effects may be unbalanced
print(model.tables(aov.ex2,"means"),digits=3) #show the summary table
## Tables of means
## Grand mean
##
## 14.0625
##
## Gender
## Gender
## f m
## 16.25 11.88
##
## Dosage
## Dosage
## a b
## 13.50 14.62
##
## Gender:Dosage
## Dosage
## Gender a b
## f 15.75 16.75
## m 11.25 12.50
Somewhat more complicated because we need to convert “wide”data.frames to“long”or“narrow”data.frames.
This can be done by using the stack function. Some data sets are already in the long format.
A detailed discussion of how to work with repeated measures designs is at http://personality-project.org/r/r..html and at http://personality-project.org/r
See also the tutorial by Jason French at http:// jason-french.com/tutorials/repeatedmeasures.html
Many within subject designs can be treated as multi-level designs. For a discussion of analyzing multilevel data (particularly for personality dynamics), see http://personality-project.org/revelle/ publications/rw.paid.17.final.pdf
filename="http://personality-project.org/r/datasets/R.appendix5.data"
data.ex5=read.file(filename) #read the data into a data.frame
## Data from the .txt file http://personality-project.org/r/datasets/R.appendix5.data has been loaded.
headTail(data.ex5,6,12) #Show the data (first 6, last 12)
## Obs Subject Gender Dosage Task Valence Recall
## 1 1 A M A F Neg 8
## 2 2 A M A F Neu 9
## 3 3 A M A F Pos 5
## 4 4 A M A C Neg 7
## 5 5 A M A C Neu 9
## 6 6 A M A C Pos 10
## ... ... <NA> <NA> <NA> <NA> <NA> ...
## 97 97 Q F C F Neg 18
## 98 98 Q F C F Neu 17
## 99 99 Q F C F Pos 18
## 100 100 Q F C C Neg 17
## 101 101 Q F C C Neu 19
## 102 102 Q F C C Pos 19
## 103 103 R F C F Neg 19
## 104 104 R F C F Neu 17
## 105 105 R F C F Pos 19
## 106 106 R F C C Neg 22
## 107 107 R F C C Neu 21
## 108 108 R F C C Pos 20
describe(data.ex5)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## Obs 1 108 54.50 31.32 54.5 54.50 40.03 1 108 107 0.00 -1.23 3.01
## Subject* 2 108 9.50 5.21 9.5 9.50 6.67 1 18 17 0.00 -1.24 0.50
## Gender* 3 108 1.50 0.50 1.5 1.50 0.74 1 2 1 0.00 -2.02 0.05
## Dosage* 4 108 2.00 0.82 2.0 2.00 1.48 1 3 2 0.00 -1.53 0.08
## Task* 5 108 1.50 0.50 1.5 1.50 0.74 1 2 1 0.00 -2.02 0.05
## Valence* 6 108 2.00 0.82 2.0 2.00 1.48 1 3 2 0.00 -1.53 0.08
## Recall 7 108 15.63 5.07 15.0 15.74 4.45 4 25 21 -0.13 -0.64 0.49
pairs.panels(data.ex5) #this shows that the design is balanced
filename="http://personality-project.org/r/datasets/R.appendix5.data"
data.ex5=read.table(filename,header=TRUE) #read the data into a table
#do the
aov.ex5 = aov(Recall~(Task*Valence*Gender*Dosage)+Error(Subject/(Task*Valence))+
(Gender*Dosage),data.ex5)
#look at the output
summary(aov.ex5)
##
## Error: Subject
## Df Sum Sq Mean Sq F value Pr(>F)
## Gender 1 542.3 542.3 5.685 0.0345 *
## Dosage 2 694.9 347.5 3.643 0.0580 .
## Gender:Dosage 2 70.8 35.4 0.371 0.6976
## Residuals 12 1144.6 95.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: Subject:Task
## Df Sum Sq Mean Sq F value Pr(>F)
## Task 1 96.33 96.33 39.862 3.87e-05 ***
## Task:Gender 1 1.33 1.33 0.552 0.472
## Task:Dosage 2 8.17 4.08 1.690 0.226
## Task:Gender:Dosage 2 3.17 1.58 0.655 0.537
## Residuals 12 29.00 2.42
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: Subject:Valence
## Df Sum Sq Mean Sq F value Pr(>F)
## Valence 2 14.69 7.343 2.998 0.0688 .
## Valence:Gender 2 3.91 1.954 0.798 0.4619
## Valence:Dosage 4 20.26 5.065 2.068 0.1166
## Valence:Gender:Dosage 4 1.04 0.259 0.106 0.9793
## Residuals 24 58.78 2.449
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: Subject:Task:Valence
## Df Sum Sq Mean Sq F value Pr(>F)
## Task:Valence 2 5.39 2.6944 1.320 0.286
## Task:Valence:Gender 2 2.17 1.0833 0.531 0.595
## Task:Valence:Dosage 4 2.78 0.6944 0.340 0.848
## Task:Valence:Gender:Dosage 4 2.67 0.6667 0.327 0.857
## Residuals 24 49.00 2.0417