Exploring variability

When describing data, we want to know both the central tendencies (mean, median) as well as the dispersion (range, variance, interquartile range). There are a number of ways of doing this.

We can describe the data or we can show the variations around the central tendency.

We will use the example data set from before

library(psych)
library(psychTools)
fn <-  "http://personality-project.org/courses/350/datasets/simulation.txt"  
my.data <- read.file(fn) 
## Data from the .txt file http://personality-project.org/courses/350/datasets/simulation.txt has been loaded.
describe(my.data)
##             vars  n  mean    sd median trimmed   mad min max range  skew kurtosis   se
## Time           1 72 14.28  5.03   19.0   14.34  0.00   9  19    10 -0.11    -2.02 0.59
## Anxiety        2 72  5.24  2.18    5.0    5.24  2.97   0  10    10 -0.04    -0.65 0.26
## Impulsivity    3 72  4.90  3.98    4.5    4.88  5.19   0  10    10  0.02    -1.83 0.47
## sex            4 72  1.50  0.50    1.5    1.50  0.74   1   2     1  0.00    -2.03 0.06
## Arousal        5 72 60.90  8.10   66.0   61.29  5.93  48  70    22 -0.27    -1.67 0.96
## Tension        6 72 56.83  6.29   57.0   57.14  5.93  38  69    31 -0.53     0.42 0.74
## Performance    7 72 72.21 17.41   78.0   73.19 18.53  38  98    60 -0.43    -1.10 2.05

Better yet is to show the data graphically

There are a number of ways of doing this.

In core R, we can use the boxplot function:

boxplot(my.data)  # or

boxplot(my.data,notch=TRUE) #to show median confidence intervals
## Warning in (function (z, notch = FALSE, width = NULL, varwidth = FALSE, : some notches went outside
## hinges ('box'): maybe set notch=FALSE

##show it by a categorical variable
boxplot(Arousal ~ Time,data=my.data,notch=TRUE)

# Or show the bivariate relationships using ‘pairs’

pairs(my.data)

The psych package has number of descriptive functions

Or, using some of the psych functions

Lets first show the multivariate relatioships

pairs.panels(my.data)

Now show the range of each variable

error.bars(my.data)

Do these data different by some grouping variable?

error.bars(my.data ~ sex)  #formula input

We can also do a dot chart

results <- error.dots(my.data)

results #show the output
## $des
##             vars  n  mean    sd median trimmed   mad min max range  skew kurtosis   se
## Time           1 72 14.28  5.03   19.0   14.34  0.00   9  19    10 -0.11    -2.02 0.59
## Anxiety        2 72  5.24  2.18    5.0    5.24  2.97   0  10    10 -0.04    -0.65 0.26
## Impulsivity    3 72  4.90  3.98    4.5    4.88  5.19   0  10    10  0.02    -1.83 0.47
## sex            4 72  1.50  0.50    1.5    1.50  0.74   1   2     1  0.00    -2.03 0.06
## Arousal        5 72 60.90  8.10   66.0   61.29  5.93  48  70    22 -0.27    -1.67 0.96
## Tension        6 72 56.83  6.29   57.0   57.14  5.93  38  69    31 -0.53     0.42 0.74
## Performance    7 72 72.21 17.41   78.0   73.19 18.53  38  98    60 -0.43    -1.10 2.05
## 
## $order
## [1] 4 3 2 1 6 5 7

or smoothed histograms

densityBy(my.data,"Arousal" , grp="sex")
#do it again, but with a legend
densityBy(my.data,"Arousal" , grp="sex",legend=1)

#
#We can also do this using 'formula' mode and show a legend
densityBy(Arousal ~ Time, data=my.data, legend=1)

#or we can do two x variables at once
densityBy(Arousal ~ Time + sex, data=my.data, legend=1) #although the legend is bad

histBy(my.data,"Arousal" , group ="Time")  #but this one does

other statistics to describe group differences include Cohen’s d

cd <- cohen.d(my.data,"Time")
cd  #show them numerically
## Call: cohen.d(x = my.data, group = "Time")
## Cohen d statistic of difference between two means
##             lower effect upper
## Anxiety     -0.44   0.03  0.49
## Impulsivity -0.34   0.12  0.59
## sex         -0.57  -0.11  0.35
## Arousal      5.39   6.58  7.75
## Tension     -0.36   0.10  0.56
## Performance  2.84   3.60  4.35
## 
## Multivariate (Mahalanobis) distance between groups
## [1] 7.4
## r equivalent of difference between two means
##     Anxiety Impulsivity         sex     Arousal     Tension Performance 
##        0.01        0.06       -0.06        0.96        0.05        0.87
error.dots(cd,main="Cohen d statistic for our data")

Functions return more than they show, examine the output of cd (from above)

names(cd) #what are the various objects 
##  [1] "cohen.d"     "hedges.g"    "M.dist"      "r"           "t"           "n"          
##  [7] "p"           "wt.d"        "descriptive" "se"          "dict"        "order"      
## [13] "Call"
cd$t  #show the t test values
##     Anxiety Impulsivity         sex     Arousal     Tension Performance 
##   0.1105898   0.5130930  -0.4662524  27.4682568   0.4231092  15.0529108
cd$descriptive  #show the descriptive statistics
## Statistics within and between groups  
## Call: statsBy(data = x, group = group)
## Intraclass Correlation 1 (Percentage of variance due to groups) 
##        Time     Anxiety Impulsivity         sex     Arousal     Tension Performance 
##        1.00       -0.03       -0.02       -0.02        0.95       -0.02        0.86 
## Intraclass Correlation 2 (Reliability of group differences) 
##        Time     Anxiety Impulsivity         sex     Arousal     Tension Performance 
##        1.00      -80.77       -2.80       -3.60        1.00       -4.59        1.00 
## eta^2 between groups  
##     Anxiety.bg Impulsivity.bg         sex.bg     Arousal.bg     Tension.bg Performance.bg 
##           0.00           0.00           0.00           0.92           0.00           0.76 
## 
## To see the correlations between and within groups, use the short=FALSE option in your print statement.
## Many results are not shown directly. To see specific objects select from the following list:
##  mean sd n F ICC1 ICC2 ci1 ci2 raw rbg pbg rwg nw ci.wg pwg etabg etawg nwg nG Call