The first part of the RMarkdown sets up various parameters. By setting the width to 100, we get output files that do not wrap the describe output. Normally we do not want to show the setup but here we echo it so that we see how to widen the output.

knitr::opts_chunk$set(echo = TRUE)
options(width=100) #This sets the width of the output, 80 seems to be the default and is too narrow

R packages are always under development. For instance, there was a suggestion a year ago about how to make the describe function more useful. This was then implemented in the next version of psych (2.4.3). The current release is 2.4.3

Rather than push the updated version to CRAN each time and in order to reduce the demands on the CRAN system, it is better to save small changes on local ‘repositories’ which have not gone through all the overhead of CRAN. The local repository for the psych package is at http://personality-project.org/r . To install originally from this repository we use the install.packages command. Or, we can check the repository to see if we need to update our package using the update.packages command. (It is not clear if this actually does anything.)

Sometimes, I update psych without changing the version number. You can check the devevelopment date using the packageDate("psych") command.

#This is commented out if I have already done this
#install.packages("psych", repos = "http://personality-project.org/r", type = "source") # or
#update.packages("psych", repos = "http://personality-project.org/r", type = "source") # if we already have it although does not seem to work

Lets make sure we succeeded. Check the version number of the package. It should now be version 2.4.4.

As the psych package grew, it became too big and some of the data and a few helper functions were moved to a new package `psychTools’. For many of the demonstrations in class, we will need both packages.

Every time you start a new R session, you need to make those packages active.

library(psych)   #remember to make this active
library(psychTools)
sessionInfo()   #see what version we have  (should be 2.4.4)
## R version 4.3.3 (2024-02-29)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.4.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Chicago
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] psychTools_2.4.3 psych_2.4.4     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.31     R6_2.5.1          fastmap_1.1.1     xfun_0.41         lattice_0.22-5   
##  [6] cachem_1.0.7      parallel_4.3.3    knitr_1.45        foreign_0.8-86    htmltools_0.5.5  
## [11] rmarkdown_2.21    cli_3.6.1         R.methodsS3_1.8.2 grid_4.3.3        sass_0.4.5       
## [16] jquerylib_0.1.4   mnormt_2.1.1      compiler_4.3.3    R.oo_1.25.0       rstudioapi_0.14  
## [21] tools_4.3.3       rtf_0.4-14.1      nlme_3.1-164      evaluate_0.20     bslib_0.4.2      
## [26] yaml_2.3.7        rlang_1.1.0       jsonlite_1.8.4
packageDate("psych")  #should be "2024-04-07"
## [1] "2024-04-07"

Seeing what has changed.

To see what has been added to a package you can read the news file for that package. You can read all of the news for a package, or select just the most recent. Here we read the news for the most recent release of psych. Note that now we are specifying Version> “2.4.3” so that we are just getting the most recent (last month and this month’s) changes.

 news(Version> "2.4.3",package="psych")
##                         Changes in version 2.4.4                        
## 
## Still to do
## 
##     o   Change examples to dontrun rather than commented out ## mostly
##  done
## 
##     o   Add the ability to get factor scores for higher order factors
##  (requested by Brandon Vaughn). Partly done in omega, but there
##  are problems associated with the entire problem of factor
##  scores.
## 
##     o   Fix a bug in alpha when finding average R (need to use cor
##  rather than cov2cor when treating missing data)
## 
##     o   Add weight option to statsBy (but the question is how to handle
##  missing data)
## 
##     o   Adjust the se for correlations (in bestScales) to take into
##  account the pairwise count, rather than the n.obs
## 
##     o   Improve documentation for thurstone.  Would help if we had an
##  example of scaling.
## 
##     o   scoreIrt.2pl needs to be able to adjust for group differences
##  correctly
## 
##     o   Possible problem in pooled correlations for statsBy
## 
##     o   Think about improvements in itemLookup to show means if desired
## 
##     o   Fix bug in polychoric where it doesn't handle empty cells very
##  well. (reported by Björn Büdenbender).  Or at least improve the
##  documentation about why this is a problem.
## 
##     o   Fix bug in mediate for multiple dependent variables (reported
##  by Martin Zeschke)
## 
##     o   scoreItem should find the mean item correlation with na.rm=TRUE
##  (Liz D)
## 
##     o   create a scoreFastBy
## 
##     o   improve the help menu for sim.multi and return the latent
##  values
## 
##     o   think about linking person scores to item difficulty scores in
##  IRT
## 
##     o   Left justify content in lookupItems (not sure how to do this)
## 
##     o   Consider using the Jennrich approach to standard errors of
##  rotated loadings
## 
##     o   Still need to fix mediate to handle more than 1 DVs
## 
##     o   Possible problem with the use of try in several functions.
## 
##     o   Need to fix omega so that alpha is correctly calculated in case
##  of no general factor
## 
##     o   It would be nice to add an option to display confidence
##  intervals of correlations in error.dots
## 
##     o   allow omega style hierarchical structure for x (and y) in esem
## 
## Additions
## 
##     o   Added the score option to scoreOverlap so that if given raw
##  data, it will find the scale scores as well as the corrected
##  for overlap statistics.  Note that the scale scores are not
##  corrected for overlap.
## 
##     o   Minor tweaks to scoreIrt.2pl to check for bad data (missing
##  cells ).  This was leading to the function hanging forever in
##  the case of pairwise counts of 0.
## 
##     o   Added fix.dlpyr (as a local function) to statsBy and describeBy
##  to not choke on Tibbles. Had added this to alpha several years
##  ago.
## 
##     o   Added itemSort to combine item means and item content and then
##  sort them.
## 
##     o   Added the median option to describe with the fast=TRUE option
##  (requested by David Condon)
## 
##     o   Parallelized describe.  Now will describe 644K subjects with
##  6633 variables in 64 seconds (with fast=TRUE option).
##  Improvement in speed varies by numbers of cores.
## 
##     o   Parallelized bigCor for very significant savings.
## 
##     o   Added the hist.border option to pairs.panels following a
##  request by Jordan Adamson.
## 
##     o   Added the w.exp (weighting exponent to cohen.kappa following a
##  request from Julius Pfadt)
## 
##     o   Added n.obs to unidim to properly find statistics from
##  correlation matrices
## 
##     o   Added a number of fit statistics to unidim
## 
##     o   Added the ability to find Spearman correlations to bigCor.
##  Added a check that the data are a data.frame rather than a
##  data.table for bigCor.
## 
##     o   Changed the class of describe output to be c("psych",
##  "Pdescribe" ,"data.frame") to avoid choking if Hmisc is in use
## 
##     o   Added the ability to use mean.weights in statsBy (still under
##  testing)
## 
##     o   Added paSelect to allow parallel analyses of subsets of items
##  using a keys list.
## 
##     o   Added vssSelect to allow VSS analyses of subsets of items using
##  a kes list.
## 
## Bugs fixed
## 
##     o   changed the number of elements in the unit.Circle in
##  spider/radar to get more precision
## 
##     o   Added a conversion to numeric in mediate to get around a
##  problem with variables that are factors. (Problem reported by
##  Lynn Bueeler )
## 
##     o   Finally fixed bug in alpha where it finds R from cov2cor
##  instead of directly (reported by Karl Ove Hufthammer )
## 
##     o   Fixed item.lookup so that it will report item means as well as
##  factor loadings.
## 
##     o   Strange bug introduced into score.multiple.choice by qgraph
##  reported by Michael Truong.  Fixed by changing class of
##  describe object used in score.multiple.choice to just be a
##  data.frame.

The magic of the apply and by commands and several functions that use them.

There are several functions in base-R that are very powerful replacements for the “for loop” that is more common in other languages (e.g., Fortran, C++, etc.). If you are working with data.frames or matrices, the by command will repeat an operation for as many rows or columns as are in you data.frame.

The help page is not very informative of the power of by and merely says that “(by) is a Function by is an object-oriented wrapper for tapply applied to data frames.” Nor is it very helpful for apply where we are told: “(apply) returns a vector or array or list of values obtained by applying a function to margins of an array or matrix.”

Consider the attitude example data set in R. First show the entire set, then use apply to find the means and then standard deviations of each column.

Finally, we also use the function colMeans which does the same thing, but just for means.

dim(attitude)  #see how big it is
## [1] 30  7
attitude    #show the data   - You rarely want to do this
##    rating complaints privileges learning raises critical advance
## 1      43         51         30       39     61       92      45
## 2      63         64         51       54     63       73      47
## 3      71         70         68       69     76       86      48
## 4      61         63         45       47     54       84      35
## 5      81         78         56       66     71       83      47
## 6      43         55         49       44     54       49      34
## 7      58         67         42       56     66       68      35
## 8      71         75         50       55     70       66      41
## 9      72         82         72       67     71       83      31
## 10     67         61         45       47     62       80      41
## 11     64         53         53       58     58       67      34
## 12     67         60         47       39     59       74      41
## 13     69         62         57       42     55       63      25
## 14     68         83         83       45     59       77      35
## 15     77         77         54       72     79       77      46
## 16     81         90         50       72     60       54      36
## 17     74         85         64       69     79       79      63
## 18     65         60         65       75     55       80      60
## 19     65         70         46       57     75       85      46
## 20     50         58         68       54     64       78      52
## 21     50         40         33       34     43       64      33
## 22     64         61         52       62     66       80      41
## 23     53         66         52       50     63       80      37
## 24     40         37         42       58     50       57      49
## 25     63         54         42       48     66       75      33
## 26     66         77         66       63     88       76      72
## 27     78         75         58       74     80       78      49
## 28     48         57         44       45     51       83      38
## 29     85         85         71       71     77       74      55
## 30     82         82         39       59     64       78      39
apply(attitude,2,mean)    #apply the mean function to every column of attitude
##     rating complaints privileges   learning     raises   critical    advance 
##   64.63333   66.60000   53.13333   56.36667   64.63333   74.76667   42.93333
apply(attitude,2,sd)    #now find the standard deviations for each column
##     rating complaints privileges   learning     raises   critical    advance 
##  12.172562  13.314757  12.235430  11.737013  10.397226   9.894908  10.288706
colMeans(attitude)    #see also rowMeans
##     rating complaints privileges   learning     raises   critical    advance 
##   64.63333   66.60000   53.13333   56.36667   64.63333   74.76667   42.93333

Apply replaces a loop

Here we look at a crude way of doing what the apply function does.

dumbApplyMean <- function(data) {
  result <- rep(NA,NCOL(data))
for(i in 1: NCOL(data)) {
 result [i]<- mean(data[,i])
} 
  return(result)}

#now run it
dumbApplyMean(attitude)
## [1] 64.63333 66.60000 53.13333 56.36667 64.63333 74.76667 42.93333
#compare with just apply
apply(attitude,2,mean)
##     rating complaints privileges   learning     raises   critical    advance 
##   64.63333   66.60000   53.13333   56.36667   64.63333   74.76667   42.93333

The by function

A related function is by which will takes a data.frame, chooses cases that match a particular value of some variable, and then does the fuction request. Here we use the sat.act data set from psych and break the data down by gender, and then describe it.

First we look at a few cases to see what the data look like. Note that we don’t want to look at all of the data because there are so many cases

library(psych) #make it active  (although it already is)
library(psychTools)  #thus, these lines are not really necesssary

head(sat.act)  # A core R command 
##       gender education age ACT SATV SATQ
## 29442      2         3  19  24  500  500
## 29457      2         3  23  35  600  500
## 29498      2         3  20  21  480  470
## 29503      1         4  27  26  550  520
## 29504      1         2  33  31  600  550
## 29518      1         5  26  28  640  640
tail(sat.act) #another core R command
##       gender education age ACT SATV SATQ
## 39904      2         3  20  26  710  680
## 39915      1         3  25  30  500  500
## 39937      1         4  40  27  613  630
## 39951      2         3  24  31  700  630
## 39961      1         4  35  32  700  780
## 39985      1         5  25  25  600  600
#headTail combines these two functions  and adds some parameters
headTail(sat.act)   #see what it looks like (combines head and tail)
##       gender education age ACT SATV SATQ
## 29442      2         3  19  24  500  500
## 29457      2         3  23  35  600  500
## 29498      2         3  20  21  480  470
## 29503      1         4  27  26  550  520
## ...      ...       ... ... ...  ...  ...
## 39937      1         4  40  27  613  630
## 39951      2         3  24  31  700  630
## 39961      1         4  35  32  700  780
## 39985      1         5  25  25  600  600
headTail(sat.act,top=6,bottom=3,from=3,to=5) #using some of the parameters
##       age ACT SATV
## 29442  19  24  500
## 29457  23  35  600
## 29498  20  21  480
## 29503  27  26  550
## 29504  33  31  600
## 29518  26  28  640
## ...   ... ...  ...
## 39951  24  31  700
## 39961  35  32  700
## 39985  25  25  600

The by command

by allows you do to do a command for various levels of a variable

describe(sat.act) #does it for all subjects
##           vars   n   mean     sd median trimmed    mad min max range  skew kurtosis   se
## gender       1 700   1.65   0.48      2    1.68   0.00   1   2     1 -0.61    -1.62 0.02
## education    2 700   3.16   1.43      3    3.31   1.48   0   5     5 -0.68    -0.07 0.05
## age          3 700  25.59   9.50     22   23.86   5.93  13  65    52  1.64     2.42 0.36
## ACT          4 700  28.55   4.82     29   28.84   4.45   3  36    33 -0.66     0.53 0.18
## SATV         5 700 612.23 112.90    620  619.45 118.61 200 800   600 -0.64     0.33 4.27
## SATQ         6 687 610.22 115.64    620  617.25 118.61 200 800   600 -0.59    -0.02 4.41
by(sat.act,sat.act[1],describe) #does it for each value of sat.act[1]
## gender: 1
##           vars   n   mean     sd median trimmed    mad min max range  skew kurtosis   se
## gender       1 247   1.00   0.00      1    1.00   0.00   1   1     0   NaN      NaN 0.00
## education    2 247   3.00   1.54      3    3.12   1.48   0   5     5 -0.54    -0.60 0.10
## age          3 247  25.86   9.74     22   24.23   5.93  14  58    44  1.43     1.43 0.62
## ACT          4 247  28.79   5.06     30   29.23   4.45   3  36    33 -1.06     1.89 0.32
## SATV         5 247 615.11 114.16    630  622.07 118.61 200 800   600 -0.63     0.13 7.26
## SATQ         6 245 635.87 116.02    660  645.53  94.89 300 800   500 -0.72    -0.12 7.41
## --------------------------------------------------------------------------- 
## gender: 2
##           vars   n   mean     sd median trimmed    mad min max range  skew kurtosis   se
## gender       1 453   2.00   0.00      2    2.00   0.00   2   2     0   NaN      NaN 0.00
## education    2 453   3.26   1.35      3    3.40   1.48   0   5     5 -0.74     0.27 0.06
## age          3 453  25.45   9.37     22   23.70   5.93  13  65    52  1.77     3.03 0.44
## ACT          4 453  28.42   4.69     29   28.63   4.45  15  36    21 -0.39    -0.42 0.22
## SATV         5 453 610.66 112.31    620  617.91 103.78 200 800   600 -0.65     0.42 5.28
## SATQ         6 442 596.00 113.07    600  602.21 133.43 200 800   600 -0.58     0.13 5.38

Being a little more transparent in the call to by

The call to by used a variable location, so which variable was it? It is better to do it by variable name

by(sat.act,sat.act$gender,describe) #now we know which variable we were using
## sat.act$gender: 1
##           vars   n   mean     sd median trimmed    mad min max range  skew kurtosis   se
## gender       1 247   1.00   0.00      1    1.00   0.00   1   1     0   NaN      NaN 0.00
## education    2 247   3.00   1.54      3    3.12   1.48   0   5     5 -0.54    -0.60 0.10
## age          3 247  25.86   9.74     22   24.23   5.93  14  58    44  1.43     1.43 0.62
## ACT          4 247  28.79   5.06     30   29.23   4.45   3  36    33 -1.06     1.89 0.32
## SATV         5 247 615.11 114.16    630  622.07 118.61 200 800   600 -0.63     0.13 7.26
## SATQ         6 245 635.87 116.02    660  645.53  94.89 300 800   500 -0.72    -0.12 7.41
## --------------------------------------------------------------------------- 
## sat.act$gender: 2
##           vars   n   mean     sd median trimmed    mad min max range  skew kurtosis   se
## gender       1 453   2.00   0.00      2    2.00   0.00   2   2     0   NaN      NaN 0.00
## education    2 453   3.26   1.35      3    3.40   1.48   0   5     5 -0.74     0.27 0.06
## age          3 453  25.45   9.37     22   23.70   5.93  13  65    52  1.77     3.03 0.44
## ACT          4 453  28.42   4.69     29   28.63   4.45  15  36    21 -0.39    -0.42 0.22
## SATV         5 453 610.66 112.31    620  617.91 103.78 200 800   600 -0.65     0.42 5.28
## SATQ         6 442 596.00 113.07    600  602.21 133.43 200 800   600 -0.58     0.13 5.38
by(sat.act,sat.act["gender"] ,describe) #does the same thing
## gender: 1
##           vars   n   mean     sd median trimmed    mad min max range  skew kurtosis   se
## gender       1 247   1.00   0.00      1    1.00   0.00   1   1     0   NaN      NaN 0.00
## education    2 247   3.00   1.54      3    3.12   1.48   0   5     5 -0.54    -0.60 0.10
## age          3 247  25.86   9.74     22   24.23   5.93  14  58    44  1.43     1.43 0.62
## ACT          4 247  28.79   5.06     30   29.23   4.45   3  36    33 -1.06     1.89 0.32
## SATV         5 247 615.11 114.16    630  622.07 118.61 200 800   600 -0.63     0.13 7.26
## SATQ         6 245 635.87 116.02    660  645.53  94.89 300 800   500 -0.72    -0.12 7.41
## --------------------------------------------------------------------------- 
## gender: 2
##           vars   n   mean     sd median trimmed    mad min max range  skew kurtosis   se
## gender       1 453   2.00   0.00      2    2.00   0.00   2   2     0   NaN      NaN 0.00
## education    2 453   3.26   1.35      3    3.40   1.48   0   5     5 -0.74     0.27 0.06
## age          3 453  25.45   9.37     22   23.70   5.93  13  65    52  1.77     3.03 0.44
## ACT          4 453  28.42   4.69     29   28.63   4.45  15  36    21 -0.39    -0.42 0.22
## SATV         5 453 610.66 112.31    620  617.91 103.78 200 800   600 -0.65     0.42 5.28
## SATQ         6 442 596.00 113.07    600  602.21 133.43 200 800   600 -0.58     0.13 5.38

Using by within functions

Some functions allow you to `by’ within the function.

describeBy(sat.act,"gender")
## 
##  Descriptive statistics by group 
## gender: 1
##           vars   n   mean     sd median trimmed    mad min max range  skew kurtosis   se
## gender       1 247   1.00   0.00      1    1.00   0.00   1   1     0   NaN      NaN 0.00
## education    2 247   3.00   1.54      3    3.12   1.48   0   5     5 -0.54    -0.60 0.10
## age          3 247  25.86   9.74     22   24.23   5.93  14  58    44  1.43     1.43 0.62
## ACT          4 247  28.79   5.06     30   29.23   4.45   3  36    33 -1.06     1.89 0.32
## SATV         5 247 615.11 114.16    630  622.07 118.61 200 800   600 -0.63     0.13 7.26
## SATQ         6 245 635.87 116.02    660  645.53  94.89 300 800   500 -0.72    -0.12 7.41
## --------------------------------------------------------------------------- 
## gender: 2
##           vars   n   mean     sd median trimmed    mad min max range  skew kurtosis   se
## gender       1 453   2.00   0.00      2    2.00   0.00   2   2     0   NaN      NaN 0.00
## education    2 453   3.26   1.35      3    3.40   1.48   0   5     5 -0.74     0.27 0.06
## age          3 453  25.45   9.37     22   23.70   5.93  13  65    52  1.77     3.03 0.44
## ACT          4 453  28.42   4.69     29   28.63   4.45  15  36    21 -0.39    -0.42 0.22
## SATV         5 453 610.66 112.31    620  617.91 103.78 200 800   600 -0.65     0.42 5.28
## SATQ         6 442 596.00 113.07    600  602.21 133.43 200 800   600 -0.58     0.13 5.38

formula input

Better yet, some functions (a slowing growing number in psych) allow for formula input. Sometimes for just some of the elements of a data frame (in which case you need to specify what the data are).

This is of the form of

describe(ACT + SATV + SATQ ~ gender, data=sat.act)
## 
##  Descriptive statistics by group 
## gender: 1
##      vars   n   mean     sd median trimmed    mad min max range  skew kurtosis   se
## ACT     1 247  28.79   5.06     30   29.23   4.45   3  36    33 -1.06     1.89 0.32
## SATV    2 247 615.11 114.16    630  622.07 118.61 200 800   600 -0.63     0.13 7.26
## SATQ    3 245 635.87 116.02    660  645.53  94.89 300 800   500 -0.72    -0.12 7.41
## --------------------------------------------------------------------------- 
## gender: 2
##      vars   n   mean     sd median trimmed    mad min max range  skew kurtosis   se
## ACT     1 453  28.42   4.69     29   28.63   4.45  15  36    21 -0.39    -0.42 0.22
## SATV    2 453 610.66 112.31    620  617.91 103.78 200 800   600 -0.65     0.42 5.28
## SATQ    3 442 596.00 113.07    600  602.21 133.43 200 800   600 -0.58     0.13 5.38
describe(sat.act ~ gender)
## 
##  Descriptive statistics by group 
## group: 1
##           vars   n   mean     sd median trimmed    mad min max range  skew kurtosis   se
## gender       1 247   1.00   0.00      1    1.00   0.00   1   1     0   NaN      NaN 0.00
## education    2 247   3.00   1.54      3    3.12   1.48   0   5     5 -0.54    -0.60 0.10
## age          3 247  25.86   9.74     22   24.23   5.93  14  58    44  1.43     1.43 0.62
## ACT          4 247  28.79   5.06     30   29.23   4.45   3  36    33 -1.06     1.89 0.32
## SATV         5 247 615.11 114.16    630  622.07 118.61 200 800   600 -0.63     0.13 7.26
## SATQ         6 245 635.87 116.02    660  645.53  94.89 300 800   500 -0.72    -0.12 7.41
## --------------------------------------------------------------------------- 
## group: 2
##           vars   n   mean     sd median trimmed    mad min max range  skew kurtosis   se
## gender       1 453   2.00   0.00      2    2.00   0.00   2   2     0   NaN      NaN 0.00
## education    2 453   3.26   1.35      3    3.40   1.48   0   5     5 -0.74     0.27 0.06
## age          3 453  25.45   9.37     22   23.70   5.93  13  65    52  1.77     3.03 0.44
## ACT          4 453  28.42   4.69     29   28.63   4.45  15  36    21 -0.39    -0.42 0.22
## SATV         5 453 610.66 112.31    620  617.91 103.78 200 800   600 -0.65     0.42 5.28
## SATQ         6 442 596.00 113.07    600  602.21 133.43 200 800   600 -0.58     0.13 5.38

Note, this does not work for violin

Scoring scales and estimating one measures of internal consistency (\(\alpha\))

A typical study involves aggregating individual items into “scales”. To this, we will use a built in data set (the ability data) and find the average number of items each person got correct. We will use the scoreItems function to do this.

It is useful to read the help pages for ability to get a feel of what the data are, and then the help page for scoreItems to see what we are going to do.

Part of the ability help page:

16 multiple choice ability items 1525 subjects taken from the Synthetic Aperture Personality Assessment (SAPA) web based personality assessment project are saved as iqitems. Those data are shown as examples of how to score multiple choice tests and analyses of response alternatives. When scored correct or incorrect, the data are useful for demonstrations of tetrachoric based factor analysis irt.fa and finding tetrachoric correlations.

Part of the scoreItems help page:

Given a data.frame or matrix of n items and N observations and a list of the direction to score them (a keys.list with k keys) find the sum scores or average scores for each person and each scale. In addition, report Cronbach’s alpha, Guttman’s Lambda 6, the average r, the scale intercorrelations, and the item by scale correlations (raw and corrected for item overlap). Replace missing values with the item median or mean if desired. Items may be keyed 1 (score it), -1 ) (reverse score it), or 0 (do not score it). Negatively keyed items will be reverse scored. Although prior versions used a keys matrix, it is now recommended to just use a list of scoring keys. See make.keys for a convenient way to make the keys file. If the input is a square matrix, then it is assumed that the input is a covariance or correlation matix and scores are not found, but the item statistics are reported. (Similar functionality to cluster.cor). response.frequencies reports the frequency of item endorsements fore each response category for polytomous or multiple choice items.

Get the data and describe it

Since we are using a built in data set, we just call it by name (ability), but if you want to do this for your own data, you need to first read the data in.

library(psych)  #you already did this, didn't you
my.data <- ability
describe(my.data)   #or, if you prefer
##           vars    n mean   sd median trimmed mad min max range  skew kurtosis   se
## reason.4     1 1442 0.68 0.47      1    0.72   0   0   1     1 -0.75    -1.44 0.01
## reason.16    2 1463 0.73 0.45      1    0.78   0   0   1     1 -1.02    -0.96 0.01
## reason.17    3 1440 0.74 0.44      1    0.80   0   0   1     1 -1.08    -0.84 0.01
## reason.19    4 1456 0.64 0.48      1    0.68   0   0   1     1 -0.60    -1.64 0.01
## letter.7     5 1441 0.63 0.48      1    0.67   0   0   1     1 -0.56    -1.69 0.01
## letter.33    6 1438 0.61 0.49      1    0.63   0   0   1     1 -0.43    -1.82 0.01
## letter.34    7 1455 0.64 0.48      1    0.68   0   0   1     1 -0.59    -1.65 0.01
## letter.58    8 1438 0.47 0.50      0    0.46   0   0   1     1  0.12    -1.99 0.01
## matrix.45    9 1458 0.55 0.50      1    0.56   0   0   1     1 -0.20    -1.96 0.01
## matrix.46   10 1470 0.57 0.50      1    0.59   0   0   1     1 -0.28    -1.92 0.01
## matrix.47   11 1465 0.64 0.48      1    0.67   0   0   1     1 -0.57    -1.67 0.01
## matrix.55   12 1459 0.39 0.49      0    0.36   0   0   1     1  0.45    -1.80 0.01
## rotate.3    13 1456 0.20 0.40      0    0.13   0   0   1     1  1.48     0.19 0.01
## rotate.4    14 1460 0.22 0.42      0    0.15   0   0   1     1  1.34    -0.21 0.01
## rotate.6    15 1456 0.31 0.46      0    0.27   0   0   1     1  0.80    -1.35 0.01
## rotate.8    16 1460 0.19 0.39      0    0.12   0   0   1     1  1.55     0.41 0.01
#describe(ability)

The alpha function will score one scale and report some useful statistics

This is discussed in the next assignment

Although seriously flawed, \(\alpha\) is perhaps the most commonly reported estimate of reliability. Although several other functions in psych are recommended, the alpha function is included for those new to R.

If all the items are keyed in the same direction, just call alpha with your data. In addition to various item statistics, the scores object will have the average (preferred) or sum scores (optional) for your data.

Here we use alpha on the ability data set.

ab.al <- alpha(ability)
ab.al   #see the results
## 
## Reliability analysis   
## Call: alpha(x = ability)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd median_r
##       0.83      0.83    0.84      0.23 4.9 0.0064 0.51 0.25     0.21
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.82  0.83  0.84
## Duhachek  0.82  0.83  0.84
## 
##  Reliability if an item is dropped:
##           raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r med.r
## reason.4       0.82      0.82    0.82      0.23 4.5   0.0069 0.0060  0.21
## reason.16      0.82      0.82    0.83      0.24 4.7   0.0067 0.0061  0.22
## reason.17      0.82      0.82    0.82      0.23 4.5   0.0069 0.0058  0.21
## reason.19      0.82      0.82    0.83      0.24 4.6   0.0067 0.0061  0.21
## letter.7       0.82      0.82    0.82      0.23 4.5   0.0068 0.0057  0.21
## letter.33      0.82      0.82    0.83      0.24 4.6   0.0068 0.0060  0.21
## letter.34      0.82      0.82    0.82      0.23 4.5   0.0069 0.0056  0.21
## letter.58      0.82      0.82    0.82      0.23 4.5   0.0069 0.0062  0.21
## matrix.45      0.82      0.83    0.83      0.24 4.7   0.0066 0.0060  0.23
## matrix.46      0.82      0.82    0.83      0.24 4.7   0.0067 0.0060  0.22
## matrix.47      0.82      0.82    0.83      0.24 4.6   0.0067 0.0063  0.21
## matrix.55      0.83      0.83    0.83      0.24 4.8   0.0065 0.0058  0.23
## rotate.3       0.82      0.82    0.82      0.23 4.6   0.0067 0.0045  0.23
## rotate.4       0.82      0.82    0.82      0.23 4.5   0.0068 0.0046  0.22
## rotate.6       0.82      0.82    0.82      0.23 4.5   0.0068 0.0051  0.22
## rotate.8       0.82      0.82    0.83      0.24 4.6   0.0067 0.0048  0.23
## 
##  Item statistics 
##              n raw.r std.r r.cor r.drop mean   sd
## reason.4  1442  0.58  0.58  0.54   0.50 0.68 0.47
## reason.16 1463  0.50  0.50  0.44   0.41 0.73 0.45
## reason.17 1440  0.57  0.57  0.54   0.49 0.74 0.44
## reason.19 1456  0.53  0.52  0.47   0.43 0.64 0.48
## letter.7  1441  0.57  0.56  0.52   0.48 0.63 0.48
## letter.33 1438  0.54  0.53  0.48   0.44 0.61 0.49
## letter.34 1455  0.58  0.57  0.54   0.49 0.64 0.48
## letter.58 1438  0.58  0.57  0.52   0.48 0.47 0.50
## matrix.45 1458  0.49  0.48  0.42   0.38 0.55 0.50
## matrix.46 1470  0.50  0.49  0.43   0.40 0.57 0.50
## matrix.47 1465  0.53  0.52  0.47   0.43 0.64 0.48
## matrix.55 1459  0.43  0.42  0.35   0.32 0.39 0.49
## rotate.3  1456  0.52  0.54  0.51   0.44 0.20 0.40
## rotate.4  1460  0.56  0.58  0.56   0.48 0.22 0.42
## rotate.6  1456  0.55  0.56  0.53   0.46 0.31 0.46
## rotate.8  1460  0.49  0.51  0.47   0.41 0.19 0.39
## 
## Non missing response frequency for each item
##              0    1 miss
## reason.4  0.32 0.68 0.05
## reason.16 0.27 0.73 0.04
## reason.17 0.26 0.74 0.06
## reason.19 0.36 0.64 0.05
## letter.7  0.37 0.63 0.06
## letter.33 0.39 0.61 0.06
## letter.34 0.36 0.64 0.05
## letter.58 0.53 0.47 0.06
## matrix.45 0.45 0.55 0.04
## matrix.46 0.43 0.57 0.04
## matrix.47 0.36 0.64 0.04
## matrix.55 0.61 0.39 0.04
## rotate.3  0.80 0.20 0.05
## rotate.4  0.78 0.22 0.04
## rotate.6  0.69 0.31 0.05
## rotate.8  0.81 0.19 0.04
names(ab.al)  #what are the various objects included in the output
##  [1] "total"         "alpha.drop"    "item.stats"    "response.freq" "keys"          "scores"       
##  [7] "nvar"          "boot.ci"       "boot"          "feldt"         "Unidim"        "var.r"        
## [13] "Fit"           "call"          "title"
describe(ab.al$scores)
##    vars    n mean   sd median trimmed  mad min max range skew kurtosis   se
## X1    1 1509 0.51 0.25    0.5    0.51 0.28   0   1     1 0.02    -0.83 0.01

Keying items for direction

The ability items were all keyed in the same direction (correct vs. incorrect). Some personality items need to be reversed scored (“do you like to go to lively parties” is the opposite of “do you tend to keep quiet at parties”). alpha by default will not reverse key items, but you can ask it do so automatically. Or, you can specify which items to reverse key.

Consider the first 5 items of the bfi which are five items measuring “Agreeableness”. If we do not reverse score them, we find a very poor alpha (and a warning that we might want to do so.) If we turn on the check.keys option, alpha will automatically reverse items that correlate negativel with the total score.

alpha(bfi[1:5])  #this will produce a terrible result
## Warning in alpha(bfi[1:5]): Some items were negatively correlated with the first principal component and probably 
## should be reversed.  
## To do this, run the function again with the 'check.keys=TRUE' option
## Some items ( A1 ) were negatively correlated with the first principal component and 
## probably should be reversed.  
## To do this, run the function again with the 'check.keys=TRUE' option
## 
## Reliability analysis   
## Call: alpha(x = bfi[1:5])
## 
##   raw_alpha std.alpha G6(smc) average_r  S/N   ase mean   sd median_r
##       0.43      0.46    0.53      0.14 0.85 0.016  4.2 0.74     0.32
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt      0.4  0.43  0.46
## Duhachek   0.4  0.43  0.46
## 
##  Reliability if an item is dropped:
##    raw_alpha std.alpha G6(smc) average_r  S/N alpha se  var.r med.r
## A1      0.72      0.72    0.67     0.397 2.63   0.0087 0.0065 0.375
## A2      0.28      0.30    0.39     0.096 0.43   0.0219 0.1095 0.081
## A3      0.18      0.21    0.31     0.061 0.26   0.0249 0.1014 0.081
## A4      0.25      0.30    0.44     0.099 0.44   0.0229 0.1604 0.104
## A5      0.21      0.24    0.36     0.071 0.31   0.0238 0.1309 0.094
## 
##  Item statistics 
##       n raw.r std.r r.cor r.drop mean  sd
## A1 2784 0.066 0.024 -0.39  -0.31  2.4 1.4
## A2 2773 0.630 0.665  0.58   0.37  4.8 1.2
## A3 2774 0.724 0.742  0.71   0.48  4.6 1.3
## A4 2781 0.686 0.661  0.50   0.37  4.7 1.5
## A5 2784 0.700 0.719  0.64   0.45  4.6 1.3
## 
## Non missing response frequency for each item
##       1    2    3    4    5    6 miss
## A1 0.33 0.29 0.14 0.12 0.08 0.03 0.01
## A2 0.02 0.05 0.05 0.20 0.37 0.31 0.01
## A3 0.03 0.06 0.07 0.20 0.36 0.27 0.01
## A4 0.05 0.08 0.07 0.16 0.24 0.41 0.01
## A5 0.02 0.07 0.09 0.22 0.35 0.25 0.01
alpha(bfi[1:5], check.keys = TRUE)  # a much better result
## Warning in alpha(bfi[1:5], check.keys = TRUE): Some items were negatively correlated with the first principal component and were automatically reversed.
##  This is indicated by a negative sign for the variable name.
## 
## Reliability analysis   
## Call: alpha(x = bfi[1:5], check.keys = TRUE)
## 
##   raw_alpha std.alpha G6(smc) average_r  S/N   ase mean  sd median_r
##        0.7      0.46    0.53      0.14 0.85 0.009  4.7 0.9     0.32
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.69   0.7  0.72
## Duhachek  0.69   0.7  0.72
## 
##  Reliability if an item is dropped:
##     raw_alpha std.alpha G6(smc) average_r  S/N alpha se  var.r med.r
## A1-      0.72      0.72    0.67     0.397 2.63   0.0087 0.0065 0.375
## A2       0.62      0.30    0.39     0.096 0.43   0.0119 0.1095 0.081
## A3       0.60      0.21    0.31     0.061 0.26   0.0124 0.1014 0.081
## A4       0.69      0.30    0.44     0.099 0.44   0.0098 0.1604 0.104
## A5       0.64      0.24    0.36     0.071 0.31   0.0111 0.1309 0.094
## 
##  Item statistics 
##        n raw.r std.r r.cor r.drop mean  sd
## A1- 2784  0.58 0.024 -0.39   0.31  4.6 1.4
## A2  2773  0.73 0.665  0.58   0.56  4.8 1.2
## A3  2774  0.76 0.742  0.71   0.59  4.6 1.3
## A4  2781  0.65 0.661  0.50   0.39  4.7 1.5
## A5  2784  0.69 0.719  0.64   0.49  4.6 1.3
## 
## Non missing response frequency for each item
##       1    2    3    4    5    6 miss
## A1 0.33 0.29 0.14 0.12 0.08 0.03 0.01
## A2 0.02 0.05 0.05 0.20 0.37 0.31 0.01
## A3 0.03 0.06 0.07 0.20 0.36 0.27 0.01
## A4 0.05 0.08 0.07 0.16 0.24 0.41 0.01
## A5 0.02 0.07 0.09 0.22 0.35 0.25 0.01

What is this thing called alpha and what does it mean?

See next week.

Some comments on useful functions

There are many functions in R that allow you to useful things. Here I will show several of them.

The first is the paste function which allows you to create strings with certain characteristic. A variation of the paste function is paste0 which automatically does not put spaces between the objects.

my.names <- paste("Variable",1:5)  #separation defaults to a blank space
my.names1 <- paste("Variable",1:5, sep="") #Speficy the sep to be nothing
my.short.names <- paste0("Var", 1:8) #just do it directly
my.names
## [1] "Variable 1" "Variable 2" "Variable 3" "Variable 4" "Variable 5"
my.names1
## [1] "Variable1" "Variable2" "Variable3" "Variable4" "Variable5"
my.short.names
## [1] "Var1" "Var2" "Var3" "Var4" "Var5" "Var6" "Var7" "Var8"

Creating a dictionary

We looked at the bfi.dictionary which is a helpful way to see what the variable s are when you do the analyses. To create such a dictionary is relatively straight forward, but requires using an external editor (such as EXCEL or BBEDIT). You want to specify at least 2 columns in your text file: the item number and the item content. If you want, you can specify more columns to have more identfication information.

Consider a text file that looks like this (copied from Excel with tabs between columns)

quote name ItemLabel Item Giant3 Big6

A1 q_146, Am indifferent to the feelings of others. Cohesion, Agreeableness

A2 q_1162, Inquire about others well-being. Cohesion Agreeableness

A3 q_1206 Know how to comfort others. Cohesion Agreeableness

A4 q_1364 Love children. Cohesion Agreeableness

A5 q_1419 Make people feel at ease. Cohesion Agreeableness

C1 q_124 Am exacting in my work. Stability Conscientiousness

C2 q_530 Continue until everything is perfect. Stability Conscientiousness

C3 q_619 Do things according to a plan. Stability Conscientiousness

C4 q_626 Do things in a half-way manner. Stability Conscientiousness

C5 q_1949 Waste my time. Stability Conscientiousness

Note, that we can not have any fancy formatting such as quotes or apostrophes (” or ’)

Copy these items to the clipboard using the read.clipboard.tab command and then show it

To make this example work, I created file on the class server

#temp <- read.clipboard.tab() # not done because we can not do it dynamically
file.name <- "http://personality-project.org/courses/314/temp.dictionary.txt"
temp <- read.delim(file.name)   #this reads a tab delimted file

Now, we need to make the rownames of this object be equal to the name

my.dictionary <- temp
rownames(my.dictionary) <- temp[,1] #the first column of temp
my.dictionary <- my.dictionary[-1] #get rid of the redundant column
my.dictionary   #show it
##    ItemLabel                                      Item    Giant3              Big6
## A1     q_146 Am indifferent to the feelings of others.  Cohesion     Agreeableness
## A2    q_1162          Inquire about others well-being.  Cohesion     Agreeableness
## A3    q_1206               Know how to comfort others.  Cohesion     Agreeableness
## A4    q_1364                            Love children.  Cohesion     Agreeableness
## A5    q_1419                 Make people feel at ease.  Cohesion     Agreeableness
## C1     q_124                   Am exacting in my work. Stability Conscientiousness
## C2     q_530     Continue until everything is perfect. Stability Conscientiousness
## C3     q_619            Do things according to a plan. Stability Conscientiousness
## C4     q_626           Do things in a half-way manner. Stability Conscientiousness
## C5    q_1949                            Waste my time. Stability Conscientiousness

We can then use my.dictionary in other functions.

Different data types

Most of the examples we have been using come from psych package and are either correlation matrices or data.frames. But some of the data sets in core-R are time series or three (or more) dimensional arrays. The standard describe function will not work on these types of data.

To explore the structure of an object, the str , dim, and names commands are very helpful. str will tell you the class and type of an object, dim will tell you how many dimensions it has (if it is at least 2), and names will tell you the objects nested within the object of interest.

To find the list of all of the example data sets in a package

data(package="psych")

Homework (due on Monday)

  1. Read in some data (or use one of the standard data sets)

  2. describe it

  3. Try using the apply function to find column or row means

  4. Compare this to the result using colMeans

  5. Compare this to the result you found in 2) (describe)

  6. Try finding the mean scores for each subject on a simple data set (e.g, the attitude data set or one of your own.)