The first part of the RMarkdown sets up various parameters. By setting the width to 100, we get output files that do not wrap the describe output. Normally we do not want to show the setup but here we echo it so that we see how to widen the output.
knitr::opts_chunk$set(echo = TRUE)
options(width=100) #This sets the width of the output, 80 seems to be the default and is too narrow
R packages are always under development. For instance, there was a
suggestion a year ago about how to make the describe
function more useful. This was then implemented in the next version of
psych
(2.4.3). The current release is 2.4.3
Rather than push the updated version to CRAN each time and in order
to reduce the demands on the CRAN system, it is better to save small
changes on local ‘repositories’ which have not gone through all the
overhead of CRAN. The local repository for the psych package is at
http://personality-project.org/r . To install originally from this
repository we use the install.packages
command. Or, we can
check the repository to see if we need to update our package using the
update.packages
command. (It is not clear if this actually
does anything.)
Sometimes, I update psych
without changing the version
number. You can check the devevelopment date using the
packageDate("psych")
command.
#This is commented out if I have already done this
#install.packages("psych", repos = "http://personality-project.org/r", type = "source") # or
#update.packages("psych", repos = "http://personality-project.org/r", type = "source") # if we already have it although does not seem to work
Lets make sure we succeeded. Check the version number of the package. It should now be version 2.4.4.
As the psych
package grew, it became too big and some of
the data and a few helper functions were moved to a new package
`psychTools’. For many of the demonstrations in class, we will need both
packages.
Every time you start a new R session, you need to make those packages active.
library(psych) #remember to make this active
library(psychTools)
sessionInfo() #see what version we have (should be 2.4.4)
## R version 4.3.3 (2024-02-29)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.4.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/Chicago
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] psychTools_2.4.3 psych_2.4.4
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.31 R6_2.5.1 fastmap_1.1.1 xfun_0.41 lattice_0.22-5
## [6] cachem_1.0.7 parallel_4.3.3 knitr_1.45 foreign_0.8-86 htmltools_0.5.5
## [11] rmarkdown_2.21 cli_3.6.1 R.methodsS3_1.8.2 grid_4.3.3 sass_0.4.5
## [16] jquerylib_0.1.4 mnormt_2.1.1 compiler_4.3.3 R.oo_1.25.0 rstudioapi_0.14
## [21] tools_4.3.3 rtf_0.4-14.1 nlme_3.1-164 evaluate_0.20 bslib_0.4.2
## [26] yaml_2.3.7 rlang_1.1.0 jsonlite_1.8.4
packageDate("psych") #should be "2024-04-07"
## [1] "2024-04-07"
To see what has been added to a package you can read the
news
file for that package. You can read all of the news
for a package, or select just the most recent. Here we read the news for
the most recent release of psych
. Note that now we are
specifying Version> “2.4.3” so that we are just getting the most
recent (last month and this month’s) changes.
news(Version> "2.4.3",package="psych")
## Changes in version 2.4.4
##
## Still to do
##
## o Change examples to dontrun rather than commented out ## mostly
## done
##
## o Add the ability to get factor scores for higher order factors
## (requested by Brandon Vaughn). Partly done in omega, but there
## are problems associated with the entire problem of factor
## scores.
##
## o Fix a bug in alpha when finding average R (need to use cor
## rather than cov2cor when treating missing data)
##
## o Add weight option to statsBy (but the question is how to handle
## missing data)
##
## o Adjust the se for correlations (in bestScales) to take into
## account the pairwise count, rather than the n.obs
##
## o Improve documentation for thurstone. Would help if we had an
## example of scaling.
##
## o scoreIrt.2pl needs to be able to adjust for group differences
## correctly
##
## o Possible problem in pooled correlations for statsBy
##
## o Think about improvements in itemLookup to show means if desired
##
## o Fix bug in polychoric where it doesn't handle empty cells very
## well. (reported by Björn Büdenbender). Or at least improve the
## documentation about why this is a problem.
##
## o Fix bug in mediate for multiple dependent variables (reported
## by Martin Zeschke)
##
## o scoreItem should find the mean item correlation with na.rm=TRUE
## (Liz D)
##
## o create a scoreFastBy
##
## o improve the help menu for sim.multi and return the latent
## values
##
## o think about linking person scores to item difficulty scores in
## IRT
##
## o Left justify content in lookupItems (not sure how to do this)
##
## o Consider using the Jennrich approach to standard errors of
## rotated loadings
##
## o Still need to fix mediate to handle more than 1 DVs
##
## o Possible problem with the use of try in several functions.
##
## o Need to fix omega so that alpha is correctly calculated in case
## of no general factor
##
## o It would be nice to add an option to display confidence
## intervals of correlations in error.dots
##
## o allow omega style hierarchical structure for x (and y) in esem
##
## Additions
##
## o Added the score option to scoreOverlap so that if given raw
## data, it will find the scale scores as well as the corrected
## for overlap statistics. Note that the scale scores are not
## corrected for overlap.
##
## o Minor tweaks to scoreIrt.2pl to check for bad data (missing
## cells ). This was leading to the function hanging forever in
## the case of pairwise counts of 0.
##
## o Added fix.dlpyr (as a local function) to statsBy and describeBy
## to not choke on Tibbles. Had added this to alpha several years
## ago.
##
## o Added itemSort to combine item means and item content and then
## sort them.
##
## o Added the median option to describe with the fast=TRUE option
## (requested by David Condon)
##
## o Parallelized describe. Now will describe 644K subjects with
## 6633 variables in 64 seconds (with fast=TRUE option).
## Improvement in speed varies by numbers of cores.
##
## o Parallelized bigCor for very significant savings.
##
## o Added the hist.border option to pairs.panels following a
## request by Jordan Adamson.
##
## o Added the w.exp (weighting exponent to cohen.kappa following a
## request from Julius Pfadt)
##
## o Added n.obs to unidim to properly find statistics from
## correlation matrices
##
## o Added a number of fit statistics to unidim
##
## o Added the ability to find Spearman correlations to bigCor.
## Added a check that the data are a data.frame rather than a
## data.table for bigCor.
##
## o Changed the class of describe output to be c("psych",
## "Pdescribe" ,"data.frame") to avoid choking if Hmisc is in use
##
## o Added the ability to use mean.weights in statsBy (still under
## testing)
##
## o Added paSelect to allow parallel analyses of subsets of items
## using a keys list.
##
## o Added vssSelect to allow VSS analyses of subsets of items using
## a kes list.
##
## Bugs fixed
##
## o changed the number of elements in the unit.Circle in
## spider/radar to get more precision
##
## o Added a conversion to numeric in mediate to get around a
## problem with variables that are factors. (Problem reported by
## Lynn Bueeler )
##
## o Finally fixed bug in alpha where it finds R from cov2cor
## instead of directly (reported by Karl Ove Hufthammer )
##
## o Fixed item.lookup so that it will report item means as well as
## factor loadings.
##
## o Strange bug introduced into score.multiple.choice by qgraph
## reported by Michael Truong. Fixed by changing class of
## describe object used in score.multiple.choice to just be a
## data.frame.
apply
and by
commands and
several functions that use them.There are several functions in base-R that are very powerful
replacements for the “for loop” that is more common in other languages
(e.g., Fortran, C++, etc.). If you are working with data.frames or
matrices, the by
command will repeat an operation for as
many rows or columns as are in you data.frame.
The help page is not very informative of the power of by
and merely says that “(by) is a Function by is an object-oriented
wrapper for tapply applied to data frames.” Nor is it very helpful for
apply
where we are told: “(apply) returns a vector or array
or list of values obtained by applying a function to margins of an array
or matrix.”
Consider the attitude
example data set in R. First show
the entire set, then use apply
to find the means and then
standard deviations of each column.
Finally, we also use the function colMeans
which does
the same thing, but just for means.
dim(attitude) #see how big it is
## [1] 30 7
attitude #show the data - You rarely want to do this
## rating complaints privileges learning raises critical advance
## 1 43 51 30 39 61 92 45
## 2 63 64 51 54 63 73 47
## 3 71 70 68 69 76 86 48
## 4 61 63 45 47 54 84 35
## 5 81 78 56 66 71 83 47
## 6 43 55 49 44 54 49 34
## 7 58 67 42 56 66 68 35
## 8 71 75 50 55 70 66 41
## 9 72 82 72 67 71 83 31
## 10 67 61 45 47 62 80 41
## 11 64 53 53 58 58 67 34
## 12 67 60 47 39 59 74 41
## 13 69 62 57 42 55 63 25
## 14 68 83 83 45 59 77 35
## 15 77 77 54 72 79 77 46
## 16 81 90 50 72 60 54 36
## 17 74 85 64 69 79 79 63
## 18 65 60 65 75 55 80 60
## 19 65 70 46 57 75 85 46
## 20 50 58 68 54 64 78 52
## 21 50 40 33 34 43 64 33
## 22 64 61 52 62 66 80 41
## 23 53 66 52 50 63 80 37
## 24 40 37 42 58 50 57 49
## 25 63 54 42 48 66 75 33
## 26 66 77 66 63 88 76 72
## 27 78 75 58 74 80 78 49
## 28 48 57 44 45 51 83 38
## 29 85 85 71 71 77 74 55
## 30 82 82 39 59 64 78 39
apply(attitude,2,mean) #apply the mean function to every column of attitude
## rating complaints privileges learning raises critical advance
## 64.63333 66.60000 53.13333 56.36667 64.63333 74.76667 42.93333
apply(attitude,2,sd) #now find the standard deviations for each column
## rating complaints privileges learning raises critical advance
## 12.172562 13.314757 12.235430 11.737013 10.397226 9.894908 10.288706
colMeans(attitude) #see also rowMeans
## rating complaints privileges learning raises critical advance
## 64.63333 66.60000 53.13333 56.36667 64.63333 74.76667 42.93333
Here we look at a crude way of doing what the apply function does.
dumbApplyMean <- function(data) {
result <- rep(NA,NCOL(data))
for(i in 1: NCOL(data)) {
result [i]<- mean(data[,i])
}
return(result)}
#now run it
dumbApplyMean(attitude)
## [1] 64.63333 66.60000 53.13333 56.36667 64.63333 74.76667 42.93333
#compare with just apply
apply(attitude,2,mean)
## rating complaints privileges learning raises critical advance
## 64.63333 66.60000 53.13333 56.36667 64.63333 74.76667 42.93333
A related function is by
which will takes a data.frame,
chooses cases that match a particular value of some variable, and then
does the fuction request. Here we use the sat.act
data set
from psych
and break the data down by gender, and then
describe it.
First we look at a few cases to see what the data look like. Note that we don’t want to look at all of the data because there are so many cases
library(psych) #make it active (although it already is)
library(psychTools) #thus, these lines are not really necesssary
head(sat.act) # A core R command
## gender education age ACT SATV SATQ
## 29442 2 3 19 24 500 500
## 29457 2 3 23 35 600 500
## 29498 2 3 20 21 480 470
## 29503 1 4 27 26 550 520
## 29504 1 2 33 31 600 550
## 29518 1 5 26 28 640 640
tail(sat.act) #another core R command
## gender education age ACT SATV SATQ
## 39904 2 3 20 26 710 680
## 39915 1 3 25 30 500 500
## 39937 1 4 40 27 613 630
## 39951 2 3 24 31 700 630
## 39961 1 4 35 32 700 780
## 39985 1 5 25 25 600 600
#headTail combines these two functions and adds some parameters
headTail(sat.act) #see what it looks like (combines head and tail)
## gender education age ACT SATV SATQ
## 29442 2 3 19 24 500 500
## 29457 2 3 23 35 600 500
## 29498 2 3 20 21 480 470
## 29503 1 4 27 26 550 520
## ... ... ... ... ... ... ...
## 39937 1 4 40 27 613 630
## 39951 2 3 24 31 700 630
## 39961 1 4 35 32 700 780
## 39985 1 5 25 25 600 600
headTail(sat.act,top=6,bottom=3,from=3,to=5) #using some of the parameters
## age ACT SATV
## 29442 19 24 500
## 29457 23 35 600
## 29498 20 21 480
## 29503 27 26 550
## 29504 33 31 600
## 29518 26 28 640
## ... ... ... ...
## 39951 24 31 700
## 39961 35 32 700
## 39985 25 25 600
by
commandby
allows you do to do a command for various levels of a
variable
describe(sat.act) #does it for all subjects
## vars n mean sd median trimmed mad min max range skew kurtosis se
## gender 1 700 1.65 0.48 2 1.68 0.00 1 2 1 -0.61 -1.62 0.02
## education 2 700 3.16 1.43 3 3.31 1.48 0 5 5 -0.68 -0.07 0.05
## age 3 700 25.59 9.50 22 23.86 5.93 13 65 52 1.64 2.42 0.36
## ACT 4 700 28.55 4.82 29 28.84 4.45 3 36 33 -0.66 0.53 0.18
## SATV 5 700 612.23 112.90 620 619.45 118.61 200 800 600 -0.64 0.33 4.27
## SATQ 6 687 610.22 115.64 620 617.25 118.61 200 800 600 -0.59 -0.02 4.41
by(sat.act,sat.act[1],describe) #does it for each value of sat.act[1]
## gender: 1
## vars n mean sd median trimmed mad min max range skew kurtosis se
## gender 1 247 1.00 0.00 1 1.00 0.00 1 1 0 NaN NaN 0.00
## education 2 247 3.00 1.54 3 3.12 1.48 0 5 5 -0.54 -0.60 0.10
## age 3 247 25.86 9.74 22 24.23 5.93 14 58 44 1.43 1.43 0.62
## ACT 4 247 28.79 5.06 30 29.23 4.45 3 36 33 -1.06 1.89 0.32
## SATV 5 247 615.11 114.16 630 622.07 118.61 200 800 600 -0.63 0.13 7.26
## SATQ 6 245 635.87 116.02 660 645.53 94.89 300 800 500 -0.72 -0.12 7.41
## ---------------------------------------------------------------------------
## gender: 2
## vars n mean sd median trimmed mad min max range skew kurtosis se
## gender 1 453 2.00 0.00 2 2.00 0.00 2 2 0 NaN NaN 0.00
## education 2 453 3.26 1.35 3 3.40 1.48 0 5 5 -0.74 0.27 0.06
## age 3 453 25.45 9.37 22 23.70 5.93 13 65 52 1.77 3.03 0.44
## ACT 4 453 28.42 4.69 29 28.63 4.45 15 36 21 -0.39 -0.42 0.22
## SATV 5 453 610.66 112.31 620 617.91 103.78 200 800 600 -0.65 0.42 5.28
## SATQ 6 442 596.00 113.07 600 602.21 133.43 200 800 600 -0.58 0.13 5.38
The call to by
used a variable location, so which
variable was it? It is better to do it by variable name
by(sat.act,sat.act$gender,describe) #now we know which variable we were using
## sat.act$gender: 1
## vars n mean sd median trimmed mad min max range skew kurtosis se
## gender 1 247 1.00 0.00 1 1.00 0.00 1 1 0 NaN NaN 0.00
## education 2 247 3.00 1.54 3 3.12 1.48 0 5 5 -0.54 -0.60 0.10
## age 3 247 25.86 9.74 22 24.23 5.93 14 58 44 1.43 1.43 0.62
## ACT 4 247 28.79 5.06 30 29.23 4.45 3 36 33 -1.06 1.89 0.32
## SATV 5 247 615.11 114.16 630 622.07 118.61 200 800 600 -0.63 0.13 7.26
## SATQ 6 245 635.87 116.02 660 645.53 94.89 300 800 500 -0.72 -0.12 7.41
## ---------------------------------------------------------------------------
## sat.act$gender: 2
## vars n mean sd median trimmed mad min max range skew kurtosis se
## gender 1 453 2.00 0.00 2 2.00 0.00 2 2 0 NaN NaN 0.00
## education 2 453 3.26 1.35 3 3.40 1.48 0 5 5 -0.74 0.27 0.06
## age 3 453 25.45 9.37 22 23.70 5.93 13 65 52 1.77 3.03 0.44
## ACT 4 453 28.42 4.69 29 28.63 4.45 15 36 21 -0.39 -0.42 0.22
## SATV 5 453 610.66 112.31 620 617.91 103.78 200 800 600 -0.65 0.42 5.28
## SATQ 6 442 596.00 113.07 600 602.21 133.43 200 800 600 -0.58 0.13 5.38
by(sat.act,sat.act["gender"] ,describe) #does the same thing
## gender: 1
## vars n mean sd median trimmed mad min max range skew kurtosis se
## gender 1 247 1.00 0.00 1 1.00 0.00 1 1 0 NaN NaN 0.00
## education 2 247 3.00 1.54 3 3.12 1.48 0 5 5 -0.54 -0.60 0.10
## age 3 247 25.86 9.74 22 24.23 5.93 14 58 44 1.43 1.43 0.62
## ACT 4 247 28.79 5.06 30 29.23 4.45 3 36 33 -1.06 1.89 0.32
## SATV 5 247 615.11 114.16 630 622.07 118.61 200 800 600 -0.63 0.13 7.26
## SATQ 6 245 635.87 116.02 660 645.53 94.89 300 800 500 -0.72 -0.12 7.41
## ---------------------------------------------------------------------------
## gender: 2
## vars n mean sd median trimmed mad min max range skew kurtosis se
## gender 1 453 2.00 0.00 2 2.00 0.00 2 2 0 NaN NaN 0.00
## education 2 453 3.26 1.35 3 3.40 1.48 0 5 5 -0.74 0.27 0.06
## age 3 453 25.45 9.37 22 23.70 5.93 13 65 52 1.77 3.03 0.44
## ACT 4 453 28.42 4.69 29 28.63 4.45 15 36 21 -0.39 -0.42 0.22
## SATV 5 453 610.66 112.31 620 617.91 103.78 200 800 600 -0.65 0.42 5.28
## SATQ 6 442 596.00 113.07 600 602.21 133.43 200 800 600 -0.58 0.13 5.38
Some functions allow you to `by’ within the function.
describeBy(sat.act,"gender")
##
## Descriptive statistics by group
## gender: 1
## vars n mean sd median trimmed mad min max range skew kurtosis se
## gender 1 247 1.00 0.00 1 1.00 0.00 1 1 0 NaN NaN 0.00
## education 2 247 3.00 1.54 3 3.12 1.48 0 5 5 -0.54 -0.60 0.10
## age 3 247 25.86 9.74 22 24.23 5.93 14 58 44 1.43 1.43 0.62
## ACT 4 247 28.79 5.06 30 29.23 4.45 3 36 33 -1.06 1.89 0.32
## SATV 5 247 615.11 114.16 630 622.07 118.61 200 800 600 -0.63 0.13 7.26
## SATQ 6 245 635.87 116.02 660 645.53 94.89 300 800 500 -0.72 -0.12 7.41
## ---------------------------------------------------------------------------
## gender: 2
## vars n mean sd median trimmed mad min max range skew kurtosis se
## gender 1 453 2.00 0.00 2 2.00 0.00 2 2 0 NaN NaN 0.00
## education 2 453 3.26 1.35 3 3.40 1.48 0 5 5 -0.74 0.27 0.06
## age 3 453 25.45 9.37 22 23.70 5.93 13 65 52 1.77 3.03 0.44
## ACT 4 453 28.42 4.69 29 28.63 4.45 15 36 21 -0.39 -0.42 0.22
## SATV 5 453 610.66 112.31 620 617.91 103.78 200 800 600 -0.65 0.42 5.28
## SATQ 6 442 596.00 113.07 600 602.21 133.43 200 800 600 -0.58 0.13 5.38
Better yet, some functions (a slowing growing number in
psych
) allow for formula
input. Sometimes for
just some of the elements of a data frame (in which case you need to
specify what the data are).
This is of the form of
describe(ACT + SATV + SATQ ~ gender, data=sat.act)
##
## Descriptive statistics by group
## gender: 1
## vars n mean sd median trimmed mad min max range skew kurtosis se
## ACT 1 247 28.79 5.06 30 29.23 4.45 3 36 33 -1.06 1.89 0.32
## SATV 2 247 615.11 114.16 630 622.07 118.61 200 800 600 -0.63 0.13 7.26
## SATQ 3 245 635.87 116.02 660 645.53 94.89 300 800 500 -0.72 -0.12 7.41
## ---------------------------------------------------------------------------
## gender: 2
## vars n mean sd median trimmed mad min max range skew kurtosis se
## ACT 1 453 28.42 4.69 29 28.63 4.45 15 36 21 -0.39 -0.42 0.22
## SATV 2 453 610.66 112.31 620 617.91 103.78 200 800 600 -0.65 0.42 5.28
## SATQ 3 442 596.00 113.07 600 602.21 133.43 200 800 600 -0.58 0.13 5.38
describe(sat.act ~ gender)
##
## Descriptive statistics by group
## group: 1
## vars n mean sd median trimmed mad min max range skew kurtosis se
## gender 1 247 1.00 0.00 1 1.00 0.00 1 1 0 NaN NaN 0.00
## education 2 247 3.00 1.54 3 3.12 1.48 0 5 5 -0.54 -0.60 0.10
## age 3 247 25.86 9.74 22 24.23 5.93 14 58 44 1.43 1.43 0.62
## ACT 4 247 28.79 5.06 30 29.23 4.45 3 36 33 -1.06 1.89 0.32
## SATV 5 247 615.11 114.16 630 622.07 118.61 200 800 600 -0.63 0.13 7.26
## SATQ 6 245 635.87 116.02 660 645.53 94.89 300 800 500 -0.72 -0.12 7.41
## ---------------------------------------------------------------------------
## group: 2
## vars n mean sd median trimmed mad min max range skew kurtosis se
## gender 1 453 2.00 0.00 2 2.00 0.00 2 2 0 NaN NaN 0.00
## education 2 453 3.26 1.35 3 3.40 1.48 0 5 5 -0.74 0.27 0.06
## age 3 453 25.45 9.37 22 23.70 5.93 13 65 52 1.77 3.03 0.44
## ACT 4 453 28.42 4.69 29 28.63 4.45 15 36 21 -0.39 -0.42 0.22
## SATV 5 453 610.66 112.31 620 617.91 103.78 200 800 600 -0.65 0.42 5.28
## SATQ 6 442 596.00 113.07 600 602.21 133.43 200 800 600 -0.58 0.13 5.38
A typical study involves aggregating individual items into “scales”.
To this, we will use a built in data set (the ability
data)
and find the average number of items each person got correct. We will
use the scoreItems
function to do this.
It is useful to read the help pages for ability
to get a
feel of what the data are, and then the help page for
scoreItems
to see what we are going to do.
ability
help page:16 multiple choice ability items 1525 subjects taken from the Synthetic Aperture Personality Assessment (SAPA) web based personality assessment project are saved as iqitems. Those data are shown as examples of how to score multiple choice tests and analyses of response alternatives. When scored correct or incorrect, the data are useful for demonstrations of tetrachoric based factor analysis irt.fa and finding tetrachoric correlations.
scoreItems
help page:Given a data.frame or matrix of n items and N observations and a list of the direction to score them (a keys.list with k keys) find the sum scores or average scores for each person and each scale. In addition, report Cronbach’s alpha, Guttman’s Lambda 6, the average r, the scale intercorrelations, and the item by scale correlations (raw and corrected for item overlap). Replace missing values with the item median or mean if desired. Items may be keyed 1 (score it), -1 ) (reverse score it), or 0 (do not score it). Negatively keyed items will be reverse scored. Although prior versions used a keys matrix, it is now recommended to just use a list of scoring keys. See make.keys for a convenient way to make the keys file. If the input is a square matrix, then it is assumed that the input is a covariance or correlation matix and scores are not found, but the item statistics are reported. (Similar functionality to cluster.cor). response.frequencies reports the frequency of item endorsements fore each response category for polytomous or multiple choice items.
Since we are using a built in data set, we just call it by name
(ability
), but if you want to do this for your own data,
you need to first read the data in.
library(psych) #you already did this, didn't you
my.data <- ability
describe(my.data) #or, if you prefer
## vars n mean sd median trimmed mad min max range skew kurtosis se
## reason.4 1 1442 0.68 0.47 1 0.72 0 0 1 1 -0.75 -1.44 0.01
## reason.16 2 1463 0.73 0.45 1 0.78 0 0 1 1 -1.02 -0.96 0.01
## reason.17 3 1440 0.74 0.44 1 0.80 0 0 1 1 -1.08 -0.84 0.01
## reason.19 4 1456 0.64 0.48 1 0.68 0 0 1 1 -0.60 -1.64 0.01
## letter.7 5 1441 0.63 0.48 1 0.67 0 0 1 1 -0.56 -1.69 0.01
## letter.33 6 1438 0.61 0.49 1 0.63 0 0 1 1 -0.43 -1.82 0.01
## letter.34 7 1455 0.64 0.48 1 0.68 0 0 1 1 -0.59 -1.65 0.01
## letter.58 8 1438 0.47 0.50 0 0.46 0 0 1 1 0.12 -1.99 0.01
## matrix.45 9 1458 0.55 0.50 1 0.56 0 0 1 1 -0.20 -1.96 0.01
## matrix.46 10 1470 0.57 0.50 1 0.59 0 0 1 1 -0.28 -1.92 0.01
## matrix.47 11 1465 0.64 0.48 1 0.67 0 0 1 1 -0.57 -1.67 0.01
## matrix.55 12 1459 0.39 0.49 0 0.36 0 0 1 1 0.45 -1.80 0.01
## rotate.3 13 1456 0.20 0.40 0 0.13 0 0 1 1 1.48 0.19 0.01
## rotate.4 14 1460 0.22 0.42 0 0.15 0 0 1 1 1.34 -0.21 0.01
## rotate.6 15 1456 0.31 0.46 0 0.27 0 0 1 1 0.80 -1.35 0.01
## rotate.8 16 1460 0.19 0.39 0 0.12 0 0 1 1 1.55 0.41 0.01
#describe(ability)
This is discussed in the next assignment
Although seriously flawed, \(\alpha\) is perhaps the most commonly
reported estimate of reliability. Although several other functions in
psych
are recommended, the alpha
function is
included for those new to R
.
If all the items are keyed in the same direction, just call
alpha
with your data. In addition to various item
statistics, the scores
object will have the average
(preferred) or sum scores (optional) for your data.
Here we use alpha
on the ability data set.
ab.al <- alpha(ability)
ab.al #see the results
##
## Reliability analysis
## Call: alpha(x = ability)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.83 0.83 0.84 0.23 4.9 0.0064 0.51 0.25 0.21
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.82 0.83 0.84
## Duhachek 0.82 0.83 0.84
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## reason.4 0.82 0.82 0.82 0.23 4.5 0.0069 0.0060 0.21
## reason.16 0.82 0.82 0.83 0.24 4.7 0.0067 0.0061 0.22
## reason.17 0.82 0.82 0.82 0.23 4.5 0.0069 0.0058 0.21
## reason.19 0.82 0.82 0.83 0.24 4.6 0.0067 0.0061 0.21
## letter.7 0.82 0.82 0.82 0.23 4.5 0.0068 0.0057 0.21
## letter.33 0.82 0.82 0.83 0.24 4.6 0.0068 0.0060 0.21
## letter.34 0.82 0.82 0.82 0.23 4.5 0.0069 0.0056 0.21
## letter.58 0.82 0.82 0.82 0.23 4.5 0.0069 0.0062 0.21
## matrix.45 0.82 0.83 0.83 0.24 4.7 0.0066 0.0060 0.23
## matrix.46 0.82 0.82 0.83 0.24 4.7 0.0067 0.0060 0.22
## matrix.47 0.82 0.82 0.83 0.24 4.6 0.0067 0.0063 0.21
## matrix.55 0.83 0.83 0.83 0.24 4.8 0.0065 0.0058 0.23
## rotate.3 0.82 0.82 0.82 0.23 4.6 0.0067 0.0045 0.23
## rotate.4 0.82 0.82 0.82 0.23 4.5 0.0068 0.0046 0.22
## rotate.6 0.82 0.82 0.82 0.23 4.5 0.0068 0.0051 0.22
## rotate.8 0.82 0.82 0.83 0.24 4.6 0.0067 0.0048 0.23
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## reason.4 1442 0.58 0.58 0.54 0.50 0.68 0.47
## reason.16 1463 0.50 0.50 0.44 0.41 0.73 0.45
## reason.17 1440 0.57 0.57 0.54 0.49 0.74 0.44
## reason.19 1456 0.53 0.52 0.47 0.43 0.64 0.48
## letter.7 1441 0.57 0.56 0.52 0.48 0.63 0.48
## letter.33 1438 0.54 0.53 0.48 0.44 0.61 0.49
## letter.34 1455 0.58 0.57 0.54 0.49 0.64 0.48
## letter.58 1438 0.58 0.57 0.52 0.48 0.47 0.50
## matrix.45 1458 0.49 0.48 0.42 0.38 0.55 0.50
## matrix.46 1470 0.50 0.49 0.43 0.40 0.57 0.50
## matrix.47 1465 0.53 0.52 0.47 0.43 0.64 0.48
## matrix.55 1459 0.43 0.42 0.35 0.32 0.39 0.49
## rotate.3 1456 0.52 0.54 0.51 0.44 0.20 0.40
## rotate.4 1460 0.56 0.58 0.56 0.48 0.22 0.42
## rotate.6 1456 0.55 0.56 0.53 0.46 0.31 0.46
## rotate.8 1460 0.49 0.51 0.47 0.41 0.19 0.39
##
## Non missing response frequency for each item
## 0 1 miss
## reason.4 0.32 0.68 0.05
## reason.16 0.27 0.73 0.04
## reason.17 0.26 0.74 0.06
## reason.19 0.36 0.64 0.05
## letter.7 0.37 0.63 0.06
## letter.33 0.39 0.61 0.06
## letter.34 0.36 0.64 0.05
## letter.58 0.53 0.47 0.06
## matrix.45 0.45 0.55 0.04
## matrix.46 0.43 0.57 0.04
## matrix.47 0.36 0.64 0.04
## matrix.55 0.61 0.39 0.04
## rotate.3 0.80 0.20 0.05
## rotate.4 0.78 0.22 0.04
## rotate.6 0.69 0.31 0.05
## rotate.8 0.81 0.19 0.04
names(ab.al) #what are the various objects included in the output
## [1] "total" "alpha.drop" "item.stats" "response.freq" "keys" "scores"
## [7] "nvar" "boot.ci" "boot" "feldt" "Unidim" "var.r"
## [13] "Fit" "call" "title"
describe(ab.al$scores)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1509 0.51 0.25 0.5 0.51 0.28 0 1 1 0.02 -0.83 0.01
The ability items were all keyed in the same direction (correct
vs. incorrect). Some personality items need to be reversed scored (“do
you like to go to lively parties” is the opposite of “do you tend to
keep quiet at parties”). alpha
by default will not reverse
key items, but you can ask it do so automatically. Or, you can specify
which items to reverse key.
Consider the first 5 items of the bfi
which are five
items measuring “Agreeableness”. If we do not reverse score them, we
find a very poor alpha (and a warning that we might want to do so.) If
we turn on the check.keys
option, alpha
will
automatically reverse items that correlate negativel with the total
score.
alpha(bfi[1:5]) #this will produce a terrible result
## Warning in alpha(bfi[1:5]): Some items were negatively correlated with the first principal component and probably
## should be reversed.
## To do this, run the function again with the 'check.keys=TRUE' option
## Some items ( A1 ) were negatively correlated with the first principal component and
## probably should be reversed.
## To do this, run the function again with the 'check.keys=TRUE' option
##
## Reliability analysis
## Call: alpha(x = bfi[1:5])
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.43 0.46 0.53 0.14 0.85 0.016 4.2 0.74 0.32
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.4 0.43 0.46
## Duhachek 0.4 0.43 0.46
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## A1 0.72 0.72 0.67 0.397 2.63 0.0087 0.0065 0.375
## A2 0.28 0.30 0.39 0.096 0.43 0.0219 0.1095 0.081
## A3 0.18 0.21 0.31 0.061 0.26 0.0249 0.1014 0.081
## A4 0.25 0.30 0.44 0.099 0.44 0.0229 0.1604 0.104
## A5 0.21 0.24 0.36 0.071 0.31 0.0238 0.1309 0.094
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## A1 2784 0.066 0.024 -0.39 -0.31 2.4 1.4
## A2 2773 0.630 0.665 0.58 0.37 4.8 1.2
## A3 2774 0.724 0.742 0.71 0.48 4.6 1.3
## A4 2781 0.686 0.661 0.50 0.37 4.7 1.5
## A5 2784 0.700 0.719 0.64 0.45 4.6 1.3
##
## Non missing response frequency for each item
## 1 2 3 4 5 6 miss
## A1 0.33 0.29 0.14 0.12 0.08 0.03 0.01
## A2 0.02 0.05 0.05 0.20 0.37 0.31 0.01
## A3 0.03 0.06 0.07 0.20 0.36 0.27 0.01
## A4 0.05 0.08 0.07 0.16 0.24 0.41 0.01
## A5 0.02 0.07 0.09 0.22 0.35 0.25 0.01
alpha(bfi[1:5], check.keys = TRUE) # a much better result
## Warning in alpha(bfi[1:5], check.keys = TRUE): Some items were negatively correlated with the first principal component and were automatically reversed.
## This is indicated by a negative sign for the variable name.
##
## Reliability analysis
## Call: alpha(x = bfi[1:5], check.keys = TRUE)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.7 0.46 0.53 0.14 0.85 0.009 4.7 0.9 0.32
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.69 0.7 0.72
## Duhachek 0.69 0.7 0.72
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## A1- 0.72 0.72 0.67 0.397 2.63 0.0087 0.0065 0.375
## A2 0.62 0.30 0.39 0.096 0.43 0.0119 0.1095 0.081
## A3 0.60 0.21 0.31 0.061 0.26 0.0124 0.1014 0.081
## A4 0.69 0.30 0.44 0.099 0.44 0.0098 0.1604 0.104
## A5 0.64 0.24 0.36 0.071 0.31 0.0111 0.1309 0.094
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## A1- 2784 0.58 0.024 -0.39 0.31 4.6 1.4
## A2 2773 0.73 0.665 0.58 0.56 4.8 1.2
## A3 2774 0.76 0.742 0.71 0.59 4.6 1.3
## A4 2781 0.65 0.661 0.50 0.39 4.7 1.5
## A5 2784 0.69 0.719 0.64 0.49 4.6 1.3
##
## Non missing response frequency for each item
## 1 2 3 4 5 6 miss
## A1 0.33 0.29 0.14 0.12 0.08 0.03 0.01
## A2 0.02 0.05 0.05 0.20 0.37 0.31 0.01
## A3 0.03 0.06 0.07 0.20 0.36 0.27 0.01
## A4 0.05 0.08 0.07 0.16 0.24 0.41 0.01
## A5 0.02 0.07 0.09 0.22 0.35 0.25 0.01
See next week.
There are many functions in R that allow you to useful things. Here I will show several of them.
The first is the paste function which allows you to create strings with certain characteristic. A variation of the paste function is paste0 which automatically does not put spaces between the objects.
my.names <- paste("Variable",1:5) #separation defaults to a blank space
my.names1 <- paste("Variable",1:5, sep="") #Speficy the sep to be nothing
my.short.names <- paste0("Var", 1:8) #just do it directly
my.names
## [1] "Variable 1" "Variable 2" "Variable 3" "Variable 4" "Variable 5"
my.names1
## [1] "Variable1" "Variable2" "Variable3" "Variable4" "Variable5"
my.short.names
## [1] "Var1" "Var2" "Var3" "Var4" "Var5" "Var6" "Var7" "Var8"
We looked at the bfi.dictionary which is a helpful way to see what the variable s are when you do the analyses. To create such a dictionary is relatively straight forward, but requires using an external editor (such as EXCEL or BBEDIT). You want to specify at least 2 columns in your text file: the item number and the item content. If you want, you can specify more columns to have more identfication information.
Consider a text file that looks like this (copied from Excel with tabs between columns)
quote name ItemLabel Item Giant3 Big6
A1 q_146, Am indifferent to the feelings of others. Cohesion, Agreeableness
A2 q_1162, Inquire about others well-being. Cohesion Agreeableness
A3 q_1206 Know how to comfort others. Cohesion Agreeableness
A4 q_1364 Love children. Cohesion Agreeableness
A5 q_1419 Make people feel at ease. Cohesion Agreeableness
C1 q_124 Am exacting in my work. Stability Conscientiousness
C2 q_530 Continue until everything is perfect. Stability Conscientiousness
C3 q_619 Do things according to a plan. Stability Conscientiousness
C4 q_626 Do things in a half-way manner. Stability Conscientiousness
C5 q_1949 Waste my time. Stability Conscientiousness
Note, that we can not have any fancy formatting such as quotes or apostrophes (” or ’)
Copy these items to the clipboard using the read.clipboard.tab command and then show it
To make this example work, I created file on the class server
#temp <- read.clipboard.tab() # not done because we can not do it dynamically
file.name <- "http://personality-project.org/courses/314/temp.dictionary.txt"
temp <- read.delim(file.name) #this reads a tab delimted file
Now, we need to make the rownames of this object be equal to the name
my.dictionary <- temp
rownames(my.dictionary) <- temp[,1] #the first column of temp
my.dictionary <- my.dictionary[-1] #get rid of the redundant column
my.dictionary #show it
## ItemLabel Item Giant3 Big6
## A1 q_146 Am indifferent to the feelings of others. Cohesion Agreeableness
## A2 q_1162 Inquire about others well-being. Cohesion Agreeableness
## A3 q_1206 Know how to comfort others. Cohesion Agreeableness
## A4 q_1364 Love children. Cohesion Agreeableness
## A5 q_1419 Make people feel at ease. Cohesion Agreeableness
## C1 q_124 Am exacting in my work. Stability Conscientiousness
## C2 q_530 Continue until everything is perfect. Stability Conscientiousness
## C3 q_619 Do things according to a plan. Stability Conscientiousness
## C4 q_626 Do things in a half-way manner. Stability Conscientiousness
## C5 q_1949 Waste my time. Stability Conscientiousness
We can then use my.dictionary in other functions.
Most of the examples we have been using come from psych package and are either correlation matrices or data.frames. But some of the data sets in core-R are time series or three (or more) dimensional arrays. The standard describe function will not work on these types of data.
To explore the structure of an object, the str , dim, and names commands are very helpful. str will tell you the class and type of an object, dim will tell you how many dimensions it has (if it is at least 2), and names will tell you the objects nested within the object of interest.
To find the list of all of the example data sets in a package
data(package="psych")
Read in some data (or use one of the standard data sets)
describe it
Try using the apply
function to find column or row
means
Compare this to the result using colMeans
Compare this to the result you found in 2) (describe)
Try finding the mean scores for each subject on a simple data set (e.g, the attitude data set or one of your own.)