#Estimating internal consistency reliability (Adapted from Revelle, W., & Condon, D. M. (2019). Reliability from α to ω: A tutorial. Psychological Assessment, 31(12), 1395–1411. https://doi.org/10.1037/pas0000754
For a open access preprint, see https://www.personality-project.org/revelle/publications/rc.pa.19.pdf
Reliability is a fundamental problem for measurement in all of science for ``(a)ll measurement is befuddled by error” . Perhaps because psychological measures are more befuddled than those of the other natural sciences, psychologists have long studied the problem of reliability and this fascination continues to this day . Unfortunately, although recent advances in reliability have gone far beyond the earlier contributions, much of this literature is more technical than readable and is aimed for the specialist rather than the practitioner. This is unfortunate, for all scientists need to appreciate the problems and importance of reliability.
Issues of reliability are fundamental to understanding how correlations between observed variables are (attenuated) underestimates of the relationships between the underlying constructs, how observed estimates of a persons score are over estimates of their latent score, and how to estimate the confidence intervals around any particular measurement. Reliability theory is not just for the psychometrician estimating latent variables, it is for the baseball manager trying to predict how well a high performing player will perform next year, how one rater agrees with another in a diagnosis of a patient, or why stock market advisors under perform the market. Understanding the many ways to estimate reliability as well as the ways to use these estimates allows one to better assess individuals and to evaluate selection and prediction techniques.
The fundamental question in reliability is to what extent do scores measured at one time and one place with one instrument predict scores at another time and another place and perhaps a different instrument? That is, given a person’s score on test 1 at time 1, what do expect that person to do on another test at time 2? The naive belief that they will do just as well on the second measure as they did on the first leads to such mistakes as the belief that punishment improves and rewards diminish subsequent performance or the belief in the “Sports Illustrated jinx”. More formally, the expectation on a second measures is just the regression of observations at time 2 on the observations at time 1. If both the time 1 and time 2 measures are equally “befuddled by error”
The basic concept of reliability seems to be very simple: observed scores reflect an unknown mixture of signal and noise. To detect the signal, we need to reduce the noise. Reliability thus defined is just the ratio of signal to noise. The signal might be something as esoteric as a gravity wave produced by a collision of two black holes, or as prosaic as the batting average of a baseball player. The noise in gravity wave detectors include the seismic effects of cows wandering in fields near the detector as well as passing ice cream trucks. The noise in batting averages include the effect of opposing pitchers, variations in wind direction, and the effects of jet lag and sleep deprivation. We can enhance the signal/noise ratio by either increasing the signal or reducing the noise. Unfortuately, this classic statement of reliabilty ignores the need for unidimensionality of our measures and equates expected scores with construct scores, a relationship that needs to be tested rather than assumed .
All tests are assumed to measures something stable or Trait like (T) something that varies over time and reflects State (S), some specific variance that is stable but does not measure our trait of interest (s), and some residual, random error (E). Reliability is then said to be the ratio of trait variance (\(V_T\)) to total test variance (\(V_x\)). That is,
\[r_{xx} = \frac{V_T}{V_x} = \frac{V_T}{V_T + V_S + V_s + V_e} \]
The problem becomes how to find \(V_T\).
What is the correlation of test the same test some later? If time is long enough that the State effect is minimal, and if we have no specific variance, then the correlation of a test with the same test later is what we want:
\[r_{xx} = \frac{V_T}{V_x}=\frac{V_T}{V_T + V_S + V_s + V_e} \]
If we do not want to wait for a long time, we can estimate reliabilty by creating another test (an alternate form) that is very similar to our first test but uses different items. By doing this, we can eliminate the effect of the specific item variance and
\[r_{xx} = \frac{V_T}{V_x} = \frac{V_T+ V_S }{V_T + V_S + V_s + V_e} \]
This will not remove any state effect on our test, but it does eliminate specific and residual variance.
A perhaps crude demonstration of this is to consider the correlation
between the odd and even items of the ability
data set.
This is a set of scores on 16 ability items given as part of the SAPA
project to 1525 participants. The items are four types: verbal
reasoning, numeric reasoning, matrix reasoning, and spatial rotations.
These items were developed by Condon and
Revelle (2014) and discussed further in Revelle,
Dworak and Condon (2021) and Dworak,
Revelle, Doebler and Dondon (2020).
As with all data sets, we should first find the dimensions (using
dim
) and then describe
it.
We use the rowSums
function to get our scores. Because
we have missing data, we need to set na.rm=TRUE
. Note, that
we should use rowMeans instead of rowSums to not be
confused with the missingness of the items.
This leads to subtle differences.
Consider the following response patterns where 0 means wrong, 1 means right, and NA means did not answer:
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
They both have a score of 2 correct out of 16, but finding the means yields .125 versus 1.0
library(psych)
library(psychTools)
dim(ability)
## [1] 1525 16
describe(ability)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## reason.4 1 1442 0.68 0.47 1 0.72 0 0 1 1 -0.75 -1.44 0.01
## reason.16 2 1463 0.73 0.45 1 0.78 0 0 1 1 -1.02 -0.96 0.01
## reason.17 3 1440 0.74 0.44 1 0.80 0 0 1 1 -1.08 -0.84 0.01
## reason.19 4 1456 0.64 0.48 1 0.68 0 0 1 1 -0.60 -1.64 0.01
## letter.7 5 1441 0.63 0.48 1 0.67 0 0 1 1 -0.56 -1.69 0.01
## letter.33 6 1438 0.61 0.49 1 0.63 0 0 1 1 -0.43 -1.82 0.01
## letter.34 7 1455 0.64 0.48 1 0.68 0 0 1 1 -0.59 -1.65 0.01
## letter.58 8 1438 0.47 0.50 0 0.46 0 0 1 1 0.12 -1.99 0.01
## matrix.45 9 1458 0.55 0.50 1 0.56 0 0 1 1 -0.20 -1.96 0.01
## matrix.46 10 1470 0.57 0.50 1 0.59 0 0 1 1 -0.28 -1.92 0.01
## matrix.47 11 1465 0.64 0.48 1 0.67 0 0 1 1 -0.57 -1.67 0.01
## matrix.55 12 1459 0.39 0.49 0 0.36 0 0 1 1 0.45 -1.80 0.01
## rotate.3 13 1456 0.20 0.40 0 0.13 0 0 1 1 1.48 0.19 0.01
## rotate.4 14 1460 0.22 0.42 0 0.15 0 0 1 1 1.34 -0.21 0.01
## rotate.6 15 1456 0.31 0.46 0 0.27 0 0 1 1 0.80 -1.35 0.01
## rotate.8 16 1460 0.19 0.39 0 0.12 0 0 1 1 1.55 0.41 0.01
score1 <- rowSums(ability[,c(1,3,5,7,9,11,13,15)],na.rm=TRUE) #the sum scores for the odds
score2 <- rowSums(ability[,seq(from=2, to = 16, by=2)],na.rm=TRUE) # the sum scores for the evens
scorem1 <- rowMeans(ability[,c(1,3,5,7,9,11,13,15)],na.rm=TRUE) # the means scores for the odds
scorem2 <- rowMeans(ability[,seq(from=2, to = 16, by=2)],na.rm=TRUE) # the mean scores for the evens
scores.df <- data.frame(score1,score2,scorem1, scorem2) #form the data.frame
pairs.panels(scores.df) #show it graphically
lowerCor(scores.df) #or show them in a neat array, rounded to 2 decimals
## scor1 scor2 scrm1 scrm2
## score1 1.00
## score2 0.76 1.00
## scorem1 0.95 0.70 1.00
## scorem2 0.71 0.96 0.73 1.00
#or, just use the traditional cor function
cor(scores.df) #does not work because of missing values
## score1 score2 scorem1 scorem2
## score1 1.0000000 0.7576345 NA NA
## score2 0.7576345 1.0000000 NA NA
## scorem1 NA NA 1 NA
## scorem2 NA NA NA 1
rxx = cor(scores.df,use="pairwise") #this does
rodd.even <- cor(score1,score2)
cor2(scorem1,scorem2) #this finds correlations using pairwise deletion and rounds
## [1] 0.73
This would be the correlation we we would expect if we had 8 item tests. That is to say,if we had used just 8 items, this is what we would expect them to correlate with another 8 item test.
But since we created 16 items, what would the correlation be between two 16 item tests? With a bit of algebra, we can predict it using a formula developed by Spearman and Brown
\[rxx = \frac{2r}{1+r}\]
or, in our case
r_split = 2*rodd.even/(1 + rodd.even)
round(r_split,2)
## [1] 0.86
But out split was the odd and even items. What happens if we consider
all possible splits and then plot the histogram of those splits? For 16
items or less, we can do this quite rapidly. For larger tests, the
number of all possible splits increases drastically, and we can sample
10,000 random splits. We use the help menu of splitHalf
to
help us draw the output.
#to show the histogram of all possible splits for the ability test
sp <- splitHalf(ability,raw=TRUE) #this saves the results
hist(sp$raw,breaks=101,xlab="Split Half Reliability",ylab="Frequency of split half reliability",main="SplitHalf
reliabilities of a test with 16 ability items")
sp #and show the results
## Split half reliabilities
## Call: splitHalf(r = ability, raw = TRUE)
##
## Maximum split half reliability (lambda 4) = 0.87
## Guttman lambda 6 = 0.84
## Average split half reliability = 0.83
## Guttman lambda 3 (alpha) = 0.83
## Guttman lambda 2 = 0.83
## Minimum split half reliability (beta) = 0.73
## Average interitem r = 0.23 with median = 0.21
## 2.5% 50% 97.5%
## Quantiles of split half reliability = 0.77 0.83 0.86
names(sp) #what else is here?
## [1] "maxrb" "minrb" "maxAB" "minAB" "meanr" "av.r" "med.r" "alpha" "lambda2"
## [10] "lambda6" "raw" "ci" "covar" "Call"
sp$minAB #show the items for the worst split
## $A
## [1] "matrix.45" "matrix.46" "matrix.47" "matrix.55" "rotate.3" "rotate.4" "rotate.6" "rotate.8"
##
## $B
## [1] "reason.4" "reason.16" "reason.17" "reason.19" "letter.7" "letter.33" "letter.34" "letter.58"
sp$maxAB #show the items for the best split
## $A
## [1] "reason.4" "reason.16" "letter.33" "letter.58" "matrix.45" "matrix.47" "rotate.4" "rotate.6"
##
## $B
## [1] "reason.17" "reason.19" "letter.7" "letter.34" "matrix.46" "matrix.55" "rotate.3" "rotate.8"
#Why do the splits differ in their correlation?
The items came in four groups of item types. If we group the first two item types and compare them to the last two item types, we have the most dissimilar pairs of tests. Alternatively, if we group two from each item type and compare that to the other two from each item type, we have the most similar pair of tests.
firstm <- rowMeans(ability[,1:8],na.rm=TRUE)
secondm <- rowMeans(ability[,9:16],na.rm=TRUE)
new.scores.df <- data.frame(scores.df,firstm,secondm)
pairs.panels(new.scores.df)
#or, if we don't want to graph it
lowerCor(new.scores.df)
## scor1 scor2 scrm1 scrm2 frstm scndm
## score1 1.00
## score2 0.76 1.00
## scorem1 0.95 0.70 1.00
## scorem2 0.71 0.96 0.73 1.00
## firstm 0.80 0.78 0.84 0.81 1.00
## secondm 0.77 0.78 0.80 0.82 0.56 1.00
A more conventional procedure, and one that does not show us the
range of possible splits, is to use the alpha
function to
find a coefficient that has come to be known as coefficient \(\alpha\) (Cronbach, 1951). The
alpha
function will also score our items for us to form a
linear composite of the items.
al <- alpha(ability)
summary(al) #see the values of the alpha output (note this summarizes just some of objects)
##
## Reliability analysis
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.83 0.83 0.84 0.23 4.9 0.0064 0.51 0.25 0.21
Although the summary(al)
shows the alpha statistic, we
can get more information if we print the object:
al #this shows more of the output
##
## Reliability analysis
## Call: alpha(x = ability)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.83 0.83 0.84 0.23 4.9 0.0064 0.51 0.25 0.21
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.82 0.83 0.84
## Duhachek 0.82 0.83 0.84
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## reason.4 0.82 0.82 0.82 0.23 4.5 0.0069 0.0060 0.21
## reason.16 0.82 0.82 0.83 0.24 4.7 0.0067 0.0061 0.22
## reason.17 0.82 0.82 0.82 0.23 4.5 0.0069 0.0058 0.21
## reason.19 0.82 0.82 0.83 0.24 4.6 0.0067 0.0061 0.21
## letter.7 0.82 0.82 0.82 0.23 4.5 0.0068 0.0057 0.21
## letter.33 0.82 0.82 0.83 0.24 4.6 0.0068 0.0060 0.21
## letter.34 0.82 0.82 0.82 0.23 4.5 0.0069 0.0056 0.21
## letter.58 0.82 0.82 0.82 0.23 4.5 0.0069 0.0062 0.21
## matrix.45 0.82 0.83 0.83 0.24 4.7 0.0066 0.0060 0.23
## matrix.46 0.82 0.82 0.83 0.24 4.7 0.0067 0.0060 0.22
## matrix.47 0.82 0.82 0.83 0.24 4.6 0.0067 0.0063 0.21
## matrix.55 0.83 0.83 0.83 0.24 4.8 0.0065 0.0058 0.23
## rotate.3 0.82 0.82 0.82 0.23 4.6 0.0067 0.0045 0.23
## rotate.4 0.82 0.82 0.82 0.23 4.5 0.0068 0.0046 0.22
## rotate.6 0.82 0.82 0.82 0.23 4.5 0.0068 0.0051 0.22
## rotate.8 0.82 0.82 0.83 0.24 4.6 0.0067 0.0048 0.23
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## reason.4 1442 0.58 0.58 0.54 0.50 0.68 0.47
## reason.16 1463 0.50 0.50 0.44 0.41 0.73 0.45
## reason.17 1440 0.57 0.57 0.54 0.49 0.74 0.44
## reason.19 1456 0.53 0.52 0.47 0.43 0.64 0.48
## letter.7 1441 0.57 0.56 0.52 0.48 0.63 0.48
## letter.33 1438 0.54 0.53 0.48 0.44 0.61 0.49
## letter.34 1455 0.58 0.57 0.54 0.49 0.64 0.48
## letter.58 1438 0.58 0.57 0.52 0.48 0.47 0.50
## matrix.45 1458 0.49 0.48 0.42 0.38 0.55 0.50
## matrix.46 1470 0.50 0.49 0.43 0.40 0.57 0.50
## matrix.47 1465 0.53 0.52 0.47 0.43 0.64 0.48
## matrix.55 1459 0.43 0.42 0.35 0.32 0.39 0.49
## rotate.3 1456 0.52 0.54 0.51 0.44 0.20 0.40
## rotate.4 1460 0.56 0.58 0.56 0.48 0.22 0.42
## rotate.6 1456 0.55 0.56 0.53 0.46 0.31 0.46
## rotate.8 1460 0.49 0.51 0.47 0.41 0.19 0.39
##
## Non missing response frequency for each item
## 0 1 miss
## reason.4 0.32 0.68 0.05
## reason.16 0.27 0.73 0.04
## reason.17 0.26 0.74 0.06
## reason.19 0.36 0.64 0.05
## letter.7 0.37 0.63 0.06
## letter.33 0.39 0.61 0.06
## letter.34 0.36 0.64 0.05
## letter.58 0.53 0.47 0.06
## matrix.45 0.45 0.55 0.04
## matrix.46 0.43 0.57 0.04
## matrix.47 0.36 0.64 0.04
## matrix.55 0.61 0.39 0.04
## rotate.3 0.80 0.20 0.05
## rotate.4 0.78 0.22 0.04
## rotate.6 0.69 0.31 0.05
## rotate.8 0.81 0.19 0.04
And we can get even more information if we examine the names of what
is in the al
object, and then describe
the
scores
object
names(al) #what are the possible objects?
## [1] "total" "alpha.drop" "item.stats" "response.freq" "keys" "scores"
## [7] "nvar" "boot.ci" "boot" "feldt" "Unidim" "var.r"
## [13] "Fit" "call" "title"
describe(al$scores) #descriptive statistics of the results
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1509 0.51 0.25 0.5 0.51 0.28 0 1 1 0.02 -0.83 0.01
The ability items were all keyed in the same direction (correct
vs. incorrect). Some personality items need to be reversed scored (“do
you like to go to lively parties” is the opposite of “do you tend to
keep quiet at parties”). alpha
by default will not reverse
key items, but you can ask it do so automatically. Or, you can specify
which items to reverse key.
Consider the first 5 items of the bfi
which are five
items measuring “Agreeableness”. If we do not reverse score them, we
find a very poor alpha (and a warning that we might want to do so.) If
we turn on the check.keys
option, alpha
will
automatically reverse items that correlate negatively with the total
score.
The bfi
data set has 25 “Big 5” personality items given
as part of the SAPA project. To see the characteristics of this file, we
can dim
it, describe
it, and then look up the
names from an associated dictionary (we use the
lookupFromKeys
function from psychTools
. It
requires a dictionary
of the items).
#first show the correlations and the item contents
dim(bfi)
## [1] 2800 28
describe(bfi[1:5])
## vars n mean sd median trimmed mad min max range skew kurtosis se
## A1 1 2784 2.41 1.41 2 2.23 1.48 1 6 5 0.83 -0.31 0.03
## A2 2 2773 4.80 1.17 5 4.98 1.48 1 6 5 -1.12 1.05 0.02
## A3 3 2774 4.60 1.30 5 4.79 1.48 1 6 5 -1.00 0.44 0.02
## A4 4 2781 4.70 1.48 5 4.93 1.48 1 6 5 -1.03 0.04 0.03
## A5 5 2784 4.56 1.26 5 4.71 1.48 1 6 5 -0.85 0.16 0.02
lowerCor(bfi[1:5])
## A1 A2 A3 A4 A5
## A1 1.00
## A2 -0.34 1.00
## A3 -0.27 0.49 1.00
## A4 -0.15 0.34 0.36 1.00
## A5 -0.18 0.39 0.50 0.31 1.00
lookupFromKeys(bfi.keys[1],dictionary=bfi.dictionary[1:4],n=5)
## $agree
## ItemLabel Item Giant3 Big6
## A1- q_146 Am indifferent to the feelings of others. Cohesion Agreeableness
## A2 q_1162 Inquire about others' well-being. Cohesion Agreeableness
## A3 q_1206 Know how to comfort others. Cohesion Agreeableness
## A4 q_1364 Love children. Cohesion Agreeableness
## A5 q_1419 Make people feel at ease. Cohesion Agreeableness
To reverse an item is the same as subtracting it from the maximum response + minimum response (in this case, subtract from 7)
alpha(bfi[1:5]) #this will produce a terrible result
## Warning in alpha(bfi[1:5]): Some items were negatively correlated with the first principal component and probably
## should be reversed.
## To do this, run the function again with the 'check.keys=TRUE' option
## Some items ( A1 ) were negatively correlated with the first principal component and
## probably should be reversed.
## To do this, run the function again with the 'check.keys=TRUE' option
##
## Reliability analysis
## Call: alpha(x = bfi[1:5])
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.43 0.46 0.53 0.14 0.85 0.016 4.2 0.74 0.32
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.4 0.43 0.46
## Duhachek 0.4 0.43 0.46
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## A1 0.72 0.72 0.67 0.397 2.63 0.0087 0.0065 0.375
## A2 0.28 0.30 0.39 0.096 0.43 0.0219 0.1095 0.081
## A3 0.18 0.21 0.31 0.061 0.26 0.0249 0.1014 0.081
## A4 0.25 0.30 0.44 0.099 0.44 0.0229 0.1604 0.104
## A5 0.21 0.24 0.36 0.071 0.31 0.0238 0.1309 0.094
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## A1 2784 0.066 0.024 -0.39 -0.31 2.4 1.4
## A2 2773 0.630 0.665 0.58 0.37 4.8 1.2
## A3 2774 0.724 0.742 0.71 0.48 4.6 1.3
## A4 2781 0.686 0.661 0.50 0.37 4.7 1.5
## A5 2784 0.700 0.719 0.64 0.45 4.6 1.3
##
## Non missing response frequency for each item
## 1 2 3 4 5 6 miss
## A1 0.33 0.29 0.14 0.12 0.08 0.03 0.01
## A2 0.02 0.05 0.05 0.20 0.37 0.31 0.01
## A3 0.03 0.06 0.07 0.20 0.36 0.27 0.01
## A4 0.05 0.08 0.07 0.16 0.24 0.41 0.01
## A5 0.02 0.07 0.09 0.22 0.35 0.25 0.01
alpha(bfi[1:5], check.keys = TRUE) # a much better result
## Warning in alpha(bfi[1:5], check.keys = TRUE): Some items were negatively correlated with the first principal component and were automatically reversed.
## This is indicated by a negative sign for the variable name.
##
## Reliability analysis
## Call: alpha(x = bfi[1:5], check.keys = TRUE)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.7 0.46 0.53 0.14 0.85 0.009 4.7 0.9 0.32
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.69 0.7 0.72
## Duhachek 0.69 0.7 0.72
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## A1- 0.72 0.72 0.67 0.397 2.63 0.0087 0.0065 0.375
## A2 0.62 0.30 0.39 0.096 0.43 0.0119 0.1095 0.081
## A3 0.60 0.21 0.31 0.061 0.26 0.0124 0.1014 0.081
## A4 0.69 0.30 0.44 0.099 0.44 0.0098 0.1604 0.104
## A5 0.64 0.24 0.36 0.071 0.31 0.0111 0.1309 0.094
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## A1- 2784 0.58 0.024 -0.39 0.31 4.6 1.4
## A2 2773 0.73 0.665 0.58 0.56 4.8 1.2
## A3 2774 0.76 0.742 0.71 0.59 4.6 1.3
## A4 2781 0.65 0.661 0.50 0.39 4.7 1.5
## A5 2784 0.69 0.719 0.64 0.49 4.6 1.3
##
## Non missing response frequency for each item
## 1 2 3 4 5 6 miss
## A1 0.33 0.29 0.14 0.12 0.08 0.03 0.01
## A2 0.02 0.05 0.05 0.20 0.37 0.31 0.01
## A3 0.03 0.06 0.07 0.20 0.36 0.27 0.01
## A4 0.05 0.08 0.07 0.16 0.24 0.41 0.01
## A5 0.02 0.07 0.09 0.22 0.35 0.25 0.01
Much more typical is the investigator with multiple scales to score
from multiple items. With the use of a scoring key
you can
tell R
which items go in which direction on what scale.
This is detailed in the help pages for the scoreItems
function.
Given a data.frame or matrix of n items and N observations and a list of the direction to score them (a keys.list with k keys) find the sum scores or average scores for each person and each scale. In addition, report Cronbach’s alpha, Guttman’s Lambda 6, the average r, the scale intercorrelations, and the item by scale correlations (raw and corrected for item overlap). Replace missing values with the item median or mean if desired. Items may be keyed 1 (score it), -1 ) (reverse score it), or 0 (do not score it). Negatively keyed items will be reverse scored. Although prior versions used a keys matrix, it is now recommended to just use a list of scoring keys. See make.keys for a convenient way to make the keys file. If the input is a square matrix, then it is assumed that the input is a covariance or correlation matix and scores are not found, but the item statistics are reported. (Similar functionality to cluster.cor). response.frequencies reports the frequency of item endorsements fore each response category for polytomous or multiple choice items.
The parameters for the scoreItems function are laid out in the
example which uses the bfi
data set for an example.
`scoreItems(keys, items, totals = FALSE, ilabels = NULL,missing=TRUE, impute=“median”, delete=TRUE, min = NULL, max = NULL, digits = 2,n.obs=NULL,select=TRUE) ’
First we need to see what the items in this data set are, then we need to create a set of keys that tell us which way to score the items.
data(bfi)
names(bfi)
## [1] "A1" "A2" "A3" "A4" "A5" "C1" "C2" "C3"
## [9] "C4" "C5" "E1" "E2" "E3" "E4" "E5" "N1"
## [17] "N2" "N3" "N4" "N5" "O1" "O2" "O3" "O4"
## [25] "O5" "gender" "education" "age"
keys.list <- list(agree=c("-A1","A2","A3","A4","A5"),
conscientious=c("C1","C2","C3","-C4","-C5"),extraversion=c("-E1","-E2","E3","E4","E5"),
neuroticism=c("N1","N2","N3","N4","N5"), openness = c("O1","-O2","O3","O4","-O5"))
keys.list #show this as a list
## $agree
## [1] "-A1" "A2" "A3" "A4" "A5"
##
## $conscientious
## [1] "C1" "C2" "C3" "-C4" "-C5"
##
## $extraversion
## [1] "-E1" "-E2" "E3" "E4" "E5"
##
## $neuroticism
## [1] "N1" "N2" "N3" "N4" "N5"
##
## $openness
## [1] "O1" "-O2" "O3" "O4" "-O5"
lookupFromKeys(keys.list,dictionary=bfi.dictionary[1:5],5) #show the items and their content
## $agree
## ItemLabel Item Giant3 Big6 Little12
## A1- q_146 Am indifferent to the feelings of others. Cohesion Agreeableness Compassion
## A2 q_1162 Inquire about others' well-being. Cohesion Agreeableness Compassion
## A3 q_1206 Know how to comfort others. Cohesion Agreeableness Compassion
## A4 q_1364 Love children. Cohesion Agreeableness Compassion
## A5 q_1419 Make people feel at ease. Cohesion Agreeableness Compassion
##
## $conscientious
## ItemLabel Item Giant3 Big6 Little12
## C1 q_124 Am exacting in my work. Stability Conscientiousness Orderliness
## C2 q_530 Continue until everything is perfect. Stability Conscientiousness Orderliness
## C3 q_619 Do things according to a plan. Stability Conscientiousness Orderliness
## C4- q_626 Do things in a half-way manner. Stability Conscientiousness Industriousness
## C5- q_1949 Waste my time. Stability Conscientiousness Industriousness
##
## $extraversion
## ItemLabel Item Giant3 Big6 Little12
## E1- q_712 Don't talk a lot. Plasticity Extraversion Sociability
## E2- q_901 Find it difficult to approach others. Plasticity Extraversion Sociability
## E3 q_1205 Know how to captivate people. Plasticity Extraversion Assertiveness
## E4 q_1410 Make friends easily. Plasticity Extraversion Sociability
## E5 q_1768 Take charge. Plasticity Extraversion Assertiveness
##
## $neuroticism
## ItemLabel Item Giant3 Big6 Little12
## N1 q_952 Get angry easily. Stability Emotional Stability Balance
## N2 q_974 Get irritated easily. Stability Emotional Stability Balance
## N3 q_1099 Have frequent mood swings. Stability Emotional Stability Balance
## N4 q_1479 Often feel blue. Stability Emotional Stability Balance
## N5 q_1505 Panic easily. Stability Emotional Stability Balance
##
## $openness
## ItemLabel Item Giant3 Big6 Little12
## O1 q_128 Am full of ideas. Plasticity Openness Intellect
## O2- q_316 Avoid difficult reading material. Plasticity Openness Intellect
## O3 q_492 Carry the conversation to a higher level. Plasticity Openness Intellect
## O4 q_1738 Spend time reflecting on things. Plasticity Openness Openness
## O5- q_1964 Will not probe deeply into a subject. Plasticity Openness Openness
Using this keys.list we can then call scoreItems:
bfi.scores <- scoreItems(keys=keys.list,items=bfi,min=1,max=6)
bfi.scores #show the summary statistics
## Call: scoreItems(keys = keys.list, items = bfi, min = 1, max = 6)
##
## (Unstandardized) Alpha:
## agree conscientious extraversion neuroticism openness
## alpha 0.7 0.72 0.76 0.81 0.6
##
## Standard errors of unstandardized Alpha:
## agree conscientious extraversion neuroticism openness
## ASE 0.014 0.014 0.013 0.011 0.017
##
## Average item correlation:
## agree conscientious extraversion neuroticism openness
## average.r 0.32 0.34 0.39 0.46 0.23
##
## Median item correlation:
## agree conscientious extraversion neuroticism openness
## 0.34 0.34 0.38 0.41 0.22
##
## Guttman 6* reliability:
## agree conscientious extraversion neuroticism openness
## Lambda.6 0.7 0.72 0.76 0.81 0.6
##
## Signal/Noise based upon av.r :
## agree conscientious extraversion neuroticism openness
## Signal/Noise 2.3 2.6 3.2 4.3 1.5
##
## Scale intercorrelations corrected for attenuation
## raw correlations below the diagonal, alpha on the diagonal
## corrected correlations above the diagonal:
## agree conscientious extraversion neuroticism openness
## agree 0.70 0.36 0.63 -0.245 0.23
## conscientious 0.26 0.72 0.35 -0.305 0.30
## extraversion 0.46 0.26 0.76 -0.284 0.32
## neuroticism -0.18 -0.23 -0.22 0.812 -0.12
## openness 0.15 0.19 0.22 -0.086 0.60
##
## Average adjusted correlations within and between scales (MIMS)
## agree cnscn extrv nrtcs opnns
## agree 0.32
## conscientious 0.22 0.34
## extraversion 0.44 0.26 0.39
## neuroticism -0.20 -0.26 -0.28 0.46
## openness 0.11 0.15 0.18 -0.08 0.23
##
## Average adjusted item x scale correlations within and between scales (MIMT)
## agree cnscn extrv nrtcs opnns
## agree 0.68
## conscientious 0.18 0.69
## extraversion 0.33 0.19 0.71
## neuroticism -0.14 -0.18 -0.17 0.76
## openness 0.10 0.12 0.14 -0.05 0.62
##
## In order to see the item by scale loadings and frequency counts of the data
## print with the short option = FALSE
names(bfi.scores) #what other objects are in bfi.scores?
## [1] "scores" "missing" "alpha" "av.r" "sn"
## [6] "n.items" "item.cor" "cor" "corrected" "G6"
## [11] "item.corrected" "response.freq" "raw" "ase" "med.r"
## [16] "keys" "MIMS" "MIMT" "Call"
What does all this output mean? Most of it is obvious, except perhaps for the scale intercorrelations. Note how the matrix is not symmetric. We have ‘corrected’ the correlations above the diagonal for scale reliability. This is the correction for attenuation originally developed by Spearman in 1904 and is just the observed correlation divided by the square root of the product of the reliabilities.
To use these score, we can define a new object
bfi.scales
and the describe it and do a
pairs.panels
of the results. Note how in the
pairs.panels
function we specify the plotting character
(pch) to be .
.
bfi.scales <- bfi.scores$scores
describe(bfi.scales)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## agree 1 2800 4.65 0.89 4.8 4.73 0.89 1.0 6 5.0 -0.77 0.44 0.02
## conscientious 2 2800 4.27 0.95 4.4 4.31 0.89 1.0 6 5.0 -0.41 -0.16 0.02
## extraversion 3 2800 4.15 1.05 4.2 4.20 1.19 1.0 6 5.0 -0.48 -0.19 0.02
## neuroticism 4 2800 3.16 1.19 3.0 3.13 1.19 1.0 6 5.0 0.22 -0.65 0.02
## openness 5 2800 4.59 0.80 4.6 4.62 0.89 1.2 6 4.8 -0.34 -0.28 0.02
pairs.panels(bfi.scales,pch='.')
With this many data points, it is hard to see the density of plot, so
we do it again with the smoother
option.
pairs.panels(bfi.scales,smoother=TRUE)
Some of the more important statistics about reliability are actually summaries of the relationships between the variables. That is, we can do them from the correlations rather than the raw data. These estimates will be “standardized”, because they treat the data as if it were standardized. These procedures are particularly relevant for those of us who have a great deal of missing data.
First find the correlations of the items, and then do the
alpha
or `‘scoreItems’ on the data.
R.ab <- lowerCor(ability) #this shows the correlations
## rsn.4 rs.16 rs.17 rs.19 ltt.7 lt.33 lt.34 lt.58 mt.45 mt.46 mt.47 mt.55 rtt.3 rtt.4
## reason.4 1.00
## reason.16 0.28 1.00
## reason.17 0.40 0.32 1.00
## reason.19 0.30 0.25 0.34 1.00
## letter.7 0.28 0.27 0.29 0.25 1.00
## letter.33 0.23 0.20 0.26 0.25 0.34 1.00
## letter.34 0.29 0.26 0.29 0.27 0.40 0.37 1.00
## letter.58 0.29 0.21 0.29 0.25 0.33 0.28 0.32 1.00
## matrix.45 0.25 0.18 0.20 0.22 0.20 0.20 0.21 0.19 1.00
## matrix.46 0.25 0.18 0.24 0.18 0.24 0.23 0.27 0.21 0.33 1.00
## matrix.47 0.24 0.24 0.27 0.23 0.27 0.23 0.30 0.23 0.24 0.23 1.00
## matrix.55 0.16 0.15 0.16 0.15 0.14 0.17 0.14 0.23 0.21 0.14 0.21 1.00
## rotate.3 0.23 0.16 0.17 0.18 0.18 0.17 0.19 0.24 0.16 0.15 0.20 0.18 1.00
## rotate.4 0.25 0.20 0.20 0.21 0.23 0.21 0.21 0.27 0.17 0.17 0.20 0.18 0.53 1.00
## rotate.6 0.25 0.20 0.27 0.19 0.20 0.21 0.19 0.26 0.15 0.20 0.18 0.17 0.43 0.45
## rotate.8 0.21 0.16 0.18 0.16 0.13 0.14 0.15 0.22 0.16 0.15 0.17 0.19 0.43 0.44
## rtt.6 rtt.8
## rotate.6 1.00
## rotate.8 0.42 1.00
R.bfi <- cor(bfi,use="pairwise") #this does not show the bfi correlations
Then use alpha
on the ability correlation matrix. Note
that the standardized alpha and raw alpha are now identical, because we
are doing everything on standardized scores.
ab.alpha <- alpha(R.ab)
ab.alpha #show the results
##
## Reliability analysis
## Call: alpha(x = R.ab)
##
## raw_alpha std.alpha G6(smc) average_r S/N median_r
## 0.83 0.83 0.84 0.23 4.9 0.21
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.68 0.83 0.93
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N var.r med.r
## reason.4 0.82 0.82 0.82 0.23 4.5 0.0060 0.21
## reason.16 0.82 0.82 0.83 0.24 4.7 0.0061 0.22
## reason.17 0.82 0.82 0.82 0.23 4.5 0.0058 0.21
## reason.19 0.82 0.82 0.83 0.24 4.6 0.0061 0.21
## letter.7 0.82 0.82 0.82 0.23 4.5 0.0057 0.21
## letter.33 0.82 0.82 0.83 0.24 4.6 0.0060 0.21
## letter.34 0.82 0.82 0.82 0.23 4.5 0.0056 0.21
## letter.58 0.82 0.82 0.82 0.23 4.5 0.0062 0.21
## matrix.45 0.83 0.83 0.83 0.24 4.7 0.0060 0.23
## matrix.46 0.82 0.82 0.83 0.24 4.7 0.0060 0.22
## matrix.47 0.82 0.82 0.83 0.24 4.6 0.0063 0.21
## matrix.55 0.83 0.83 0.83 0.24 4.8 0.0058 0.23
## rotate.3 0.82 0.82 0.82 0.23 4.6 0.0045 0.23
## rotate.4 0.82 0.82 0.82 0.23 4.5 0.0046 0.22
## rotate.6 0.82 0.82 0.82 0.23 4.5 0.0051 0.22
## rotate.8 0.82 0.82 0.83 0.24 4.6 0.0048 0.23
##
## Item statistics
## r r.cor r.drop
## reason.4 0.58 0.54 0.49
## reason.16 0.50 0.44 0.40
## reason.17 0.57 0.54 0.49
## reason.19 0.52 0.47 0.43
## letter.7 0.56 0.52 0.47
## letter.33 0.53 0.48 0.44
## letter.34 0.57 0.54 0.48
## letter.58 0.57 0.52 0.48
## matrix.45 0.48 0.42 0.38
## matrix.46 0.49 0.43 0.39
## matrix.47 0.52 0.47 0.43
## matrix.55 0.42 0.35 0.32
## rotate.3 0.54 0.51 0.45
## rotate.4 0.58 0.56 0.49
## rotate.6 0.56 0.53 0.47
## rotate.8 0.51 0.47 0.41
Now, do the same thing for the bfi correlations, but just show the summary.
bfi.cor.scores <- scoreItems(keys.list,items=R.bfi)
summary(bfi.cor.scores) #compare to the prior results
## Call: NULL
##
## Scale intercorrelations corrected for attenuation
## raw correlations below the diagonal, (standardized) alpha on the diagonal
## corrected correlations above the diagonal:
## agree conscientious extraversion neuroticism openness
## agree 0.71 0.35 0.64 -0.242 0.25
## conscientious 0.25 0.73 0.36 -0.287 0.30
## extraversion 0.47 0.27 0.76 -0.278 0.35
## neuroticism -0.18 -0.22 -0.22 0.815 -0.11
## openness 0.16 0.20 0.24 -0.074 0.61
round(bfi.cor.scores$corrected - bfi.scores$corrected,2)
## agree conscientious extraversion neuroticism openness
## agree 0.01 -0.01 0.01 0.00 0.02
## conscientious 0.00 0.01 0.00 0.02 0.00
## extraversion 0.01 0.00 0.00 0.01 0.03
## neuroticism 0.00 0.01 0.00 0.00 0.02
## openness 0.02 0.00 0.02 0.01 0.01
Perhaps obvious, but doing it on the correlation matrix does not allow us to find scores, nor to descriptive statistics of the scores.
We can see from the correlation matrix that extraversion, neuroticism and openness are barely correlated. What if we made up a scale of those 15 items and allowed the items to be flipped to be most consistent with total score? (This is straight forward but dangerous).
The resulting \(\alpha\) of this false composite appears to be fairly good, but when we examine all possible splits, we discover we probably should not have done this.
alpha(bfi[11:25],check.keys=TRUE) #this reverses some items
## Warning in alpha(bfi[11:25], check.keys = TRUE): Some items were negatively correlated with the first principal component and were automatically reversed.
## This is indicated by a negative sign for the variable name.
##
## Reliability analysis
## Call: alpha(x = bfi[11:25], check.keys = TRUE)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.77 0.49 0.64 0.059 0.94 0.0064 3 0.7 0.051
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.75 0.77 0.78
## Duhachek 0.75 0.77 0.78
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## E1 0.75 0.52 0.66 0.072 1.08 0.0067 0.055 0.054
## E2 0.73 0.49 0.64 0.065 0.97 0.0073 0.051 0.044
## E3- 0.75 0.49 0.64 0.065 0.97 0.0068 0.054 0.054
## E4- 0.75 0.53 0.66 0.075 1.13 0.0069 0.052 0.051
## E5- 0.75 0.50 0.65 0.068 1.01 0.0067 0.055 0.054
## N1 0.75 0.39 0.56 0.043 0.63 0.0070 0.053 0.045
## N2 0.75 0.39 0.56 0.044 0.64 0.0069 0.053 0.051
## N3 0.74 0.38 0.56 0.042 0.61 0.0070 0.053 0.044
## N4 0.74 0.41 0.59 0.048 0.70 0.0073 0.053 0.045
## N5 0.75 0.42 0.60 0.048 0.71 0.0070 0.057 0.044
## O1- 0.76 0.49 0.64 0.065 0.97 0.0066 0.057 0.052
## O2 0.77 0.49 0.64 0.063 0.95 0.0064 0.060 0.044
## O3- 0.76 0.50 0.64 0.066 0.99 0.0067 0.055 0.052
## O4 0.77 0.46 0.63 0.057 0.85 0.0063 0.062 0.040
## O5 0.77 0.51 0.65 0.068 1.03 0.0064 0.059 0.051
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## E1 2777 0.48 0.132 -0.00098 0.350 3.0 1.6
## E2 2784 0.65 0.248 0.17609 0.548 3.1 1.6
## E3- 2775 0.49 0.254 0.16228 0.388 3.0 1.4
## E4- 2791 0.52 0.077 -0.04781 0.410 2.6 1.5
## E5- 2779 0.47 0.206 0.08779 0.358 2.6 1.3
## N1 2778 0.55 0.628 0.67944 0.438 2.9 1.6
## N2 2779 0.55 0.621 0.66725 0.435 3.5 1.5
## N3 2789 0.57 0.649 0.68340 0.453 3.2 1.6
## N4 2764 0.63 0.546 0.53908 0.529 3.2 1.6
## N5 2771 0.55 0.537 0.50165 0.430 3.0 1.6
## O1- 2778 0.38 0.253 0.13710 0.279 2.2 1.1
## O2 2800 0.36 0.279 0.15245 0.223 2.7 1.6
## O3- 2772 0.44 0.231 0.12721 0.335 2.6 1.2
## O4 2786 0.19 0.389 0.28391 0.079 4.9 1.2
## O5 2780 0.32 0.189 0.04838 0.203 2.5 1.3
##
## Non missing response frequency for each item
## 1 2 3 4 5 6 miss
## E1 0.24 0.23 0.15 0.16 0.13 0.09 0.01
## E2 0.19 0.24 0.12 0.22 0.14 0.09 0.01
## E3 0.05 0.11 0.15 0.30 0.27 0.13 0.01
## E4 0.05 0.09 0.10 0.16 0.34 0.26 0.00
## E5 0.03 0.08 0.10 0.22 0.34 0.22 0.01
## N1 0.24 0.24 0.15 0.19 0.12 0.07 0.01
## N2 0.12 0.19 0.15 0.26 0.18 0.10 0.01
## N3 0.18 0.23 0.13 0.21 0.16 0.09 0.00
## N4 0.17 0.24 0.15 0.22 0.14 0.09 0.01
## N5 0.24 0.24 0.14 0.18 0.12 0.09 0.01
## O1 0.01 0.04 0.08 0.22 0.33 0.33 0.01
## O2 0.29 0.26 0.14 0.16 0.10 0.06 0.00
## O3 0.03 0.05 0.11 0.28 0.34 0.20 0.01
## O4 0.02 0.04 0.06 0.17 0.32 0.39 0.01
## O5 0.27 0.32 0.19 0.13 0.07 0.03 0.01
sp.bfi <- splitHalf(bfi[11:25],raw=TRUE)
## Warning in splitHalf(bfi[11:25], raw = TRUE): Some items were negatively correlated with total
## scale and were automatically reversed.
sp.bfi #show the results
## Split half reliabilities
## Call: splitHalf(r = bfi[11:25], raw = TRUE)
##
## Maximum split half reliability (lambda 4) = 0.85
## Guttman lambda 6 = 0.81
## Average split half reliability = 0.76
## Guttman lambda 3 (alpha) = 0.76
## Guttman lambda 2 = 0.78
## Minimum split half reliability (beta) = 0.36
## Average interitem r = 0.17 with median = 0.13
## 2.5% 50% 97.5%
## Quantiles of split half reliability = 0.59 0.77 0.83
hist(sp.bfi$raw,breaks=101) #by specifying the breaks we can increase the resolution
By examining the split halfs, we can see that there are some that are as low as .36 which suggests that the scale is not measuring one thing. (It is not.) We will explore this problem of multi-dimensionality next.