library(psych)
library(psychTools)
The problem of how to construct scales and evaluate their quality is known as test theory. Although sometimes expressed in terms of many equations, the basic concepts are fairly straightforward. Developed by Charles Spearman (1904) and extended by many psychometricians in the next 120 years one of the fundamental questions has to do with reliability of the the measurement.
Some useful readings include the discussion of reliability by Revelle and Condon (2019) and how to construct scales by Revelle and Garner scale construction.
Reliability is the correlation of a test with a test just like it. It is also a way of decomposing the variance of a test.
• Observed = True + Error
• Reliability = \(1- \frac{\sigma^{2}_{error}}{\sigma^{2}_{observed}}\)
Reliability is the squared correlation with the domain: \(r_{xx} = r_{x_{domain}}^{2 }\)
But how do we find a test ‘just like it’?
2.Reliability requires variance in observed score
As \(\sigma_{observed}^{2}\) decreases so will
\(1- \frac{\sigma^{2}_{error}}{\sigma^{2}_{observed}}\)
• People differ in some latent score
• Items differ in difficulty (\(\theta\)) and difficulty (\(\delta\))
• \(p(correct|ability,difficulty,...)=f(ability−difficulty)\)
\(p(endorsement|trait level,difficulty,...)=f(trait level −difficulty)\)
• What is the appropriate function?
The basic assumption for any psychological attribute is that there is always someone higher than you have observed, and there always someone lower. Most tests are bounded 0-1 (0 to 100%) and thus the observed score has to be a non-linear function of the underlying attribute.
Classic Test Theory considers all items to be random replicates of each other and total (or average) score to be the appropriate measure of the underlying attribute. Items are thought to be endorsed (passed) with an increasing probability as a function of the underlying trait. But if the trait is unbounded (just as there is always the possibility of someone being higher than the highest observed score, so is there a chance of someone being lower than the lowest observed score), and the score is bounded (from p=0 to p=1), then the relationship between the latent score and the observed score must be non-linear. This leads to the most simple of all models, one that has no parameters to estimate but is just a non-linear mapping of latent to observed:
\[\begin{equation} \tag{1} p(correct | \theta) = \frac{1}{1+ e^{ - \theta }} . \end{equation}\].
logistic
functionThe logistic, when scaled by 1.702, is very close to the cumulative
normal function. We show this using the curve
function.
curve(logistic(x*1.702),-3,3,main="logistic (solid) versus cumulative normal (dotted")
curve(pnorm(x),lty="dotted",add=TRUE)
Why do we compare this to the cumulative normal? If the some
attribute is distributed normally (remember the central limit theorem),
then the probability of having an attribute value less than x is the
cumulative normal. We show this using the curve
function.
curve(dnorm(x),-3,3,ylim=c(0,1),main="Normal curve and its sum")
curve(pnorm(x), add=TRUE,lty="dotted")
The Rasch model is a one paramter model (modeling item difficulty). Given a person with attribute value, \(\theta\), and a set of items with difficulty, \(\delta\), we model the probability of endorsing an item as \[\begin{equation} \tag{2} p(correct | \delta, \theta) = \frac{1}{1+ e^{\delta - \theta }} . \end{equation}\].
The problem becomes one of estimating the persons attribute value given a pattern of responses.
The approach suggested by Rasch was to solve equation 1 by examining the patterns of successes (1) and failures (0) for a participant.
This treats all items as differing in difficulty, but having equal
discrimination. We do this with the rasch
function in the
ltm
package.
The problem with the Rasch model is that it assumes all items are equally discriminating. By adding an additional discrimination parameter, we model two parameters:
\[\begin{equation} \tag{3} p(correct | \delta, \theta, \alpha) = \frac{1}{1+ e^{\alpha*(\delta - \theta) }} . \end{equation}\].
Using the MIRT
package, we can find these
parameters.
By finding tetrachoric or polychoric correlations
of the items, there is a one-to-one relationship between the parameters
of factor analysis and the results of an IRT analysis. This is shown
using the irt.fa
function in psych
.
sim.data <- sim.irt(nvar=9, n=1000)
names(sim.data)
## [1] "items" "discrimination" "difficulty" "gamma" "zeta"
## [6] "theta"
sim.data$discrimination #by default, these are equal
## [1] 1 1 1 1 1 1 1 1 1
sim.data$difficulty #by default, equally spaced
## [1] -3.00 -2.25 -1.50 -0.75 0.00 0.75 1.50 2.25 3.00
describe(sim.data$items)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## V1 1 1000 0.92 0.27 1 1.00 0 0 1 1 -3.09 7.57 0.01
## V2 2 1000 0.86 0.35 1 0.95 0 0 1 1 -2.10 2.40 0.01
## V3 3 1000 0.77 0.42 1 0.84 0 0 1 1 -1.31 -0.29 0.01
## V4 4 1000 0.64 0.48 1 0.67 0 0 1 1 -0.58 -1.67 0.02
## V5 5 1000 0.50 0.50 1 0.50 0 0 1 1 -0.01 -2.00 0.02
## V6 6 1000 0.34 0.47 0 0.30 0 0 1 1 0.67 -1.56 0.02
## V7 7 1000 0.20 0.40 0 0.12 0 0 1 1 1.51 0.27 0.01
## V8 8 1000 0.12 0.33 0 0.03 0 0 1 1 2.28 3.19 0.01
## V9 9 1000 0.07 0.25 0 0.00 0 0 1 1 3.49 10.20 0.01
irt.sim <- irt.fa(sim.data$items) #plot the information curves
plot.irt(irt.sim,type="ICC") #plot the Item Characteristic Curves
plot.irt(irt.sim, type ="test") #show test information
Use the MSQ item set from last week
keys <- list(onefactor = cs(active,alert, aroused, -sleepy,-tired, -drowsy,anxious, jittery,nervous, -calm,-relaxed, -at.ease),
energy=cs(active,alert, aroused, -sleepy,-tired, -drowsy),
tension =cs(anxious, jittery,nervous, -calm,-relaxed, -at.ease))
msq.items <- selectFromKeys(keys[1])
msq.items
## onefactor1 onefactor2 onefactor3 onefactor4 onefactor5 onefactor6 onefactor7 onefactor8
## "active" "alert" "aroused" "sleepy" "tired" "drowsy" "anxious" "jittery"
## onefactor9 onefactor10 onefactor11 onefactor12
## "nervous" "calm" "relaxed" "at.ease"
Use those items from the full set of msq cases. First do it for all 12 items and one factor, and then do a 2 factor solution.
msq.irt <- irt.fa(msq[msq.items])
#now show the two factor solution
msq.irt2 <- irt.fa(msq[msq.items] ,2)
## Loading required namespace: GPArotation
Show the test information curves. First for the one factor solution, and then for the two factor solution.
plot.irt(msq.irt, type="test")
plot.irt(msq.irt2, type="test")