library(psych)
library(psychTools)

Some basic test theory

The problem of how to construct scales and evaluate their quality is known as test theory. Although sometimes expressed in terms of many equations, the basic concepts are fairly straightforward. Developed by Charles Spearman (1904) and extended by many psychometricians in the next 120 years one of the fundamental questions has to do with reliability of the the measurement.

Some useful readings include the discussion of reliability by Revelle and Condon (2019) and how to construct scales by Revelle and Garner scale construction.

Classic test theory: Observed scores = True scores + error

Classical model of reliability

Reliability is the correlation of a test with a test just like it. It is also a way of decomposing the variance of a test.

• Observed = True + Error

• Reliability = \(1- \frac{\sigma^{2}_{error}}{\sigma^{2}_{observed}}\)

Reliability is the squared correlation with the domain: \(r_{xx} = r_{x_{domain}}^{2 }\)

But how do we find a test ‘just like it’?

2.Reliability requires variance in observed score

As \(\sigma_{observed}^{2}\) decreases so will

\(1- \frac{\sigma^{2}_{error}}{\sigma^{2}_{observed}}\)

The “New” psychometrics

Model the person as well as the item

• People differ in some latent score

• Items differ in difficulty (\(\theta\)) and difficulty (\(\delta\))

Although the original model is a model of ability tests and item difficulty, this may be applied to the study of attitudes and temperament. We still call the latent trait \(\theta\) which reflects the level of the unseen (latent) attribute.

• \(p(correct|ability,difficulty,...)=f(ability−difficulty)\)

\(p(endorsement|trait level,difficulty,...)=f(trait level −difficulty)\)

• What is the appropriate function?

Extensions to polytomous items, particularly rating scale model

Classic Test Theory as 0 parameter IRT

Psychological attributes are non-linear functions of probability

The basic assumption for any psychological attribute is that there is always someone higher than you have observed, and there always someone lower. Most tests are bounded 0-1 (0 to 100%) and thus the observed score has to be a non-linear function of the underlying attribute.

Classic Test Theory considers all items to be random replicates of each other and total (or average) score to be the appropriate measure of the underlying attribute. Items are thought to be endorsed (passed) with an increasing probability as a function of the underlying trait. But if the trait is unbounded (just as there is always the possibility of someone being higher than the highest observed score, so is there a chance of someone being lower than the lowest observed score), and the score is bounded (from p=0 to p=1), then the relationship between the latent score and the observed score must be non-linear. This leads to the most simple of all models, one that has no parameters to estimate but is just a non-linear mapping of latent to observed:

\[\begin{equation} \tag{1} p(correct | \theta) = \frac{1}{1+ e^{ - \theta }} . \end{equation}\].

This function is a `logistic` function

The logistic, when scaled by 1.702, is very close to the cumulative normal function. We show this using the curve function.

curve(logistic(x*1.702),-3,3,main="logistic (solid) versus cumulative normal (dotted")
curve(pnorm(x),lty="dotted",add=TRUE)

Why do we compare this to the cumulative normal? If the some attribute is distributed normally (remember the central limit theorem), then the probability of having an attribute value less than x is the cumulative normal. We show this using the curve function.

curve(dnorm(x),-3,3,ylim=c(0,1),main="Normal curve and its sum")
curve(pnorm(x), add=TRUE,lty="dotted")

Item Response Theory

The Rasch model is a one paramter model (modeling item difficulty). Given a person with attribute value, \(\theta\), and a set of items with difficulty, \(\delta\), we model the probability of endorsing an item as \[\begin{equation} \tag{2} p(correct | \delta, \theta) = \frac{1}{1+ e^{\delta - \theta }} . \end{equation}\].

The problem becomes one of estimating the persons attribute value given a pattern of responses.

The approach suggested by Rasch was to solve equation 1 by examining the patterns of successes (1) and failures (0) for a participant.

This treats all items as differing in difficulty, but having equal discrimination. We do this with the rasch function in the ltm package.

The two parameter model

The problem with the Rasch model is that it assumes all items are equally discriminating. By adding an additional discrimination parameter, we model two parameters:

\[\begin{equation} \tag{3} p(correct | \delta, \theta, \alpha) = \frac{1}{1+ e^{\alpha*(\delta - \theta) }} . \end{equation}\].

Using the MIRT package, we can find these parameters.

Factor analysis and 2 PL IRT

By finding tetrachoric or polychoric correlations of the items, there is a one-to-one relationship between the parameters of factor analysis and the results of an IRT analysis. This is shown using the irt.fa function in psych.

Simulate some IRT type data

sim.data <- sim.irt(nvar=9, n=1000)
names(sim.data)

## [1] "items"          "discrimination" "difficulty"     "gamma"          "zeta"          
## [6] "theta"

sim.data$discrimination #by default, these are equal

## [1] 1 1 1 1 1 1 1 1 1

sim.data$difficulty  #by default, equally spaced

## [1] -3.00 -2.25 -1.50 -0.75  0.00  0.75  1.50  2.25  3.00

describe(sim.data$items)

##    vars    n mean   sd median trimmed mad min max range  skew kurtosis   se
## V1    1 1000 0.92 0.27      1    1.00   0   0   1     1 -3.09     7.57 0.01
## V2    2 1000 0.86 0.35      1    0.95   0   0   1     1 -2.10     2.40 0.01
## V3    3 1000 0.77 0.42      1    0.84   0   0   1     1 -1.31    -0.29 0.01
## V4    4 1000 0.64 0.48      1    0.67   0   0   1     1 -0.58    -1.67 0.02
## V5    5 1000 0.50 0.50      1    0.50   0   0   1     1 -0.01    -2.00 0.02
## V6    6 1000 0.34 0.47      0    0.30   0   0   1     1  0.67    -1.56 0.02
## V7    7 1000 0.20 0.40      0    0.12   0   0   1     1  1.51     0.27 0.01
## V8    8 1000 0.12 0.33      0    0.03   0   0   1     1  2.28     3.19 0.01
## V9    9 1000 0.07 0.25      0    0.00   0   0   1     1  3.49    10.20 0.01

irt.sim <- irt.fa(sim.data$items) #plot the information curves

plot.irt(irt.sim,type="ICC") #plot the Item Characteristic Curves

plot.irt(irt.sim, type ="test") #show test information

Generalizing to polytomous items

Use the MSQ item set from last week

keys <- list(onefactor = cs(active,alert, aroused, -sleepy,-tired, -drowsy,anxious, jittery,nervous, -calm,-relaxed, -at.ease),
        energy=cs(active,alert, aroused, -sleepy,-tired, -drowsy),
        tension =cs(anxious, jittery,nervous, -calm,-relaxed, -at.ease))
msq.items <- selectFromKeys(keys[1])
msq.items

##  onefactor1  onefactor2  onefactor3  onefactor4  onefactor5  onefactor6  onefactor7  onefactor8 
##    "active"     "alert"   "aroused"    "sleepy"     "tired"    "drowsy"   "anxious"   "jittery" 
##  onefactor9 onefactor10 onefactor11 onefactor12 
##   "nervous"      "calm"   "relaxed"   "at.ease"

Use those items from the full set of msq cases. First do it for all 12 items and one factor, and then do a 2 factor solution.

msq.irt <- irt.fa(msq[msq.items])

#now show the two factor solution

msq.irt2 <- irt.fa(msq[msq.items] ,2)

## Loading required namespace: GPArotation

Show the test information curves. First for the one factor solution, and then for the two factor solution.

plot.irt(msq.irt, type="test")

plot.irt(msq.irt2, type="test")

350.week9 Test theory

William Revelle

5/15/2023

Some basic test theory

Classic test theory: Observed scores = True scores + error

Classical model of reliability

The “New” psychometrics

Classic Test Theory as 0 parameter IRT

Psychological attributes are non-linear functions of probability

This function is a `logistic` function

Item Response Theory

The two parameter model

Factor analysis and 2 PL IRT

Simulate some IRT type data

Generalizing to polytomous items

350.week9 Test theory

William Revelle

5/15/2023

Some basic test theory

Classic test theory: Observed scores = True scores + error

Classical model of reliability

Alternate estimates of reliability all share this need for variance

The “New” psychometrics

Classic Test Theory as 0 parameter IRT

Psychological attributes are non-linear functions of probability

This function is a logistic function

Item Response Theory

The two parameter model

Factor analysis and 2 PL IRT

Simulate some IRT type data

Generalizing to polytomous items

This function is a `logistic` function