Two new functions have been added to psychTools
. These
are nice examples for today’s class.
Check the news to see what has changed.
packageDate("psychTools") #should be >= 05/10/24
## [1] "2024-05-10"
packageDate("psych" # should be >= 05/17/24
)
## [1] "2024-05-17"
news(Version =="2.4.4",package="psych")
## Changes in version 2.4.4 (2024-04-03)
##
## Introduction
##
## o The psych package includes functions and data sets to do
## classic and modern psychometrics and to analyze personality and
## experimental psychological data sets. The psych package has
## been developed as a supplement to courses in research methods
## in psychology, personality research, and graduate level
## psychometric theory courses. The functions are a supplement to
## the text (in progress): An introduction to psychometric theory
## with applications in R. The functions are also written to
## further research in the Personality-Motivation-Cognition
## Laboratory at Northwestern University.
##
## o Additional functions are added sporadically.
##
## o This news file reports changes that have been made as the
## package has been developed.
##
## o To report bugs, send email to <mailto:revelle@northwestern.edu>
## using bug.report.
##
## o Version 2.4.4 is the development release of the psych package.
## It is available as a source file for Macs or PCs in the
## repository at <https://personality-project.org/r/>. The
## released version on CRAN is 2.4.3 (Added March 15, 2024). The
## second digit reflects the year (i.e., 2024), the third set the
## month (i.e., 2.3.12 was released in December of 2023 the last
## two digits of development versions reflect either an minor
## change or the day of any modifications, e.g. 1.8.3.3 was the
## third attempt to get 1.8.3 released. 1.7.8 was released in
## August, 2017.
##
## o To install this development version, use the command:
## install.packages("psych",repos="https://personality-project.org/r",type="source").
##
## Remember to restart R and library(psych) to make the new
## version active. It is also useful to check the date of the
## development package, as I will use the same version number but
## change the dates (e.g., packageDate("psych")). The date of the
## CRAN version reflects when it was submitted.
##
## Still to do
##
## o Change examples to dontrun rather than commented out ## mostly
## done
##
## o Add the ability to get factor scores for higher order factors
## (requested by Brandon Vaughn). Partly done in omega, but there
## are problems associated with the entire problem of factor
## scores.
##
## o Fix a bug in alpha when finding average R (need to use cor
## rather than cov2cor when treating missing data)
##
## o Add weight option to statsBy (but the question is how to handle
## missing data)
##
## o Adjust the se for correlations (in bestScales) to take into
## account the pairwise count, rather than the n.obs
##
## o Improve documentation for thurstone. Would help if we had an
## example of scaling.
##
## o scoreIrt.2pl needs to be able to adjust for group differences
## correctly
##
## o Possible problem in pooled correlations for statsBy
##
## o Think about improvements in itemLookup to show means if desired
##
## o Fix bug in polychoric where it doesn't handle empty cells very
## well. (reported by Björn Büdenbender). Or at least improve the
## documentation about why this is a problem.
##
## o Fix bug in mediate for multiple dependent variables (reported
## by Martin Zeschke)
##
## o scoreItem should find the mean item correlation with na.rm=TRUE
## (Liz D)
##
## o create a scoreFastBy
##
## o improve the help menu for sim.multi and return the latent
## values
##
## o think about linking person scores to item difficulty scores in
## IRT
##
## o Consider using the Jennrich approach to standard errors of
## rotated loadings
##
## o Still need to fix mediate to handle more than 1 DVs
##
## o Possible problem with the use of try in several functions.
##
## o Need to fix omega so that alpha is correctly calculated in case
## of no general factor
##
## o allow omega style hierarchical structure for x (and y) in esem
##
## Additions
##
## o Finally added Left justification to dictionary content in
## lookupItems and in bestScales
##
## o Added the ability to show probability stars in corPlot.
##
## o Added a short function (r2p) to convert r to p values.
##
## o Added the pval option to cor2
##
## o Modified fa.sort to include cor2 output
##
## o Added the score option to scoreOverlap so that if given raw
## data, it will find the scale scores as well as the corrected
## for overlap statistics. Note that the scale scores are not
## corrected for overlap.
##
## o Minor tweaks to scoreIrt.2pl to check for bad data (missing
## cells ). This was leading to the function hanging forever in
## the case of pairwise counts of 0.
##
## o Added fix.dlpyr (as a local function) to statsBy and describeBy
## to not choke on Tibbles. Had added this to alpha several years
## ago.
##
## o Added itemSort to combine item means and item content and then
## sort them.
##
## o Added the median option to describe with the fast=TRUE option
## (requested by David Condon)
##
## o Parallelized describe. Now will describe 644K subjects with
## 6633 variables in 64 seconds (with fast=TRUE option).
## Improvement in speed varies by numbers of cores.
##
## o Parallelized bigCor for very significant savings.
##
## o Added the hist.border option to pairs.panels following a
## request by Jordan Adamson.
##
## o Added the w.exp (weighting exponent to cohen.kappa following a
## request from Julius Pfadt)
##
## o Added n.obs to unidim to properly find statistics from
## correlation matrices
##
## o Added a number of fit statistics to unidim
##
## o Added the ability to find Spearman correlations to bigCor.
## Added a check that the data are a data.frame rather than a
## data.table for bigCor.
##
## o Changed the class of describe output to be c("psych",
## "Pdescribe" ,"data.frame") to avoid choking if Hmisc is in use
##
## o Added the ability to use mean.weights in statsBy (still under
## testing)
##
## o Added paSelect to allow parallel analyses of subsets of items
## using a keys list.
##
## o Added vssSelect to allow VSS analyses of subsets of items using
## a kes list.
##
## Bugs fixed
##
## o changed the number of elements in the unit.Circle in
## spider/radar to get more precision
##
## o Added a conversion to numeric in mediate to get around a
## problem with variables that are factors. (Problem reported by
## Lynn Bueeler )
##
## o Finally fixed bug in alpha where it finds R from cov2cor
## instead of directly (reported by Karl Ove Hufthammer )
##
## o Fixed item.lookup so that it will report item means as well as
## factor loadings.
##
## o Strange bug introduced into score.multiple.choice by qgraph
## reported by Michael Truong. Fixed by changing class of
## describe object used in score.multiple.choice to just be a
## data.frame.
##
## o Fixed df2latex for short names not labeling h2 (reported by
## Alex Weiss )
##
## o Fixed a problem in polychoric which was leading to NaN values
## in rare cases due to a precision problem in cumsum of
## frequencies (Reported by Esmerala Ruiz Pujada with a nice data
## set that led to the problem.)
##
## o Fixed a minor problem in print_psych.bestScales for
## dictionaries with mislabeled columns.
##
## o Fixed alpha to properly report std.alpha in case of a keys
## vector (reported by Benjamin Bui)
# I have already installed this so I am commenting it out
#install.packages("psych",repos="https://personality-project.org/r",type="source")
Make psych
and psychTools
active.
library(psych)
library(psychTools)
It is frequently necessary to combine information from different
files into one unified object. The lookupFromKeys
function
is an example of this.
We have a list of item scoring keys (a keyslist) which tells
us which items are to be scored from a data file. We have another file,
an item dictionary which has the content of the items given. We
have yet another file which may be created dynamically using either
scoreItems
or scoreOverlap
and is the
correlation of each item with each scale. We want to examine the item
content of each scale, and include the item by scale correlation. Thus,
we need to combine the keys, the dictionary, and the correlations. Lets
first look at the result, and then examine how the function works.
bfi.keys #show the keys for the bfi
## $agree
## [1] "-A1" "A2" "A3" "A4" "A5"
##
## $conscientious
## [1] "C1" "C2" "C3" "-C4" "-C5"
##
## $extraversion
## [1] "-E1" "-E2" "E3" "E4" "E5"
##
## $neuroticism
## [1] "N1" "N2" "N3" "N4" "N5"
##
## $openness
## [1] "O1" "-O2" "O3" "O4" "-O5"
bfi.scores <- scoreItems(bfi.keys,bfi)#find the scores
names(bfi.scores) #what are the objects included?
## [1] "scores" "missing" "alpha" "av.r" "sn"
## [6] "n.items" "item.cor" "cor" "corrected" "G6"
## [11] "item.corrected" "response.freq" "raw" "ase" "med.r"
## [16] "keys" "MIMS" "MIMT" "Call"
bfi.overlap <- scoreOverlap(bfi.keys,bfi)#find the correlations corrected for overlap
#compare the two sets of correlations
combined.df <- data.frame(raw=bfi.scores$item.cor,corrected= bfi.overlap$item.cor)
difference <- combined.df[1:5] - combined.df[6:10]
round(difference,2) #show the differences
## raw.agree raw.conscientious raw.extraversion raw.neuroticism raw.openness
## A1 -0.17 0.01 0.02 -0.02 0.04
## A2 0.04 -0.06 -0.09 0.01 -0.07
## A3 0.04 -0.05 -0.11 0.01 -0.07
## A4 0.16 -0.07 -0.07 0.02 -0.01
## A5 0.05 -0.06 -0.13 0.02 -0.07
## C1 -0.03 0.04 -0.05 0.01 -0.09
## C2 -0.04 0.02 -0.05 0.00 -0.07
## C3 -0.04 0.07 -0.04 0.01 -0.03
## C4 0.04 -0.01 0.05 -0.03 0.05
## C5 0.04 -0.07 0.07 -0.04 0.02
## E1 0.06 0.01 -0.08 -0.01 0.05
## E2 0.08 0.05 -0.02 -0.03 0.05
## E3 -0.09 -0.05 0.02 0.01 -0.13
## E4 -0.10 -0.05 0.00 0.02 -0.03
## E5 -0.07 -0.10 0.04 0.01 -0.11
## N1 0.04 0.04 0.02 0.03 0.02
## N2 0.04 0.03 0.02 0.04 0.01
## N3 0.02 0.04 0.03 0.07 0.00
## N4 0.04 0.06 0.08 0.09 0.00
## N5 0.01 0.03 0.04 0.13 0.04
## O1 -0.04 -0.05 -0.08 0.01 0.04
## O2 0.00 0.04 0.02 -0.02 -0.18
## O3 -0.06 -0.06 -0.10 0.01 0.00
## O4 -0.02 -0.01 0.01 -0.02 0.14
## O5 0.02 0.04 0.02 -0.01 -0.10
#now, lookup the items from the dictionary and combine them with the correlations
lookupFromKeys(bfi.keys,dictionary=bfi.dictionary[1:3],cors=bfi.overlap$item.cor,n=5)
## $agree
## ItemLabel Item Giant3 cors
## A3 q_1206 Know how to comfort others. Cohesion 0.72
## A2 q_1162 Inquire about others' well-being. Cohesion 0.69
## A5 q_1419 Make people feel at ease. Cohesion 0.64
## A4 q_1364 Love children. Cohesion 0.50
## A1- q_146 Am indifferent to the feelings of others. Cohesion -0.40
##
## $conscientious
## ItemLabel Item Giant3 cors
## C4- q_626 Do things in a half-way manner. Stability -0.72
## C2 q_530 Continue until everything is perfect. Stability 0.68
## C5- q_1949 Waste my time. Stability -0.65
## C1 q_124 Am exacting in my work. Stability 0.60
## C3 q_619 Do things according to a plan. Stability 0.59
##
## $extraversion
## ItemLabel Item Giant3 cors
## E2- q_901 Find it difficult to approach others. Plasticity -0.76
## E4 q_1410 Make friends easily. Plasticity 0.74
## E3 q_1205 Know how to captivate people. Plasticity 0.66
## E1- q_712 Don't talk a lot. Plasticity -0.64
## E5 q_1768 Take charge. Plasticity 0.60
##
## $neuroticism
## ItemLabel Item Giant3 cors
## N1 q_952 Get angry easily. Stability 0.76
## N2 q_974 Get irritated easily. Stability 0.75
## N3 q_1099 Have frequent mood swings. Stability 0.74
## N4 q_1479 Often feel blue. Stability 0.62
## N5 q_1505 Panic easily. Stability 0.55
##
## $openness
## ItemLabel Item Giant3 cors
## O3 q_492 Carry the conversation to a higher level. Plasticity 0.67
## O1 q_128 Am full of ideas. Plasticity 0.57
## O5- q_1964 Will not probe deeply into a subject. Plasticity -0.57
## O2- q_316 Avoid difficult reading material. Plasticity -0.48
## O4 q_1738 Spend time reflecting on things. Plasticity 0.36
Just ask for it by name. This unfortunately does not include the comments. I have copied the function from the source file to include some comments.
#adjusted 11/15/20 to add correlations if provided
"lookupFromKeys" <-
function(keys.list,dictionary,n=20,cors = NULL,sort=TRUE,suppress.names=FALSE,digits=2){
n.scales <- length(keys.list)
results <- item.cors <- result.df <- list()
for(i in 1:n.scales) {
list.name <- names(keys.list[i])
list.i <- keys.list[[i]]
keys <- rep(1,length(list.i))[1:(min(n,length(list.i)))]
neg <- grep("-", list.i[1:(min(n,length(list.i)))]) #find the negative keyed items
keys[neg] <- -1
select <- sub("-", "", list.i) #get rid of negative signs
results[[i]] <- lookup(select[1:(min(n,length(list.i)))],dictionary)
if(!is.null(rownames(results[[i]])[keys < 0])) rownames(results[[i]])[keys < 0] <- paste0(rownames(results[[i]])[keys<0],"-") #put them in
if(!is.null(cors)) { item.cors[[i]] <- round(cors[select[1:(min(n,length(select)))],i],digits=digits)
result.df[[i]] <- data.frame(results[[i]],cors=item.cors[[i]])
if(sort) {
ord <- order(abs(item.cors[[i]]),decreasing=TRUE)
result.df[[i]] <- result.df[[i]][ord,]
}
} else {result.df[[i]] <- data.frame(results[[i]])} #results[[i]] <- c(results[[i]],cors= round(cors[select[1:n],i],digits=digits))
if(suppress.names) names(results[[i]]) <- ""
# names(results[i]) <- list.name
}
names(result.df) <- names(keys.list)
return(result.df)}
#this function also calls lookup
#lookup which x's are found in y[c],return matches for y[]
"lookup" <-
function(x,y,criteria=NULL,keep.na=FALSE) {
if (is.null(criteria)) {temp <- match(x,rownames(y))} else {
temp <- match(x,y[,criteria])}
if(any(!is.na(temp))) {
y <- (y[temp[!is.na(temp)],,drop=FALSE]) } else {y <- NA}
return(y)}
lookupFromKeys
calls some other
functions – how do they work?grep
is one of the base
functions for
Pattern Matching and Replacement. (grep or global regular
expression search ) According to its help page “grep, grepl,
regexpr, gregexpr and regexec search for matches to argument pattern
within each element of a character vector: they differ in the format of
and amount of detail in the results.
sub
and gsub
perform replacement of the
first and all matches respectively.”
The call to grep
is:
grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,
fixed = FALSE, useBytes = FALSE, invert = FALSE)
grep(value = FALSE) returns a vector of the indices of the elements of x that yielded a match (or not, for invert = TRUE). This will be an integer vector unless the input is a long vector, when it will be a double vector.
That is, grep
searches the object x
for the
pattern pattern
and returns the value TRUE if it finds it,
FALSE if it does not. If x is a vector, then grep
returns a
vector of where in x it found the pattern.
### demonstrate grep
set.seed(42) #so you get the same results
x <- sample(10,20,replace=TRUE)
x #show the values
## [1] 1 5 1 9 10 4 2 10 1 8 7 4 9 5 4 10 2 3 9 9
grep(10,x) #3 locations have value 10
## [1] 5 8 16
grep(9,x) #and 4 location have value 9
## [1] 4 13 19 20
and the help menu for sub
says
`sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE)'
`sub` and `gsub` return a character vector of the same length and with the same attributes as x (after possible coercion to character). Elements of character vectors x which are not substituted will be returned unchanged (including any declared encoding). If useBytes = FALSE a non-ASCII substituted result will often be in UTF-8 with a marked encoding (e.g., if there is a UTF-8 input, and in a multibyte locale unless fixed = TRUE). Such strings can be re-encoded by enc2native.
sub
and gsub
x
## [1] 1 5 1 9 10 4 2 10 1 8 7 4 9 5 4 10 2 3 9 9
y <- sub(9,"hi",x) #converts to character string
y #but why did it replace all of them?
## [1] "1" "5" "1" "hi" "10" "4" "2" "10" "1" "8" "7" "4" "hi" "5" "4" "10" "2" "3" "hi"
## [20] "hi"
#x was a vector
x.string <- "This is a very long and convoluted sentence. We want to search it for all occasions of `a` and then change them"
sub('a','A', x.string) #just the first one is changed
## [1] "This is A very long and convoluted sentence. We want to search it for all occasions of `a` and then change them"
gsub('a','A', x.string) #they are all changed (*global sub*)
## [1] "This is A very long And convoluted sentence. We wAnt to seArch it for All occAsions of `A` And then chAnge them"
lookup
function from
psych
lookup
## function(x,y,criteria=NULL,keep.na=FALSE) {
## if (is.null(criteria)) {temp <- match(x,rownames(y))} else {
## temp <- match(x,y[,criteria])}
## if(any(!is.na(temp))) {
## y <- (y[temp[!is.na(temp)],,drop=FALSE]) } else {y <- NA}
## return(y)}
This calls match
, another base
function.
According to its help page,
`match` match returns a vector of the positions of (first) matches of its first argument in its second.
%in% is a more intuitive interface as a binary operator, which returns a logical vector indicating if there is a match or not for its left operand.
Try it
x
## [1] 1 5 1 9 10 4 2 10 1 8 7 4 9 5 4 10 2 3 9 9
match(10,x) #yes, 10 is in x (The 5th position)
## [1] 5
10 %in% x #it is still in x
## [1] TRUE
x %in% 10 #and now we get a vector of where in x it is
## [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [17] FALSE FALSE FALSE FALSE
#adjusted 11/15/20 to add correlations if provided
"lookupFromKeys" <- #the function name
function(keys.list,dictionary,n=1,cors = NULL,suppress.names=FALSE,digits=2){ #the parameters
n.scales <- length(keys.list) #dynamically find out how many scales to work on
results <- item.cors <- result.df <- list() #make up some lists to store the output
for(i in 1:n.scales) { #a for loop will repeat this next section
list.name <- names(keys.list[i]) #get the name of the next list
list.i <- keys.list[[i]] #grab the next set of keys
keys <- rep(1,length(list.i))[1:(min(n,length(list.i)))] #create a dummy key of 1s
neg <- grep("-", list.i[1:n]) #find which items are negatively keyed
keys[neg] <- -1 #change those that are negative to -1
select <- sub("-", "", list.i) #drop all - signs from the list.i
results[[i]] <- lookup(select[1:n],dictionary) #find the n items in the dictionary
if(!is.null(rownames(results[[i]])[keys < 0])) rownames(results[[i]])[keys < 0] <- paste0(rownames(results[[i]])[keys<0],"-") #put the negative sign back in
#these next few lines see if we want to add the correlations
if(!is.null(cors)) { item.cors[[i]] <- round(cors[select[1:n],i],digits=digits)
result.df[[i]] <- data.frame(results[[i]],cors=item.cors[[i]])} else {result.df[[i]] <- data.frame(results[[i]])} #results[[i]] <- c(results[[i]],cors= round(cors[select[1:n],i],digits=digits))
if(suppress.names) names(results[[i]]) <- "" #do we want the names of the keys
# names(results[i]) <- list.name
} #end of our loop
names(result.df) <- names(keys.list) #name the resulting data frame
return(result.df)} #return the values found
bfi
items and
dictionarylookupFromKeys(bfi.keys, bfi.dictionary[1:3],n=2)#only show the first 3 columns
## $agree
## ItemLabel Item Giant3
## A1- q_146 Am indifferent to the feelings of others. Cohesion
## A2 q_1162 Inquire about others' well-being. Cohesion
##
## $conscientious
## ItemLabel Item Giant3
## C1 q_124 Am exacting in my work. Stability
## C2 q_530 Continue until everything is perfect. Stability
##
## $extraversion
## ItemLabel Item Giant3
## E1- q_712 Don't talk a lot. Plasticity
## E2- q_901 Find it difficult to approach others. Plasticity
##
## $neuroticism
## ItemLabel Item Giant3
## N1 q_952 Get angry easily. Stability
## N2 q_974 Get irritated easily. Stability
##
## $openness
## ItemLabel Item Giant3
## O1 q_128 Am full of ideas. Plasticity
## O2- q_316 Avoid difficult reading material. Plasticity
It would be nice to show the correlations of these items with their
associated scale. This is done by combining the output from the
scoreItems
or scoreOverlap
function. This was
the example shown earlier.
lookupFromKeys(bfi.keys,dictionary=bfi.dictionary[1:3],cors=bfi.overlap$item.cor,n=5)
## $agree
## ItemLabel Item Giant3 cors
## A3 q_1206 Know how to comfort others. Cohesion 0.72
## A2 q_1162 Inquire about others' well-being. Cohesion 0.69
## A5 q_1419 Make people feel at ease. Cohesion 0.64
## A4 q_1364 Love children. Cohesion 0.50
## A1- q_146 Am indifferent to the feelings of others. Cohesion -0.40
##
## $conscientious
## ItemLabel Item Giant3 cors
## C4- q_626 Do things in a half-way manner. Stability -0.72
## C2 q_530 Continue until everything is perfect. Stability 0.68
## C5- q_1949 Waste my time. Stability -0.65
## C1 q_124 Am exacting in my work. Stability 0.60
## C3 q_619 Do things according to a plan. Stability 0.59
##
## $extraversion
## ItemLabel Item Giant3 cors
## E2- q_901 Find it difficult to approach others. Plasticity -0.76
## E4 q_1410 Make friends easily. Plasticity 0.74
## E3 q_1205 Know how to captivate people. Plasticity 0.66
## E1- q_712 Don't talk a lot. Plasticity -0.64
## E5 q_1768 Take charge. Plasticity 0.60
##
## $neuroticism
## ItemLabel Item Giant3 cors
## N1 q_952 Get angry easily. Stability 0.76
## N2 q_974 Get irritated easily. Stability 0.75
## N3 q_1099 Have frequent mood swings. Stability 0.74
## N4 q_1479 Often feel blue. Stability 0.62
## N5 q_1505 Panic easily. Stability 0.55
##
## $openness
## ItemLabel Item Giant3 cors
## O3 q_492 Carry the conversation to a higher level. Plasticity 0.67
## O1 q_128 Am full of ideas. Plasticity 0.57
## O5- q_1964 Will not probe deeply into a subject. Plasticity -0.57
## O2- q_316 Avoid difficult reading material. Plasticity -0.48
## O4 q_1738 Spend time reflecting on things. Plasticity 0.36
Now that we have seen the output of lookupFromKeys
with
the correlation information, it is appealling to change the function so
that it will sort the items in each scale by the absolute value of the
correlations. To do this, we take our function and change it a little
bit. We will use the order
function.
`order` returns a permutation which rearranges its first argument into ascending or descending order, breaking ties by further arguments. sort.list does the same, using only one argument.
y <- order(x) #find the order
y #show them
## [1] 1 3 9 7 17 18 6 12 15 2 14 11 10 4 13 19 20 5 8 16
x[y] #reorganize x in terms of this order
## [1] 1 1 1 2 2 3 4 4 4 5 5 7 8 9 9 9 9 10 10 10
z <- order(x,decreasing=TRUE)
x[z] #show them in decreasing order
## [1] 10 10 10 9 9 9 9 8 7 5 5 4 4 4 3 2 2 1 1 1
See the examples for how to use these functions to sort data frames,
etc. Also see dfOrder in psychTools
#adjusted 11/15/20 to add correlations if provided
"lookupFromKeys2" <- #the function name (we use this name while testing it)
function(keys.list,dictionary,n=1,cors = NULL,sort=TRUE, suppress.names=FALSE,digits=2){ # add sort to option in the parameter list
n.scales <- length(keys.list) #dynamically find out how many scales to work on
results <- item.cors <- result.df <- list() #make up some lists to store the output
for(i in 1:n.scales) { #a for loop will repeat this next section
list.name <- names(keys.list[i]) #get the name of the next list
list.i <- keys.list[[i]] #grab the next set of keys
keys <- rep(1,length(list.i))[1:(min(n,length(list.i)))] #create a dummy key of 1s
neg <- grep("-", list.i[1:n]) #find which items are negatively keyed
keys[neg] <- -1 #change those that are negative to -1
select <- sub("-", "", list.i) #drop all - signs from the list.i
results[[i]] <- lookup(select[1:n],dictionary) #find the n items in the dictionary
if(!is.null(rownames(results[[i]])[keys < 0])) rownames(results[[i]])[keys < 0] <- paste0(rownames(results[[i]])[keys<0],"-") #put the negative sign back in
#this next few lines sees if we want to add the correlations
if(!is.null(cors)) { item.cors[[i]] <- round(cors[select[1:n],i],digits=digits)
result.df[[i]] <- data.frame(results[[i]],cors=item.cors[[i]])
if(sort) {
ord <- order(abs(item.cors[[i]]),decreasing=TRUE)
result.df[[i]] <- result.df[[i]][ord,]
}
} else {result.df[[i]] <- data.frame(results[[i]])} #results[[i]] <- c(results[[i]],cors= round(cors[select[1:n],i],digits=digits))
if(suppress.names) names(results[[i]]) <- "" #do we want the names of the keys
# names(results[i]) <- list.name
} #end of our loop
names(result.df) <- names(keys.list) #name the resulting data frame
return(result.df)} #return the values found
lookupFromKeys2(bfi.keys, dictionary=bfi.dictionary[1:3], cors=bfi.scores$item.cor,n=5)
## $agree
## ItemLabel Item Giant3 cors
## A3 q_1206 Know how to comfort others. Cohesion 0.76
## A2 q_1162 Inquire about others' well-being. Cohesion 0.73
## A5 q_1419 Make people feel at ease. Cohesion 0.69
## A4 q_1364 Love children. Cohesion 0.65
## A1- q_146 Am indifferent to the feelings of others. Cohesion -0.58
##
## $conscientious
## ItemLabel Item Giant3 cors
## C4- q_626 Do things in a half-way manner. Stability -0.74
## C5- q_1949 Waste my time. Stability -0.72
## C2 q_530 Continue until everything is perfect. Stability 0.70
## C3 q_619 Do things according to a plan. Stability 0.66
## C1 q_124 Am exacting in my work. Stability 0.64
##
## $extraversion
## ItemLabel Item Giant3 cors
## E2- q_901 Find it difficult to approach others. Plasticity -0.78
## E4 q_1410 Make friends easily. Plasticity 0.74
## E1- q_712 Don't talk a lot. Plasticity -0.72
## E3 q_1205 Know how to captivate people. Plasticity 0.68
## E5 q_1768 Take charge. Plasticity 0.64
##
## $neuroticism
## ItemLabel Item Giant3 cors
## N3 q_1099 Have frequent mood swings. Stability 0.81
## N1 q_952 Get angry easily. Stability 0.80
## N2 q_974 Get irritated easily. Stability 0.78
## N4 q_1479 Often feel blue. Stability 0.71
## N5 q_1505 Panic easily. Stability 0.68
##
## $openness
## ItemLabel Item Giant3 cors
## O3 q_492 Carry the conversation to a higher level. Plasticity 0.67
## O5- q_1964 Will not probe deeply into a subject. Plasticity -0.67
## O2- q_316 Avoid difficult reading material. Plasticity -0.65
## O1 q_128 Am full of ideas. Plasticity 0.61
## O4 q_1738 Spend time reflecting on things. Plasticity 0.50
Once you get a basic function to work, it is tempting (addictively?) to keep making it better. This is partly in response to user requests.
In fact, the current version of psychTools
has include
the improved version since 2020.
R <- lowerCor(sai[4:23])
## calm secur tense rgrtf at.es upset wrryn restd anxis cmfrt cnfdn nervs jttry hgh.s
## calm 1.00
## secure 0.59 1.00
## tense -0.49 -0.32 1.00
## regretful -0.19 -0.27 0.33 1.00
## at.ease 0.70 0.61 -0.47 -0.24 1.00
## upset -0.32 -0.35 0.47 0.55 -0.36 1.00
## worrying -0.20 -0.29 0.34 0.46 -0.26 0.45 1.00
## rested 0.34 0.39 -0.22 -0.14 0.42 -0.23 -0.17 1.00
## anxious -0.37 -0.20 0.59 0.26 -0.33 0.33 0.33 -0.07 1.00
## comfortable 0.55 0.56 -0.36 -0.21 0.64 -0.32 -0.21 0.47 -0.21 1.00
## confident 0.36 0.63 -0.17 -0.22 0.47 -0.26 -0.23 0.38 -0.06 0.50 1.00
## nervous -0.38 -0.27 0.63 0.33 -0.37 0.39 0.36 -0.14 0.58 -0.24 -0.14 1.00
## jittery -0.40 -0.18 0.56 0.16 -0.34 0.22 0.18 -0.06 0.58 -0.21 -0.05 0.60 1.00
## high.strung -0.38 -0.18 0.55 0.18 -0.31 0.27 0.24 -0.04 0.54 -0.18 -0.03 0.51 0.63 1.00
## relaxed 0.70 0.54 -0.48 -0.22 0.69 -0.33 -0.24 0.41 -0.34 0.62 0.41 -0.37 -0.39 -0.38
## content 0.52 0.61 -0.32 -0.26 0.60 -0.37 -0.27 0.45 -0.18 0.63 0.56 -0.22 -0.15 -0.17
## worried -0.26 -0.33 0.45 0.49 -0.30 0.52 0.68 -0.20 0.39 -0.25 -0.24 0.49 0.24 0.31
## rattled -0.34 -0.14 0.49 0.16 -0.27 0.22 0.19 0.00 0.52 -0.16 0.00 0.50 0.64 0.62
## joyful 0.29 0.44 -0.17 -0.14 0.39 -0.24 -0.16 0.44 0.00 0.49 0.48 -0.07 0.02 0.02
## pleasant 0.49 0.58 -0.32 -0.23 0.57 -0.38 -0.24 0.49 -0.16 0.63 0.54 -0.20 -0.13 -0.14
## relxd cntnt worrd rttld joyfl plsnt
## relaxed 1.00
## content 0.59 1.00
## worried -0.29 -0.29 1.00
## rattled -0.31 -0.09 0.24 1.00
## joyful 0.35 0.57 -0.17 0.09 1.00
## pleasant 0.53 0.70 -0.27 -0.08 0.67 1.00
corPlot(R,main="Alabama need not come first",cex=.4) #doesn't have any clear order
f2 <- fa(R,2)
## Loading required namespace: GPArotation
Rs <- matSort(R,f2)
corPlot(Rs,xlas=3,main='Sorted by factor loadings', cex=.4)
Lets look at how matSort
works.
#Two alternative names, matSort is the prefered one.
"mat.sort" <- "matSort" <-
function(m,f=NULL) {
if (is.null(f) ) {f <- fa(m) } #we gave it a factor matrix
if(is.list(f) && (!is.null(loadings(f)))) {load <- loadings(f)} else {load <- f}
load <- as.matrix(load)
nitems <- NROW(load)
nfactors <-NCOL(load)
loads <- data.frame(item=seq(1:nitems),cluster=rep(0,nitems),unclass(load))
#first sort them into clusters
#first find the maximum for each row and assign it to that cluster
loads$cluster <- apply(abs(load),1,which.max)
ord <- sort(loads$cluster,index.return=TRUE)
loads[1:nitems,] <- loads[ord$ix,]
rownames(loads)[1:nitems] <- rownames(loads)[ord$ix]
#now sort column wise
#now sort the loadings that have their highest loading on each cluster
items <- table(loads$cluster) #how many items are in each cluster?
first <- 1
item <- loads$item
for (i in 1:length(items)) {
if(items[i] > 0 ) {
last <- first + items[i]- 1
ord <- sort(abs(loads[first:last,i+2]),decreasing=TRUE,index.return=TRUE)
loads[first:last,3:(nfactors+2)] <- load[item[ord$ix+first-1],]
loads[first:last,1] <- item[ord$ix+first-1]
rownames(loads)[first:last] <- rownames(loads)[ord$ix+first-1]
first <- first + items[i] }
}
item.order <- loads[,1]
m <- m[item.order,item.order]
return(m)
}
selectBy
and splitBy
dim(bfi)
## [1] 2800 28
small <- selectBy(bfi, 'gender=2 & age < 35 & education > 3')
dim(small)
## [1] 306 28
headTail(small, from=21) #just show the last 8 columns
## O1 O2 O3 O4 O5 gender education age
## 61693 3 3 2 2 5 2 4 27
## 61700 5 2 5 5 1 2 5 24
## 61723 5 6 3 6 3 2 4 26
## 61754 5 3 3 6 3 2 4 32
## ... ... ... ... ... ... ... ... ...
## 67525 5 6 4 5 1 2 4 30
## 67539 5 4 4 3 4 2 5 33
## 67547 6 1 4 5 2 2 4 24
## 67556 5 1 6 4 3 2 4 29
How does this function work? Look at it. It uses grepl
and other functions.
#written May 20, 2023
#copied the code from source
"selectBy" <- function(x,by) {#use a quasi formula input
by <- gsub(" ","", by) #this removes the spaces
if(grepl("\\|",by)) { AND <- FALSE
bb <- unlist(strsplit(by,"\\|"))} else { #note to search for a | we have to escape it!
AND <- TRUE
if(grepl(",",by)) {
bb <- unlist(strsplit(by,","))} else {
bb <- unlist(strsplit(by,"&"))} }
n.by <- length(bb)
by <- isvalue <- notvalue <- lessthan <- morethan <- matrix(NA,ncol= n.by)
eq <- grep("=",bb) #find which operators were used
lt <- grep("<",bb) #returns a vector
gr <- grep(">",bb)
ne <- grep("!=",bb)
#prepare the relevant search parameters
if(length(eq ) >0) {temp <- unlist(strsplit(bb[eq],"="))
by[eq] <- temp[1]
isvalue[eq] <-as.numeric( temp[2])
}
if( length(lt) >0) {temp <- unlist(strsplit(bb[lt],"<"))
by[lt] <- temp[1]
lessthan[lt] <- as.numeric( temp[2])
}
if(length(gr) >0) {temp <- unlist(strsplit(bb[gr],">"))
by[gr] <- temp[1]
morethan[gr] <- as.numeric( temp[2])
}
if(length(ne) >0) {temp <- unlist(strsplit(bb[eq],"!="))
by[ne] <- temp[1]
notvalue[ne] <-as.numeric( temp[2])
}
#make sure that the variable names are correct
if(!all(by %in% colnames(x))) {
cat("\n Offending variables are ",by[!by %in% colnames(x) ],"\n")
stop("Variables specified do not match the variables in the data. \nFix the names and try again")}
#do this on y which serves as pointers to x, rather than x #then combine the pointers for & (and) | (or)
y <- matrix(TRUE,nrow=NROW(x), ncol=n.by)
for(i in 1:length(by)) {
if(!is.na(isvalue[,i])) y[,i] <- x[,by[i]]==isvalue[i]
if(!is.na(notvalue[,i])) y[,i] <- (x[,by[i]]!= notvalue[i])
if(!is.na(lessthan[,i])) y[,i] <- (x[,by[i]]< lessthan[i])
if(!is.na(morethan[,i])) y[,i] <- (x[,by[i]] > morethan[i])
}
if(AND ) {y <-apply(y,1,all)} else {y <- apply(y,1,any)}
y[is.na(y) ] <- FALSE
return(x[y,])
}
Another function, using much of the same logic as
selectBy
is to convert specified variables into dichotomous
(0,1) codes. This is splitBy
. Note that the programming
took advantage of the code in selectBy
.
#written May 20, 2023
"splitBy" <- function(x,by,new=FALSE) {#use a quasi formula input
by <- gsub(" ","", by) #this removes the spaces
bb <- unlist(strsplit(by,","))
n.by <- length(bb)
by <- isvalue <- notvalue <- lessthan <- morethan <- matrix(NA,ncol= n.by)
eq <- grep("=",bb) #find which operators were used
lt <- grep("<",bb) #returns a vector
gr <- grep(">",bb)
ne <- grep("!=",bb)
#prepare the relevant search parameters
if(length(eq ) >0) {temp <- unlist(strsplit(bb[eq],"="))
by[eq] <- temp[1]
isvalue[eq] <-as.numeric( temp[2])
}
if( length(lt) >0) {temp <- unlist(strsplit(bb[lt],"<"))
by[lt] <- temp[1]
lessthan[lt] <- as.numeric( temp[2])
}
if(length(gr) >0) {temp <- unlist(strsplit(bb[gr],">"))
by[gr] <- temp[1]
morethan[gr] <- as.numeric( temp[2])
}
if(length(ne) >0) {temp <- unlist(strsplit(bb[eq],"!="))
by[ne] <- temp[1]
notvalue[ne] <-as.numeric( temp[2])
}
#do this on y which serves as pointers to x, rather than x #then combine the pointers for & (and) | (or)
if(!all(by %in% colnames(x))) {
cat("\n Offending variables are ",by[!by %in% colnames(x) ],"\n")
stop("Variables specified do not match the variables in the data. \nFix the names and try again")}
y <- matrix(TRUE,nrow=NROW(x), ncol=n.by)
colnames(y) <- paste0(c(by),"2")
for(i in 1:length(by)) { #check each value as true or false
if(!is.na(isvalue[,i])) y[,i] <- x[,by[i]]==isvalue[i]
if(!is.na(notvalue[,i])) y[,i] <- (x[,by[i]]!= notvalue[i])
if(!is.na(lessthan[,i])) y[,i] <- (x[,by[i]]< lessthan[i])
if(!is.na(morethan[,i])) y[,i] <- (x[,by[i]] > morethan[i])
}
#convert to numeric
y <- y +0 #convert TRUE FALSE to numeric
if(new){return(y)} else {return(cbind(x,y))}
}
Try this function, on the bfi
data set again.
Dichotomize the age variable into high and low, and education as high
and low.
bfi2 <- splitBy(bfi,'age < 25, education > 3')
lowerCor(bfi2[26:30]) #note that age2 1 means 1
## gendr edctn age age2 edct2
## gender 1.00
## education 0.01 1.00
## age 0.05 0.24 1.00
## age2 -0.05 -0.29 -0.75 1.00
## education2 -0.04 0.81 0.28 -0.31 1.00
describe(bfi2[26:30],skew=FALSE,range=FALSE)
## vars n mean sd se
## gender 1 2800 1.67 0.47 0.01
## education 2 2577 3.19 1.11 0.02
## age 3 2800 28.78 11.13 0.21
## age2 4 2800 0.46 0.50 0.01
## education2 5 2577 0.32 0.46 0.01
% df2latex % dd