psych and psychTools packages have been updated to 2.4.4

Two new functions have been added to psychTools. These are nice examples for today’s class.

Check the news to see what has changed.

packageDate("psychTools") #should be >= 05/10/24

## [1] "2024-05-10"

packageDate("psych" # should be >= 05/17/24
)

## [1] "2024-05-17"

news(Version =="2.4.4",package="psych")

##                  Changes in version 2.4.4 (2024-04-03)                  
## 
## Introduction
## 
##     o   The psych package includes functions and data sets to do
##  classic and modern psychometrics and to analyze personality and
##  experimental psychological data sets. The psych package has
##  been developed as a supplement to courses in research methods
##  in psychology, personality research, and graduate level
##  psychometric theory courses. The functions are a supplement to
##  the text (in progress): An introduction to psychometric theory
##  with applications in R. The functions are also written to
##  further research in the Personality-Motivation-Cognition
##  Laboratory at Northwestern University.
## 
##     o   Additional functions are added sporadically.
## 
##     o   This news file reports changes that have been made as the
##  package has been developed.
## 
##     o   To report bugs, send email to <mailto:revelle@northwestern.edu>
##  using bug.report.
## 
##     o   Version 2.4.4 is the development release of the psych package.
##  It is available as a source file for Macs or PCs in the
##  repository at <https://personality-project.org/r/>. The
##  released version on CRAN is 2.4.3 (Added March 15, 2024). The
##  second digit reflects the year (i.e., 2024), the third set the
##  month (i.e., 2.3.12 was released in December of 2023 the last
##  two digits of development versions reflect either an minor
##  change or the day of any modifications, e.g. 1.8.3.3 was the
##  third attempt to get 1.8.3 released.  1.7.8 was released in
##  August, 2017.
## 
##     o   To install this development version, use the command:
##  install.packages("psych",repos="https://personality-project.org/r",type="source").
##  
##  Remember to restart R and library(psych) to make the new
##  version active.  It is also useful to check the date of the
##  development package, as I will use the same version number but
##  change the dates (e.g., packageDate("psych")).  The date of the
##  CRAN version reflects when it was submitted.
## 
## Still to do
## 
##     o   Change examples to dontrun rather than commented out ## mostly
##  done
## 
##     o   Add the ability to get factor scores for higher order factors
##  (requested by Brandon Vaughn). Partly done in omega, but there
##  are problems associated with the entire problem of factor
##  scores.
## 
##     o   Fix a bug in alpha when finding average R (need to use cor
##  rather than cov2cor when treating missing data)
## 
##     o   Add weight option to statsBy (but the question is how to handle
##  missing data)
## 
##     o   Adjust the se for correlations (in bestScales) to take into
##  account the pairwise count, rather than the n.obs
## 
##     o   Improve documentation for thurstone.  Would help if we had an
##  example of scaling.
## 
##     o   scoreIrt.2pl needs to be able to adjust for group differences
##  correctly
## 
##     o   Possible problem in pooled correlations for statsBy
## 
##     o   Think about improvements in itemLookup to show means if desired
## 
##     o   Fix bug in polychoric where it doesn't handle empty cells very
##  well. (reported by Björn Büdenbender).  Or at least improve the
##  documentation about why this is a problem.
## 
##     o   Fix bug in mediate for multiple dependent variables (reported
##  by Martin Zeschke)
## 
##     o   scoreItem should find the mean item correlation with na.rm=TRUE
##  (Liz D)
## 
##     o   create a scoreFastBy
## 
##     o   improve the help menu for sim.multi and return the latent
##  values
## 
##     o   think about linking person scores to item difficulty scores in
##  IRT
## 
##     o   Consider using the Jennrich approach to standard errors of
##  rotated loadings
## 
##     o   Still need to fix mediate to handle more than 1 DVs
## 
##     o   Possible problem with the use of try in several functions.
## 
##     o   Need to fix omega so that alpha is correctly calculated in case
##  of no general factor
## 
##     o   allow omega style hierarchical structure for x (and y) in esem
## 
## Additions
## 
##     o   Finally added Left justification to dictionary content in
##  lookupItems and in bestScales
## 
##     o   Added the ability to show probability stars in corPlot.
## 
##     o   Added a short function (r2p) to convert r to p values.
## 
##     o   Added the pval option to cor2
## 
##     o   Modified fa.sort to include cor2 output
## 
##     o   Added the score option to scoreOverlap so that if given raw
##  data, it will find the scale scores as well as the corrected
##  for overlap statistics.  Note that the scale scores are not
##  corrected for overlap.
## 
##     o   Minor tweaks to scoreIrt.2pl to check for bad data (missing
##  cells ).  This was leading to the function hanging forever in
##  the case of pairwise counts of 0.
## 
##     o   Added fix.dlpyr (as a local function) to statsBy and describeBy
##  to not choke on Tibbles. Had added this to alpha several years
##  ago.
## 
##     o   Added itemSort to combine item means and item content and then
##  sort them.
## 
##     o   Added the median option to describe with the fast=TRUE option
##  (requested by David Condon)
## 
##     o   Parallelized describe.  Now will describe 644K subjects with
##  6633 variables in 64 seconds (with fast=TRUE option).
##  Improvement in speed varies by numbers of cores.
## 
##     o   Parallelized bigCor for very significant savings.
## 
##     o   Added the hist.border option to pairs.panels following a
##  request by Jordan Adamson.
## 
##     o   Added the w.exp (weighting exponent to cohen.kappa following a
##  request from Julius Pfadt)
## 
##     o   Added n.obs to unidim to properly find statistics from
##  correlation matrices
## 
##     o   Added a number of fit statistics to unidim
## 
##     o   Added the ability to find Spearman correlations to bigCor.
##  Added a check that the data are a data.frame rather than a
##  data.table for bigCor.
## 
##     o   Changed the class of describe output to be c("psych",
##  "Pdescribe" ,"data.frame") to avoid choking if Hmisc is in use
## 
##     o   Added the ability to use mean.weights in statsBy (still under
##  testing)
## 
##     o   Added paSelect to allow parallel analyses of subsets of items
##  using a keys list.
## 
##     o   Added vssSelect to allow VSS analyses of subsets of items using
##  a kes list.
## 
## Bugs fixed
## 
##     o   changed the number of elements in the unit.Circle in
##  spider/radar to get more precision
## 
##     o   Added a conversion to numeric in mediate to get around a
##  problem with variables that are factors. (Problem reported by
##  Lynn Bueeler )
## 
##     o   Finally fixed bug in alpha where it finds R from cov2cor
##  instead of directly (reported by Karl Ove Hufthammer )
## 
##     o   Fixed item.lookup so that it will report item means as well as
##  factor loadings.
## 
##     o   Strange bug introduced into score.multiple.choice by qgraph
##  reported by Michael Truong.  Fixed by changing class of
##  describe object used in score.multiple.choice to just be a
##  data.frame.
## 
##     o   Fixed df2latex for short names not labeling h2 (reported by
##  Alex Weiss )
## 
##     o   Fixed a problem in polychoric which was leading to NaN values
##  in rare cases due to a precision problem in cumsum of
##  frequencies (Reported by Esmerala Ruiz Pujada with a nice data
##  set that led to the problem.)
## 
##     o   Fixed a minor problem in print_psych.bestScales for
##  dictionaries with mislabeled columns.
## 
##     o   Fixed alpha to properly report std.alpha in case of a keys
##  vector (reported by Benjamin Bui)

# I have already installed this so I am commenting it out
#install.packages("psych",repos="https://personality-project.org/r",type="source")

Make psych and psychTools active.

library(psych)
library(psychTools)

Merging and file manipulation

It is frequently necessary to combine information from different files into one unified object. The lookupFromKeys function is an example of this.

We have a list of item scoring keys (a keyslist) which tells us which items are to be scored from a data file. We have another file, an item dictionary which has the content of the items given. We have yet another file which may be created dynamically using either scoreItems or scoreOverlap and is the correlation of each item with each scale. We want to examine the item content of each scale, and include the item by scale correlation. Thus, we need to combine the keys, the dictionary, and the correlations. Lets first look at the result, and then examine how the function works.

bfi.keys  #show the keys for the bfi

## $agree
## [1] "-A1" "A2"  "A3"  "A4"  "A5" 
## 
## $conscientious
## [1] "C1"  "C2"  "C3"  "-C4" "-C5"
## 
## $extraversion
## [1] "-E1" "-E2" "E3"  "E4"  "E5" 
## 
## $neuroticism
## [1] "N1" "N2" "N3" "N4" "N5"
## 
## $openness
## [1] "O1"  "-O2" "O3"  "O4"  "-O5"

bfi.scores <- scoreItems(bfi.keys,bfi)#find the scores
names(bfi.scores)  #what are the objects included?

##  [1] "scores"         "missing"        "alpha"          "av.r"           "sn"            
##  [6] "n.items"        "item.cor"       "cor"            "corrected"      "G6"            
## [11] "item.corrected" "response.freq"  "raw"            "ase"            "med.r"         
## [16] "keys"           "MIMS"           "MIMT"           "Call"

bfi.overlap <- scoreOverlap(bfi.keys,bfi)#find the correlations corrected for overlap
#compare the two sets of correlations
combined.df <- data.frame(raw=bfi.scores$item.cor,corrected= bfi.overlap$item.cor)
difference <- combined.df[1:5] - combined.df[6:10]
round(difference,2) #show the differences

##    raw.agree raw.conscientious raw.extraversion raw.neuroticism raw.openness
## A1     -0.17              0.01             0.02           -0.02         0.04
## A2      0.04             -0.06            -0.09            0.01        -0.07
## A3      0.04             -0.05            -0.11            0.01        -0.07
## A4      0.16             -0.07            -0.07            0.02        -0.01
## A5      0.05             -0.06            -0.13            0.02        -0.07
## C1     -0.03              0.04            -0.05            0.01        -0.09
## C2     -0.04              0.02            -0.05            0.00        -0.07
## C3     -0.04              0.07            -0.04            0.01        -0.03
## C4      0.04             -0.01             0.05           -0.03         0.05
## C5      0.04             -0.07             0.07           -0.04         0.02
## E1      0.06              0.01            -0.08           -0.01         0.05
## E2      0.08              0.05            -0.02           -0.03         0.05
## E3     -0.09             -0.05             0.02            0.01        -0.13
## E4     -0.10             -0.05             0.00            0.02        -0.03
## E5     -0.07             -0.10             0.04            0.01        -0.11
## N1      0.04              0.04             0.02            0.03         0.02
## N2      0.04              0.03             0.02            0.04         0.01
## N3      0.02              0.04             0.03            0.07         0.00
## N4      0.04              0.06             0.08            0.09         0.00
## N5      0.01              0.03             0.04            0.13         0.04
## O1     -0.04             -0.05            -0.08            0.01         0.04
## O2      0.00              0.04             0.02           -0.02        -0.18
## O3     -0.06             -0.06            -0.10            0.01         0.00
## O4     -0.02             -0.01             0.01           -0.02         0.14
## O5      0.02              0.04             0.02           -0.01        -0.10

#now, lookup the items from the dictionary and combine them with the correlations
lookupFromKeys(bfi.keys,dictionary=bfi.dictionary[1:3],cors=bfi.overlap$item.cor,n=5)

## $agree
##     ItemLabel                                      Item   Giant3  cors
## A3     q_1206 Know how to comfort others.               Cohesion  0.72
## A2     q_1162 Inquire about others' well-being.         Cohesion  0.69
## A5     q_1419 Make people feel at ease.                 Cohesion  0.64
## A4     q_1364 Love children.                            Cohesion  0.50
## A1-    q_146  Am indifferent to the feelings of others. Cohesion -0.40
## 
## $conscientious
##     ItemLabel                                  Item    Giant3  cors
## C4-    q_626  Do things in a half-way manner.       Stability -0.72
## C2     q_530  Continue until everything is perfect. Stability  0.68
## C5-    q_1949 Waste my time.                        Stability -0.65
## C1     q_124  Am exacting in my work.               Stability  0.60
## C3     q_619  Do things according to a plan.        Stability  0.59
## 
## $extraversion
##     ItemLabel                                  Item     Giant3  cors
## E2-    q_901  Find it difficult to approach others. Plasticity -0.76
## E4     q_1410 Make friends easily.                  Plasticity  0.74
## E3     q_1205 Know how to captivate people.         Plasticity  0.66
## E1-    q_712  Don't talk a lot.                     Plasticity -0.64
## E5     q_1768 Take charge.                          Plasticity  0.60
## 
## $neuroticism
##    ItemLabel                       Item    Giant3 cors
## N1    q_952  Get angry easily.          Stability 0.76
## N2    q_974  Get irritated easily.      Stability 0.75
## N3    q_1099 Have frequent mood swings. Stability 0.74
## N4    q_1479 Often feel blue.           Stability 0.62
## N5    q_1505 Panic easily.              Stability 0.55
## 
## $openness
##     ItemLabel                                      Item     Giant3  cors
## O3     q_492  Carry the conversation to a higher level. Plasticity  0.67
## O1     q_128  Am full of ideas.                         Plasticity  0.57
## O5-    q_1964 Will not probe deeply into a subject.     Plasticity -0.57
## O2-    q_316  Avoid difficult reading material.         Plasticity -0.48
## O4     q_1738 Spend time reflecting on things.          Plasticity  0.36

How does this function work? Lets examine it

Just ask for it by name. This unfortunately does not include the comments. I have copied the function from the source file to include some comments.

#adjusted 11/15/20 to add correlations if provided
 "lookupFromKeys" <- 
function(keys.list,dictionary,n=20,cors = NULL,sort=TRUE,suppress.names=FALSE,digits=2){
n.scales <- length(keys.list)
results <- item.cors <- result.df <- list()
for(i in 1:n.scales) {
  list.name <- names(keys.list[i])
  list.i <- keys.list[[i]]
   keys <- rep(1,length(list.i))[1:(min(n,length(list.i)))]
    neg <- grep("-", list.i[1:(min(n,length(list.i)))]) #find the negative keyed items
    keys[neg] <- -1
  select <- sub("-", "", list.i) #get rid of negative signs
  results[[i]] <- lookup(select[1:(min(n,length(list.i)))],dictionary)
  
 if(!is.null(rownames(results[[i]])[keys < 0]))  rownames(results[[i]])[keys < 0] <- paste0(rownames(results[[i]])[keys<0],"-") #put them in

 if(!is.null(cors)) { item.cors[[i]] <- round(cors[select[1:(min(n,length(select)))],i],digits=digits)
 result.df[[i]] <- data.frame(results[[i]],cors=item.cors[[i]])
 if(sort) {
      ord <- order(abs(item.cors[[i]]),decreasing=TRUE)
      result.df[[i]] <- result.df[[i]][ord,]
 }
 } else {result.df[[i]] <- data.frame(results[[i]])} #results[[i]] <- c(results[[i]],cors= round(cors[select[1:n],i],digits=digits))
  if(suppress.names) names(results[[i]]) <- ""
 
 # names(results[i]) <- list.name
}
names(result.df) <- names(keys.list)
return(result.df)} 


#this function also calls lookup

  #lookup which x's are found in y[c],return matches for y[]
 
 "lookup" <- 
function(x,y,criteria=NULL,keep.na=FALSE) {
if (is.null(criteria)) {temp <- match(x,rownames(y))} else {
     temp <- match(x,y[,criteria])}
     if(any(!is.na(temp))) {    
 y <- (y[temp[!is.na(temp)],,drop=FALSE]) } else {y <- NA}
return(y)}

We notice that `lookupFromKeys` calls some other functions – how do they work?

grep is one of the base functions for Pattern Matching and Replacement. (grep or global regular expression search ) According to its help page “grep, grepl, regexpr, gregexpr and regexec search for matches to argument pattern within each element of a character vector: they differ in the format of and amount of detail in the results.

sub and gsub perform replacement of the first and all matches respectively.”

The call to grep is:

 grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,
 fixed = FALSE, useBytes = FALSE, invert = FALSE)
 
 grep(value = FALSE) returns a vector of the indices of the elements of x that yielded a match (or not, for invert = TRUE). This will be an integer vector unless the input is a long vector, when it will be a double vector.

That is, grep searches the object x for the pattern pattern and returns the value TRUE if it finds it, FALSE if it does not. If x is a vector, then grep returns a vector of where in x it found the pattern.

### demonstrate grep

 set.seed(42)  #so you get the same results
  x <- sample(10,20,replace=TRUE)
  x  #show the values

##  [1]  1  5  1  9 10  4  2 10  1  8  7  4  9  5  4 10  2  3  9  9

  grep(10,x)    #3 locations have value 10

## [1]  5  8 16

  grep(9,x)     #and 4 location have value 9

## [1]  4 13 19 20

and the help menu for sub says

 `sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE)'

`sub` and `gsub` return a character vector of the same length and with the same attributes as x (after possible coercion to character). Elements of character vectors x which are not substituted will be returned unchanged (including any declared encoding). If useBytes = FALSE a non-ASCII substituted result will often be in UTF-8 with a marked encoding (e.g., if there is a UTF-8 input, and in a multibyte locale unless fixed = TRUE). Such strings can be re-encoded by enc2native.

show the use of `sub` and `gsub`

##  [1]  1  5  1  9 10  4  2 10  1  8  7  4  9  5  4 10  2  3  9  9

y <- sub(9,"hi",x)  #converts to character string
y   #but why did it replace all of them?

##  [1] "1"  "5"  "1"  "hi" "10" "4"  "2"  "10" "1"  "8"  "7"  "4"  "hi" "5"  "4"  "10" "2"  "3"  "hi"
## [20] "hi"

#x was a vector
x.string <- "This is a very long and convoluted sentence.  We want to search it for all occasions of `a` and then change them"
sub('a','A', x.string)  #just the first one is changed

## [1] "This is A very long and convoluted sentence.  We want to search it for all occasions of `a` and then change them"

gsub('a','A', x.string) #they are all changed (*global sub*)

## [1] "This is A very long And convoluted sentence.  We wAnt to seArch it for All occAsions of `A` And then chAnge them"

We also use the `lookup` function from `psych`

lookup

## function(x,y,criteria=NULL,keep.na=FALSE) {
## if (is.null(criteria)) {temp <- match(x,rownames(y))} else {
##      temp <- match(x,y[,criteria])}
##      if(any(!is.na(temp))) { 
##  y <- (y[temp[!is.na(temp)],,drop=FALSE]) } else {y <- NA}
## return(y)}

This calls match, another base function. According to its help page,

 `match` match returns a vector of the positions of (first) matches of its first argument in its second.

 %in% is a more intuitive interface as a binary operator, which returns a logical vector indicating if there is a match or not for its left operand.

Try it

##  [1]  1  5  1  9 10  4  2 10  1  8  7  4  9  5  4 10  2  3  9  9

match(10,x)   #yes, 10 is in x (The 5th position)

## [1] 5

10 %in% x     #it is still in x

## [1] TRUE

x %in% 10   #and now we get a vector of where in x it is

##  [1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
## [17] FALSE FALSE FALSE FALSE

Here is the function with comments

#adjusted 11/15/20 to add correlations if provided
 "lookupFromKeys" <-   #the function name
function(keys.list,dictionary,n=1,cors = NULL,suppress.names=FALSE,digits=2){ #the parameters
n.scales <- length(keys.list)  #dynamically find out how many scales to work on
results <- item.cors <- result.df <- list()  #make up some lists to store the output
for(i in 1:n.scales) {   #a for loop will repeat this next section 
  list.name <- names(keys.list[i])  #get the name of the next list
  list.i <- keys.list[[i]]  #grab the next set of keys
   keys <- rep(1,length(list.i))[1:(min(n,length(list.i)))]  #create a dummy key of 1s
    neg <- grep("-", list.i[1:n])  #find which items are negatively keyed
    keys[neg] <- -1    #change those that are negative to -1
  select <- sub("-", "", list.i)   #drop all - signs from the list.i
  results[[i]] <- lookup(select[1:n],dictionary)  #find the n items in the dictionary
 if(!is.null(rownames(results[[i]])[keys < 0]))  rownames(results[[i]])[keys < 0] <- paste0(rownames(results[[i]])[keys<0],"-")  #put the negative sign back in
#these next few lines see if we want to add the correlations
 if(!is.null(cors)) { item.cors[[i]] <- round(cors[select[1:n],i],digits=digits)
 result.df[[i]] <- data.frame(results[[i]],cors=item.cors[[i]])} else {result.df[[i]] <- data.frame(results[[i]])} #results[[i]] <- c(results[[i]],cors= round(cors[select[1:n],i],digits=digits))
  if(suppress.names) names(results[[i]]) <- ""   #do we want the names of the keys
 
 # names(results[i]) <- list.name
}    #end of our loop
names(result.df) <- names(keys.list)  #name the resulting data frame
return(result.df)}   #return the values found

Show this function with the `bfi` items and dictionary

lookupFromKeys(bfi.keys, bfi.dictionary[1:3],n=2)#only show the first 3 columns

## $agree
##     ItemLabel                                      Item   Giant3
## A1-     q_146 Am indifferent to the feelings of others. Cohesion
## A2     q_1162         Inquire about others' well-being. Cohesion
## 
## $conscientious
##    ItemLabel                                  Item    Giant3
## C1     q_124               Am exacting in my work. Stability
## C2     q_530 Continue until everything is perfect. Stability
## 
## $extraversion
##     ItemLabel                                  Item     Giant3
## E1-     q_712                     Don't talk a lot. Plasticity
## E2-     q_901 Find it difficult to approach others. Plasticity
## 
## $neuroticism
##    ItemLabel                  Item    Giant3
## N1     q_952     Get angry easily. Stability
## N2     q_974 Get irritated easily. Stability
## 
## $openness
##     ItemLabel                              Item     Giant3
## O1      q_128                 Am full of ideas. Plasticity
## O2-     q_316 Avoid difficult reading material. Plasticity

An improvement added two years ago (suggested by See Young Myaeng)

It would be nice to show the correlations of these items with their associated scale. This is done by combining the output from the scoreItems or scoreOverlap function. This was the example shown earlier.

lookupFromKeys(bfi.keys,dictionary=bfi.dictionary[1:3],cors=bfi.overlap$item.cor,n=5)

## $agree
##     ItemLabel                                      Item   Giant3  cors
## A3     q_1206               Know how to comfort others. Cohesion  0.72
## A2     q_1162         Inquire about others' well-being. Cohesion  0.69
## A5     q_1419                 Make people feel at ease. Cohesion  0.64
## A4     q_1364                            Love children. Cohesion  0.50
## A1-     q_146 Am indifferent to the feelings of others. Cohesion -0.40
## 
## $conscientious
##     ItemLabel                                  Item    Giant3  cors
## C4-     q_626       Do things in a half-way manner. Stability -0.72
## C2      q_530 Continue until everything is perfect. Stability  0.68
## C5-    q_1949                        Waste my time. Stability -0.65
## C1      q_124               Am exacting in my work. Stability  0.60
## C3      q_619        Do things according to a plan. Stability  0.59
## 
## $extraversion
##     ItemLabel                                  Item     Giant3  cors
## E2-     q_901 Find it difficult to approach others. Plasticity -0.76
## E4     q_1410                  Make friends easily. Plasticity  0.74
## E3     q_1205         Know how to captivate people. Plasticity  0.66
## E1-     q_712                     Don't talk a lot. Plasticity -0.64
## E5     q_1768                          Take charge. Plasticity  0.60
## 
## $neuroticism
##    ItemLabel                       Item    Giant3 cors
## N1     q_952          Get angry easily. Stability 0.76
## N2     q_974      Get irritated easily. Stability 0.75
## N3    q_1099 Have frequent mood swings. Stability 0.74
## N4    q_1479           Often feel blue. Stability 0.62
## N5    q_1505              Panic easily. Stability 0.55
## 
## $openness
##     ItemLabel                                      Item     Giant3  cors
## O3      q_492 Carry the conversation to a higher level. Plasticity  0.67
## O1      q_128                         Am full of ideas. Plasticity  0.57
## O5-    q_1964     Will not probe deeply into a subject. Plasticity -0.57
## O2-     q_316         Avoid difficult reading material. Plasticity -0.48
## O4     q_1738          Spend time reflecting on things. Plasticity  0.36

Another way to improve the function

Now that we have seen the output of lookupFromKeys with the correlation information, it is appealling to change the function so that it will sort the items in each scale by the absolute value of the correlations. To do this, we take our function and change it a little bit. We will use the order function.

 `order` returns a permutation which rearranges its first argument into ascending or descending order, breaking ties by further arguments. sort.list does the same, using only one argument.

y <- order(x)   #find the order
y   #show them

##  [1]  1  3  9  7 17 18  6 12 15  2 14 11 10  4 13 19 20  5  8 16

x[y]  #reorganize x in terms of this order

##  [1]  1  1  1  2  2  3  4  4  4  5  5  7  8  9  9  9  9 10 10 10

z <- order(x,decreasing=TRUE)
x[z]  #show them in decreasing order

##  [1] 10 10 10  9  9  9  9  8  7  5  5  4  4  4  3  2  2  1  1  1

See the examples for how to use these functions to sort data frames, etc. Also see dfOrder in psychTools

#adjusted 11/15/20 to add correlations if provided
 "lookupFromKeys2" <-   #the function name  (we use this name while  testing it)
function(keys.list,dictionary,n=1,cors = NULL,sort=TRUE, suppress.names=FALSE,digits=2){ # add sort to option in the parameter list
n.scales <- length(keys.list)  #dynamically find out how many scales to work on
results <- item.cors <- result.df <- list()  #make up some lists to store the output
for(i in 1:n.scales) {   #a for loop will repeat this next section 
  list.name <- names(keys.list[i])  #get the name of the next list
  list.i <- keys.list[[i]]  #grab the next set of keys
   keys <- rep(1,length(list.i))[1:(min(n,length(list.i)))]  #create a dummy key of 1s
    neg <- grep("-", list.i[1:n])  #find which items are negatively keyed
    keys[neg] <- -1    #change those that are negative to -1
  select <- sub("-", "", list.i)   #drop all - signs from the list.i
  results[[i]] <- lookup(select[1:n],dictionary)  #find the n items in the dictionary
 if(!is.null(rownames(results[[i]])[keys < 0]))  rownames(results[[i]])[keys < 0] <- paste0(rownames(results[[i]])[keys<0],"-")  #put the negative sign back in

  #this next few lines sees if we want to add the correlations
 if(!is.null(cors)) { item.cors[[i]] <- round(cors[select[1:n],i],digits=digits)
 result.df[[i]] <- data.frame(results[[i]],cors=item.cors[[i]])
 if(sort) {
      ord <- order(abs(item.cors[[i]]),decreasing=TRUE)
      result.df[[i]] <- result.df[[i]][ord,]
 }
 
 } else {result.df[[i]] <- data.frame(results[[i]])} #results[[i]] <- c(results[[i]],cors= round(cors[select[1:n],i],digits=digits))
  if(suppress.names) names(results[[i]]) <- ""   #do we want the names of the keys
 
 # names(results[i]) <- list.name
}    #end of our loop
names(result.df) <- names(keys.list)  #name the resulting data frame
return(result.df)}   #return the values found

Now use this improved function on our example

lookupFromKeys2(bfi.keys, dictionary=bfi.dictionary[1:3], cors=bfi.scores$item.cor,n=5)

## $agree
##     ItemLabel                                      Item   Giant3  cors
## A3     q_1206               Know how to comfort others. Cohesion  0.76
## A2     q_1162         Inquire about others' well-being. Cohesion  0.73
## A5     q_1419                 Make people feel at ease. Cohesion  0.69
## A4     q_1364                            Love children. Cohesion  0.65
## A1-     q_146 Am indifferent to the feelings of others. Cohesion -0.58
## 
## $conscientious
##     ItemLabel                                  Item    Giant3  cors
## C4-     q_626       Do things in a half-way manner. Stability -0.74
## C5-    q_1949                        Waste my time. Stability -0.72
## C2      q_530 Continue until everything is perfect. Stability  0.70
## C3      q_619        Do things according to a plan. Stability  0.66
## C1      q_124               Am exacting in my work. Stability  0.64
## 
## $extraversion
##     ItemLabel                                  Item     Giant3  cors
## E2-     q_901 Find it difficult to approach others. Plasticity -0.78
## E4     q_1410                  Make friends easily. Plasticity  0.74
## E1-     q_712                     Don't talk a lot. Plasticity -0.72
## E3     q_1205         Know how to captivate people. Plasticity  0.68
## E5     q_1768                          Take charge. Plasticity  0.64
## 
## $neuroticism
##    ItemLabel                       Item    Giant3 cors
## N3    q_1099 Have frequent mood swings. Stability 0.81
## N1     q_952          Get angry easily. Stability 0.80
## N2     q_974      Get irritated easily. Stability 0.78
## N4    q_1479           Often feel blue. Stability 0.71
## N5    q_1505              Panic easily. Stability 0.68
## 
## $openness
##     ItemLabel                                      Item     Giant3  cors
## O3      q_492 Carry the conversation to a higher level. Plasticity  0.67
## O5-    q_1964     Will not probe deeply into a subject. Plasticity -0.67
## O2-     q_316         Avoid difficult reading material. Plasticity -0.65
## O1      q_128                         Am full of ideas. Plasticity  0.61
## O4     q_1738          Spend time reflecting on things. Plasticity  0.50

Are there other ways we can improve this function?

Once you get a basic function to work, it is tempting (addictively?) to keep making it better. This is partly in response to user requests.

In fact, the current version of psychTools has include the improved version since 2020.

More useful functions for data presentation

matSort will organize a correlation matrix by some structural variable

R <- lowerCor(sai[4:23])

##             calm  secur tense rgrtf at.es upset wrryn restd anxis cmfrt cnfdn nervs jttry hgh.s
## calm         1.00                                                                              
## secure       0.59  1.00                                                                        
## tense       -0.49 -0.32  1.00                                                                  
## regretful   -0.19 -0.27  0.33  1.00                                                            
## at.ease      0.70  0.61 -0.47 -0.24  1.00                                                      
## upset       -0.32 -0.35  0.47  0.55 -0.36  1.00                                                
## worrying    -0.20 -0.29  0.34  0.46 -0.26  0.45  1.00                                          
## rested       0.34  0.39 -0.22 -0.14  0.42 -0.23 -0.17  1.00                                    
## anxious     -0.37 -0.20  0.59  0.26 -0.33  0.33  0.33 -0.07  1.00                              
## comfortable  0.55  0.56 -0.36 -0.21  0.64 -0.32 -0.21  0.47 -0.21  1.00                        
## confident    0.36  0.63 -0.17 -0.22  0.47 -0.26 -0.23  0.38 -0.06  0.50  1.00                  
## nervous     -0.38 -0.27  0.63  0.33 -0.37  0.39  0.36 -0.14  0.58 -0.24 -0.14  1.00            
## jittery     -0.40 -0.18  0.56  0.16 -0.34  0.22  0.18 -0.06  0.58 -0.21 -0.05  0.60  1.00      
## high.strung -0.38 -0.18  0.55  0.18 -0.31  0.27  0.24 -0.04  0.54 -0.18 -0.03  0.51  0.63  1.00
## relaxed      0.70  0.54 -0.48 -0.22  0.69 -0.33 -0.24  0.41 -0.34  0.62  0.41 -0.37 -0.39 -0.38
## content      0.52  0.61 -0.32 -0.26  0.60 -0.37 -0.27  0.45 -0.18  0.63  0.56 -0.22 -0.15 -0.17
## worried     -0.26 -0.33  0.45  0.49 -0.30  0.52  0.68 -0.20  0.39 -0.25 -0.24  0.49  0.24  0.31
## rattled     -0.34 -0.14  0.49  0.16 -0.27  0.22  0.19  0.00  0.52 -0.16  0.00  0.50  0.64  0.62
## joyful       0.29  0.44 -0.17 -0.14  0.39 -0.24 -0.16  0.44  0.00  0.49  0.48 -0.07  0.02  0.02
## pleasant     0.49  0.58 -0.32 -0.23  0.57 -0.38 -0.24  0.49 -0.16  0.63  0.54 -0.20 -0.13 -0.14
##          relxd cntnt worrd rttld joyfl plsnt
## relaxed   1.00                              
## content   0.59  1.00                        
## worried  -0.29 -0.29  1.00                  
## rattled  -0.31 -0.09  0.24  1.00            
## joyful    0.35  0.57 -0.17  0.09  1.00      
## pleasant  0.53  0.70 -0.27 -0.08  0.67  1.00

corPlot(R,main="Alabama need not come first",cex=.4) #doesn't have any clear order

f2 <- fa(R,2)

## Loading required namespace: GPArotation

Rs <- matSort(R,f2)
corPlot(Rs,xlas=3,main='Sorted by factor loadings', cex=.4)

Lets look at how matSort works.

#Two alternative names, matSort is the prefered one.
"mat.sort" <- "matSort" <- 
function(m,f=NULL) {
if (is.null(f) ) {f <- fa(m) } #we gave it a factor matrix
if(is.list(f) && (!is.null(loadings(f)))) {load <- loadings(f)}  else {load <- f}
    load  <- as.matrix(load)
    nitems <- NROW(load)
    nfactors <-NCOL(load)
    loads <- data.frame(item=seq(1:nitems),cluster=rep(0,nitems),unclass(load))
#first sort them into clusters
      
        #first find the maximum for each row and assign it to that cluster
         loads$cluster <- apply(abs(load),1,which.max)
         ord <- sort(loads$cluster,index.return=TRUE)
        loads[1:nitems,] <- loads[ord$ix,]
        rownames(loads)[1:nitems] <- rownames(loads)[ord$ix]
         
  #now sort column wise
  #now sort the loadings that have their highest loading on each cluster
  
        items <- table(loads$cluster)   #how many items are in each cluster?
        first <- 1
        item <- loads$item
        for (i in 1:length(items)) {
        if(items[i] > 0 ) {
                last <- first + items[i]- 1
                ord <- sort(abs(loads[first:last,i+2]),decreasing=TRUE,index.return=TRUE)
                loads[first:last,3:(nfactors+2)] <- load[item[ord$ix+first-1],]
                loads[first:last,1] <- item[ord$ix+first-1]
                rownames(loads)[first:last] <- rownames(loads)[ord$ix+first-1]
                first <- first + items[i]  }
                 }  
      item.order <- loads[,1]            
   m  <- m[item.order,item.order]
   return(m)
}

Two new functions that continue showing programming

selectBy and splitBy

dim(bfi)

## [1] 2800   28

small <- selectBy(bfi, 'gender=2 & age < 35 & education > 3')
dim(small)

## [1] 306  28

headTail(small, from=21) #just show the last 8 columns

##        O1  O2  O3  O4  O5 gender education age
## 61693   3   3   2   2   5      2         4  27
## 61700   5   2   5   5   1      2         5  24
## 61723   5   6   3   6   3      2         4  26
## 61754   5   3   3   6   3      2         4  32
## ...   ... ... ... ... ...    ...       ... ...
## 67525   5   6   4   5   1      2         4  30
## 67539   5   4   4   3   4      2         5  33
## 67547   6   1   4   5   2      2         4  24
## 67556   5   1   6   4   3      2         4  29

How does this function work? Look at it. It uses grepl and other functions.

#written May 20, 2023
#copied the code from source
"selectBy" <- function(x,by) {#use a quasi formula input

   by <- gsub(" ","", by)   #this removes the spaces
     
   if(grepl("\\|",by)) {  AND <- FALSE
                    bb <- unlist(strsplit(by,"\\|"))} else {  #note to search for a | we have to escape it!
     
                    AND  <- TRUE
                     if(grepl(",",by)) {
                       bb <- unlist(strsplit(by,","))} else {
                       bb <- unlist(strsplit(by,"&"))} }
                      
    n.by <- length(bb)
    by <- isvalue <- notvalue <- lessthan <- morethan <- matrix(NA,ncol= n.by)
   
    eq <- grep("=",bb)   #find which operators were used
    lt <- grep("<",bb)   #returns a vector
    gr <- grep(">",bb)
    ne  <- grep("!=",bb)
   #prepare the relevant search parameters
    if(length(eq ) >0)  {temp <- unlist(strsplit(bb[eq],"="))
               by[eq] <- temp[1]
               isvalue[eq] <-as.numeric( temp[2])
               } 
    if( length(lt)  >0)  {temp <- unlist(strsplit(bb[lt],"<"))
               by[lt] <- temp[1]
               lessthan[lt] <- as.numeric( temp[2])
               } 
     if(length(gr) >0)  {temp <- unlist(strsplit(bb[gr],">"))
               by[gr] <- temp[1]
               morethan[gr] <- as.numeric( temp[2])
               } 
      if(length(ne) >0)  {temp <- unlist(strsplit(bb[eq],"!="))
               by[ne] <- temp[1]
               notvalue[ne] <-as.numeric( temp[2])
               } 

#make sure that the variable names are correct
 if(!all(by %in% colnames(x)))  {
                                 cat("\n Offending variables are ",by[!by %in% colnames(x) ],"\n")
                                 stop("Variables specified do not match the variables in the data. \nFix the names and try again")}  
#do this on y which serves as  pointers to x, rather than x  #then combine the pointers for & (and) | (or)
y <- matrix(TRUE,nrow=NROW(x), ncol=n.by)

for(i in 1:length(by)) {
    if(!is.na(isvalue[,i])) y[,i] <-  x[,by[i]]==isvalue[i]
    if(!is.na(notvalue[,i])) y[,i] <- (x[,by[i]]!= notvalue[i])
    if(!is.na(lessthan[,i])) y[,i] <- (x[,by[i]]< lessthan[i])
    if(!is.na(morethan[,i])) y[,i] <- (x[,by[i]] >  morethan[i])
}
if(AND ) {y <-apply(y,1,all)} else {y <- apply(y,1,any)} 
y[is.na(y) ] <- FALSE
return(x[y,])
}

Another function, using much of the same logic as selectBy is to convert specified variables into dichotomous (0,1) codes. This is splitBy. Note that the programming took advantage of the code in selectBy.

#written May 20, 2023
"splitBy" <- function(x,by,new=FALSE) {#use a quasi formula input

   by <- gsub(" ","", by)   #this removes the spaces
     
   bb <- unlist(strsplit(by,",")) 
                     
                      
    n.by <- length(bb)
    by <- isvalue <- notvalue <- lessthan <- morethan <- matrix(NA,ncol= n.by)
   
    eq <- grep("=",bb)   #find which operators were used
    lt <- grep("<",bb)   #returns a vector
    gr <- grep(">",bb)
    ne  <- grep("!=",bb)
   #prepare the relevant search parameters
    if(length(eq ) >0)  {temp <- unlist(strsplit(bb[eq],"="))
               by[eq] <- temp[1]
               isvalue[eq] <-as.numeric( temp[2])
               } 
    if( length(lt)  >0)  {temp <- unlist(strsplit(bb[lt],"<"))
               by[lt] <- temp[1]
               lessthan[lt] <- as.numeric( temp[2])
               } 
     if(length(gr) >0)  {temp <- unlist(strsplit(bb[gr],">"))
               by[gr] <- temp[1]
               morethan[gr] <- as.numeric( temp[2])
               } 
      if(length(ne) >0)  {temp <- unlist(strsplit(bb[eq],"!="))
               by[ne] <- temp[1]
               notvalue[ne] <-as.numeric( temp[2])
               } 
     
#do this on y which serves as  pointers to x, rather than x  #then combine the pointers for & (and) | (or)
if(!all(by %in% colnames(x)))  {
                                 cat("\n Offending variables are ",by[!by %in% colnames(x) ],"\n")
                                 stop("Variables specified do not match the variables in the data. \nFix the names and try again")}  

y <- matrix(TRUE,nrow=NROW(x), ncol=n.by)
colnames(y) <- paste0(c(by),"2")
for(i in 1:length(by)) { #check each value as true or false
    if(!is.na(isvalue[,i])) y[,i] <-  x[,by[i]]==isvalue[i]  
    if(!is.na(notvalue[,i])) y[,i] <- (x[,by[i]]!= notvalue[i])
    if(!is.na(lessthan[,i])) y[,i] <- (x[,by[i]]< lessthan[i])
    if(!is.na(morethan[,i])) y[,i] <- (x[,by[i]] >  morethan[i])
}
#convert to numeric
y <- y +0   #convert TRUE FALSE to numeric
if(new){return(y)} else {return(cbind(x,y))}
}

Try this function, on the bfi data set again. Dichotomize the age variable into high and low, and education as high and low.

bfi2 <- splitBy(bfi,'age < 25, education > 3')
lowerCor(bfi2[26:30]) #note that age2 1 means 1

##            gendr edctn age   age2  edct2
## gender      1.00                        
## education   0.01  1.00                  
## age         0.05  0.24  1.00            
## age2       -0.05 -0.29 -0.75  1.00      
## education2 -0.04  0.81  0.28 -0.31  1.00

describe(bfi2[26:30],skew=FALSE,range=FALSE)

##            vars    n  mean    sd   se
## gender        1 2800  1.67  0.47 0.01
## education     2 2577  3.19  1.11 0.02
## age           3 2800 28.78 11.13 0.21
## age2          4 2800  0.46  0.50 0.01
## education2    5 2577  0.32  0.46 0.01

% df2latex % dd

350.week10 Advanced programming

William Revelle

05/20/2024

psych and psychTools packages have been updated to 2.4.4

Merging and file manipulation

How does this function work? Lets examine it

We notice that `lookupFromKeys` calls some other functions – how do they work?

show the use of `sub` and `gsub`

We also use the `lookup` function from `psych`

Here is the function with comments

Show this function with the `bfi` items and dictionary

An improvement added two years ago (suggested by See Young Myaeng)

Another way to improve the function

Now use this improved function on our example

Are there other ways we can improve this function?

More useful functions for data presentation

matSort will organize a correlation matrix by some structural variable

Two new functions that continue showing programming

350.week10 Advanced programming

William Revelle

05/20/2024

psych and psychTools packages have been updated to 2.4.4

Merging and file manipulation

How does this function work? Lets examine it

We notice that lookupFromKeys calls some other functions – how do they work?

show the use of sub and gsub

We also use the lookup function from psych

Here is the function with comments

Show this function with the bfi items and dictionary

An improvement added two years ago (suggested by See Young Myaeng)

Another way to improve the function

Now use this improved function on our example

Are there other ways we can improve this function?

More useful functions for data presentation

matSort will organize a correlation matrix by some structural variable

Two new functions that continue showing programming

We notice that `lookupFromKeys` calls some other functions – how do they work?

show the use of `sub` and `gsub`

We also use the `lookup` function from `psych`

Show this function with the `bfi` items and dictionary