Comments about RMarkdown

RMarkdown has a special syntax in terms of spacing

It is necessary to have a space after the # to make a heading

The ` symbol to make r code run must start in column 1

First make sure we have psych and psychTools

Notice that although we just specified loading two packages, a whole set of packages come up as well because we are using RStudio. These are all part of the overhead of using RMarkdown.

library(psych)
library(psychTools)
sessionInfo()

## R version 4.4.0 beta (2024-04-12 r86412)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sonoma 14.4.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Chicago
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] psychTools_2.4.4 psych_2.4.4     
## 
## loaded via a namespace (and not attached):
##  [1] nlme_3.1-164      cli_3.6.1         knitr_1.43        rlang_1.1.1       xfun_0.39        
##  [6] jsonlite_1.8.5    rtf_0.4-14.1      htmltools_0.5.8.1 sass_0.4.9        rmarkdown_2.26   
## [11] grid_4.4.0        evaluate_0.21     jquerylib_0.1.4   fastmap_1.1.1     yaml_2.3.7       
## [16] lifecycle_1.0.3   compiler_4.4.0    rstudioapi_0.16.0 R.oo_1.26.0       lattice_0.22-6   
## [21] digest_0.6.31     R6_2.5.1          foreign_0.8-86    mnormt_2.1.1      parallel_4.4.0   
## [26] bslib_0.7.0       R.methodsS3_1.8.2 tools_4.4.0       cachem_1.0.8

Manipulating data

When using a data file, it is likely that you will want to combine it with another file, sort it, examine just a few cases, etc. Today we work through a number of such operations.

We saw these last week when we worked on the reliability exercise, but today we will work through those in more detail.

In particular, we will work with the stai and msqR data files. First we get them, and find out their names.

dim(sai)  #what are the dimensions of this data set?

## [1] 5378   23

colnames(sai)  #what are the variables

##  [1] "study"       "time"        "id"          "calm"        "secure"      "tense"      
##  [7] "regretful"   "at.ease"     "upset"       "worrying"    "rested"      "anxious"    
## [13] "comfortable" "confident"   "nervous"     "jittery"     "high.strung" "relaxed"    
## [19] "content"     "worried"     "rattled"     "joyful"      "pleasant"

dim(msqR)

## [1] 6411   88

colnames(msqR)

##  [1] "active"       "afraid"       "alert"        "angry"        "aroused"      "ashamed"     
##  [7] "astonished"   "at.ease"      "at.rest"      "attentive"    "blue"         "bored"       
## [13] "calm"         "clutched.up"  "confident"    "content"      "delighted"    "depressed"   
## [19] "determined"   "distressed"   "drowsy"       "dull"         "elated"       "energetic"   
## [25] "enthusiastic" "excited"      "fearful"      "frustrated"   "full.of.pep"  "gloomy"      
## [31] "grouchy"      "guilty"       "happy"        "hostile"      "inspired"     "intense"     
## [37] "interested"   "irritable"    "jittery"      "lively"       "lonely"       "nervous"     
## [43] "placid"       "pleased"      "proud"        "quiescent"    "quiet"        "relaxed"     
## [49] "sad"          "satisfied"    "scared"       "serene"       "sleepy"       "sluggish"    
## [55] "sociable"     "sorry"        "still"        "strong"       "surprised"    "tense"       
## [61] "tired"        "unhappy"      "upset"        "vigorous"     "wakeful"      "warmhearted" 
## [67] "wide.awake"   "anxious"      "cheerful"     "idle"         "inactive"     "tranquil"    
## [73] "alone"        "kindly"       "scornful"     "Extraversion" "Neuroticism"  "Lie"         
## [79] "Sociability"  "Impulsivity"  "gender"       "TOD"          "drug"         "film"        
## [85] "time"         "id"           "form"         "study"

The data

For these examples we use small subsets of the larger msqR and sai data sets (in psychTools and then specify which items to score for which analysis. The msqR data set is stored as a data.frame which may be thought of a spreadsheet with subjects as rows and variables as columns. (Using the $ command specfies a particular column by name). Both of these data sets represent data collected in multiple different studies with different designs. Thus, to show the different studies and the number of subjects per occasion we use the table command. table(msqR$study,msqR$time) does a cross tabulation of two variables within the msqR data.frame, the study and the time variables.

Because the entire data set includes 6,411 rows for 3,032 unique subjects (some studies included multiple administrations), we will select just subjects from studies that meet particular criteria. That is, for short term test-dependability, those studies where the SAI and MSQ was given twice in the same session (time = 1 and 2). For longer term stability (over 1-2 days), those studies where the SAI and MSQ were given on different days (time = 1 and 3). We use the subset function to choose just those subjects who meet certain conditions (e.g., the first occasion data). We use “==” to represent equality.

table(sai$study,sai$time) #show the study names and sample sizes

##           
##              1   2   3   4
##   AGES      68  68   0   0
##   Cart      63  63   0   0
##   CITY     157   0   0   0
##   EMIT      71   0   0   0
##   Fast      94  94   0   0
##   FIAT      70  70   0   0
##   FILM      95  95  95   0
##   FLAT     170 170 170   0
##   GRAY     107   0   0   0
##   HOME      67  67   0   0
##   IMPS     102   0   0   0
##   ITEM      49   0   0   0
##   Maps     160   0   0   0
##   MITE      49   0   0   0
##   MIXX      71   0   0   0
##   PAT       65  65   0   0
##   PATS     132   0   0   0
##   RAFT      40   0   0   0
##   RIM      342   0 342   0
##   ROB       51   0  46   0
##   SALT     104 104   0   0
##   SAM      324   0 324   0
##   SHED      58  58   0   0
##   SHOP      98  98   0   0
##   SWAM.one  94   0   0   0
##   SWAM.two  54   0   0   0
##   VALE      77  77  70  70
##   XRAY     200 200   0   0

#Now, select some subsets for analysis using the subset function.
#the short term consistency sets

#use the subset command which chooses from a data frame the logical set defined in the second step

sai.control <- subset(sai,is.element(sai$study,c("Cart", "Fast", "SHED", "SHOP")) )

#lets take this apart
temp <- is.element(sai$study,c("Cart", "Fast", "SHED", "SHOP")) 
length(temp)

## [1] 5378

headTail(temp) #not very interesting, just a set of logical values #logical FALSE is 0, logical TRUE is 1  ,

##   [,1]            [,2]            [,3]            [,4]           
## h "FALSE"         "FALSE"         "FALSE"         "FALSE"        
##   "...       ..." "...       ..." "...       ..." "...       ..."
## t "FALSE"         "FALSE"         "FALSE"         "FALSE"

#so therefore, we can find out how many subjects were chosen
sum(temp)  #of the 5378 subjects, 626 were in those four studies

## [1] 626

dim(sai.control) #these are the 626 subjects for whom the logical values were TRUE

## [1] 626  23

temp is a vector of logical values. We show this just to see the steps.

Get the MSQ data that match these sai data, use a similar (but different) approach

table(msqR$study,msqR$time) #note haw the same studies are shown.

##           
##              1   2   3   4
##   AGES      68  68   0   0
##   Cart      63  63   0   0
##   CITY     157 157   0   0
##   EMIT      71  71   0   0
##   Fast      94  94   0   0
##   FIAT      70  70   0   0
##   FILM      95  95  95   0
##   FLAT     170 170 170   0
##   GRAY     107 107   0   0
##   HOME      67  67   0   0
##   IMPS     102 102   0   0
##   ITEM      49  49   0   0
##   Maps     160 160   0   0
##   MITE      49  49   0   0
##   MIXX      71  71   0   0
##   PAT       65  65  65  65
##   PATS     132   0   0   0
##   RAFT      40  40   0   0
##   RIM      342   0 342   0
##   ROB       51  51  46  46
##   SALT     104 104   0   0
##   SAM      324   0 324   0
##   SHED      58  58   0   0
##   SHOP      98  98   0   0
##   SWAM.one  94   0   0   0
##   SWAM.two  54   0   0   0
##   VALE      77  77  70  70
##   XRAY     200 200   0   0

#will do this a slightly different way
select <- is.element(msqR$study,c("Cart", "Fast", "SHED", "SHOP"))
msq.control <- msqR[select, ] #just the selected cases, 
dim(msq.control)

## [1] 626  88

select certain variables in msq

We will use the %in% function

select.variables <- colnames(sai) %in% colnames(msqR)
select.variables  #this is a vector of TRUEs and FALSEs.

##  [1]  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE  TRUE  TRUE  TRUE
## [17] FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE

selected.variables <- colnames(sai)[select.variables] #just those that are TRUE
msq.selected <- msq.control[,selected.variables]
dim(msq.selected)

## [1] 626  13

Are these the same subjects? Correlate the ids

cor(sai.control[,3],msq.selected[,3])

## [1] 1

#cor is a bit finicky try cor2
cor2(sai.control[,2:4],msq.selected[,2:4])

##       time    id  calm
## time  1.00  0.00 -0.12
## id    0.00  1.00 -0.02
## calm -0.16 -0.08  0.64

Lets describe this file

describe(msq.selected)

##           vars   n  mean    sd median trimmed   mad min max range  skew kurtosis   se
## study*       1 626  2.61  1.13    2.0    2.64  1.48   1   4     3 -0.04    -1.40 0.05
## time         2 626  1.50  0.50    1.5    1.50  0.74   1   2     1  0.00    -2.00 0.02
## id           3 626 41.93 25.99   40.0   40.71 29.65   1  98    97  0.32    -0.88 1.04
## calm         4 618  1.57  0.85    2.0    1.59  1.48   0   3     3 -0.07    -0.62 0.03
## tense        5 619  0.46  0.72    0.0    0.31  0.00   0   3     3  1.43     1.23 0.03
## at.ease      6 620  1.48  0.90    1.0    1.47  1.48   0   3     3  0.01    -0.79 0.04
## upset        7 620  0.38  0.69    0.0    0.22  0.00   0   3     3  1.94     3.43 0.03
## anxious      8 620  0.53  0.79    0.0    0.38  0.00   0   3     3  1.37     1.04 0.03
## confident    9 618  1.38  0.93    1.0    1.35  1.48   0   3     3  0.08    -0.87 0.04
## nervous     10 622  0.29  0.60    0.0    0.15  0.00   0   3     3  2.14     4.19 0.02
## jittery     11 621  0.41  0.67    0.0    0.28  0.00   0   3     3  1.64     2.34 0.03
## relaxed     12 622  1.60  0.89    2.0    1.62  1.48   0   3     3 -0.05    -0.77 0.04
## content     13 616  1.25  0.91    1.0    1.20  1.48   0   3     3  0.21    -0.81 0.04

350 Week 6 a: data manipulation

William Revelle

4/29/24