Notice that although we just specified loading two packages, a whole set of packages come up as well because we are using RStudio. These are all part of the overhead of using RMarkdown.
library(psych)
library(psychTools)
sessionInfo()
## R version 4.4.0 beta (2024-04-12 r86412)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sonoma 14.4.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/Chicago
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] psychTools_2.4.4 psych_2.4.4
##
## loaded via a namespace (and not attached):
## [1] nlme_3.1-164 cli_3.6.1 knitr_1.43 rlang_1.1.1 xfun_0.39
## [6] jsonlite_1.8.5 rtf_0.4-14.1 htmltools_0.5.8.1 sass_0.4.9 rmarkdown_2.26
## [11] grid_4.4.0 evaluate_0.21 jquerylib_0.1.4 fastmap_1.1.1 yaml_2.3.7
## [16] lifecycle_1.0.3 compiler_4.4.0 rstudioapi_0.16.0 R.oo_1.26.0 lattice_0.22-6
## [21] digest_0.6.31 R6_2.5.1 foreign_0.8-86 mnormt_2.1.1 parallel_4.4.0
## [26] bslib_0.7.0 R.methodsS3_1.8.2 tools_4.4.0 cachem_1.0.8
When using a data file, it is likely that you will want to combine it with another file, sort it, examine just a few cases, etc. Today we work through a number of such operations.
We saw these last week when we worked on the reliability exercise, but today we will work through those in more detail.
In particular, we will work with the stai
and
msqR
data files. First we get them, and find out their
names.
dim(sai) #what are the dimensions of this data set?
## [1] 5378 23
colnames(sai) #what are the variables
## [1] "study" "time" "id" "calm" "secure" "tense"
## [7] "regretful" "at.ease" "upset" "worrying" "rested" "anxious"
## [13] "comfortable" "confident" "nervous" "jittery" "high.strung" "relaxed"
## [19] "content" "worried" "rattled" "joyful" "pleasant"
dim(msqR)
## [1] 6411 88
colnames(msqR)
## [1] "active" "afraid" "alert" "angry" "aroused" "ashamed"
## [7] "astonished" "at.ease" "at.rest" "attentive" "blue" "bored"
## [13] "calm" "clutched.up" "confident" "content" "delighted" "depressed"
## [19] "determined" "distressed" "drowsy" "dull" "elated" "energetic"
## [25] "enthusiastic" "excited" "fearful" "frustrated" "full.of.pep" "gloomy"
## [31] "grouchy" "guilty" "happy" "hostile" "inspired" "intense"
## [37] "interested" "irritable" "jittery" "lively" "lonely" "nervous"
## [43] "placid" "pleased" "proud" "quiescent" "quiet" "relaxed"
## [49] "sad" "satisfied" "scared" "serene" "sleepy" "sluggish"
## [55] "sociable" "sorry" "still" "strong" "surprised" "tense"
## [61] "tired" "unhappy" "upset" "vigorous" "wakeful" "warmhearted"
## [67] "wide.awake" "anxious" "cheerful" "idle" "inactive" "tranquil"
## [73] "alone" "kindly" "scornful" "Extraversion" "Neuroticism" "Lie"
## [79] "Sociability" "Impulsivity" "gender" "TOD" "drug" "film"
## [85] "time" "id" "form" "study"
For these examples we use small subsets of the larger msqR and sai
data sets (in psychTools and then specify which items to score for which
analysis. The msqR data set is stored as a data.frame which may be
thought of a spreadsheet with subjects as rows and variables as columns.
(Using the $ command specfies a particular column by name). Both of
these data sets represent data collected in multiple different studies
with different designs. Thus, to show the different studies and the
number of subjects per occasion we use the table
command.
table(msqR$study,msqR$time)
does a cross tabulation of two
variables within the msqR data.frame, the study and the time
variables.
Because the entire data set includes 6,411 rows for 3,032 unique subjects (some studies included multiple administrations), we will select just subjects from studies that meet particular criteria. That is, for short term test-dependability, those studies where the SAI and MSQ was given twice in the same session (time = 1 and 2). For longer term stability (over 1-2 days), those studies where the SAI and MSQ were given on different days (time = 1 and 3). We use the subset function to choose just those subjects who meet certain conditions (e.g., the first occasion data). We use “==” to represent equality.
table(sai$study,sai$time) #show the study names and sample sizes
##
## 1 2 3 4
## AGES 68 68 0 0
## Cart 63 63 0 0
## CITY 157 0 0 0
## EMIT 71 0 0 0
## Fast 94 94 0 0
## FIAT 70 70 0 0
## FILM 95 95 95 0
## FLAT 170 170 170 0
## GRAY 107 0 0 0
## HOME 67 67 0 0
## IMPS 102 0 0 0
## ITEM 49 0 0 0
## Maps 160 0 0 0
## MITE 49 0 0 0
## MIXX 71 0 0 0
## PAT 65 65 0 0
## PATS 132 0 0 0
## RAFT 40 0 0 0
## RIM 342 0 342 0
## ROB 51 0 46 0
## SALT 104 104 0 0
## SAM 324 0 324 0
## SHED 58 58 0 0
## SHOP 98 98 0 0
## SWAM.one 94 0 0 0
## SWAM.two 54 0 0 0
## VALE 77 77 70 70
## XRAY 200 200 0 0
#Now, select some subsets for analysis using the subset function.
#the short term consistency sets
#use the subset command which chooses from a data frame the logical set defined in the second step
sai.control <- subset(sai,is.element(sai$study,c("Cart", "Fast", "SHED", "SHOP")) )
#lets take this apart
temp <- is.element(sai$study,c("Cart", "Fast", "SHED", "SHOP"))
length(temp)
## [1] 5378
headTail(temp) #not very interesting, just a set of logical values #logical FALSE is 0, logical TRUE is 1 ,
## [,1] [,2] [,3] [,4]
## h "FALSE" "FALSE" "FALSE" "FALSE"
## "... ..." "... ..." "... ..." "... ..."
## t "FALSE" "FALSE" "FALSE" "FALSE"
#so therefore, we can find out how many subjects were chosen
sum(temp) #of the 5378 subjects, 626 were in those four studies
## [1] 626
dim(sai.control) #these are the 626 subjects for whom the logical values were TRUE
## [1] 626 23
temp is a vector of logical values. We show this just to see the steps.
table(msqR$study,msqR$time) #note haw the same studies are shown.
##
## 1 2 3 4
## AGES 68 68 0 0
## Cart 63 63 0 0
## CITY 157 157 0 0
## EMIT 71 71 0 0
## Fast 94 94 0 0
## FIAT 70 70 0 0
## FILM 95 95 95 0
## FLAT 170 170 170 0
## GRAY 107 107 0 0
## HOME 67 67 0 0
## IMPS 102 102 0 0
## ITEM 49 49 0 0
## Maps 160 160 0 0
## MITE 49 49 0 0
## MIXX 71 71 0 0
## PAT 65 65 65 65
## PATS 132 0 0 0
## RAFT 40 40 0 0
## RIM 342 0 342 0
## ROB 51 51 46 46
## SALT 104 104 0 0
## SAM 324 0 324 0
## SHED 58 58 0 0
## SHOP 98 98 0 0
## SWAM.one 94 0 0 0
## SWAM.two 54 0 0 0
## VALE 77 77 70 70
## XRAY 200 200 0 0
#will do this a slightly different way
select <- is.element(msqR$study,c("Cart", "Fast", "SHED", "SHOP"))
msq.control <- msqR[select, ] #just the selected cases,
dim(msq.control)
## [1] 626 88
We will use the %in% function
select.variables <- colnames(sai) %in% colnames(msqR)
select.variables #this is a vector of TRUEs and FALSEs.
## [1] TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE FALSE FALSE TRUE FALSE TRUE TRUE TRUE
## [17] FALSE TRUE TRUE FALSE FALSE FALSE FALSE
selected.variables <- colnames(sai)[select.variables] #just those that are TRUE
msq.selected <- msq.control[,selected.variables]
dim(msq.selected)
## [1] 626 13
cor(sai.control[,3],msq.selected[,3])
## [1] 1
#cor is a bit finicky try cor2
cor2(sai.control[,2:4],msq.selected[,2:4])
## time id calm
## time 1.00 0.00 -0.12
## id 0.00 1.00 -0.02
## calm -0.16 -0.08 0.64
describe(msq.selected)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## study* 1 626 2.61 1.13 2.0 2.64 1.48 1 4 3 -0.04 -1.40 0.05
## time 2 626 1.50 0.50 1.5 1.50 0.74 1 2 1 0.00 -2.00 0.02
## id 3 626 41.93 25.99 40.0 40.71 29.65 1 98 97 0.32 -0.88 1.04
## calm 4 618 1.57 0.85 2.0 1.59 1.48 0 3 3 -0.07 -0.62 0.03
## tense 5 619 0.46 0.72 0.0 0.31 0.00 0 3 3 1.43 1.23 0.03
## at.ease 6 620 1.48 0.90 1.0 1.47 1.48 0 3 3 0.01 -0.79 0.04
## upset 7 620 0.38 0.69 0.0 0.22 0.00 0 3 3 1.94 3.43 0.03
## anxious 8 620 0.53 0.79 0.0 0.38 0.00 0 3 3 1.37 1.04 0.03
## confident 9 618 1.38 0.93 1.0 1.35 1.48 0 3 3 0.08 -0.87 0.04
## nervous 10 622 0.29 0.60 0.0 0.15 0.00 0 3 3 2.14 4.19 0.02
## jittery 11 621 0.41 0.67 0.0 0.28 0.00 0 3 3 1.64 2.34 0.03
## relaxed 12 622 1.60 0.89 2.0 1.62 1.48 0 3 3 -0.05 -0.77 0.04
## content 13 616 1.25 0.91 1.0 1.20 1.48 0 3 3 0.21 -0.81 0.04
Comments about RMarkdown
RMarkdown has a special syntax in terms of spacing
It is necessary to have a space after the # to make a heading
The ` symbol to make r code run must start in column 1