--- title: "350 Week 9 data manipulation" author: "William Revelle" date: "06/01/2020" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) options(width=100) ``` # Psych has been updated, install the latest version ```{r} #install.packages("psych",repos="http://personality-project.org/r",type="source") library(psych) library(psychTools) sessionInfo() #psych and psychTools should be 2.0.4 ``` #Various R commands for data manipulation and display These are examples taken from the help file for the `sai` and `msqR` data sets. `sai` for the State Anxiety Inventory `msqR` for the Motivational State Questionnaire. ## These data were collected at the PMC lab: The standard experimental study at the Personality, Motivation and Cognition (PMC) laboratory (Revelle and Anderson, 1997) was to administer a number of personality trait and state measures (e.g. the epi, msq, msqR and sai) to participants before some experimental manipulation of arousal/effort/anxiety. Following the manipulation (with a 30 minute delay if giving caffeine/placebo), some performance task was given, followed once again by measures of state arousal/effort/anxiety. Here are the item level data on the `sai` (state anxiety) and the `tai` (trait anxiety). Scores on these scales may be found using the scoring keys. The affect data set includes pre and post scores for two studies (flat and maps) which manipulated state by using four types of movies. In addition to being useful for studies of motivational state, these studies provide examples of test-retest and alternate form reliabilities. Given that 10 items overlap with the msqR data, they also allow for a comparison of immediate duplication of items with 30 minute delays. Studies CART, FAST, SHED, RAFT, and SHOP were either control groups, or did not experimentally vary arousal/effort/anxiety. AGES, CITY, EMIT, RIM, SALT, and XRAY were caffeine manipulations between time 1 and 2 (RIM and VALE were repeated day 1 and day 2) FIAT, FLAT, MAPS, MIXX, and THRU were 1 day studies with film manipulation between time 1 and time 2. SAM1 and SAM2 were the first and second day of a two day study. The STAI was given once per day. MSQ not MSQR was given. VALE and PAT were two day studies with the STAI given pre and post on both days RIM was a two day study with the STAI and MSQ given once per day. Usually, time of day 1 = 8:50-9am am, and 2 = 7:30 pm, however, in rob, with paid subjects, the times were 0530 and 22:30. ##The `table` command tells you about the data ```{r} data(sai) #actually not necessary, because the data are 'lazy loaded' dim(sai) #how many subjects (row) and how many columns (variables) colnames(sai) #and what are the variables? table(sai$study) #what are the names of the studies in this file table(sai$study,sai$time) #cross tabulate the studies by time adminstered. ``` ##We can choose various studies using the `subset` and `%in%` commands ```{r} control <- subset(sai,sai$study %in% c("Cart", "Fast", "SHED", "RAFT", "SHOP")) table(control$study,control$time) #it still knows the names of the studies, even though no one is there dim(control) #pre and post drug studies drug <- subset(sai,sai$study %in% c("AGES", "CITY","EMIT", "SALT", "VALE", "XRAY")) #pre and post film studies film <- subset(sai,sai$study %in% c("FIAT","FLAT", "MAPS", "MIXX")) ``` ##Lets explore the `%in%` command Z <- X %in% Y returns all the elements in X that are in Y X %in% Y returns the logical values (TRUE, FALSE) for each element in X ```{r} table(sai$study) #shows the counts in each study studies <- names(table(sai$study)) #gets the names of the studies studies %in% c("Cart", "Fast", "SHED", "RAFT", "SHOP") #versus c("Cart", "Fast", "SHED", "RAFT", "SHOP") %in% studies select <- studies %in% c("Cart", "Fast", "SHED", "RAFT", "SHOP") select ``` # Various data manipulation and displays We will show a `corPlot` of the `sai` data several different ways. The first is the most naive way, by variable order ```{r} R <- corPlot(sai[4:23]) #draw the corPlot and return the correlation matrix ``` ## use a graphical parameter to make that graph better The previous graph does not show all of the variables on the x axis. The `xlas` command makes the labels on the xaxis vertical or horizontal (default). ```{r} R <- corPlot(sai[4:23],xlas=3) #draw the corPlot and return the correlation matrix ``` Do a factor analysis of the data, take out 2 factors, and then sort the variables by the loadings on these two factors. We can do this using the `fa` and `mat.sort` functions. ```{r} f2 <- fa(R,2) #extract two factors f2 #show the factor analysis output fa.diagram(f2) #show the fa output as a path diagram plot(f2, labels = colnames(R)) #show it as a spatial plot Rs <- matSort(R,f2) #sort the matrix by the factor loadings corPlot(Rs,main="SAI sorted by factor loadings") corPlot(Rs, xlas=2,main = "SAI sorted by factor loadings") #set x names to be vertical ``` #What are the scores for these anxiety items? Can we form scales? We use `scoreItems` to form scales as well as `scoreOverlap` to examine the structure. We first get (create) a `keys.list` to show which items go on which scales. ```{r} sai.keys <- list(sai = c("tense","regretful" , "upset", "worrying", "anxious", "nervous" , "jittery" , "high.strung", "worried" , "rattled","-calm", "-secure","-at.ease","-rested","-comfortable", "-confident" ,"-relaxed" , "-content" , "-joyful", "-pleasant" ) , sai.p = c("calm","at.ease","rested","comfortable", "confident", "secure" ,"relaxed" , "content" , "joyful", "pleasant" ), sai.n = c( "tense" , "anxious", "nervous" , "jittery" , "rattled", "high.strung", "upset", "worrying","worried","regretful" ) ) #show the keys sai.keys ``` The keys are used for specifying which items go on which scales. Note that the sai.p and sai.n are the subsets of the total and represent positive and negative items. They will correlate with the total partly because of item overlap. We can adjust for this by using the `scoreOverlap` function on the correlation matrix. Although we can adjust the correlations, we can not adjust the scores, and thus there is no scores object returned from `scoreOverlap`. ```{r} sai.scores <- scoreItems(sai.keys,sai) sai.scores #this produces a fair amount of output summary(sai.scores) #just show the correlation matrix of the scales sai.overlap <- scoreOverlap(sai.keys,sai) sai.overlap #show the full output summary(sai.scores) summary(sai.overlap) #compare these two, why do they differ? ``` #Use the scores we just found and compare them over time The `sai.scores$scores` object has the scores, but we need to combine with the experimental data ```{r} dim(sai.scores$scores) sai.df <- data.frame(sai[1:3],sai.scores$scores) describe(sai.df) ```