--- title: "350 Week 6 a: data manipulation" author: "William Revelle" date: "4/29/24" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) options(width=100) ``` ## Comments about RMarkdown RMarkdown has a special syntax in terms of spacing It is necessary to have a space after the # to make a heading The ` symbol to make r code run must start in column 1 ## First make sure we have psych and psychTools Notice that although we just specified loading two packages, a whole set of packages come up as well because we are using RStudio. These are all part of the overhead of using RMarkdown. ```{r} library(psych) library(psychTools) sessionInfo() ``` ### Manipulating data When using a data file, it is likely that you will want to combine it with another file, sort it, examine just a few cases, etc. Today we work through a number of such operations. We saw these last week when we worked on the reliability exercise, but today we will work through those in more detail. In particular, we will work with the `stai` and `msqR` data files. First we get them, and find out their names. ```{r} dim(sai) #what are the dimensions of this data set? colnames(sai) #what are the variables dim(msqR) colnames(msqR) ``` # The data For these examples we use small subsets of the larger msqR and sai data sets (in psychTools and then specify which items to score for which analysis. The msqR data set is stored as a data.frame which may be thought of a spreadsheet with subjects as rows and variables as columns. (Using the $ command specfies a particular column by name). Both of these data sets represent data collected in multiple different studies with different designs. Thus, to show the different studies and the number of subjects per occasion we use the `table` command. `table(msqR$study,msqR$time)` does a cross tabulation of two variables within the msqR data.frame, the study and the time variables. Because the entire data set includes 6,411 rows for 3,032 unique subjects (some studies included multiple administrations), we will select just subjects from studies that meet particular criteria. That is, for short term test-dependability, those studies where the SAI and MSQ was given twice in the same session (time = 1 and 2). For longer term stability (over 1-2 days), those studies where the SAI and MSQ were given on different days (time = 1 and 3). We use the subset function to choose just those subjects who meet certain conditions (e.g., the first occasion data). We use "==" to represent equality. ```{r} table(sai$study,sai$time) #show the study names and sample sizes #Now, select some subsets for analysis using the subset function. #the short term consistency sets #use the subset command which chooses from a data frame the logical set defined in the second step sai.control <- subset(sai,is.element(sai$study,c("Cart", "Fast", "SHED", "SHOP")) ) #lets take this apart temp <- is.element(sai$study,c("Cart", "Fast", "SHED", "SHOP")) length(temp) headTail(temp) #not very interesting, just a set of logical values #logical FALSE is 0, logical TRUE is 1 , #so therefore, we can find out how many subjects were chosen sum(temp) #of the 5378 subjects, 626 were in those four studies dim(sai.control) #these are the 626 subjects for whom the logical values were TRUE ``` temp is a vector of logical values. We show this just to see the steps. # Get the MSQ data that match these sai data, use a similar (but different) approach ```{r} table(msqR$study,msqR$time) #note haw the same studies are shown. #will do this a slightly different way select <- is.element(msqR$study,c("Cart", "Fast", "SHED", "SHOP")) msq.control <- msqR[select, ] #just the selected cases, dim(msq.control) ``` ## select certain variables in msq We will use the %in% function ```{r} select.variables <- colnames(sai) %in% colnames(msqR) select.variables #this is a vector of TRUEs and FALSEs. selected.variables <- colnames(sai)[select.variables] #just those that are TRUE msq.selected <- msq.control[,selected.variables] dim(msq.selected) ``` ## Are these the same subjects? Correlate the ids ```{r} cor(sai.control[,3],msq.selected[,3]) #cor is a bit finicky try cor2 cor2(sai.control[,2:4],msq.selected[,2:4]) ``` ### Lets describe this file ```{r} describe(msq.selected) ```