--- title: "350.wk3 part a" output: html_document: default pdf_document: default --- The first part of the RMarkdown sets up various paremeters. By setting the width to 100, we get output files that do not wrap the describe output. Normally we do not want to show the setup but here we echo it so that we see how to widen the output. ```{r setup, include=TRUE} knitr::opts_chunk$set(echo = TRUE) options(width=100) #This sets the width of the output, 80 seems to be the default and is too narrow ``` R packages are always under development. For instance, there was a suggestion last week about how to make the `describe` function more useful. This has been implemented in the most recent version of `psych` (1.8.9). After class on Monday, I put is a few more modifications. Rather than push the updated version to CRAN each time and in order to reduce the demands on the CRAN system, it is better to save small changes on local 'repositories' which have not gone through all the overhead of CRAN. The local repository for the psych package is at http://personality-project.org/r . To install originally from this repository we use the `install.packages` command. Or, we can check the repository to see if we need to update our package using the `update.packages` command. (It is not clear if this actually does anything.) \newpage ```{r} #This is commented out if I have already done this #install.packages("psych", repos = "http://personality-project.org/r", type = "source") # or #update.packages("psych", repos = "http://personality-project.org/r", type = "source") # if we already have it although does not seem to work ``` Lets make sure we succeeded. Check the version number of the package. It should now be version 1.8.9 ```{r} library(psych) #remember to make this active library(psychTools) sessionInfo() #see what version we have (should be 2.0.2) ``` ###Seeing what has changed. To see what has been added to a package you can read the `news` file for that package. You can read all of the news for a package, or select just the most recent. Here we read the news for the most recent release of `psych`. Note that now we are specifying Version== "2.0.2" so that we are just getting the most recent (last month and this month's) changes. ```{r } news(Version== "2.0.2",package="psych") ``` # The magic of the `apply` and `by` commands and several functions that use them. There are several functions in base-R that are very powerful replacements for the "for loop" that is more common in other languages (e.g., Fortran, C++, etc.). If you are working with data.frames or matrices, the `by` command will repeat an operation for as many rows or columns as are in you data.frame. The help page is not very informative of the power of `by` and merely says that "(by) is a Function by is an object-oriented wrapper for tapply applied to data frames." Nor is it very helpful for `apply` where we are told: "(apply) returns a vector or array or list of values obtained by applying a function to margins of an array or matrix." Consider the `attitude` example data set in R. First show the entire set, then use `appyly` to find the means and then standard deviations of each column. Finally, we also use the function `colMeans` which does the same thing, but just for means. ```{r} dim(attitude) #see how big it is attitude #show the data - You rarely want to do this apply(attitude,2,mean) #apply the mean function to every column of attitude apply(attitude,2,sd) #now find the standard deviations for each column colMeans(attitude) #see also rowMeans ``` A related function is `by` which will takes a data.frame, chooses cases that match a particular value of some variable, and then does the fuction request. Here we use the `sat.act` data set from `psych` and break the data down by gender, and then describe it ```{r} library(psych) #make it active library(psychTools) head(sat.act) # A core R command tail(sat.act) #another core R command headTail(sat.act) #see what it looks like (combines head and tail) headTail(sat.act,top=6,bottom=3,from=3,to=5) #using some of the parameters by(sat.act,sat.act[1],describe) ``` ##The `by` command `by` allows you do to do a command for various levels of a variable ```{r} describe(sat.act) #does it for all subjects by(sat.act,sat.act[1],describe) #does it for each value of sat.act[1] ``` ### DescribeBy and statsBy The `describeBy` and `statsBy` functions do this for you as well. ```{r} describeBy(sat.act,group = "gender") ``` `statsBy` reports far more statistics broken down by group. Much of the output is hidden in a number of objects that can be queried individually. We save the resulting object if we want to access the separate results. ```{r} sb <- statsBy(sat.act,"education") sb ``` ##`error.bars.by` will draw error bars broken down by groups Here we use the `spi` data set but just use the 2-10 variables. We suppress the xlab (making it "" in order to not overwrite the vertical labels.) ```{r} error.bars.by(spi[2:10],group="sex",ylab="Score",las=2,xlab="") ``` ##Another way to show error bars We can also show confidence intervals for multiple variables by using the *error.dots* function. This will (by default) sort the results by their means and display the top 12 and bottom 12 variables. To see how the variables differ in their means by a grouping variable, specify which variable to use. ```{r} error.dots(bfi[1:25],head=13) #show the item means #or consider some grouping variable error.dots(bfi,var="age",group="education",sort=FALSE,eyes=TRUE) ``` ##Using various parameters of a function to make it more interesting Suppose we want to compare male and female responses to the *bfi* and we want to show their means in a dot.plot. We can use the *error.dot* function with some appropriate choice of subjects. ```{r} males <- subset(bfi,bfi$gender==1) dim(males) females <- subset(bfi,bfi$gender==2) dim(females) results <- error.dots(females[1:25],head=13,eyes=TRUE) #now put the males in the same graph error.dots(males[1:25],head=13,order=results$order,add=TRUE) ``` #Scoring scales and estimating one measures of internal consistency ($\alpha$) A typical study involves aggregating individual items into "scales". To this, we will use a built in data set (the `ability` data) and find the average number of items each person got correct. We will use the `scoreItems` function to do this. It is useful to read the help pages for `ability` to get a feel of what the data are, and then the help page for `scoreItems` to see what we are going to do. ##Part of the `ability` help page: 16 multiple choice ability items 1525 subjects taken from the Synthetic Aperture Personality Assessment (SAPA) web based personality assessment project are saved as iqitems. Those data are shown as examples of how to score multiple choice tests and analyses of response alternatives. When scored correct or incorrect, the data are useful for demonstrations of tetrachoric based factor analysis irt.fa and finding tetrachoric correlations. ##Part of the `scoreItems` help page: Given a data.frame or matrix of n items and N observations and a list of the direction to score them (a keys.list with k keys) find the sum scores or average scores for each person and each scale. In addition, report Cronbach's alpha, Guttman's Lambda 6, the average r, the scale intercorrelations, and the item by scale correlations (raw and corrected for item overlap). Replace missing values with the item median or mean if desired. Items may be keyed 1 (score it), -1 ) (reverse score it), or 0 (do not score it). Negatively keyed items will be reverse scored. Although prior versions used a keys matrix, it is now recommended to just use a list of scoring keys. See make.keys for a convenient way to make the keys file. If the input is a square matrix, then it is assumed that the input is a covariance or correlation matix and scores are not found, but the item statistics are reported. (Similar functionality to cluster.cor). response.frequencies reports the frequency of item endorsements fore each response category for polytomous or multiple choice items. #Get the data and describe it Since we are using a built in data set, we just call it by name (`ability`), but if you want to do this for your own data, you need to first read the data in. ```{r} library(psych) #you already did this, didn't you my.data <- ability describe(my.data) #or, if you prefer #describe(ability) ``` ##The alpha function will score one scale and report some useful statistics Although seriously flawed, $\alpha$ is perhaps the most commonly reported estimate of reliability. Although several other functions in `psych` are recommended, the `alpha` function is included for those new to `R`. If all the items are keyed in the same direction, just call `alpha` with your data. In addition to various item statistics, the `scores` object will have the average (preferred) or sum scores (optional) for your data. Here we use `alpha` on the ability data set. ```{r} ab.al <- alpha(ability) ab.al #see the results names(ab.al) #what are the various objects included in the output describe(ab.al$scores) ``` ### Keying items for direction The ability items were all keyed in the same direction (correct vs. incorrect). Some personality items need to be reversed scored ("do you like to go to lively parties" is the opposite of "do you tend to keep quiet at parties"). `alpha` by default will not reverse key items, but you can ask it do so automatically. Or, you can specify which items to reverse key. Consider the first 5 items of the `bfi` which are five items measuring "Agreeableness". If we do not reverse score them, we find a very poor alpha (and a warning that we might want to do so.) If we turn on the `check.keys` option, `alpha` will automatically reverse items that correlate negativel with the total score. ```{r} alpha(bfi[1:5]) #this will produce a terrible result alpha(bfi[1:5], check.keys = TRUE) # a much better result ``` ##What is this thing called alpha and what does it mean? See next week. #Some comments on useful functions There are many functions in R that allow you to useful things. Here I will show several of them. The first is the *paste* function which allows you to create strings with certain characteristic. A variation of the *paste* function is *paste0* which automatically does not put spaces between the objects. ```{r} my.names <- paste("Variable",1:5) #separation defaults to a blank space my.names1 <- paste("Variable",1:5, sep="") #Speficy the sep to be nothing my.short.names <- paste0("Var", 1:8) #just do it directly my.names my.names1 my.short.names ``` ##Creating a dictionary We looked at the *bfi.dictionary* which is a helpful way to see what the variable s are when you do the analyses. To create such a dictionary is relatively straight forward, but requires using an external editor (such as EXCEL or BBEDIT). You want to specify at least 2 columns in your text file: the item number and the item content. If you want, you can specify more columns to have more identfication information. Consider a text file that looks like this (copied from Excel with tabs between columns) \block quote name ItemLabel Item Giant3 Big6 A1 q_146, Am indifferent to the feelings of others. Cohesion, Agreeableness A2 q_1162, Inquire about others well-being. Cohesion Agreeableness A3 q_1206 Know how to comfort others. Cohesion Agreeableness A4 q_1364 Love children. Cohesion Agreeableness A5 q_1419 Make people feel at ease. Cohesion Agreeableness C1 q_124 Am exacting in my work. Stability Conscientiousness C2 q_530 Continue until everything is perfect. Stability Conscientiousness C3 q_619 Do things according to a plan. Stability Conscientiousness C4 q_626 Do things in a half-way manner. Stability Conscientiousness C5 q_1949 Waste my time. Stability Conscientiousness Note, that we can not have any fancy formatting such as quotes or apostrophes (" or ') Copy these items to the clipboard using the *read.clipboard.tab* command and then show it To make this example work, I created file on the class server ```{r} #temp <- read.clipboard.tab() # not done because we can not do it dynamically file.name <- "http://personality-project.org/courses/314/temp.dictionary.txt" temp <- read.delim(file.name) #this reads a tab delimted file ``` Now, we need to make the rownames of this object be equal to the name ```{r} my.dictionary <- temp rownames(my.dictionary) <- temp[,1] #the first column of temp my.dictionary <- my.dictionary[-1] #get rid of the redundant column my.dictionary #show it ``` We can then use *my.dictionary* in other functions. ##Different data types Most of the examples we have been using come from *psych* package and are either correlation matrices or *data.frames*. But some of the data sets in core-R are *time series* or three (or more) dimensional *arrays*. The standard *describe* function will not work on these types of data. To explore the structure of an object, the *str* , *dim*, and *names* commands are very helpful. *str* will tell you the class and type of an object, *dim* will tell you how many dimensions it has (if it is at least 2), and *names* will tell you the objects nested within the object of interest. To find the list of all of the example data sets in a package ```{r} data(package="psych") ``` #Homework (due on Monday) 1) Read in some data (or use one of the standard data sets) 2) describe it 3) Try using the `apply` function to find column or row means 4) Compare this to the result using `colMeans` 5) Compare this to the result you found in 2) (describe) 6) Try finding the mean scores for each subject on a simple data set (e.g, the attitude data set or one of your own.)