Part of the lecture notes and assignments for Using R in psychological research at Northwestern University, Spring, 2023.
Before it is possible to use R for analysis, we must first get the data. Data files come in many different flavors. Here we will explore how to read in data from the clipboard, from text and csv files, as well as from SPSS.
We run this in the script window of RStudio so that we can keep our notes. This way we can embed text (what you are reading) with the actual R commands and the R output. This is a convenient way to remember what you are doing.
Before we do anything, we need to set up RMarkdown so it has nice parameters. I show the actual commands issued which are hidden when we Knitr.
{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
options(width=100) #This sets the width of the output, 80 seems to be the default and is too narrow
To make these commands run in R, you precede the first line with three ``` (below the tilda key on the keyboard) and then close the last line by adding three more ```
This entire Rmd file is saved in the class notes folder so that you can see how the Markdown commands are written.
First, start up RStudio.
Then, the 350.wk2.Rmd may be read using your browser. Copy the entire Rmd file and then paste it ito the Rmarkdown window.
You are now ready to create the file yourself.
Open RStudio
Create a new file by choosing the File menu (with the R markdown option). You now have a Rmarkdown template that you can modify with the commands that you want. Remember to make some R code run in your template, precede what you want with three ``` and then {r} new line with some r commands followed eventually with three more ```
e.g.
```{r}
R commands
```
psych
package.Much of this is summarized in the vignette: An introduction to the psych package: Part I: data entry and data description which you may get by finding the vignettes for psych.
For these examples, we first need to activate the psych
and the psychTools
packages.
We will read the data using several different approaches. For each of these approaches, we will save the data in the object `my.data’. You can, of course, call this object anything you want.
library(psych) #this assumes we have already installed psych
library(psychTools) #this is needed for some addtional data sets and tools
If you have a data set that you have read from a web browser, or found in a file that you viewed, you can copy the file to your clipboard (using the appropriate commands for your system) and then read the clipboard into R.
First, we use our browser to read the remote file:
http://personality-project.org/r/datasets/simulation.txt
Select all elements of the file and copy to the clipboard. Then
#my.data <- read.clipboard() #this takes what is in the clipboard and makes into the my.data object
#clearly, since this is an interactive command, I can not show this in a script
Now, lets see what we got. We will ask for the dimensions of my.data, show the first and last few lines, and then get some basic descriptive statistics.
But, we can not do this in a script. so we will do the following:
my.file <- "/Users/WR/Library/CloudStorage/OneDrive-NorthwesternUniversity/pmc/courses.23/350/datasets/simulation.txt"
my.data <- read.file(my.file)
## Data from the .txt file /Users/WR/Library/CloudStorage/OneDrive-NorthwesternUniversity/pmc/courses.23/350/datasets/simulation.txt has been loaded.
dim(my.data) #what is the size of the object we read?
## [1] 72 7
headTail(my.data) #show the first and last 4 lines of the object
## Time Anxiety Impulsivity sex Arousal Tension Performance
## 1 9 4 9 1 50 55 40
## 2 19 8 8 1 70 64 90
## 3 9 5 10 2 50 69 48
## 4 9 4 1 2 57 55 68
## ... ... ... ... ... ... ... ...
## 69 19 6 1 1 66 53 88
## 70 9 5 10 2 48 63 40
## 71 19 6 8 2 69 60 95
## 72 19 10 1 2 66 48 93
describe(my.data) #get some descriptive statistics of this object
## vars n mean sd median trimmed mad min max range skew kurtosis se
## Time 1 72 14.28 5.03 19.0 14.34 0.00 9 19 10 -0.11 -2.02 0.59
## Anxiety 2 72 5.24 2.18 5.0 5.24 2.97 0 10 10 -0.04 -0.65 0.26
## Impulsivity 3 72 4.90 3.98 4.5 4.88 5.19 0 10 10 0.02 -1.83 0.47
## sex 4 72 1.50 0.50 1.5 1.50 0.74 1 2 1 0.00 -2.03 0.06
## Arousal 5 72 60.90 8.10 66.0 61.29 5.93 48 70 22 -0.27 -1.67 0.96
## Tension 6 72 56.83 6.29 57.0 57.14 5.93 38 69 31 -0.53 0.42 0.74
## Performance 7 72 72.21 17.41 78.0 73.19 18.53 38 98 60 -0.43 -1.10 2.05
Instead of reading from the clipboard, we can specify the local or remote location of the file and read it directly.
file.name <- "http://personality-project.org/r/datasets/simulation.txt"
my.data <- read.file(file.name) #goes to the remote location and reads it
## Data from the .txt file http://personality-project.org/r/datasets/simulation.txt has been loaded.
Once again, we want to see what we got.
dim(my.data) #what is the size of the object we read?
## [1] 72 7
headTail(my.data) #show the first and last 4 lines of the object
## Time Anxiety Impulsivity sex Arousal Tension Performance
## 1 9 4 9 1 50 55 40
## 2 19 8 8 1 70 64 90
## 3 9 5 10 2 50 69 48
## 4 9 4 1 2 57 55 68
## ... ... ... ... ... ... ... ...
## 69 19 6 1 1 66 53 88
## 70 9 5 10 2 48 63 40
## 71 19 6 8 2 69 60 95
## 72 19 10 1 2 66 48 93
describe(my.data) #get some descriptive statistics of this object
## vars n mean sd median trimmed mad min max range skew kurtosis se
## Time 1 72 14.28 5.03 19.0 14.34 0.00 9 19 10 -0.11 -2.02 0.59
## Anxiety 2 72 5.24 2.18 5.0 5.24 2.97 0 10 10 -0.04 -0.65 0.26
## Impulsivity 3 72 4.90 3.98 4.5 4.88 5.19 0 10 10 0.02 -1.83 0.47
## sex 4 72 1.50 0.50 1.5 1.50 0.74 1 2 1 0.00 -2.03 0.06
## Arousal 5 72 60.90 8.10 66.0 61.29 5.93 48 70 22 -0.27 -1.67 0.96
## Tension 6 72 56.83 6.29 57.0 57.14 5.93 38 69 31 -0.53 0.42 0.74
## Performance 7 72 72.21 17.41 78.0 73.19 18.53 38 98 60 -0.43 -1.10 2.05
We can find the file on our local hard disk by looking for it with the file.choose command. Unfortunately, I need to comment out this statement because I can not dynamically do it as part of a script. So, I will make up a new object `fn’ (file.name) which I will set to what we got before
#next line is suppressed because we can not do it interactively
#so instead, we will define fn as file.name
#fn <-file.choose() # this opens your system to look for the file
fn <- "https://personality-project.org/courses/350/datasets/simulation.txt" #from my looking for it
fn # show the name of the file
## [1] "https://personality-project.org/courses/350/datasets/simulation.txt"
my.data <- read.file(fn)
## Data from the .txt file https://personality-project.org/courses/350/datasets/simulation.txt has been loaded.
dim(my.data) #still the 72 by 7 data file
## [1] 72 7
Unfortunately, although this example will work on my machine, because it is reading a local file, this will not work on your computer. You can change the script to choose a text file from your computer.
If I do not specify the name of the file (fn) in my read.file
command, R will open a system window to let you find it on your machine.
What it is doing is calling the file.choose
function for
you.
I can not show this for your computer, but you can try it on your machine.
#my.data <- read.file()
dim(my.data)
## [1] 72 7
SPSS saves the data in format with the .sav suffix. We can read these data in using read.file. Eli Finkel has shared a small SPSS.sav file .
If you have an spss file on your computer, you could try opening this way.
fn <- "http://personality-project.org/r/datasets/finkel.sav"
eli <- read.file(fn) #go and get it and convert to a normal data.frame
## Data from the SPSS sav file http://personality-project.org/r/datasets/finkel.sav has been loaded.
dim(eli)
## [1] 69 5
headTail(eli)
## USER HAPPY SOULMATE ENJOYDEX UPSET
## 1 "001" 4 7 7 1
## 2 "003" 6 5 7 0
## 3 "004" 6 7 7 0
## 4 "005" 6 7 7 0
## ... <NA> ... ... ... ...
## 66 "074" 7 7 7 1
## 67 "075" 6 7 7 1
## 68 "076" 7 7 7 0
## 69 "078" 2 7 7 1
colnames(eli)
## [1] "USER" "HAPPY" "SOULMATE" "ENJOYDEX" "UPSET"
describe(eli)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## USER* 1 69 35.00 20.06 35 35.00 25.20 1 69 68 0.00 -1.25 2.42
## HAPPY 2 69 5.71 1.04 6 5.82 0.00 2 7 5 -1.17 1.62 0.13
## SOULMATE 3 69 5.09 1.80 5 5.32 1.48 1 7 6 -0.88 -0.03 0.22
## ENJOYDEX 4 68 6.47 1.01 7 6.70 0.00 2 7 5 -2.37 5.92 0.12
## UPSET 5 69 0.41 0.49 0 0.39 0.00 0 1 1 0.38 -1.89 0.06
By default, the read.file function translates complex coding systems into numercal values. Sometimes you want to see the actual encoding of the SPSS file. You can do this by specifying ‘use.value.labels=TRUE’. Compare the next two objects. (Taken from the the help pages of an SPSS online training workshop at Central Michigan University).
fn <- "http://personality-project.org/r/datasets/Cars.sav"
data1 <- read.file(fn) #go and get it and convert to a normal data.frame
## Data from the SPSS sav file http://personality-project.org/r/datasets/Cars.sav has been loaded.
data2 <- read.file(fn,use.value.labels=TRUE) #don't convert the value labels
## Data from the SPSS sav file http://personality-project.org/r/datasets/Cars.sav has been loaded.
headTail(data1) #look at the first and last few lines
## MPG ENGINE HORSE WEIGHT ACCEL YEAR ORIGIN CYLINDER FILTER_.
## 1 18 307 130 3504 12 70 1 8 0
## 2 15 350 165 3693 11.5 70 1 8 0
## 3 18 318 150 3436 11 70 1 8 0
## 4 16 304 150 3433 12 70 1 8 0
## ... ... ... ... ... ... ... ... ... ...
## 403 44 97 52 2130 24.6 82 2 4 1
## 404 32 135 84 2295 11.6 82 1 4 1
## 405 28 120 79 2625 18.6 82 1 4 1
## 406 31 119 82 2720 19.4 82 1 4 1
headTail(data2) #notice we now have the values as entered
## MPG ENGINE HORSE WEIGHT ACCEL YEAR ORIGIN CYLINDER FILTER_.
## 1 18 307 130 3504 12 70 American 8 Cylinders Not Selected
## 2 15 350 165 3693 11.5 70 American 8 Cylinders Not Selected
## 3 18 318 150 3436 11 70 American 8 Cylinders Not Selected
## 4 16 304 150 3433 12 70 American 8 Cylinders Not Selected
## ... ... ... ... ... ... <NA> <NA> <NA> <NA>
## 403 44 97 52 2130 24.6 82 European 4 Cylinders Selected
## 404 32 135 84 2295 11.6 82 American 4 Cylinders Selected
## 405 28 120 79 2625 18.6 82 American 4 Cylinders Selected
## 406 31 119 82 2720 19.4 82 American 4 Cylinders Selected
The describe
function (in psych) will describe both data
sets. It converts the levels information in the second data set into
numeric values and then does the desription. Note that the conversion of
the year variable (was 1 to 13 in the in the spss converted file, but
70-82 in the describe converted object.
describe(data1)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## MPG 1 398 23.51 7.82 23.0 23.06 8.90 9 46.6 37.6 0.45 -0.53 0.39
## ENGINE 2 406 194.04 105.21 148.5 183.75 86.73 4 455.0 451.0 0.69 -0.81 5.22
## HORSE 3 400 104.83 38.52 95.0 100.36 29.65 46 230.0 184.0 1.04 0.55 1.93
## WEIGHT 4 406 2969.56 849.83 2811.0 2913.97 947.38 732 5140.0 4408.0 0.46 -0.77 42.18
## ACCEL 5 406 15.50 2.82 15.5 15.45 2.59 8 24.8 16.8 0.21 0.35 0.14
## YEAR 6 405 75.94 3.74 76.0 75.93 4.45 70 82.0 12.0 0.02 -1.21 0.19
## ORIGIN 7 405 1.57 0.80 1.0 1.46 0.00 1 3.0 2.0 0.92 -0.81 0.04
## CYLINDER 8 405 5.47 1.71 4.0 5.35 0.00 3 8.0 5.0 0.51 -1.41 0.08
## FILTER_. 9 398 0.73 0.44 1.0 0.79 0.00 0 1.0 1.0 -1.04 -0.92 0.02
describe(data2)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## MPG 1 398 23.51 7.82 23.0 23.06 8.90 9 46.6 37.6 0.45 -0.53 0.39
## ENGINE 2 406 194.04 105.21 148.5 183.75 86.73 4 455.0 451.0 0.69 -0.81 5.22
## HORSE 3 400 104.83 38.52 95.0 100.36 29.65 46 230.0 184.0 1.04 0.55 1.93
## WEIGHT 4 406 2969.56 849.83 2811.0 2913.97 947.38 732 5140.0 4408.0 0.46 -0.77 42.18
## ACCEL 5 406 15.50 2.82 15.5 15.45 2.59 8 24.8 16.8 0.21 0.35 0.14
## YEAR* 6 405 6.94 3.74 7.0 6.93 4.45 1 13.0 12.0 0.02 -1.21 0.19
## ORIGIN* 7 405 1.57 0.80 1.0 1.46 0.00 1 3.0 2.0 0.92 -0.81 0.04
## CYLINDER* 8 405 3.20 1.33 2.0 3.14 0.00 1 5.0 4.0 0.27 -1.69 0.07
## FILTER_.* 9 398 1.73 0.44 2.0 1.79 0.00 1 2.0 1.0 -1.04 -0.92 0.02
Just as there are several input formats, so are there several output formats.
Collections of files that are to be read in again from R can be `saved’ as .Rda files (Rdata files).
A single file can be written as an .rds file.
Files can also written as text files so that other programs outside of R can read them.
You choose the way you want to write and save the file by specifying the suffix:
.text becomes a normal text file (that is to say, readable by a word processor)
.rds becomes a file readable by R
.rda can save multiple objects
To create a new file on your disk, use the file.choose function with new=TRUE or just write.file(object, f=) That is to say, write.file by specifying the object to save, and f= where to save it.
#```{r} # fn.txt <- file.choose(new=TRUE) commented out but creates fn.txt <- “/Users/WR/Box Sync/pmc_folder/courses.18/350/datasets/cars.txt” fn.rda <- “/Users/WR/Box Sync/pmc_folder/courses.18/350/datasets/cars.rda” fn.rds <- “/Users/WR/Box Sync/pmc_folder/courses.18/350/datasets/cars.rds” write.file(data1,f=fn.txt) #save as text file write.file(data1,f=fn.rds) #save as a file for R to read again save(data1,data2,file=fn.rda) #use the save command to save several objects
#```
In RStudio, the upper right hand window sows the various objects in your workspace. We can show all the objects in your work space by using the ls() function
ls()
## [1] "data1" "data2" "eli" "file.name" "fn" "my.data" "my.file"
Lets get rid of unnneccessary objects. We will remove the ones we do not want using the rm() function
rm(eli,data1,data2,my.data,file.name)
ls() #list them again
## [1] "fn" "my.file"
Now, read in from the data file named fn.rds
#{r} fn.rds #show the location my.data <- read.file(fn.rds) dim(my.data) #
After reading through the examples from above, and reading in each of the demonstation data sets, try to read in some of your own data. Then try to save it, and then read it again.
Use the file.choose() function as you explore files on your own machine.
Probably using the `read.file’ function would help you.