Part of the lecture notes and assignments for Using R in psychological research at Northwestern University, Spring, 2023.

350: Exercises for Week 2: Reading and writing data

Before it is possible to use R for analysis, we must first get the data. Data files come in many different flavors. Here we will explore how to read in data from the clipboard, from text and csv files, as well as from SPSS.

Preliminaries, using RMarkdown to annotate and show your work

We run this in the script window of RStudio so that we can keep our notes. This way we can embed text (what you are reading) with the actual R commands and the R output. This is a convenient way to remember what you are doing.

Before we do anything, we need to set up RMarkdown so it has nice parameters. I show the actual commands issued which are hidden when we Knitr.

{r setup, include=FALSE}

knitr::opts_chunk$set(echo = TRUE)

options(width=100) #This sets the width of the output, 80 seems to be the default and is too narrow

To make these commands run in R, you precede the first line with three ``` (below the tilda key on the keyboard) and then close the last line by adding three more ```

This entire Rmd file is saved in the class notes folder so that you can see how the Markdown commands are written.

Using the template to do this example

First, start up RStudio.

Then, the 350.wk2.Rmd may be read using your browser. Copy the entire Rmd file and then paste it ito the Rmarkdown window.

You are now ready to create the file yourself.

Creating a RMarkdown script

Open RStudio

Create a new file by choosing the File menu (with the R markdown option). You now have a Rmarkdown template that you can modify with the commands that you want. Remember to make some R code run in your template, precede what you want with three ``` and then {r} new line with some r commands followed eventually with three more ```

e.g.

```{r}

R commands

```

Reading the data using the psych package.

Much of this is summarized in the vignette: An introduction to the psych package: Part I: data entry and data description which you may get by finding the vignettes for psych.

For these examples, we first need to activate the psych and the psychTools packages.

We will read the data using several different approaches. For each of these approaches, we will save the data in the object `my.data’. You can, of course, call this object anything you want.

library(psych)       #this assumes we have already installed psych
library(psychTools)  #this is needed for some addtional data sets and tools

Just read from the clipboard

If you have a data set that you have read from a web browser, or found in a file that you viewed, you can copy the file to your clipboard (using the appropriate commands for your system) and then read the clipboard into R.

First, we use our browser to read the remote file:

http://personality-project.org/r/datasets/simulation.txt

Select all elements of the file and copy to the clipboard. Then

#my.data <- read.clipboard()  #this takes what is in the clipboard and makes into the my.data object
#clearly, since this is an interactive command, I can not show this in a script

Now, lets see what we got. We will ask for the dimensions of my.data, show the first and last few lines, and then get some basic descriptive statistics.

But, we can not do this in a script. so we will do the following:

my.file <- "/Users/WR/Library/CloudStorage/OneDrive-NorthwesternUniversity/pmc/courses.23/350/datasets/simulation.txt"

my.data <- read.file(my.file)
## Data from the .txt file /Users/WR/Library/CloudStorage/OneDrive-NorthwesternUniversity/pmc/courses.23/350/datasets/simulation.txt has been loaded.
dim(my.data)  #what is the size of the object we read?
## [1] 72  7
headTail(my.data)  #show the first and last 4 lines of the object
##     Time Anxiety Impulsivity sex Arousal Tension Performance
## 1      9       4           9   1      50      55          40
## 2     19       8           8   1      70      64          90
## 3      9       5          10   2      50      69          48
## 4      9       4           1   2      57      55          68
## ...  ...     ...         ... ...     ...     ...         ...
## 69    19       6           1   1      66      53          88
## 70     9       5          10   2      48      63          40
## 71    19       6           8   2      69      60          95
## 72    19      10           1   2      66      48          93
describe(my.data)  #get some descriptive statistics of this object
##             vars  n  mean    sd median trimmed   mad min max range  skew kurtosis   se
## Time           1 72 14.28  5.03   19.0   14.34  0.00   9  19    10 -0.11    -2.02 0.59
## Anxiety        2 72  5.24  2.18    5.0    5.24  2.97   0  10    10 -0.04    -0.65 0.26
## Impulsivity    3 72  4.90  3.98    4.5    4.88  5.19   0  10    10  0.02    -1.83 0.47
## sex            4 72  1.50  0.50    1.5    1.50  0.74   1   2     1  0.00    -2.03 0.06
## Arousal        5 72 60.90  8.10   66.0   61.29  5.93  48  70    22 -0.27    -1.67 0.96
## Tension        6 72 56.83  6.29   57.0   57.14  5.93  38  69    31 -0.53     0.42 0.74
## Performance    7 72 72.21 17.41   78.0   73.19 18.53  38  98    60 -0.43    -1.10 2.05

Or we can specify the file name and then use the read.file command

Instead of reading from the clipboard, we can specify the local or remote location of the file and read it directly.

file.name <- "http://personality-project.org/r/datasets/simulation.txt"  
my.data <- read.file(file.name)  #goes to the remote location and reads it
## Data from the .txt file http://personality-project.org/r/datasets/simulation.txt has been loaded.

Once again, we want to see what we got.

dim(my.data)  #what is the size of the object we read?
## [1] 72  7
headTail(my.data)  #show the first and last 4 lines of the object
##     Time Anxiety Impulsivity sex Arousal Tension Performance
## 1      9       4           9   1      50      55          40
## 2     19       8           8   1      70      64          90
## 3      9       5          10   2      50      69          48
## 4      9       4           1   2      57      55          68
## ...  ...     ...         ... ...     ...     ...         ...
## 69    19       6           1   1      66      53          88
## 70     9       5          10   2      48      63          40
## 71    19       6           8   2      69      60          95
## 72    19      10           1   2      66      48          93
describe(my.data)  #get some descriptive statistics of this object
##             vars  n  mean    sd median trimmed   mad min max range  skew kurtosis   se
## Time           1 72 14.28  5.03   19.0   14.34  0.00   9  19    10 -0.11    -2.02 0.59
## Anxiety        2 72  5.24  2.18    5.0    5.24  2.97   0  10    10 -0.04    -0.65 0.26
## Impulsivity    3 72  4.90  3.98    4.5    4.88  5.19   0  10    10  0.02    -1.83 0.47
## sex            4 72  1.50  0.50    1.5    1.50  0.74   1   2     1  0.00    -2.03 0.06
## Arousal        5 72 60.90  8.10   66.0   61.29  5.93  48  70    22 -0.27    -1.67 0.96
## Tension        6 72 56.83  6.29   57.0   57.14  5.93  38  69    31 -0.53     0.42 0.74
## Performance    7 72 72.21 17.41   78.0   73.19 18.53  38  98    60 -0.43    -1.10 2.05

Read a local file using file.choose()

We can find the file on our local hard disk by looking for it with the file.choose command. Unfortunately, I need to comment out this statement because I can not dynamically do it as part of a script. So, I will make up a new object `fn’ (file.name) which I will set to what we got before

#next line is suppressed because we can not do it interactively
#so instead, we will define fn as file.name
#fn <-file.choose() #  this opens your system to look for the file
fn <-  "https://personality-project.org/courses/350/datasets/simulation.txt"  #from my looking for it
fn # show the name of the file
## [1] "https://personality-project.org/courses/350/datasets/simulation.txt"
my.data <- read.file(fn) 
## Data from the .txt file https://personality-project.org/courses/350/datasets/simulation.txt has been loaded.
dim(my.data)  #still the 72 by 7 data file
## [1] 72  7

Unfortunately, although this example will work on my machine, because it is reading a local file, this will not work on your computer. You can change the script to choose a text file from your computer.

Combining file.choose and read.file into one command

If I do not specify the name of the file (fn) in my read.file command, R will open a system window to let you find it on your machine. What it is doing is calling the file.choose function for you.

I can not show this for your computer, but you can try it on your machine.

#my.data <- read.file()
dim(my.data)
## [1] 72  7

Reading an SPSS file

SPSS saves the data in format with the .sav suffix. We can read these data in using read.file. Eli Finkel has shared a small SPSS.sav file .

If you have an spss file on your computer, you could try opening this way.

fn <- "http://personality-project.org/r/datasets/finkel.sav" 
eli <- read.file(fn)  #go and get it and convert to a normal data.frame
## Data from the SPSS sav file http://personality-project.org/r/datasets/finkel.sav has been loaded.
dim(eli)
## [1] 69  5
headTail(eli)
##      USER HAPPY SOULMATE ENJOYDEX UPSET
## 1   "001"     4        7        7     1
## 2   "003"     6        5        7     0
## 3   "004"     6        7        7     0
## 4   "005"     6        7        7     0
## ...  <NA>   ...      ...      ...   ...
## 66  "074"     7        7        7     1
## 67  "075"     6        7        7     1
## 68  "076"     7        7        7     0
## 69  "078"     2        7        7     1
colnames(eli)
## [1] "USER"     "HAPPY"    "SOULMATE" "ENJOYDEX" "UPSET"
describe(eli)
##          vars  n  mean    sd median trimmed   mad min max range  skew kurtosis   se
## USER*       1 69 35.00 20.06     35   35.00 25.20   1  69    68  0.00    -1.25 2.42
## HAPPY       2 69  5.71  1.04      6    5.82  0.00   2   7     5 -1.17     1.62 0.13
## SOULMATE    3 69  5.09  1.80      5    5.32  1.48   1   7     6 -0.88    -0.03 0.22
## ENJOYDEX    4 68  6.47  1.01      7    6.70  0.00   2   7     5 -2.37     5.92 0.12
## UPSET       5 69  0.41  0.49      0    0.39  0.00   0   1     1  0.38    -1.89 0.06

Keeping (viewing) the original codes

By default, the read.file function translates complex coding systems into numercal values. Sometimes you want to see the actual encoding of the SPSS file. You can do this by specifying ‘use.value.labels=TRUE’. Compare the next two objects. (Taken from the the help pages of an SPSS online training workshop at Central Michigan University).

fn <- "http://personality-project.org/r/datasets/Cars.sav" 
data1 <- read.file(fn)  #go and get it and convert to a normal data.frame
## Data from the SPSS sav file http://personality-project.org/r/datasets/Cars.sav has been loaded.
data2 <- read.file(fn,use.value.labels=TRUE) #don't convert the value labels
## Data from the SPSS sav file http://personality-project.org/r/datasets/Cars.sav has been loaded.
headTail(data1)   #look at the first and last few lines
##     MPG ENGINE HORSE WEIGHT ACCEL YEAR ORIGIN CYLINDER FILTER_.
## 1    18    307   130   3504    12   70      1        8        0
## 2    15    350   165   3693  11.5   70      1        8        0
## 3    18    318   150   3436    11   70      1        8        0
## 4    16    304   150   3433    12   70      1        8        0
## ... ...    ...   ...    ...   ...  ...    ...      ...      ...
## 403  44     97    52   2130  24.6   82      2        4        1
## 404  32    135    84   2295  11.6   82      1        4        1
## 405  28    120    79   2625  18.6   82      1        4        1
## 406  31    119    82   2720  19.4   82      1        4        1
headTail(data2)   #notice we now have the values as entered
##     MPG ENGINE HORSE WEIGHT ACCEL YEAR   ORIGIN    CYLINDER     FILTER_.
## 1    18    307   130   3504    12   70 American 8 Cylinders Not Selected
## 2    15    350   165   3693  11.5   70 American 8 Cylinders Not Selected
## 3    18    318   150   3436    11   70 American 8 Cylinders Not Selected
## 4    16    304   150   3433    12   70 American 8 Cylinders Not Selected
## ... ...    ...   ...    ...   ... <NA>     <NA>        <NA>         <NA>
## 403  44     97    52   2130  24.6   82 European 4 Cylinders     Selected
## 404  32    135    84   2295  11.6   82 American 4 Cylinders     Selected
## 405  28    120    79   2625  18.6   82 American 4 Cylinders     Selected
## 406  31    119    82   2720  19.4   82 American 4 Cylinders     Selected

The describe function (in psych) will describe both data sets. It converts the levels information in the second data set into numeric values and then does the desription. Note that the conversion of the year variable (was 1 to 13 in the in the spss converted file, but 70-82 in the describe converted object.

describe(data1)
##          vars   n    mean     sd median trimmed    mad min    max  range  skew kurtosis    se
## MPG         1 398   23.51   7.82   23.0   23.06   8.90   9   46.6   37.6  0.45    -0.53  0.39
## ENGINE      2 406  194.04 105.21  148.5  183.75  86.73   4  455.0  451.0  0.69    -0.81  5.22
## HORSE       3 400  104.83  38.52   95.0  100.36  29.65  46  230.0  184.0  1.04     0.55  1.93
## WEIGHT      4 406 2969.56 849.83 2811.0 2913.97 947.38 732 5140.0 4408.0  0.46    -0.77 42.18
## ACCEL       5 406   15.50   2.82   15.5   15.45   2.59   8   24.8   16.8  0.21     0.35  0.14
## YEAR        6 405   75.94   3.74   76.0   75.93   4.45  70   82.0   12.0  0.02    -1.21  0.19
## ORIGIN      7 405    1.57   0.80    1.0    1.46   0.00   1    3.0    2.0  0.92    -0.81  0.04
## CYLINDER    8 405    5.47   1.71    4.0    5.35   0.00   3    8.0    5.0  0.51    -1.41  0.08
## FILTER_.    9 398    0.73   0.44    1.0    0.79   0.00   0    1.0    1.0 -1.04    -0.92  0.02
describe(data2)
##           vars   n    mean     sd median trimmed    mad min    max  range  skew kurtosis    se
## MPG          1 398   23.51   7.82   23.0   23.06   8.90   9   46.6   37.6  0.45    -0.53  0.39
## ENGINE       2 406  194.04 105.21  148.5  183.75  86.73   4  455.0  451.0  0.69    -0.81  5.22
## HORSE        3 400  104.83  38.52   95.0  100.36  29.65  46  230.0  184.0  1.04     0.55  1.93
## WEIGHT       4 406 2969.56 849.83 2811.0 2913.97 947.38 732 5140.0 4408.0  0.46    -0.77 42.18
## ACCEL        5 406   15.50   2.82   15.5   15.45   2.59   8   24.8   16.8  0.21     0.35  0.14
## YEAR*        6 405    6.94   3.74    7.0    6.93   4.45   1   13.0   12.0  0.02    -1.21  0.19
## ORIGIN*      7 405    1.57   0.80    1.0    1.46   0.00   1    3.0    2.0  0.92    -0.81  0.04
## CYLINDER*    8 405    3.20   1.33    2.0    3.14   0.00   1    5.0    4.0  0.27    -1.69  0.07
## FILTER_.*    9 398    1.73   0.44    2.0    1.79   0.00   1    2.0    1.0 -1.04    -0.92  0.02

Writing data

Just as there are several input formats, so are there several output formats.

Collections of files that are to be read in again from R can be `saved’ as .Rda files (Rdata files).

A single file can be written as an .rds file.

Files can also written as text files so that other programs outside of R can read them.

You choose the way you want to write and save the file by specifying the suffix:

.text becomes a normal text file (that is to say, readable by a word processor)

.rds becomes a file readable by R

.rda can save multiple objects

Creating a file and writing to it

To create a new file on your disk, use the file.choose function with new=TRUE or just write.file(object, f=) That is to say, write.file by specifying the object to save, and f= where to save it.

#```{r} # fn.txt <- file.choose(new=TRUE) commented out but creates fn.txt <- “/Users/WR/Box Sync/pmc_folder/courses.18/350/datasets/cars.txt” fn.rda <- “/Users/WR/Box Sync/pmc_folder/courses.18/350/datasets/cars.rda” fn.rds <- “/Users/WR/Box Sync/pmc_folder/courses.18/350/datasets/cars.rds” write.file(data1,f=fn.txt) #save as text file write.file(data1,f=fn.rds) #save as a file for R to read again save(data1,data2,file=fn.rda) #use the save command to save several objects

#```

Showing and clearing your workspace

In RStudio, the upper right hand window sows the various objects in your workspace. We can show all the objects in your work space by using the ls() function

ls()
## [1] "data1"     "data2"     "eli"       "file.name" "fn"        "my.data"   "my.file"

Cleaning up the workspace

Lets get rid of unnneccessary objects. We will remove the ones we do not want using the rm() function

rm(eli,data1,data2,my.data,file.name)
ls() #list them again
## [1] "fn"      "my.file"

Now, read in from the data file named fn.rds #{r} fn.rds #show the location my.data <- read.file(fn.rds) dim(my.data) #

Assignment for Week 2, part 1

After reading through the examples from above, and reading in each of the demonstation data sets, try to read in some of your own data. Then try to save it, and then read it again.

Use the file.choose() function

Use the file.choose() function as you explore files on your own machine.

Probably using the `read.file’ function would help you.