--- title: "350.wk2.data" author: "William Revelle" date: "`r Sys.Date()`" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) options(width=100) ``` # Some comments about characters and numbers Most of the data we analyze is `numeric`. But sometimes we will have variables that are `character`. This will lead to some interesting problems. Consider the following dataset created by a fellow student. First, we need to remember to make `psych` active. ```{r} library(psych) library(psychTools) ``` ```{r} fn <- "https://personality-project.org/courses/350/datasets/hp.csv" hp <- read.file(fn) headTail(hp) #just show the first and last 4 lines summary(hp) #this the R way of summarizing describe(hp) #this is the psych way of describing ``` That some of the data are `character` means that the `cor` function will not work. `describe and `lowerCor` converts the character data to numeric using the `char2numeric` function and then does normal operations on the data. But this leads to some confusion, in that characters are converted to numeric values in alphabetical order. Thus, `female' becomes 1 and `male` becomes 2, but `man` becomes 1 and `woman` becomes 2. To let you it has automatically done this conversion, it adds and * to the variable name. Thus sex, and Gen are renamed as sex* and Gen*. Look at the correlations. sex* and Gen* are negative correlated. ```{r} lowerCor(hp) ``` This is easy to see if we show the data after we convert it using char2numeric. ```{r} converted <- char2numeric(hp) headTail(converted) ```