Consider the following numbers

```A	B	C	D	E	F
1	0	16	7	1	1
2	7	17	62	12	2
3	0	9	0	5	4
4	7	18	35	18	8
5	7	13	5	28	16
6	8	11	10	78	32
7	9	13	14	0	64
8	2	10	48	46	128
9	7	16	0	23	256
10	3	10	13	23	512
11	4	14	8	11	1024
12	4	12	9	34	2048
13	3	22	5	10	4096
14	0	10	59	5	8192
15	5	13	96	24	16384
16	7	22	97	43	32768

```

For each column, find the (arithmetic) mean, median, and standard deviation. How well do these conventional statistics describe the basic characteristics of the data? Arithmetic mean, Median, Standard Deviation

By examining the data, there are possible transformations that might better capture the underlying characteristics of each column. What transforms would you recommend that would make the data easier to understand? Find the same descriptive statistics on these transformed data.

The following code in the R system will do this. (Note that I am shortcutting the input step by copying the data to the clipboad and using a procedure to read the clipboard. My "read.clipboard()" function supposedly combines the code for PCs and Macs into one function. You can get it by downloading my "useful.r" routines.
```source("http://personality-project.org/r/useful.r")   #get a small package of psychometrically useful functions
problem1 <- read.clipboard()   #after first copying the table with the header row from above
summary(problem1) #get the basic summary statistics
boxplot(problem1) #show this graphically
pairs.panels(problem1) #show a graphic with scatterplots and histograms

#produces this output
problem1 <- read.clipboard()   #after first copying the table with the header row from above
> summary(problem1)
A               B               C               D               E
Min.   : 1.00   Min.   :0.000   Min.   : 9.00   Min.   : 0.00   Min.   : 0.00
1st Qu.: 4.75   1st Qu.:2.750   1st Qu.:10.75   1st Qu.: 6.50   1st Qu.: 8.75
Median : 8.50   Median :4.500   Median :13.00   Median :11.50   Median :20.50
Mean   : 8.50   Mean   :4.562   Mean   :14.12   Mean   :29.25   Mean   :22.56
3rd Qu.:12.25   3rd Qu.:7.000   3rd Qu.:16.25   3rd Qu.:50.75   3rd Qu.:29.50
Max.   :16.00   Max.   :9.000   Max.   :22.00   Max.   :97.00   Max.   :78.00
F
Min.   :    1
1st Qu.:   14
Median :  192
Mean   : 4096
3rd Qu.: 2560
Max.   :32768
> boxplot(problem1) #show this graphically
> pairs.panels(problem1) #show a graphic with scatterplots and histograms

```
Note that the boxplot isn't very helpful, because the range of variable F is so great. What happens if we do a log transform of the data?
```logprob <- log(problem1)
summary(logprob)
boxplot(logprob)

logprob <- log(problem1)
> summary(logprob)
A               B                C               D               E
Min.   :0.000   Min.   :  -Inf   Min.   :2.197   Min.   : -Inf   Min.   : -Inf
1st Qu.:1.554   1st Qu.:0.9972   1st Qu.:2.374   1st Qu.:1.862   1st Qu.:2.129
Median :2.138   Median :1.4979   Median :2.565   Median :2.434   Median :3.013
Mean   :1.917   Mean   :  -Inf   Mean   :2.611   Mean   : -Inf   Mean   : -Inf
3rd Qu.:2.505   3rd Qu.:1.9459   3rd Qu.:2.788   3rd Qu.:3.923   3rd Qu.:3.381
Max.   :2.773   Max.   :2.1972   Max.   :3.091   Max.   :4.575   Max.   :4.357
F
Min.   : 0.000
1st Qu.: 2.599
Median : 5.199
Mean   : 5.199
3rd Qu.: 7.798
Max.   :10.397
> boxplot(logprob)
Warning messages:
1: Outlier (-Inf) in 2nd boxplot are NOT drawn in: bplt(at[i], wid = width[i], stats = z\$stats[, i], out = z\$out[z\$group ==
2: Outlier (-Inf) in 4th boxplot are NOT drawn in: bplt(at[i], wid = width[i], stats = z\$stats[, i], out = z\$out[z\$group ==
3: Outlier (-Inf) in 5th boxplot are NOT drawn in: bplt(at[i], wid = width[i], stats = z\$stats[, i], out = z\$out[z\$group ==

```

The complaint about the box plot arises because we logged numbers for B, D and E that were zero. Try adding one to the numbers before taking the logs.

```log1prob <- log(problem1+1)
summary(log1prob)
boxplot(log1prob)

log1prob <- log(problem1+1)
> summary(log1prob)
A                B               C               D               E
Min.   :0.6931   Min.   :0.000   Min.   :2.303   Min.   :0.000   Min.   :0.000
1st Qu.:1.7462   1st Qu.:1.314   1st Qu.:2.463   1st Qu.:2.008   1st Qu.:2.246
Median :2.2499   Median :1.701   Median :2.639   Median :2.518   Median :3.061
Mean   :2.0941   Mean   :1.486   Mean   :2.684   Mean   :2.674   Mean   :2.698
3rd Qu.:2.5835   3rd Qu.:2.079   3rd Qu.:2.848   3rd Qu.:3.942   3rd Qu.:3.414
Max.   :2.8332   Max.   :2.303   Max.   :3.135   Max.   :4.585   Max.   :4.369
F
Min.   : 0.6931
1st Qu.: 2.6742
Median : 5.2044
Mean   : 5.2962
3rd Qu.: 7.7983
Max.   :10.3972
> boxplot(log1prob)

```
The final boxplot is shown below (I have not shown the ones that are not as useful.)