An example of function evaluation and testing

The lm function is the workhorse for linear regression. It works on raw data and provides very useful diagnostics.

lmCor, originally named setCor, was developed to handle input from correlation matrices. It was eventually elaborated to mimic much of the functionality of lm, and to work with raw data as well as correlations, but with various options added. In the process of doing so, it was necessary to check against lm for various options.

(In 2020, Robert Stupinsky reported an error with zero centering and standarizing. Thus, this check of the revised code was written. It seems to have worked. See the news file for version 2.0.12.)

We first do a simple multiple regression with three predictors and compare that to the lmCor results. By default, lmCor standardizes and zero centers the data. We turn this off for this first example.

Here we go through a number of tests. We need to compare the output carefully to detect errors.

We first use a small data set built into core-R: attitude. We then test it on a larger and more complicated data set: sat.act.

The data

library(psych)
library(psychTools)
describe(attitude)
##            vars  n  mean    sd median trimmed   mad min max range  skew kurtosis   se
## rating        1 30 64.63 12.17   65.5   65.21 10.38  40  85    45 -0.36    -0.77 2.22
## complaints    2 30 66.60 13.31   65.0   67.08 14.83  37  90    53 -0.22    -0.68 2.43
## privileges    3 30 53.13 12.24   51.5   52.75 10.38  30  83    53  0.38    -0.41 2.23
## learning      4 30 56.37 11.74   56.5   56.58 14.83  34  75    41 -0.05    -1.22 2.14
## raises        5 30 64.63 10.40   63.5   64.50 11.12  43  88    45  0.20    -0.60 1.90
## critical      6 30 74.77  9.89   77.5   75.83  7.41  49  92    43 -0.87     0.17 1.81
## advance       7 30 42.93 10.29   41.0   41.83  8.90  25  72    47  0.85     0.47 1.88
R <- lowerCor(attitude) #find the correlations for further testing
##            ratng cmpln prvlg lrnng raiss crtcl advnc
## rating     1.00                                     
## complaints 0.83  1.00                               
## privileges 0.43  0.56  1.00                         
## learning   0.62  0.60  0.49  1.00                   
## raises     0.59  0.67  0.45  0.64  1.00             
## critical   0.16  0.19  0.15  0.12  0.38  1.00       
## advance    0.16  0.22  0.34  0.53  0.57  0.28  1.00

Simple regression

We need to specify some of the options for lmCor to make the results match lm. In addition, we need to wrap the lmCor commands with a print statement to get the same number of decimal places as lm

summary(lm(rating ~ complaints + privileges + learning, data=attitude))
## 
## Call:
## lm(formula = rating ~ complaints + privileges + learning, data = attitude)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.2012  -5.7478   0.5599   5.8226  11.3241 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  11.2583     7.3183   1.538   0.1360    
## complaints    0.6824     0.1288   5.296 1.54e-05 ***
## privileges   -0.1033     0.1293  -0.799   0.4318    
## learning      0.2380     0.1394   1.707   0.0997 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.863 on 26 degrees of freedom
## Multiple R-squared:  0.715,  Adjusted R-squared:  0.6821 
## F-statistic: 21.74 on 3 and 26 DF,  p-value: 2.936e-07
print(lmCor(rating ~ complaints + privileges+ learning, data=attitude,std=FALSE,zero=FALSE),digits=4)

## Call: lmCor(y = rating ~ complaints + privileges + learning, data = attitude, 
##     std = FALSE, zero = FALSE)
## 
## Multiple Regression from raw data 
## 
##  DV =  rating 
##               slope     se       t         p lower.ci upper.ci    VIF    Vy.x
## (Intercept) 11.2583 7.3183  1.5384 1.360e-01  -3.7848  26.3014 1.0000  0.0000
## complaints   0.6824 0.1288  5.2964 1.539e-05   0.4176   0.9473 1.8120  0.6161
## privileges  -0.1033 0.1293 -0.7985 4.318e-01  -0.3692   0.1626 1.5421 -0.0442
## learning     0.2380 0.1394  1.7070 9.974e-02  -0.0486   0.5245 1.6485  0.1431
## 
## Residual Standard Error =  6.863  with  26  degrees of freedom
## 
##  Multiple Regression
##             R    R2    Ruw   R2uw Shrunken R2 SE of R2 overall F df1 df2         p
## rating 0.8456 0.715 0.6942 0.4819      0.6821   0.0728   21.7432   3  26 2.936e-07
lmCor(rating ~ complaints + privileges+ learning, data=attitude) #default values

## Call: lmCor(y = rating ~ complaints + privileges + learning, data = attitude)
## 
## Multiple Regression from raw data 
## 
##  DV =  rating 
##             slope   se     t       p lower.ci upper.ci  VIF  Vy.x
## (Intercept)  0.00 0.10  0.00 1.0e+00    -0.22     0.22 1.00  0.00
## complaints   0.75 0.14  5.30 1.5e-05     0.46     1.04 1.81  0.62
## privileges  -0.10 0.13 -0.80 4.3e-01    -0.37     0.16 1.54 -0.04
## learning     0.23 0.13  1.71 1.0e-01    -0.05     0.51 1.65  0.14
## 
## Residual Standard Error =  0.56  with  26  degrees of freedom
## 
##  Multiple Regression
##           R   R2  Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2        p
## rating 0.85 0.72 0.75 0.56        0.68     0.07     21.74   3  26 2.94e-07
lmCor(rating ~ complaints + privileges+ learning, data=R,n.obs=30)
## Call: lmCor(y = rating ~ complaints + privileges + learning, data = R, 
##     n.obs = 30)
## 
## Multiple Regression from matrix input 
## 
##  DV =  rating 
##            slope   se     t       p lower.ci upper.ci  VIF  Vy.x
## complaints  0.75 0.14  5.30 1.5e-05     0.46     1.04 1.81  0.62
## privileges -0.10 0.13 -0.80 4.3e-01    -0.37     0.16 1.54 -0.04
## learning    0.23 0.13  1.71 1.0e-01    -0.05     0.51 1.65  0.14
## 
## Residual Standard Error =  0.56  with  26  degrees of freedom
## 
##  Multiple Regression
##           R   R2  Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2        p
## rating 0.85 0.72 0.75 0.56        0.68     0.07     21.74   3  26 2.94e-07

Linear regression with an interaction

To find interactions, we express it with a product. Unless we zero center, this changes the interpretability of the first order terms.

mod1 <- lm(rating ~ complaints + privileges + learning, data=attitude)
mod2 <- lm(rating ~ complaints * privileges + learning, data=attitude)
#now show them
summary(mod1)
## 
## Call:
## lm(formula = rating ~ complaints + privileges + learning, data = attitude)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.2012  -5.7478   0.5599   5.8226  11.3241 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  11.2583     7.3183   1.538   0.1360    
## complaints    0.6824     0.1288   5.296 1.54e-05 ***
## privileges   -0.1033     0.1293  -0.799   0.4318    
## learning      0.2380     0.1394   1.707   0.0997 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.863 on 26 degrees of freedom
## Multiple R-squared:  0.715,  Adjusted R-squared:  0.6821 
## F-statistic: 21.74 on 3 and 26 DF,  p-value: 2.936e-07
summary(mod2)
## 
## Call:
## lm(formula = rating ~ complaints * privileges + learning, data = attitude)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.5795  -4.9937   0.2601   5.9985  10.5996 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)  
## (Intercept)            0.118124  27.312048   0.004   0.9966  
## complaints             0.855545   0.428889   1.995   0.0571 .
## privileges             0.141025   0.591132   0.239   0.8134  
## learning               0.219815   0.147999   1.485   0.1500  
## complaints:privileges -0.003405   0.008032  -0.424   0.6753  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.974 on 25 degrees of freedom
## Multiple R-squared:  0.717,  Adjusted R-squared:  0.6718 
## F-statistic: 15.84 on 4 and 25 DF,  p-value: 1.396e-06
#zero center
mod3  <- lm(rating ~ complaints * privileges + learning, data=data.frame(scale(attitude,scale=FALSE)))
summary(mod3)
## 
## Call:
## lm(formula = rating ~ complaints * privileges + learning, data = data.frame(scale(attitude, 
##     scale = FALSE)))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.5795  -4.9937   0.2601   5.9985  10.5996 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            0.299341   1.455962   0.206    0.839    
## complaints             0.674642   0.132205   5.103 2.86e-05 ***
## privileges            -0.085728   0.137807  -0.622    0.540    
## learning               0.219815   0.147999   1.485    0.150    
## complaints:privileges -0.003405   0.008032  -0.424    0.675    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.974 on 25 degrees of freedom
## Multiple R-squared:  0.717,  Adjusted R-squared:  0.6718 
## F-statistic: 15.84 on 4 and 25 DF,  p-value: 1.396e-06
anova(mod2,mod3)
## Analysis of Variance Table
## 
## Model 1: rating ~ complaints * privileges + learning
## Model 2: rating ~ complaints * privileges + learning
##   Res.Df    RSS Df   Sum of Sq F Pr(>F)
## 1     25 1215.9                        
## 2     25 1215.9  0 -2.5011e-12

Compare to lmCor

sc1 <- lmCor(rating ~ complaints + privileges + learning, data=attitude)

sc2 <- lmCor(rating ~ complaints * privileges + learning, data=attitude)

summary(sc1)
## 
## Multiple Regression from raw data 
## lmCor(y = rating ~ complaints + privileges + learning, data = attitude)
## 
## Multiple Regression from matrix input 
## 
## Beta weights 
##             rating
## (Intercept)   0.00
## complaints    0.75
## privileges   -0.10
## learning      0.23
## 
## Multiple R 
## rating 
##   0.85 
## 
## Multiple R2 
## rating 
##   0.72 
## 
## Cohen's set correlation R2 
## [1] 0.72
## 
## Squared Canonical Correlations
## NULL
summary(sc2)
## 
## Multiple Regression from raw data 
## lmCor(y = rating ~ complaints * privileges + learning, data = attitude)
## 
## Multiple Regression from matrix input 
## 
## Beta weights 
##                       rating
## (Intercept)            0.000
## complaints             0.738
## privileges            -0.086
## learning               0.212
## complaints*privileges -0.049
## 
## Multiple R 
## rating 
##   0.85 
## 
## Multiple R2 
## rating 
##   0.72 
## 
## Cohen's set correlation R2 
## [1] 0.72
## 
## Squared Canonical Correlations
## NULL
anova(sc1,sc2)
## Model 1 = lmCor(y = rating ~ complaints + privileges + learning, data = attitude)
## Model 2 = lmCor(y = rating ~ complaints * privileges + learning, data = attitude)
## $rating
##   Res Df   Res SS Diff df    Diff SS         F Pr(F > )
## 1     26 8.264871      NA         NA        NA       NA
## 2     25 8.205890       1 0.05898142 0.1796924 0.675263