The lm function is the workhorse for linear regression. It works on raw data and provides very useful diagnostics.
lmCor, originally named setCor, was developed to handle input from correlation matrices. It was eventually elaborated to mimic much of the functionality of lm, and to work with raw data as well as correlations, but with various options added. In the process of doing so, it was necessary to check against lm for various options.
(In 2020, Robert Stupinsky reported an error with zero centering and standarizing. Thus, this check of the revised code was written. It seems to have worked. See the news file for version 2.0.12.)
We first do a simple multiple regression with three predictors and compare that to the lmCor results. By default, lmCor standardizes and zero centers the data. We turn this off for this first example.
Here we go through a number of tests. We need to compare the output carefully to detect errors.
We first use a small data set built into core-R:
attitude
. We then test it on a larger and more complicated
data set: sat.act
.
library(psych)
library(psychTools)
describe(attitude)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## rating 1 30 64.63 12.17 65.5 65.21 10.38 40 85 45 -0.36 -0.77 2.22
## complaints 2 30 66.60 13.31 65.0 67.08 14.83 37 90 53 -0.22 -0.68 2.43
## privileges 3 30 53.13 12.24 51.5 52.75 10.38 30 83 53 0.38 -0.41 2.23
## learning 4 30 56.37 11.74 56.5 56.58 14.83 34 75 41 -0.05 -1.22 2.14
## raises 5 30 64.63 10.40 63.5 64.50 11.12 43 88 45 0.20 -0.60 1.90
## critical 6 30 74.77 9.89 77.5 75.83 7.41 49 92 43 -0.87 0.17 1.81
## advance 7 30 42.93 10.29 41.0 41.83 8.90 25 72 47 0.85 0.47 1.88
R <- lowerCor(attitude) #find the correlations for further testing
## ratng cmpln prvlg lrnng raiss crtcl advnc
## rating 1.00
## complaints 0.83 1.00
## privileges 0.43 0.56 1.00
## learning 0.62 0.60 0.49 1.00
## raises 0.59 0.67 0.45 0.64 1.00
## critical 0.16 0.19 0.15 0.12 0.38 1.00
## advance 0.16 0.22 0.34 0.53 0.57 0.28 1.00
We need to specify some of the options for lmCor
to make
the results match lm
. In addition, we need to wrap the
lmCor
commands with a print statement to get the same
number of decimal places as lm
summary(lm(rating ~ complaints + privileges + learning, data=attitude))
##
## Call:
## lm(formula = rating ~ complaints + privileges + learning, data = attitude)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.2012 -5.7478 0.5599 5.8226 11.3241
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.2583 7.3183 1.538 0.1360
## complaints 0.6824 0.1288 5.296 1.54e-05 ***
## privileges -0.1033 0.1293 -0.799 0.4318
## learning 0.2380 0.1394 1.707 0.0997 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.863 on 26 degrees of freedom
## Multiple R-squared: 0.715, Adjusted R-squared: 0.6821
## F-statistic: 21.74 on 3 and 26 DF, p-value: 2.936e-07
print(lmCor(rating ~ complaints + privileges+ learning, data=attitude,std=FALSE,zero=FALSE),digits=4)
## Call: lmCor(y = rating ~ complaints + privileges + learning, data = attitude,
## std = FALSE, zero = FALSE)
##
## Multiple Regression from raw data
##
## DV = rating
## slope se t p lower.ci upper.ci VIF Vy.x
## (Intercept) 11.2583 7.3183 1.5384 1.360e-01 -3.7848 26.3014 1.0000 0.0000
## complaints 0.6824 0.1288 5.2964 1.539e-05 0.4176 0.9473 1.8120 0.6161
## privileges -0.1033 0.1293 -0.7985 4.318e-01 -0.3692 0.1626 1.5421 -0.0442
## learning 0.2380 0.1394 1.7070 9.974e-02 -0.0486 0.5245 1.6485 0.1431
##
## Residual Standard Error = 6.863 with 26 degrees of freedom
##
## Multiple Regression
## R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
## rating 0.8456 0.715 0.6942 0.4819 0.6821 0.0728 21.7432 3 26 2.936e-07
lmCor(rating ~ complaints + privileges+ learning, data=attitude) #default values
## Call: lmCor(y = rating ~ complaints + privileges + learning, data = attitude)
##
## Multiple Regression from raw data
##
## DV = rating
## slope se t p lower.ci upper.ci VIF Vy.x
## (Intercept) 0.00 0.10 0.00 1.0e+00 -0.22 0.22 1.00 0.00
## complaints 0.75 0.14 5.30 1.5e-05 0.46 1.04 1.81 0.62
## privileges -0.10 0.13 -0.80 4.3e-01 -0.37 0.16 1.54 -0.04
## learning 0.23 0.13 1.71 1.0e-01 -0.05 0.51 1.65 0.14
##
## Residual Standard Error = 0.56 with 26 degrees of freedom
##
## Multiple Regression
## R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
## rating 0.85 0.72 0.75 0.56 0.68 0.07 21.74 3 26 2.94e-07
lmCor(rating ~ complaints + privileges+ learning, data=R,n.obs=30)
## Call: lmCor(y = rating ~ complaints + privileges + learning, data = R,
## n.obs = 30)
##
## Multiple Regression from matrix input
##
## DV = rating
## slope se t p lower.ci upper.ci VIF Vy.x
## complaints 0.75 0.14 5.30 1.5e-05 0.46 1.04 1.81 0.62
## privileges -0.10 0.13 -0.80 4.3e-01 -0.37 0.16 1.54 -0.04
## learning 0.23 0.13 1.71 1.0e-01 -0.05 0.51 1.65 0.14
##
## Residual Standard Error = 0.56 with 26 degrees of freedom
##
## Multiple Regression
## R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
## rating 0.85 0.72 0.75 0.56 0.68 0.07 21.74 3 26 2.94e-07
To find interactions, we express it with a product. Unless we zero center, this changes the interpretability of the first order terms.
mod1 <- lm(rating ~ complaints + privileges + learning, data=attitude)
mod2 <- lm(rating ~ complaints * privileges + learning, data=attitude)
#now show them
summary(mod1)
##
## Call:
## lm(formula = rating ~ complaints + privileges + learning, data = attitude)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.2012 -5.7478 0.5599 5.8226 11.3241
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.2583 7.3183 1.538 0.1360
## complaints 0.6824 0.1288 5.296 1.54e-05 ***
## privileges -0.1033 0.1293 -0.799 0.4318
## learning 0.2380 0.1394 1.707 0.0997 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.863 on 26 degrees of freedom
## Multiple R-squared: 0.715, Adjusted R-squared: 0.6821
## F-statistic: 21.74 on 3 and 26 DF, p-value: 2.936e-07
summary(mod2)
##
## Call:
## lm(formula = rating ~ complaints * privileges + learning, data = attitude)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.5795 -4.9937 0.2601 5.9985 10.5996
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.118124 27.312048 0.004 0.9966
## complaints 0.855545 0.428889 1.995 0.0571 .
## privileges 0.141025 0.591132 0.239 0.8134
## learning 0.219815 0.147999 1.485 0.1500
## complaints:privileges -0.003405 0.008032 -0.424 0.6753
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.974 on 25 degrees of freedom
## Multiple R-squared: 0.717, Adjusted R-squared: 0.6718
## F-statistic: 15.84 on 4 and 25 DF, p-value: 1.396e-06
#zero center
mod3 <- lm(rating ~ complaints * privileges + learning, data=data.frame(scale(attitude,scale=FALSE)))
summary(mod3)
##
## Call:
## lm(formula = rating ~ complaints * privileges + learning, data = data.frame(scale(attitude,
## scale = FALSE)))
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.5795 -4.9937 0.2601 5.9985 10.5996
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.299341 1.455962 0.206 0.839
## complaints 0.674642 0.132205 5.103 2.86e-05 ***
## privileges -0.085728 0.137807 -0.622 0.540
## learning 0.219815 0.147999 1.485 0.150
## complaints:privileges -0.003405 0.008032 -0.424 0.675
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.974 on 25 degrees of freedom
## Multiple R-squared: 0.717, Adjusted R-squared: 0.6718
## F-statistic: 15.84 on 4 and 25 DF, p-value: 1.396e-06
anova(mod2,mod3)
## Analysis of Variance Table
##
## Model 1: rating ~ complaints * privileges + learning
## Model 2: rating ~ complaints * privileges + learning
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 25 1215.9
## 2 25 1215.9 0 -2.5011e-12
sc1 <- lmCor(rating ~ complaints + privileges + learning, data=attitude)
sc2 <- lmCor(rating ~ complaints * privileges + learning, data=attitude)
summary(sc1)
##
## Multiple Regression from raw data
## lmCor(y = rating ~ complaints + privileges + learning, data = attitude)
##
## Multiple Regression from matrix input
##
## Beta weights
## rating
## (Intercept) 0.00
## complaints 0.75
## privileges -0.10
## learning 0.23
##
## Multiple R
## rating
## 0.85
##
## Multiple R2
## rating
## 0.72
##
## Cohen's set correlation R2
## [1] 0.72
##
## Squared Canonical Correlations
## NULL
summary(sc2)
##
## Multiple Regression from raw data
## lmCor(y = rating ~ complaints * privileges + learning, data = attitude)
##
## Multiple Regression from matrix input
##
## Beta weights
## rating
## (Intercept) 0.000
## complaints 0.738
## privileges -0.086
## learning 0.212
## complaints*privileges -0.049
##
## Multiple R
## rating
## 0.85
##
## Multiple R2
## rating
## 0.72
##
## Cohen's set correlation R2
## [1] 0.72
##
## Squared Canonical Correlations
## NULL
anova(sc1,sc2)
## Model 1 = lmCor(y = rating ~ complaints + privileges + learning, data = attitude)
## Model 2 = lmCor(y = rating ~ complaints * privileges + learning, data = attitude)
## $rating
## Res Df Res SS Diff df Diff SS F Pr(F > )
## 1 26 8.264871 NA NA NA NA
## 2 25 8.205890 1 0.05898142 0.1796924 0.675263