Adding time to a plot and adventures in smoothing

The following plots and instructions show how to put several figures on a page, give an overall label to the page, and to make time the axis. I also show how to subset the data to reject outliers. Finally, the effect of four levels of smoothing in 'lowess' are examined.

In order to show events over time, it is helpful to plot the data as a function of time. This is trivial if the data are equally spaced, but when the data are not equally spaced, it is important to add time to the plot. The following is an analysis of electrical production and consumption as a function of time of year for an energy efficient house. The data are from 38 months of fairly frequent observations of energy produced using PhotoVoltaic roofing and of electrical consumption.

The data are stored in a tab delimited text file produced by a spreadsheet. (e.g., Excel or Open Office)


datafilename="http://personality-project.org/r/datasets/electric.data.txt" #where are the data?
#local data are at datafilename <- "/Library/WebServer/Documents/personalitytheoryf/r/datasets/electric.data.txt"
#alternatively, we can dynamically find out the file name
datafilename <- file.choose()    #get the name of the file by using system commands

electric  <- read.table(datafilename,header=TRUE)  #read the data file
describe(electric)    #get basic descriptive statistics using the describe function

attach(electric)                               #for easier manipulation 

dates <-strptime(as.character(electric$date1), "%m/%d/%y")   #change the date field to a internal form for time  
                                                  #see ?formats and ?POSIXlt  for how to read different formats
                                                  #further note that we need to treat the date information as read from the spreadsheet as character data


electric=electric[,2:17]               #get rid of the original date in the data.frame       
electric=data.frame(date=dates,electric)      #this is now the date in useful fashion

detach(electric)
daterange=c(as.POSIXlt(min(electric$date)),as.POSIXlt(max(electric$date)))   #figure out the lowest and highest months
rm(dates)       #get rid of the old date because otherwise it will be floating around confusing us
attach(electric)
#plot 4 different panels -- energy produced, consumed, net as function of date, concumption by temperature
quartz (height=6,width=9)      #create a graphic device of appropriate size
plot.new()                               #start a new plot 
                                        #set parameters for this graph
par(omi=c(0,0,1,0) )                      #set the size of the outer margins 
par(mfrow=c(2,2))                        #number of rows and columns to graph

plot(date,produced,xaxt="n",ylab="KWH/day")          #don't plot the x axis
axis.POSIXct(1, at=seq(daterange[1], daterange[2], by="month"), format="%b") #label the x axis by months
lines(lowess(date,produced, f=3/10))   #lowess smooth with 30% window
title(main="Seasonal Effects on PV Electrical Production")

plot(date,consumed,xaxt="n",ylab="KWH/day")          #don't plot the x axis
axis.POSIXct(1, at=seq(daterange[1], daterange[2], by="month"), format="%b") #label the x axis by months
lines(lowess(date,consumed, f=3/10))   #lowess smooth with 30% window
title(main="Seasonal Effects on total Electrical Consumption")

net=produced-consumed     #net electrical production
plot(date,net,xaxt="n",ylab="KWH/day")          #don't plot the x axis
axis.POSIXct(1, at=seq(daterange[1], daterange[2], by="month"), format="%b") #label the x axis by months
lines(lowess(date,net, f=3/10))   #lowess smooth with 30% window
title(main="Seasonal Effects on net Electrical Production")
abline(h=0)     #put in the net 0 line for readability

plot(temp1,consumed,main="Electrical consumption by mean daily temperature",xlab="Mean daily temperature",ylab="KWH/day")
#lines(lowess(mean,consumed,f=3/10))   #this does not work if there are missing data

#so, the following function fixes it
  lowess.na <- function(x, y = NULL, f = 2/3,...) {  #do lowess with missing data
  
  x1 <- subset(x,(!is.na(x)) &(!is.na(y)))
  y1 <- subset(y, (!is.na(x)) &(!is.na(y)))
  lowess.na <- lowess(x1,y1,f, ...)
  }
  
  lines(lowess.na(temp1,consumed,f=3/10))   #this does work if there are missing data
   


mtext("Seasonality effects on production and consumption",3,outer=TRUE,line=1,cex=1.5)    #note that we seem to add the global title last
                     #cex = character expansion factor 


Upon inspection of these data, it seems as if during the first winter, there were some days with unusually high electrical consumption. Indeed, this was because the house was still under constrcution. We now redo the analysis with just days with consumption < 40 KWH/day. This makes use of the 'subset' command. However, we also notice that we had very high demand in July of 2005. This was due to about 6 days of air conditioning! We should include these data


normal=subset(electric,(consumed<40) |as.POSIXlt(date)$year>103 )       #select those days with more typical power consumption or after 2003
detach(electric)                         #release the names in this data frame to use the next one

attach(normal)       #use the names in the new data frame

#plot 4 different panels -- energy produced, consumed, net as function of date, consumption by temperature

smooth.value = 3/10                   #allows us to experiment with different smoothing parameters in lowess
                                    

plot.new()                               #start a new plot 
                                        #set parameters for this graph
par(omi=c(0,0,1,0) )                      #set the size of the outer margins 
par(mfrow=c(2,2))                        #number of rows and columns to graph

plot(date,produced,xaxt="n")          #don't plot the x axis
axis.POSIXct(1, at=seq(daterange[1], daterange[2], by="month"), format="%b") #label the x axis by months
lines(lowess(date,produced, f=smooth.value))   #lowess smooth with 30% window
title(main="Seasonal Effects on PV Electrical Production")

plot(date,consumed,xaxt="n")          #don't plot the x axis
axis.POSIXct(1, at=seq(daterange[1], daterange[2], by="month"), format="%b") #label the x axis by months
lines(lowess(date,consumed, f=smooth.value))   #lowess smooth with 30% window
title(main="Seasonal Effects on total Electrical Consumption")

net=produced-consumed     #net electrical production
plot(date,net,xaxt="n")          #don't plot the x axis
axis.POSIXct(1, at=seq(daterange[1], daterange[2], by="month"), format="%b") #label the x axis by months
lines(lowess(date,net, f=smooth.value))   #lowess smooth with 30% window
title(main="Seasonal Effects on net Electrical Production")
abline(h=0)

plot(temp1,consumed,main="Electrical consumption by mean daily temperature",xlab="Mean daily temperature",ylab="KWH/day")
lines(lowess.na(temp1,consumed,f=smooth.value))

mtext("Seasonality effects on production and consumption",3,outer=TRUE,line=1,cex=1.5)    #note that we seem to add the global title last
                     #cex = character expansion factor 
                     
plot.3=recordPlot()   #save this one before redoing with a different smooth value
smooth.value=.5

We can examine the effects of different levels of smoothing either by doing different graphs and saving them, or by plotting multiple smooths on each graph. Lets try multiple smoothing functions.



                                    

plot.new()                               #start a new plot 
                                        #set parameters for this graph
par(omi=c(0,0,1,0) )                      #set the size of the outer margins 
par(mfrow=c(2,2))                        #number of rows and columns to graph

plot(date,produced,xaxt="n")          #don't plot the x axis
axis.POSIXct(1, at=seq(daterange[1], daterange[2], by="month"), format="%b") #label the x axis by months
lines(lowess(date,produced, f=.25))   #lowess smooth with 25% window
lines(lowess(date,produced, f=1/3))   #lowess smooth with 33% window
lines(lowess(date,produced, f=.5))   #lowess smooth with 50% window
lines(lowess(date,produced, f=2/3))   #lowess smooth with 66% window
title(main="Analysis of Seasonal Effects on PV Electrical Production")

plot(date,consumed,xaxt="n")          #don't plot the x axis
axis.POSIXct(1, at=seq(daterange[1], daterange[2], by="month"), format="%b") #label the x axis by months
lines(lowess(date,consumed, f=.25))   #lowess smooth with 25% window
lines(lowess(date,consumed, f=1/3))   #lowess smooth with 33% window
lines(lowess(date,consumed, f=.5))   #lowess smooth with 50% window
lines(lowess(date,consumed, f=2/3))   #lowess smooth with 66% window
title(main="Analysis of Seasonal Effects on total Electrical Consumption")

net=produced-consumed     #net electrical production
plot(date,net,xaxt="n")          #don't plot the x axis
axis.POSIXct(1, at=seq(daterange[1], daterange[2], by="month"), format="%b") #label the x axis by months
lines(lowess(date,net, f=.25))   #lowess smooth with 25% window
lines(lowess(date,net, f=1/3))   #lowess smooth with 33% window
lines(lowess(date,net, f=.5))   #lowess smooth with 50% window
lines(lowess(date,net, f=2/3))   #lowess smooth with 66% window
title(main="Analysis of Seasonal Effects on net Electrical Production")
abline(h=0)

plot(temp1,consumed,main="Electrical consumption by mean daily temperature",xlab="Mean daily temperature",ylab="KWH/day")
lines(lowess.na(temp1,consumed,f=.25))
lines(lowess.na(temp1,consumed,f=1/3))
lines(lowess.na(temp1,consumed,f=.5))
lines(lowess.na(temp1,consumed,f=2/3))

mtext("Analysis of seasonality effects on production and consumption\r with 4 levels of smoothing",3,outer=TRUE,line=1,cex=1.5)    #note that we seem to add the global title last
                     #cex = character expansion factor 
                     

This results in the following 4 (messy) plots:

We would expect a seasonality effect in PV production, but why is there one for consumption? After examining the relationship between external temperature and consumption several hypotheses are likely. Although after one year I had thought this relationship was due to the use of electric pumps to deliver the radiant heat, after another winter's analysis, I believe this was due to overreliance on internal air circulation using HVAC equipment (as advised by an incompetent Heating Contractor) rather than the more appropriate use of an Energy Recovery Ventilation (HRV) system. Based upon the data from the first winter, we stopped using the HVAC system and used the ERV system in the second winter. Notice the decreased in energy consumption. In fact, the relationship is reduced across the years.

Another source of electrical consumption is driving a pump to circulate hot water (from a hot water solar system) to heat a small exercise pool. This can be seen from the plot of consumption by hours of heating, as well as the relationship between external temperature and kwh consumption as a function of hours of heating (a Coplot)


plot(poolheat,consumed,main="Effect of external temperature on electrical Consumption")
abline(lm(consumed~poolheat,data=normal))

 coplot(consumed~mean|poolheat,panel = panel.smooth)
 
 model=lm(consumed~mean+poolheat)
 model=lm(consumed~mean)
 
 
oneday=subset(normal,days==1)
detach(normal)
attach(oneday)
par(mfrow=c(2,1)) 
plot(poolheat,deltaheat,main="change in temperature as a function of hours of heating",ylab="change in temperature",xlab="hours of heating")
mod1=lm(deltaheat~poolheat)
abline(mod1)    #put the regression line in
plot(poolheat,consumed,main="electrical consumption as a function of hours of heating pool",ylab="KWH/day",xlab="Hours of pool heat")
mod2=lm(consumed~poolheat)
abline(mod2)
mtext("Effect of solar pool heating on pool heat and electrical consumption",3,outer=TRUE,line=1,cex=1.5)    #note that we seem to add the global title last
                     #cex = character expansion factor 


The strength of the relationship may be determined by examing the R-square of the regression which in the first case (hours of heating and change in temperature) is quite strong. Although the relationship is not as strong, the effect of running the pump on power consumption suggests that one hour of operation leads to about an additional KWH of power consumption.

summary(mod1)

Call:
lm(formula = deltaheat ~ poolheat)

Residuals:
     Min       1Q   Median       3Q      Max 
-3.76586 -0.37228  0.02131  0.71810  3.23414 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.71810    0.08057  -8.912   <2e-16 ***
poolheat     0.69679    0.03708  18.791   <2e-16 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

Residual standard error: 1.009 on 254 degrees of freedom
Multiple R-Squared: 0.5816,	Adjusted R-squared:  0.58 
F-statistic: 353.1 on 1 and 254 DF,  p-value: < 2.2e-16 


summary(mod2)
summary(mod2)

Call:
lm(formula = consumed ~ poolheat)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.5670 -3.0012 -0.6728  2.2919 16.5615 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  19.7089     0.3291  59.880  < 2e-16 ***
poolheat      1.0901     0.1515   7.196 6.96e-12 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

Residual standard error: 4.121 on 254 degrees of freedom
Multiple R-Squared: 0.1694,	Adjusted R-squared: 0.1661 
F-statistic: 51.79 on 1 and 254 DF,  p-value: 6.956e-12 

Finally, lets try to get some monthly statistics. My solution to doing this sort is a kludge, but works.

month.year <- function (x) {12*(as.POSIXlt(x)$year-103)+as.POSIXlt(x)$mon+1} #months are mod 12 and normally range from 0-11.  this makes them go to 1-12

#attach(normal)          #get the cleaned data
month <- month.year(date) #months start at January 2003 and add from then

mean.produced=tapply(produced,month,mean)  #apply the mean function to amount produced each month
sd.produced=tapply(produced,month,sd)
mean.consumed <- tapply(consumed,month,mean)
sd.consumed <- tapply(consumed,month,sd)
months <- month.name   #names from January to December
for (i in 1 : length(mean.consumed)) {month.observed[i]<- as.numeric(names(mean.consumed[i]))%%12} #first get the months as numbers 0-11
month.observed[month.observed==0]<-12   # make the decembers 12 instead of 0
month.names <- months[as.numeric(month.observed)]

stats <- data.frame(mean.produced,sd.produced,mean.consumed,sd.consumed,month=month.names)
print(stats,digits=3)


#see also month.name and month.abb  for convenient use of months




#gives this output
   mean.produced sd.produced mean.consumed sd.consumed     month
2           7.02          NA          31.5          NA  February
3          14.75        7.44          27.8        4.34     March
4          19.70        9.18          29.9        7.34     April
5          20.76        9.62          15.4        6.78       May
6          24.72        7.45          17.8        5.13      June
7          21.08        7.42          18.9        5.22      July
8          20.19        5.57          17.4        1.96    August
9          17.57        6.63          15.0        3.05 September
10         14.76        5.92          17.7        2.72   October
11          6.69        3.73          26.7        4.07  November
12          7.51        4.49          32.3        4.05  December
13          6.50        5.03          28.1        4.11   January
14          8.26        6.10          26.7        4.62  February
15         12.41        6.08          26.2        5.26     March
16         22.58        5.40          24.1        5.01     April
17         18.49        7.42          21.1        3.86       May
18         20.90        6.86          20.0        3.94      June
19         19.78        6.57          17.6        2.19      July
20         17.79        7.25          18.9        3.98    August
21         19.56        3.58          18.1        2.27 September
22         12.27        7.19          17.7        2.64   October
23          6.98        4.94          23.0        5.52  November
24          6.71        3.95          25.8        6.23  December
25          5.97        5.32          24.8        3.65   January
26          9.28        6.33          21.2        3.23  February
27         14.17        6.36          21.3        3.91     March
28         20.19        6.69          18.3        2.94     April
29         18.60        7.49          16.3        2.84       May
30         22.06        5.14          17.6        4.14      June
31         20.42        6.90          21.7       15.46      July
32         20.96        4.35          16.5        1.81    August
33         10.56        7.17          17.1        2.33 September
34         10.25        5.70          18.1        3.14   October
35          8.63        5.22          22.0        2.72  November
36          5.66        5.09          23.4        3.76  December
37          7.31        5.30          23.5        4.16   January
38         10.74        5.46          23.6        3.46  February


In addition to plotting LOWESS smooths, we can also organize the date by months and do graphical analyses of monthly data. This uses the POSIXlt function to give us the numerical month.

Here are the results in boxplot form.


#    as.POSIXlt(date)$mon     #gives the months in numeric order mod 12 with January = 0 and December = 11


par(mfrow=c(3,1))
bproduced=boxplot(produced~month.year(date),names=month.names,data=normal )   #show the graph and save the data in bc
title(main="Seasonal Effects on PV Electrical Production",cex=1.5)
lines(lowess(bproduced$stats[3,],f=1/5))    #add a smooth to the box plot

bconsumed<-boxplot(consumed~month.year(date),names=month.names,data=normal )  
title(main="Seasonal Effects on Electrical Consumption",cex=1.5)
lines(lowess(bconsumed$stats[3,],f=1/6))

bnet<-boxplot(net~month.year(date),names=month.names,data=normal )  
title(main="Seasonal Effects on net Production",cex=1.5)
abline(h=0)
lines(lowess(bnet$stats[3,],f=1/6))
mtext("Seasonality effects on production and consumption",3,outer=TRUE,line=1,cex=1.5)    #cex = character expansion factor
                     
                     
 #    as.POSIXlt(date)$mon     #gives the months in numeric order mod 12 with January = 0 and December = 11

#now do it for months collapsed across years


par(mfrow=c(3,1))
bproduced=boxplot(produced~ as.POSIXlt(date)$mon,names=months ,data=normal )   #show the graph and save the data in bc
title(main="Seasonal Effects on PV Electrical Production",cex=1.5)
lines(lowess(bproduced$stats[3,],f=1/5))    #add a smooth to the box plot

bconsumed<-boxplot(consumed~ as.POSIXlt(date)$mon,names=months,data=normal )  
title(main="Seasonal Effects on Electrical Consumption",cex=1.5)
lines(lowess(bconsumed$stats[3,],f=1/6))

bnet<-boxplot(net~ as.POSIXlt(date)$mon,names=months,data=normal )  
title(main="Seasonal Effects on net Production",cex=1.5)
abline(h=0)
lines(lowess(bnet$stats[3,],f=1/6))
mtext("Seasonality effects on production and consumption",3,outer=TRUE,line=1,cex=1.5)    #cex = character expansion factor                    
                     
 
 
Monthly box plot

 
 par(mfrow=c(3,1))
bproduced=boxplot(produced~month.year(date),data=normal,names=month.names )   #show the graph and save the data in bc
title(main="Seasonal Effects on PV Electrical Production",cex=1.5)
lines(lowess(bproduced$stats[3,],f=1/5))    #add a smooth to the box plot

bconsumed<-boxplot(consumed~month.year(date),names=month.names,data=normal )  
title(main="Seasonal Effects on Electrical Consumption",cex=1.5)
lines(lowess(bconsumed$stats[3,],f=1/6))

bnet<-boxplot(net~month.year(date),names=month.names,data=normal )  
title(main="Seasonal Effects on net Production",cex=1.5)
abline(h=0)
lines(lowess(bnet$stats[3,],f=1/6))
mtext("Seasonality effects on production and consumption",3,outer=TRUE,line=1,cex=1.5)    #cex = character expansion factor
 
       
                     


The fitted line going through the medians is a lowess smooth. Note that we need to run the box plots first to get the appropriate statistics. Further note that the fraction of data included in the smooth function = 1/3. This recognizes the summer months are net production months better than a larger fraction.

                     
new.plot()
par(mfrow=c(1,1))

plot(bproduced$stats[3,],xaxt="n",xlab="Time of Year",ylab="KWH/day",ylim=c(-25,35),type="b",lwd=4)
axis(1,labels=c("February","April","June","August","October","December"))
#lines(lowess(bproduced$stats[3,]))


points(bconsumed$stats[3,],xaxt="n",type="b",lwd=2,col="red")
#lines(lowess(bconsumed$stats[3,]))


points(bnet$stats[3,],xaxt="n",type="b")
axis(1,labels=c("February","April","June","August","October","December"))
#lines(lowess(bnet$stats[3,]))
abline(h=0)
mtext("Production and consumption vary over the year ",3,outer=TRUE,line=1,cex=1.5)    #note that we seem to add the global title last
                     #cex = character expansion factor
percent=100*produced/consumed


bpercent=boxplot(percent~(as.POSIXlt(date)$mon+as.POSIXlt(date)$year),names=month.names )   #show the graph and save the data in bc
plot(bpercent$stats[3,],xaxt="n",xlab="Time of Year",ylab="% of demand",type="b",ylim=c(0,150),lwd=2)
abline(h=100)
axis(1,labels=c("February","April","June","August","October","December"))
title(main="Annual electrical production is 70% of total demand")

It is also possible to plot multiple points and fits on the same graph. Consider a plot of solar output/solar slate as a function of roof pitch and orientation.

plot.new()                               #start a new plot 
                                        #set parameters for this graph
par(omi=c(0,0,1,0) )                      #set the size of the outer margins 
par(mfrow=c(1,1))                        #number of rows and columns to graph

plot(date,belvederewatts,xaxt="n",pch=19,ylab="watt hours")          #don't plot the x axis
axis.POSIXct(1, at=seq(daterange[1], daterange[2], by="month"), format="%b") #label the x axis by months
lines(lowess(date,belvederewatts, f=smooth.value),lty=1)   #lowess smooth with 30% window
points(date,garagewatts,pch=21)
lines(lowess(date,garagewatts,f=smooth.value),lty=2)
points(date,midwatts,pch=23)
lines(lowess(date,midwatts,f=smooth.value),lty=3)
title(main="Seasonal Effects on PV Electrical Production (per slate/day)")



plot(belvederewatts,garagewatts,pch=19)
points(belvederewatts,midwatts,pch=21)
abline(0,1)
Now, to calculate the effeciency of the converter, we can compare net consumption to production. We need to select days where we were not heating the pool.
part of a short guide to R
Version of Feb 10, 2006
William Revelle Department of Psychology
Northwestern University