Adding time to a plot and adventures in smoothing
The following plots and instructions show how to put several figures on a page, give an overall label to the page, and to make time the axis. I also show how to subset the data to reject outliers. Finally, the effect of four levels of smoothing in 'lowess' are examined.
In order to show events over time, it is helpful to plot the data as a function of time. This is trivial if the data are equally spaced, but when the data are not equally spaced, it is important to add time to the plot. The following is an analysis of electrical production and consumption as a function of time of year for an energy efficient house. The data are from 38 months of fairly frequent observations of energy produced using PhotoVoltaic roofing and of electrical consumption.
The data are stored in a tab delimited text file produced by a spreadsheet. (e.g., Excel or Open Office)
datafilename="http://personality-project.org/r/datasets/electric.data.txt" #where are the data? #local data are at datafilename <- "/Library/WebServer/Documents/personalitytheoryf/r/datasets/electric.data.txt" #alternatively, we can dynamically find out the file name datafilename <- file.choose() #get the name of the file by using system commands electric <- read.table(datafilename,header=TRUE) #read the data file describe(electric) #get basic descriptive statistics using the describe function attach(electric) #for easier manipulation dates <-strptime(as.character(electric$date1), "%m/%d/%y") #change the date field to a internal form for time #see ?formats and ?POSIXlt for how to read different formats #further note that we need to treat the date information as read from the spreadsheet as character data electric=electric[,2:17] #get rid of the original date in the data.frame electric=data.frame(date=dates,electric) #this is now the date in useful fashion detach(electric) daterange=c(as.POSIXlt(min(electric$date)),as.POSIXlt(max(electric$date))) #figure out the lowest and highest months rm(dates) #get rid of the old date because otherwise it will be floating around confusing us attach(electric) #plot 4 different panels -- energy produced, consumed, net as function of date, concumption by temperature quartz (height=6,width=9) #create a graphic device of appropriate size plot.new() #start a new plot #set parameters for this graph par(omi=c(0,0,1,0) ) #set the size of the outer margins par(mfrow=c(2,2)) #number of rows and columns to graph plot(date,produced,xaxt="n",ylab="KWH/day") #don't plot the x axis axis.POSIXct(1, at=seq(daterange[1], daterange[2], by="month"), format="%b") #label the x axis by months lines(lowess(date,produced, f=3/10)) #lowess smooth with 30% window title(main="Seasonal Effects on PV Electrical Production") plot(date,consumed,xaxt="n",ylab="KWH/day") #don't plot the x axis axis.POSIXct(1, at=seq(daterange[1], daterange[2], by="month"), format="%b") #label the x axis by months lines(lowess(date,consumed, f=3/10)) #lowess smooth with 30% window title(main="Seasonal Effects on total Electrical Consumption") net=produced-consumed #net electrical production plot(date,net,xaxt="n",ylab="KWH/day") #don't plot the x axis axis.POSIXct(1, at=seq(daterange[1], daterange[2], by="month"), format="%b") #label the x axis by months lines(lowess(date,net, f=3/10)) #lowess smooth with 30% window title(main="Seasonal Effects on net Electrical Production") abline(h=0) #put in the net 0 line for readability plot(temp1,consumed,main="Electrical consumption by mean daily temperature",xlab="Mean daily temperature",ylab="KWH/day") #lines(lowess(mean,consumed,f=3/10)) #this does not work if there are missing data #so, the following function fixes it lowess.na <- function(x, y = NULL, f = 2/3,...) { #do lowess with missing data x1 <- subset(x,(!is.na(x)) &(!is.na(y))) y1 <- subset(y, (!is.na(x)) &(!is.na(y))) lowess.na <- lowess(x1,y1,f, ...) } lines(lowess.na(temp1,consumed,f=3/10)) #this does work if there are missing data mtext("Seasonality effects on production and consumption",3,outer=TRUE,line=1,cex=1.5) #note that we seem to add the global title last #cex = character expansion factor
Upon inspection of these data, it seems as if during the first winter, there were some days with unusually high electrical consumption. Indeed, this was because the house was still under constrcution. We now redo the analysis with just days with consumption < 40 KWH/day. This makes use of the 'subset' command. However, we also notice that we had very high demand in July of 2005. This was due to about 6 days of air conditioning! We should include these data
normal=subset(electric,(consumed<40) |as.POSIXlt(date)$year>103 ) #select those days with more typical power consumption or after 2003 detach(electric) #release the names in this data frame to use the next one attach(normal) #use the names in the new data frame #plot 4 different panels -- energy produced, consumed, net as function of date, consumption by temperature smooth.value = 3/10 #allows us to experiment with different smoothing parameters in lowess plot.new() #start a new plot #set parameters for this graph par(omi=c(0,0,1,0) ) #set the size of the outer margins par(mfrow=c(2,2)) #number of rows and columns to graph plot(date,produced,xaxt="n") #don't plot the x axis axis.POSIXct(1, at=seq(daterange[1], daterange[2], by="month"), format="%b") #label the x axis by months lines(lowess(date,produced, f=smooth.value)) #lowess smooth with 30% window title(main="Seasonal Effects on PV Electrical Production") plot(date,consumed,xaxt="n") #don't plot the x axis axis.POSIXct(1, at=seq(daterange[1], daterange[2], by="month"), format="%b") #label the x axis by months lines(lowess(date,consumed, f=smooth.value)) #lowess smooth with 30% window title(main="Seasonal Effects on total Electrical Consumption") net=produced-consumed #net electrical production plot(date,net,xaxt="n") #don't plot the x axis axis.POSIXct(1, at=seq(daterange[1], daterange[2], by="month"), format="%b") #label the x axis by months lines(lowess(date,net, f=smooth.value)) #lowess smooth with 30% window title(main="Seasonal Effects on net Electrical Production") abline(h=0) plot(temp1,consumed,main="Electrical consumption by mean daily temperature",xlab="Mean daily temperature",ylab="KWH/day") lines(lowess.na(temp1,consumed,f=smooth.value)) mtext("Seasonality effects on production and consumption",3,outer=TRUE,line=1,cex=1.5) #note that we seem to add the global title last #cex = character expansion factor plot.3=recordPlot() #save this one before redoing with a different smooth value smooth.value=.5
We can examine the effects of different levels of smoothing either by doing different graphs and saving them, or by plotting multiple smooths on each graph. Lets try multiple smoothing functions.
plot.new() #start a new plot #set parameters for this graph par(omi=c(0,0,1,0) ) #set the size of the outer margins par(mfrow=c(2,2)) #number of rows and columns to graph plot(date,produced,xaxt="n") #don't plot the x axis axis.POSIXct(1, at=seq(daterange[1], daterange[2], by="month"), format="%b") #label the x axis by months lines(lowess(date,produced, f=.25)) #lowess smooth with 25% window lines(lowess(date,produced, f=1/3)) #lowess smooth with 33% window lines(lowess(date,produced, f=.5)) #lowess smooth with 50% window lines(lowess(date,produced, f=2/3)) #lowess smooth with 66% window title(main="Analysis of Seasonal Effects on PV Electrical Production") plot(date,consumed,xaxt="n") #don't plot the x axis axis.POSIXct(1, at=seq(daterange[1], daterange[2], by="month"), format="%b") #label the x axis by months lines(lowess(date,consumed, f=.25)) #lowess smooth with 25% window lines(lowess(date,consumed, f=1/3)) #lowess smooth with 33% window lines(lowess(date,consumed, f=.5)) #lowess smooth with 50% window lines(lowess(date,consumed, f=2/3)) #lowess smooth with 66% window title(main="Analysis of Seasonal Effects on total Electrical Consumption") net=produced-consumed #net electrical production plot(date,net,xaxt="n") #don't plot the x axis axis.POSIXct(1, at=seq(daterange[1], daterange[2], by="month"), format="%b") #label the x axis by months lines(lowess(date,net, f=.25)) #lowess smooth with 25% window lines(lowess(date,net, f=1/3)) #lowess smooth with 33% window lines(lowess(date,net, f=.5)) #lowess smooth with 50% window lines(lowess(date,net, f=2/3)) #lowess smooth with 66% window title(main="Analysis of Seasonal Effects on net Electrical Production") abline(h=0) plot(temp1,consumed,main="Electrical consumption by mean daily temperature",xlab="Mean daily temperature",ylab="KWH/day") lines(lowess.na(temp1,consumed,f=.25)) lines(lowess.na(temp1,consumed,f=1/3)) lines(lowess.na(temp1,consumed,f=.5)) lines(lowess.na(temp1,consumed,f=2/3)) mtext("Analysis of seasonality effects on production and consumption\r with 4 levels of smoothing",3,outer=TRUE,line=1,cex=1.5) #note that we seem to add the global title last #cex = character expansion factor
This results in the following 4 (messy) plots:
We would expect a seasonality effect in PV production, but why is there one for consumption? After examining the relationship between external temperature and consumption several hypotheses are likely. Although after one year I had thought this relationship was due to the use of electric pumps to deliver the radiant heat, after another winter's analysis, I believe this was due to overreliance on internal air circulation using HVAC equipment (as advised by an incompetent Heating Contractor) rather than the more appropriate use of an Energy Recovery Ventilation (HRV) system. Based upon the data from the first winter, we stopped using the HVAC system and used the ERV system in the second winter. Notice the decreased in energy consumption. In fact, the relationship is reduced across the years.
Another source of electrical consumption is driving a pump to circulate hot water (from a hot water solar system) to heat a small exercise pool. This can be seen from the plot of consumption by hours of heating, as well as the relationship between external temperature and kwh consumption as a function of hours of heating (a Coplot)
plot(poolheat,consumed,main="Effect of external temperature on electrical Consumption") abline(lm(consumed~poolheat,data=normal)) coplot(consumed~mean|poolheat,panel = panel.smooth) model=lm(consumed~mean+poolheat) model=lm(consumed~mean) oneday=subset(normal,days==1) detach(normal) attach(oneday) par(mfrow=c(2,1)) plot(poolheat,deltaheat,main="change in temperature as a function of hours of heating",ylab="change in temperature",xlab="hours of heating") mod1=lm(deltaheat~poolheat) abline(mod1) #put the regression line in plot(poolheat,consumed,main="electrical consumption as a function of hours of heating pool",ylab="KWH/day",xlab="Hours of pool heat") mod2=lm(consumed~poolheat) abline(mod2) mtext("Effect of solar pool heating on pool heat and electrical consumption",3,outer=TRUE,line=1,cex=1.5) #note that we seem to add the global title last #cex = character expansion factor
The strength of the relationship may be determined by examing the R-square of the regression which in the first case (hours of heating and change in temperature) is quite strong. Although the relationship is not as strong, the effect of running the pump on power consumption suggests that one hour of operation leads to about an additional KWH of power consumption.
summary(mod1) Call: lm(formula = deltaheat ~ poolheat) Residuals: Min 1Q Median 3Q Max -3.76586 -0.37228 0.02131 0.71810 3.23414 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.71810 0.08057 -8.912 <2e-16 *** poolheat 0.69679 0.03708 18.791 <2e-16 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 1.009 on 254 degrees of freedom Multiple R-Squared: 0.5816, Adjusted R-squared: 0.58 F-statistic: 353.1 on 1 and 254 DF, p-value: < 2.2e-16 summary(mod2) summary(mod2) Call: lm(formula = consumed ~ poolheat) Residuals: Min 1Q Median 3Q Max -7.5670 -3.0012 -0.6728 2.2919 16.5615 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 19.7089 0.3291 59.880 < 2e-16 *** poolheat 1.0901 0.1515 7.196 6.96e-12 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 4.121 on 254 degrees of freedom Multiple R-Squared: 0.1694, Adjusted R-squared: 0.1661 F-statistic: 51.79 on 1 and 254 DF, p-value: 6.956e-12Finally, lets try to get some monthly statistics. My solution to doing this sort is a kludge, but works.
month.year <- function (x) {12*(as.POSIXlt(x)$year-103)+as.POSIXlt(x)$mon+1} #months are mod 12 and normally range from 0-11. this makes them go to 1-12 #attach(normal) #get the cleaned data month <- month.year(date) #months start at January 2003 and add from then mean.produced=tapply(produced,month,mean) #apply the mean function to amount produced each month sd.produced=tapply(produced,month,sd) mean.consumed <- tapply(consumed,month,mean) sd.consumed <- tapply(consumed,month,sd) months <- month.name #names from January to December for (i in 1 : length(mean.consumed)) {month.observed[i]<- as.numeric(names(mean.consumed[i]))%%12} #first get the months as numbers 0-11 month.observed[month.observed==0]<-12 # make the decembers 12 instead of 0 month.names <- months[as.numeric(month.observed)] stats <- data.frame(mean.produced,sd.produced,mean.consumed,sd.consumed,month=month.names) print(stats,digits=3) #see also month.name and month.abb for convenient use of months #gives this output mean.produced sd.produced mean.consumed sd.consumed month 2 7.02 NA 31.5 NA February 3 14.75 7.44 27.8 4.34 March 4 19.70 9.18 29.9 7.34 April 5 20.76 9.62 15.4 6.78 May 6 24.72 7.45 17.8 5.13 June 7 21.08 7.42 18.9 5.22 July 8 20.19 5.57 17.4 1.96 August 9 17.57 6.63 15.0 3.05 September 10 14.76 5.92 17.7 2.72 October 11 6.69 3.73 26.7 4.07 November 12 7.51 4.49 32.3 4.05 December 13 6.50 5.03 28.1 4.11 January 14 8.26 6.10 26.7 4.62 February 15 12.41 6.08 26.2 5.26 March 16 22.58 5.40 24.1 5.01 April 17 18.49 7.42 21.1 3.86 May 18 20.90 6.86 20.0 3.94 June 19 19.78 6.57 17.6 2.19 July 20 17.79 7.25 18.9 3.98 August 21 19.56 3.58 18.1 2.27 September 22 12.27 7.19 17.7 2.64 October 23 6.98 4.94 23.0 5.52 November 24 6.71 3.95 25.8 6.23 December 25 5.97 5.32 24.8 3.65 January 26 9.28 6.33 21.2 3.23 February 27 14.17 6.36 21.3 3.91 March 28 20.19 6.69 18.3 2.94 April 29 18.60 7.49 16.3 2.84 May 30 22.06 5.14 17.6 4.14 June 31 20.42 6.90 21.7 15.46 July 32 20.96 4.35 16.5 1.81 August 33 10.56 7.17 17.1 2.33 September 34 10.25 5.70 18.1 3.14 October 35 8.63 5.22 22.0 2.72 November 36 5.66 5.09 23.4 3.76 December 37 7.31 5.30 23.5 4.16 January 38 10.74 5.46 23.6 3.46 February
In addition to plotting LOWESS smooths, we can also organize the date by months and do graphical analyses of monthly data. This uses the POSIXlt function to give us the numerical month.
Here are the results in boxplot form.
# as.POSIXlt(date)$mon #gives the months in numeric order mod 12 with January = 0 and December = 11 par(mfrow=c(3,1)) bproduced=boxplot(produced~month.year(date),names=month.names,data=normal ) #show the graph and save the data in bc title(main="Seasonal Effects on PV Electrical Production",cex=1.5) lines(lowess(bproduced$stats[3,],f=1/5)) #add a smooth to the box plot bconsumed<-boxplot(consumed~month.year(date),names=month.names,data=normal ) title(main="Seasonal Effects on Electrical Consumption",cex=1.5) lines(lowess(bconsumed$stats[3,],f=1/6)) bnet<-boxplot(net~month.year(date),names=month.names,data=normal ) title(main="Seasonal Effects on net Production",cex=1.5) abline(h=0) lines(lowess(bnet$stats[3,],f=1/6)) mtext("Seasonality effects on production and consumption",3,outer=TRUE,line=1,cex=1.5) #cex = character expansion factor # as.POSIXlt(date)$mon #gives the months in numeric order mod 12 with January = 0 and December = 11 #now do it for months collapsed across years par(mfrow=c(3,1)) bproduced=boxplot(produced~ as.POSIXlt(date)$mon,names=months ,data=normal ) #show the graph and save the data in bc title(main="Seasonal Effects on PV Electrical Production",cex=1.5) lines(lowess(bproduced$stats[3,],f=1/5)) #add a smooth to the box plot bconsumed<-boxplot(consumed~ as.POSIXlt(date)$mon,names=months,data=normal ) title(main="Seasonal Effects on Electrical Consumption",cex=1.5) lines(lowess(bconsumed$stats[3,],f=1/6)) bnet<-boxplot(net~ as.POSIXlt(date)$mon,names=months,data=normal ) title(main="Seasonal Effects on net Production",cex=1.5) abline(h=0) lines(lowess(bnet$stats[3,],f=1/6)) mtext("Seasonality effects on production and consumption",3,outer=TRUE,line=1,cex=1.5) #cex = character expansion factorMonthly box plot
par(mfrow=c(3,1)) bproduced=boxplot(produced~month.year(date),data=normal,names=month.names ) #show the graph and save the data in bc title(main="Seasonal Effects on PV Electrical Production",cex=1.5) lines(lowess(bproduced$stats[3,],f=1/5)) #add a smooth to the box plot bconsumed<-boxplot(consumed~month.year(date),names=month.names,data=normal ) title(main="Seasonal Effects on Electrical Consumption",cex=1.5) lines(lowess(bconsumed$stats[3,],f=1/6)) bnet<-boxplot(net~month.year(date),names=month.names,data=normal ) title(main="Seasonal Effects on net Production",cex=1.5) abline(h=0) lines(lowess(bnet$stats[3,],f=1/6)) mtext("Seasonality effects on production and consumption",3,outer=TRUE,line=1,cex=1.5) #cex = character expansion factor
The fitted line going through the medians is a lowess smooth. Note that we need to run the box plots first to get the appropriate statistics. Further note that the fraction of data included in the smooth function = 1/3. This recognizes the summer months are net production months better than a larger fraction.
new.plot() par(mfrow=c(1,1)) plot(bproduced$stats[3,],xaxt="n",xlab="Time of Year",ylab="KWH/day",ylim=c(-25,35),type="b",lwd=4) axis(1,labels=c("February","April","June","August","October","December")) #lines(lowess(bproduced$stats[3,])) points(bconsumed$stats[3,],xaxt="n",type="b",lwd=2,col="red") #lines(lowess(bconsumed$stats[3,])) points(bnet$stats[3,],xaxt="n",type="b") axis(1,labels=c("February","April","June","August","October","December")) #lines(lowess(bnet$stats[3,])) abline(h=0) mtext("Production and consumption vary over the year ",3,outer=TRUE,line=1,cex=1.5) #note that we seem to add the global title last #cex = character expansion factor percent=100*produced/consumed bpercent=boxplot(percent~(as.POSIXlt(date)$mon+as.POSIXlt(date)$year),names=month.names ) #show the graph and save the data in bc plot(bpercent$stats[3,],xaxt="n",xlab="Time of Year",ylab="% of demand",type="b",ylim=c(0,150),lwd=2) abline(h=100) axis(1,labels=c("February","April","June","August","October","December")) title(main="Annual electrical production is 70% of total demand")
It is also possible to plot multiple points and fits on the same graph. Consider a plot of solar output/solar slate as a function of roof pitch and orientation.
plot.new() #start a new plot #set parameters for this graph par(omi=c(0,0,1,0) ) #set the size of the outer margins par(mfrow=c(1,1)) #number of rows and columns to graph plot(date,belvederewatts,xaxt="n",pch=19,ylab="watt hours") #don't plot the x axis axis.POSIXct(1, at=seq(daterange[1], daterange[2], by="month"), format="%b") #label the x axis by months lines(lowess(date,belvederewatts, f=smooth.value),lty=1) #lowess smooth with 30% window points(date,garagewatts,pch=21) lines(lowess(date,garagewatts,f=smooth.value),lty=2) points(date,midwatts,pch=23) lines(lowess(date,midwatts,f=smooth.value),lty=3) title(main="Seasonal Effects on PV Electrical Production (per slate/day)") plot(belvederewatts,garagewatts,pch=19) points(belvederewatts,midwatts,pch=21) abline(0,1)Now, to calculate the effeciency of the converter, we can compare net consumption to production. We need to select days where we were not heating the pool.
part of a short guide to R
Version of Feb 10, 2006
William Revelle
Department of Psychology
Northwestern University