Tuesday, February 19, 2013

metvurst now a package (repository moved to GitHub)


Inspired by a post on PirateGrunt, I finally managed to pack metvurst up and turn it into a proper R-Package (the fact that I'm on holiday and have some time also helped). As a side-effect of this, the repository has been moved from google code to GitHub. As I use RStudio for developping R-code, this shift seemed inevitable, as the integration of git in the package development tools in RStudio is very handy.

In order to install metvurst you need to have devtools. Making use of devtools install_github() function you can easily install and load metvurst:

library(devtools)
install_github('metvurst', 'tim-salabim')
library(metvurst)

I have tried it on Linux and Mac so far, so in case there are any problems on Windows, please let me know (a quick note if indeed it does work on Windows would be appreciated too). 

For now, the two core functions strip() and windContours() along with some helper functions (mainly to convert wind speed and direction to u and v components and vice versa) are included.
The package is fully functional but there is no documentation for now. I will progressively add and update documentation manuals over the next few weeks (maybe months, depending on how busy I am once I return to work).

Have fun using metvurst and in case you have any questions, suggestions or critique don't hesitate to get in touch.

Cheers
TimSalabim

Thursday, January 24, 2013

Make plot panels fit the distribution of your data

I am a big fan of lattice/latticeExtra. In fact, nearly all visualisations I have produced so far make use of this great package. The possibilities for customisation are endless and the amount of flexibility it provides is especially valuable for producing visualisations in batch mode/programatically.

Today I needed to visualise some precipitation data for a poster presentation of climate observations at Mt. Kilimanjaro. I wanted to show monthly precipitation observations in relation to long term mean monthly precipitation in order to show which months have been particularly wet or dry. The important point here is that by combining two different visualisations of the same data, we need to make sure that we make these directly comparable. This means that the scales of the absolute rain amounts and the deviations need to be similar, so we can get an instant impression of the deviation in relation to the absolute amounts.

Here's what I've done with latticeExtra (using mock data):

First, we need some (semi-) random data.

## LOAD PACKAGE
library(latticeExtra, quietly = TRUE)

## CREATE MOCK DATA
# precipitation long term mean
pltmean <- 800
# precipitation long term standard deviation
pltsd <- 200
# precipitation observations
pobs <- rnorm(12, pltmean, pltsd)
# preceipitation deviation from long term mean
pdev <- rnorm(12, 0, 150)
# months
dates <- 1:12

Then we calculate the panel heights to be relative to the (precipitation) data distribution. This is crucial because we want the deviation data to be directly comparable to the observed values.

## CALCULATE RELATIVE PANEL HEIGHTS
y.abs <- max(abs(pobs))
y.dev <- range(pdev)[2] - range(pdev)[1]
yy.aspect <- y.dev/y.abs

Then, we create the bar charts as objects.

## COLOUR
clrs <- rev(brewer.pal(3, "RdBu"))

## CREATE THE PLOT OBJECTS
abs <- barchart(pobs ~ dates, horizontal = FALSE, strip = FALSE, origin = 0,
                between = list(y = 0.3),
                ylab = "Precipitation [mm]", xlab = "Months", col = clrs[1])

dev <- barchart(pdev ~ dates, horizontal = FALSE, origin = 0, 
                col = ifelse(pdev > 0, clrs[1], clrs[length(clrs)]))

Now, we combine the two plot objects into one and also create strips to be plotted at the top of each panel with labels providing some detail about the respective panel.

## COMBINE PLOT OBJECTS INTO ONE AND CREATE CUSTOM STRIPS FOR LABELLING
out <- c(abs, dev, x.same = TRUE, y.same = FALSE, layout = c(1,2))
out <- update(out, scales = list(y = list(rot = 0)), 
              strip = strip.custom(bg = "grey40", 
                                   par.strip.text = list(col = "white", 
                                                         font = 2),
                                   strip.names = FALSE, strip.levels = TRUE, 
                                   factor.levels = c("observed", 
                                                     "deviation from long term monthly mean")))

As a final step, we re-size the panels according to the panel heights calculated earlier.

## RESIZE PANELS RELATIVE TO DATA DISTRIBUTION
out <- resizePanels(out, h = c(1,yy.aspect), w = 1)

And this is what the final product looks like.

## PRINT PLOT
print(out)

Note: I suggest you rerun this example a few times to see how the relative panel sizes change with the data distribution (which is randomly created during each run). This highlights the usefulness of such an approach for batch visualisations.

sessionInfo()
## R version 2.15.2 (2012-10-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=C                 LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_0.9           latticeExtra_0.6-19 lattice_0.20-13    
## [4] RColorBrewer_1.0-5 
## 
## loaded via a namespace (and not attached):
## [1] digest_0.6.0   evaluate_0.4.3 formatR_0.7    grid_2.15.2   
## [5] markdown_0.5.3 plyr_1.7.1     stringr_0.6    tools_2.15.2

Wednesday, January 2, 2013

(Semi-)automating the R markdown to blogger workflow

In his recent post 100 most read R posts for 2012 (stats from R-bloggers) – big data, visualization, data manipulation, and other languages Tal Galili - the guy behind R-Bloggers - presents his wishlist for 2013. Among other things he states

“The next step will be to have a “publish to your WordPress/blogger” button right from the RStudio console – allowing for the smoothest R blogging experience one could dream of.

I hope we’ll see this as early as possible in 2013.”

Given that I had some troubles myself recently with finding a convenient way of creating post for this blog (a blogspot site) and that I didn't want to wait until RStudio may integrate such functionality, I decided to come up with my own work around for the time being.

But let's start from the beginning:

What I want:

  • I want to blog
  • I want to blog using RStudio's R markdown framework (as it is very convenient)
  • I want to have code highlighting in my posts

What I don't want:

  • I don't want to manually fiddle around in the html code for each of the posts I create with RStudio
  • I don't want write blog posts using the compose section on blogger
  • Basically, I don't want to spend a lot of time on formatting (I rather spend the time on coding)

Why I want this:

  • For starters, because I'm lazy
  • Secondly, I want to integrate blogging into my teaching in the near future and for that I need a convenient and straight forward solution on how to create nice posts (after all I want to teach the students how to use R, not html)

The solution (as of now) is as follows (at least for me):

Yihui Xie has produced a SyntaxHighlighter Brush for the R Language which can be used for highlighting R code in blog posts.

In order to get the SyntaxHighlighter working on your blog, simply follow this tutorial at thebiobucket*.

However, the SyntaxHighlighter by Alex Gorbatchev, and subsequently also Yihui's brush for R, works a little different from the knitr code highlighting implementation. Basically, a chunk of code created with knitr is prepended by this little bit of html code:

<pre><code class="r">

whereas SyntaxHighlighter requires the following format for it to work:

<pre class="brush: r">

In order to automate the process of adjusting the brush definition needed for code highlighting using the SyntaxHighlighter brush I created the follwing function:

rmd2blogger <- function(rmdfile) {

  stopifnot(require(knitr))

  ## knit rmdfile to <body> - only html
  knit2html(rmdfile, fragment.only = TRUE)

  ## read knitted html file
  htm.name <- gsub(".Rmd", ".html", rmdfile)
  tmp <- readLines(htm.name)

  ## substitue important parts
  tmp <- gsub("<pre><code class=\"r\">", "<pre class=\"brush: r\">", tmp)
  tmp <- gsub("</code></pre>", "</pre>", tmp)

  ## write with .blg file ending
  writeLines(tmp, paste(htm.name, ".blg", sep =""))

}

This function has two important components to it:

  1. it uses knitr's knit2html with fragment.only set to TRUE, which means your only creating the <body> part of the html.
  2. it produces a .html.blg file in the path where the .Rmd is located where all syntax highlighting brushes are formatted to work with Yihui's R brush

To use it, simply provide the path to the .Rmd file you want to publish on your blogspot.

rmd2blogger("someRmdFile.Rmd")

Now, you can simply open the .html.blg file in a text editor and copy the complete contents into the html editor for a new post on your blogger site and the code R should look like the one above.

In case you find any bugs, please let me know.

sessionInfo()
## R version 2.15.2 (2012-10-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=C                 LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  grid      stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
## [1] rgdal_0.7-25        raster_1.9-92       gridBase_0.4-6     
## [4] latticeExtra_0.6-19 lattice_0.20-10     RColorBrewer_1.0-5 
## [7] sp_0.9-99           knitr_0.9          
## 
## loaded via a namespace (and not attached):
## [1] digest_0.6.0   evaluate_0.4.3 formatR_0.7    markdown_0.5.3
## [5] plyr_1.7.1     stringr_0.6    tools_2.15.2

Friday, December 28, 2012

Visualising diurnal wind climatologies

In this post I want to highlight the second core function of the metvurst repository:

The windContours function

It is intended to provide a compact overview of the wind field climatology at a location and plots wind direction and speed as a function of the hour of day.

  • direction is plotted as frequencies of occurrences
  • speed is represented by a box plot

The classical approach for visualising wind direction and speed at a location is to plot a wind rose where a stacked barplot of speeds is plotted in a circular coordinates system showing the direction. As an example of a wind rose we take the example from the Wikipedia page for “wind rose”.

alt text

This kind of representation is very useful for getting a general sense of the of the wind field climatology at a location. You will be able to quickly get information along the lines of. “the x most prevailing wind directions are from here, there and sometimes also over there”. Furthermore, you will also know which of these tends to produce windier conditions. So far, so good.

If you want to learn about possible diurnal flow climatologies, however, there is no way of obtaining this information straight from the wind rose. The common way to achieve visualising such dynamics using wind roses is to plot individual roses for some specific period (e.g. hourly or 3 hourly roses).

Getting information on diurnal climate dynamics is especially important in regions of complex terrain or for coastal locations, where diurnally reversing wind flow patterns are a major climatic feature.

The windContours function is able to deliver such information at a glance. Consider the case from the last post, where we used hourly meteorological information from a coastal station in Fiji.

First, we (again) read the data from the web at BOM (Australian Bureau Of Meteorology).

## LOAD RColorBrewer FOR COLOR CREATION
library(RColorBrewer)

## SET URL FOR DATA DOWNLOAD
url <- "http://www.bom.gov.au/ntc/IDO70004/IDO70004_"

## YEARS TO BE DOWNLOADED
yr <- 1993:2012

## READ DATA FOR ALL YEARS FROM URL INTO LIST
fijilst <- lapply(seq(yr), function(i) {
    read.csv(paste(url, yr[i], ".csv", sep = ""), na.strings = c(-9999, 999))
})

## TURN LIST INTO COMPLETE DATAFRAME AND CONVERT NA STRINGS TO NAs
fiji <- do.call("rbind", fijilst)
fiji[fiji == -9999] <- NA
fiji[fiji == -9999] <- NA
fiji[fiji == 999] <- NA

## CREATE POSIX DATETIME AND CONVERT UTC TO LOCAL FIJI TIME
dts <- as.POSIXct(fiji$Date...UTC.Time, format = "%d-%b-%Y %H:%M") + 12 * 60 * 
    60

Next, we need to create a vector of the hours of the recordings in order to plot our hourly climatologies.

## CREATE CONDITIONING VARIABLE (IN THIS CASE HOUR)
hr <- substr(as.character(dts), 12, 13)

Now, we can use the windContours function straight away to plot the hourly wind direction frequencies and corresponding wind speeds for the observation period 1993 - 2012. Note, for reproducibility I have posted the function on my personal uni web page, but it is recommended that you get the function from the metvurst repository at google code.

source("http://www.staff.uni-marburg.de/~appelhat/r_stuff/windContours.R")
windContours(hour = hr, 
             wd = fiji$Wind.Direction, 
             ws = fiji$Wind.Speed,
             keytitle = "hourly wind frequencies [%]")

plot of chunk unnamed-chunk-3

Just as we would with a wind rose, we easily see that there are 3 main prevailing wind directions:

  1. north north-east
  2. south-east
  3. west north-west

However, we straight away realise that there is a distinct diurnal pattern in these directions with 1. and 2. being nocturnal and 3. denoting the daytime winds. This additional information is captured at a glance.

It is possible to refine the graph in a lot of ways. In the next few figures I want to highlight a few of the flexibilities that windContours provides:

In order to adjust the x-axis of the speed plot, use the speedlim = parameter

windContours(hour = hr, 
             wd = fiji$Wind.Direction, 
             ws = fiji$Wind.Speed, 
             speedlim = 12, 
             keytitle = "hourly wind frequencies [%]")

plot of chunk unnamed-chunk-4

In case you want to change the density of the contour lines you can use spacing =

windContours(hour = hr, 
             wd = fiji$Wind.Direction, 
             ws = fiji$Wind.Speed, 
             speedlim = 12, 
             keytitle = "hourly wind frequencies [%]",
             spacing = .5)

plot of chunk unnamed-chunk-5

You can also provide colors through the colour = parameter

windContours(hour = hr, 
             wd = fiji$Wind.Direction, 
             ws = fiji$Wind.Speed, 
             speedlim = 12, 
             keytitle = "hourly wind frequencies [%]",
             colour = rev(brewer.pal(11, "Spectral")))

plot of chunk unnamed-chunk-6

You can adjust the number of cuts to be made using ncuts =

windContours(hour = hr, 
             wd = fiji$Wind.Direction, 
             ws = fiji$Wind.Speed, 
             speedlim = 12, 
             keytitle = "hourly wind frequencies [%]",
             colour = rev(brewer.pal(11, "Spectral")),
             ncuts = .5)

plot of chunk unnamed-chunk-7

Apart from the design, you can also adjust the position of the x-axis so that the plot is centered north (default is south) using the centre = parameter (allowed values are: “S”, “N”, “E”, “W”)

windContours(hour = hr, 
             wd = fiji$Wind.Direction, 
             ws = fiji$Wind.Speed, 
             speedlim = 12, 
             keytitle = "hourly wind frequencies [%]",
             colour = rev(brewer.pal(11, "Spectral")),
             centre = "N")

plot of chunk unnamed-chunk-8

There's also the possibility to provide an additional variable using add.var = in order to examine the relationship between winds and some other parameter of interest (e.g. concentrations of some pollutant or precipitation or anything really). In doing so, the contours will remain as before (representing wind direction frequencies) and the (colour-)filled part of the left panel will be used to represent the additional variable. Here, we're using air temperature (don't forget to change the keytitle) where we can easily see that cold air mainly comes from south (extra-tropical air masses) and highest temperatures are observed when air flow is from northerly directions (tropical air masses)

windContours(hour = hr, 
             wd = fiji$Wind.Direction, 
             ws = fiji$Wind.Speed, 
             speedlim = 12, 
             keytitle = "Temperature [°C]",
             colour = rev(brewer.pal(11, "Spectral")),
             add.var = fiji$Air.Temperature)

plot of chunk unnamed-chunk-9

In case you want to change the smoothing of the contours you can use the smooth.contours = parameter. The same applies to the (colour-)filled area using smooth.fill =

windContours(hour = hr, 
             wd = fiji$Wind.Direction, 
             ws = fiji$Wind.Speed, 
             speedlim = 12, 
             keytitle = "Temperature [°C]",
             colour = rev(brewer.pal(11, "Spectral")),
             add.var = fiji$Air.Temperature,
             smooth.contours = .8,
             smooth.fill = 1.5)

plot of chunk unnamed-chunk-10

In the future I might present some analytical post making use of the windContours function to highlight its usefulness in climate analysis. For now, I refer the interested reader to Appelhans et al. (2012) where I'm presenting the seasonal wind climatology of Christchurch, New Zealand and also analyse general diurnal patterns of air pollution using windContours.

sessionInfo()
## R version 2.15.2 (2012-10-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=C                 LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] gridBase_0.4-6      abind_1.4-0         fields_6.7         
## [4] spam_0.29-2         latticeExtra_0.6-19 lattice_0.20-10    
## [7] RColorBrewer_1.0-5  knitr_0.9          
## 
## loaded via a namespace (and not attached):
## [1] digest_0.6.0   evaluate_0.4.3 formatR_0.7    plyr_1.7.1    
## [5] stringr_0.6    tools_2.15.2