The Butterfly Affectation:
A case study in embedding an external image in an R plot

by Paul Murrell

Wednesday 31 August 2016


This report documents a variety of approaches to including an external vector image within an R plot. The image presents particular challenges because it contains features that are not natively supported by the R graphics system, which makes it hard for R to faithfully reproduce the image.

The external image

The external image is a PostScript file that draws a butterfly. This image was provided by Jianshu Zhao, an agricultural scientist from Tsinghua University, Beijing; the application is to embellish a plot of ecological data, as shown in Figure 2 of Hahn & Orrock. The butterfly image used in this report is shown below (converted to PNG format for easy inclusion in HTML).

For the purpose of demonstration in this report, we will include this butterfly image within a 'lattice' plot of Fisher's butterfly data (as provided by the 'SPECIES' package). We are not concerned with making a visually pleasing or statistically accurate image; the main point is to visually emphasise the fact that R is creating the final result, which consists of an external image embedded within an R plot.

  library(SPECIES)
  library(lattice)
  library(grid)
  data(butterfly)
  svg("plot.svg")
  xyplot(n_j ~ j, butterfly, pch=16, col="white", cex=1.5,
         ylab="Number of Species", xlab="Number of Individuals",
         subset=-25,
         panel=function(...) {
             grid.rect(gp=gpar(fill="grey20"))
             panel.xyplot(...)
         })
  dev.off()

Embedding a raster image

The easiest way to include an external image is to include a raster version of the image. In this case, the original image is PostScript, so we need to convert it with something like ImageMagick.

  system("convert -density 300 butterfly.ps butterfly.png")

Once we have a raster version of the image, it is easy to read it into R with the 'png' package (or the 'jpeg' package for JPEG files).

  library(png)
  butterflyPNG <- readPNG("butterfly.png")

Now we can draw the raster image wherever we like on the plot with grid.raster.

  svg("butterfly-plot.svg")
  xyplot(n_j ~ j, butterfly, pch=16, col="white", cex=1.5,
         ylab="Number of Species", xlab="Number of Individuals",
         subset=-25,
         panel=function(...) {
             grid.rect(gp=gpar(fill="grey20"))
             grid.raster(butterflyPNG, width=.5)
             panel.xyplot(...)
         })
  dev.off()

In addition to its simplicity, the major advantage of this approach is that we can produce the final plot in multiple formats. The plot above is in SVG format, but we could easily produce PDF, PNG, or WMF versions, simply by changing the R graphics device that we draw on.

The main downside to this approach is immediately apparent in the plot above (at least when viewed on a screen): raster images only look good at an appropriate size; when scaled they become blurry or jaggy.

Embedding a vector image with 'grImport'

An alternative approach is to import the original vector image. The 'grImport' package provides functions to import PostScript files into R (and draw them). The first step is to convert the PostScript file into a special XML format that the 'grImport' package can read.

  library(grImport)
  PostScriptTrace("butterfly.ps", "butterfly-image.xml")

The next step is to read the XML file into R (to create a "Picture" object).

  butterflyPS <- readPicture("butterfly-image.xml")

Finally, we draw the "Picture" object with grid.picture.

  svg("butterfly-grImport-plot.svg")
  xyplot(n_j ~ j, butterfly, pch=16, col="white", cex=1.5,
         ylab="Number of Species", xlab="Number of Individuals",
         subset=-25,
         panel=function(...) {
             grid.rect(gp=gpar(fill="grey20"))
             grid.picture(butterflyPS, width=.5)
             panel.xyplot(...)
         })
  dev.off()

This result is superior to the raster image approach because the embedded image is a vector image, so it looks nice at any scale.

A limitation of the 'grImport' approach is that it sometimes cannot reproduce all features of the imported image, partly because R graphics does not support all graphics features. The butterfly image is a good example of this limitation because the original image contains a radial gradient fill (the green to blue to green fill colour) and R graphics does not support gradient fills.

Embedding a vector image with 'grImport' and 'gridSVG'

One way around the lack of support in R graphics for some features, like gradient fills, is to use the 'gridSVG' package, which provides access to most features of the SVG graphics format.

  library(gridSVG)

The following code uses functions from the 'gridSVG' package to create a definition of a radial gradient fill. The main part of this definition is the vector of (three) colours that the gradient will pass through; the other arguments will be discussed later.

  rg <- radialGradient(c(rgb(17,150,134, max=255),
                         rgb(7,97,165, max=255),
                         rgb(24,149,61, max=255)),
                       r=unit(0.25, "npc"),
                       gradientUnits="coords")

In order to apply this gradient fill to the butterfly, we need to do two things: "register" the gradient fill; and apply the gradient fill to the butterfly image components of the drawing (so that we only fill the butterfly, not, for example, the data symbols in the plot). The following code does the right thing, using the registerGradientFill function and the grid.gradientFill function; we will discuss the details of how this works later. The resulting plot is shown below the code.

  pdf(NULL)
  xyplot(n_j ~ j, butterfly, pch=16, col="white", cex=1.5,
         ylab="Number of Species", xlab="Number of Individuals",
         subset=-25,
         panel=function(...) {
             grid.rect(gp=gpar(fill="grey20"))
             grid.picture(butterflyPS, width=.5)
             registerGradientFill("bgb", rg)
             panel.xyplot(...)
         })
  grid.gradientFill("pathgrob", label="bgb",
                    group=FALSE, grep=TRUE, global=TRUE)
  grid.export("butterfly-grImport-gridSVG-plot.svg", strict=FALSE)
  dev.off()

The main advantage of this approach is that it produces a nice result. We now have a vector version of the image, in complete radial-gradient-fill colour glory, embedded in the plot. However, it comes at some cost.

First of all, this approach only produces an SVG format. This is fine if we are including the plot in an HTML document like this one, but if we want some other format, we will need to convert the final plot using tools outside of R. On the plus side, we were at least able to use R to combine the external image with a plot.

The second major cost of this approach is complexity. For example, we have to describe a radial gradient fill. That involves determining which colours to use for the fill, which we can do with something like a colour picker tool or, as was done in this case, inspecting the original image source to determine the actual colours used in the original image. The latter requires being able to navigate within raw PostScript and/or SVG files.

Another detail about describing the radial gradient is properly specifying the size of the radial gradient. In this case, we are defining a radial gradient that will be used in the central quarter of the lattice plot panel (that is where the butterfly image has been drawn). Specifying this correctly involves two important steps. When we create the radial gradient we specify r=.25 (radius) and gradientUnits="coords", which means that the radius will be one quarter of the size of the viewport that is in effect when the gradient is registered. We must then also make sure that the gradient is registered within the lattice plot panel (so that .25 means one quarter of the size of the lattice plot panel), which is why the call to registerGradientFill occurs within the panel function in the call to xyplot.

Another difficulty is that we must only apply the gradient fill to the butterfly. This requires us to identify the components of the drawing that are the butterfly. We can use grid.ls to help with this. The output below shows all of the 'grid' grobs that have been drawn, including all of the plot and all of the butterfly.

  xyplot(n_j ~ j, butterfly, pch=16, col="white", cex=1.5,
         ylab="Number of Species", xlab="Number of Individuals",
         subset=-25,
         panel=function(...) {
             grid.rect(gp=gpar(fill="grey20"))
             grid.picture(butterflyPS, width=.5)
             panel.xyplot(...)
         })
   grid.ls()
  plot_01.background
  plot_01.xlab
  plot_01.ylab
  plot_01.ticks.top.panel.1.1
  plot_01.ticks.left.panel.1.1
  plot_01.ticklabels.left.panel.1.1
  plot_01.ticks.bottom.panel.1.1
  plot_01.ticklabels.bottom.panel.1.1
  plot_01.ticks.right.panel.1.1
  GRID.rect.145
  GRID.picture.146
    GRID.pathgrob.147
    GRID.pathgrob.148
    GRID.pathgrob.149
    GRID.pathgrob.150
    GRID.pathgrob.151
    GRID.pathgrob.152
    GRID.pathgrob.153
    GRID.pathgrob.154
    GRID.pathgrob.155
    GRID.pathgrob.156
    GRID.pathgrob.157
    GRID.pathgrob.158
    GRID.pathgrob.159
    GRID.pathgrob.160
    GRID.pathgrob.161
    GRID.pathgrob.162
    GRID.pathgrob.163
    GRID.pathgrob.164
    GRID.pathgrob.165
    GRID.pathgrob.166
    GRID.pathgrob.167
    GRID.pathgrob.168
    GRID.pathgrob.169
    GRID.pathgrob.170
    GRID.pathgrob.171
    GRID.pathgrob.172
    GRID.pathgrob.173
    GRID.pathgrob.174
    GRID.pathgrob.175
    GRID.pathgrob.176
    GRID.pathgrob.177
    GRID.pathgrob.178
    GRID.pathgrob.179
  plot_01.xyplot.points.panel.1.1
  plot_01.border.panel.1.1

From this it is clear that the butterfly picture consists of a set of pathgrob grobs. This is the basis for the grid.gradientFill call (reproduced below). The grep=TRUE and global=TRUE specify that all grobs with "pathgrob" in their name will have this gradient fill applied to them. The group=FALSE is necessary so that the gradient fill is applied to the <path> elements themselves within the SVG code (rather than the <g> elements that encompass the <path> elements).

  grid.gradientFill("pathgrob", label="bgb",
                    group=FALSE, grep=TRUE, global=TRUE)

In summary, in order to get a nice vector, full-colour butterfly embedded in the plot, we need a reasonable working knowledge of 'grid', SVG, and 'gridSVG'.

Embedding a vector image with 'grImport2' and 'gridSVG'

The final approach that we will consider involves a reincarnation of the 'grImport' package - the 'grImport2' package. The first step in this approach involves creating a Cairo SVG version of the original image. We can do this with the 'grConvert' package.

  library(grConvert)
  convertPicture("butterfly.ps", "butterfly.svg")

The Cairo SVG version of the image can then be read into R using the 'grImport2' pacakge.

  library(grImport2)
  butterflySVG <- readPicture("butterfly.svg")

This imported image differs from the one imported by 'grImport' because it has retained the radial gradient fill information. That does not help us to render the image with normal R graphics, but we can render with 'gridSVG', which does understand radial gradients. In the code below, the call to grid.picture is to grImport2::grid.picture rather than grImport::grid.picture and the argument ext="gridSVG" indicates that we are going to produce the final image with 'gridSVG'.

  pdf(NULL)
  xyplot(n_j ~ j, butterfly, pch=16, col="white", cex=1.5,
         ylab="Number of Species", xlab="Number of Individuals",
         subset=-25,
         panel=function(...) {
             grid.rect(gp=gpar(fill="grey20"))
             grid.picture(butterflySVG, width=.5, ext="gridSVG")
             panel.xyplot(...)
         })
  grid.export("butterfly-grImport2-gridSVG-plot.svg", strict=FALSE)
  dev.off()

The advantage of this approach compared to the 'grImport'-plus-'gridSVG' approach is that it is much simpler - it is much closer to the straightforward 'grImport' approach - with the benefit of including the radial gradient.

Unfortunately, in this particular case, there is a downside: the butterfly is not drawn properly (at least, not in Firefox version 48.0 or in Chrome version 52.0.2743.116). Fortunately, this can be fixed because (after visual inspection of the raw SVG) the problem turns out to be that the butterfly shape is being drawn using clipping paths in the SVG file and there are <polyline> elements within the clipping paths as well as <path> elements. The following code fixes this, with the help of the XML package, by removing all <polyline> elements within <clipPath> elements.

  library(XML)
  butterflySVG <- xmlParse("butterfly-grImport2-gridSVG-plot.svg")
  clipPolylines <- getNodeSet(butterflySVG, "//svg:clipPath//svg:polyline",
                              namespaces=c(svg="http://www.w3.org/2000/svg"))
  invisible(lapply(clipPolylines,
                   function(x) {
                       parent <- xmlParent(x)
                       removeChildren(parent, kids=list(x))
                   }))
  saveXML(butterflySVG, "butterfly-grImport2-gridSVG-cleaned.svg")

Just like with the 'grImport'-plus-'gridSVG' approach, we have a nice final image, but we have paid a price for it by, in this case, requiring a working knowledge of SVG, XML, and XPath.

Another downside to this approach is that it produces a much larger SVG file. The file listing below shows that the most efficient approach, in terms of final image size, is the 'grImport'-plus-'gridSVG' approach (because it only stores one copy of the radial gradient).

  system("ls -lh butterfly*.svg", intern=TRUE)
  [1] "-rw-rwx--- 1 pmur002 pmur002 629K Aug 31 08:53 butterfly-grImport2-gridSVG-cleaned.svg"
  [2] "-rw-rwx--- 1 pmur002 pmur002 820K Aug 31 08:53 butterfly-grImport2-gridSVG-plot.svg"   
  [3] "-rw-rwx--- 1 pmur002 pmur002 129K Aug 31 08:53 butterfly-grImport-gridSVG-plot.svg"    
  [4] "-rw-rwx--- 1 pmur002 pmur002 167K Aug 31 08:53 butterfly-grImport-plot.svg"            
  [5] "-rw-rwx--- 1 pmur002 pmur002 143K Aug 31 08:53 butterfly-plot.svg"                     
  [6] "-rw-rwx--- 1 pmur002 pmur002 756K Aug 26 12:59 butterfly.svg"

Summary

This report has documented a range of approaches for embedding a relatively complex vector image into an R plot. If it is possible to target a known final image size (e.g., for a PDF document to print) then converting the vector image to a raster representation, at an appropriate resolution, is the most straightforward approach and should work well across all graphics formats that R can produce. Reproducing the vector image with R might be easy, using the 'grImport' package, if the image does not contain any advanced graphics features (like gradient fills or complex clipping paths). If 'grImport' cannot reproduce all features of the image, it may still be useful for providing the basic shapes within the image and the 'gridSVG' package can be used to then add fancier features, though this will likely require greater knowledge of image formats and the R packages involved and this approach can only be used to produce the final image in an SVG format. Another option is to use the 'grImport2' package (and 'grConvert'), though complex graphics features will again require 'gridSVG' and force an SVG format for the final result. One major redeeming feature of working with SVG is that it is a text format and an XML format, which means that it is easy to visually inspect the raw image data and it is possible to programmatically manipulate the raw image data in meaningful ways.

Resources

References


Creative Commons License
The Butterfly Affectation by Paul Murrell is licensed under a Creative Commons Attribution 4.0 International License.