Extreme Makeover: R Graphics Edition

by Paul Murrell http://orcid.org/0000-0002-3224-8858

Version 1:

This document by Paul Murrell is licensed under a Creative Commons Attribution 4.0 International License.

This report describes a complex R graphics customisation example using functions from the 'grid' and 'gridGraphics' packages and introduces two new functions in 'grid': deviceLoc and deviceDim.

1. The problem

This report describes and then solves a complex graphics customisation problem in R (R Core Team, 2018) that was proposed by Diana Kriese (personal communication). The solution demonstrates a useful application of the 'gridGraphics' package (Murrell, 2015) (to convert 'graphics' plots to 'grid' plots) and tools from the 'grid' package (Murrell, 2011), including two new utility functions: deviceLoc and deviceDim.

The image that we want to produce is a combination of a pie chart and a bar plot, with lines connecting the two plots, as shown below.

The most difficult part of this image is drawing the two lines that run from the corners of the bar plot and are tangent to the circumference of the pie chart.

On one hand, the solution just requires a little bit of trigonometry, as shown in the diagram below. Given the location of the left edge of the bar plot (b_{x, y}), (half) the height of the bar plot (h), the centre of the pie chart (c_{x, y}), and the radius of the pie chart (r), we have two right-angle triangles (one right-angle at b_{x, y} and the other at the red dot on the circumference of the circle). These triangles allow us to calculate the angles α and β, and the sum of those angles, along with the pie chart radius, allow us to calculate the offset of the red dot from the centre of the pie chart.

On the other hand, the solution is quite challenging because it is not straightforward to determine the locations and dimensions of the pie chart and the bar plot - the essential values that we need to perform those trigonometric calculations.

The problem is made more challenging by the fact that the pie chart and the bar plot are drawn by calling someone else's code. The bar plot is produced by calling the barplot function from the 'graphics' package and the pie chart is drawn by calling functions from the 'plotrix' package. One consequence of this fact is that we do not know exactly how, including exactly where, the pie chart and the bar plot have been drawn. We have made use of the convenience of calling existing functions so that we do not have to perform a lot of calculations ourselves to position and size the polygons and rectangles and text that make up the pie chart and the bar plot, but this comes at the cost of not knowing exactly where all of those polygons and rectangles and text have been placed.

This problem is compounded by the fact that the functions that we are calling are based on the 'graphics' package (the "base" graphics system in R) and that means it is much harder to find out where everything has been drawn. To make this idea more concrete, the following code just draws the bar plot by itself (the result is shown below the code).

A nice feature of the base graphics system is that, after drawing a plot, the coordinate system of the plot is available to add further output. Unfortunately, the barplot function has a complicated algorithm for determing its coordinate system so the result is not very intuitive. For example, the following code asks for the coordinate system from the barplot (the first two values are the limits of the x-axis scale). It is not obvious how to determine where the left edge of the bar is based on that coordinate system.

The barplot function does return some useful information about the locations of its bars; the value that we assigned to bar.x is the mid-points of the bars. We can use that to do things like label the bars, as in the following code.

However, for the problem that we are addressing, we need more information about what has been drawn.

2. Using 'grid' and 'gridGraphics'

The 'grid' package provides more tools for querying the locations and sizes of what has been drawn and the 'gridGraphics' package can turn a 'graphics' plot into a 'grid' one. The following code does this for the simple bar plot by calling the grid.echo function from the 'gridGraphics' package.

The output is the same as the original bar plot, but we now have 'grid' grobs to play with. The grid.ls function from 'grid' lists all grobs in the current scene. In this case, there is a single grob called "bar-plot-1-rect-1" that represents the two rectangles drawn in the bar plot.

The 'grid' package also allows us to query grobs. For example, the following code expresses a location that is on the left edge of the bar plot (180 degrees counter-clockwise from the positive x-axis).

That location only makes sense in the viewport that the barplot rectangle was drawn within, but grid.ls can also show us all of the viewports in the current scene. The output tells us that the grob "bar-plot-1-rect-1" was drawn within a viewport called "bar-window-1-1".

We can navigate to the correct viewport using the downViewport function and then calculate exactly the location of the edge of the bar plot rectangle. The following code uses this idea to draw a small circle in the middle of the left edge of the bar plot.

3. More problems

A further complication in this example is the fact that the pie chart and the bar plot are arranged together on the page using a layout. The following code defines a layout with a large square region on the left (for the pie chart) and a tall thin region on the right (for the bar plot), with a half-inch gap in between the regions.

Using a layout makes it easier to specify the arrangement that we want, but makes it much harder to tell exactly where the plots will finally end up. That is the point of using a layout: we do not want to have to figure out exactly where things go; we want the layout to figure that out for us.

The following code uses this layout to draw the bar plot on the right-hand side of the page. First, we push a viewport with the layout, then we push a viewport in column 3 of the layout, and then we push another viewport that is only 70% of the height of that third column (in the output below the code, the viewports are shown as grey rectangles). The next step is to define a function that draws the bar plot (using functions from the 'graphics' package). We can then give that function as the first argument in a call to grid.echo to draw a 'grid' version of the bar plot inside the current viewport. It is important that we specify newpage=FALSE so that grid.echo does not start a new page. We also provide a prefix argument so that we can more easily identify the grobs that are created when the bar plot is drawn.

We can do something very similar to draw the pie chart within the large square region on the left side of the layout.

The problem is the same as before: we want to find out where the edge of the bar plot is and where the centre of the circle is. It is just much more complicated now because there are more viewports and grobs in the scene.

4. More 'grid' tools

When scenes become more complicated with lots of grobs and lots of viewports, the grid.grep function can be useful to search for grobs and viewports by name. For example, in this case, we know we want a rectangle for the bar plot, so we can try code like the following. The argument grep=TRUE means that grid.grep will treat its first argument as a regular expression. The argument viewports=TRUE means that we will search for viewports as well as grobs.

Including viewports in the search has the benefit that any grob matches also have their viewport path returned as an attribute, as shown below. This means that we know the name of the grob that we want and the name of the viewport that the grob was drawn within.

This programmatic approach is also better for capturing the steps involved in the calculations so that we can record and reuse them in the future.

With this information, we can again find the left edge of the bar plot exactly. The following code does this and draws a dot on the left edge. Notice how this code records the return value from the downViewport function, which gives the number of viewports that downViewport descended. This value is then be given to upViewport to "reverse" the call to downViewport after drawing.

We can do something similar to find the centre and edge of the pie chart, though a little more detective work is required. The grob listing above shows that the pie chart contains three polygons, with no clear indication of which is which. In this sort of situation, we may need a little exploration to get things right. The following code uses grid.edit from the 'grid' package to modify the border colour for the three polygons and we can see that the first is the wedge on the left, the second is the wedge on the right, and the third is the complete pie.

That third polygon makes it easy to calculate both the centre and edge of the pie chart. The following code places dots at the centre and right edge of the pie.

5. The final problem and the final solution

Unfortunately, we are still not quite where we want to be. The previous code shows that we can navigate down to the pie chart viewport and calculate the location of the pie polygon within that viewport. However, that does not tell us where the edge of the pie chart is relative to the edge of the bar plot. For that we need to know the location of the pie polygon within the page (and, similarly, we need the location of the bar plot rectangle within the page).

This is where two new 'grid' functions come in. The deviceLoc function takes a location within a viewport and converts it to a location (in inches) relative to the device (or, equivalently, the "root" viewport). The deviceDim function is similar, but its two arguments (and its return value) are a width/height pair rather than an x/y location.

The following code uses the deviceLoc function to calculate the centre of the pie chart and the edge of the bar rectangle relative to the device (and draws dots and a line between to show that it gets it right). Notice that we must first descend to the pie chart viewport (because that is where the pie polygon is drawn), where we then do the conversion, and then do similar for the bar rectangle, but then we must navigate back up to the device to use the results for drawing (because that is where the result of deviceLoc makes sense). The deviceDim function can be used to calculate the height of the bar rectangle.

This finally gives us the information that we need. We can calculate the centre of the pie, the radius of the pie, the left edge of the bar, and the height of the bar, all as locations and dimensions on the device. That information can then be used to carry out the trigonometric calculations that determine the point on the edge of the circle from which we should draw a line to the corner of the bar. The final result is reproduced below, along with a listing of the grobs and viewports that are involved, which show that the lines from pie to bar (upper-tangent and lower-tangent) are both drawn within the "root" viewport (because that is the common coordinate system within which we calculated the important locations of both pie and bar).

6. Discussion

This report has described how to solve a specific R graphics customisation example using tools from the 'grid' and 'gridGraphics' packages. As a specific example, this provides a useful demonstration of some tools in the 'grid' graphics world that may not be very well known. It also demonstrates the value of converting a high-level plot that was drawn using the base graphics system in R into a 'grid' version, to gain access to the tools within the 'grid' graphics system. More generally, it demonstrates that quite extreme customisations are possible within the 'grid' graphics system.

The importance of code

An important feature of the solutions outlined in this report is that they are code based. One way that this sort of annotation could be solved is by drawing the pie chart and the bar plot with R, saving to PDF or SVG format, and then manually adding the lines with an editor like Adobe Illustrator or Inkscape (The Inkscape Team, 2018).

By performing the annotation entirely within R, we get the usual benefits of code-based solutions: maintaining a record, easy reuse, reproducibility, sharing, etc. In this case, we also get the benefit of accuracy. Any manual attempt to draw the lines so that they are tangent to the circumference of the pie cannot hope to guarantee accuracy in the same way as a mathematical calculation in code. This is also an excellent example where tiny adjustments to the image, such as shifting the relative positions of the pie chart and the bar plot, will be trivially accommodated by the code solution, but would rapidly induce apoplexy if manual corrections had to be repeated.

How well do device locations nest?

An important feature of the solution outlined in this report was the conversion of locations and dimensions of shapes within the scene to locations and dimensions in terms of inches on the device (or "root" viewport), using deviceLoc and deviceDim. This makes the resulting drawing very absolute, which, for example, means that it is only valid for the current device size.

This sort of absolute solution is undesirable in the 'grid' graphics system because, ideally, any drawing that we do is not relative to the device, but relative to the current viewport. This is to allow our drawing to be nested or embedded within other people's drawing.

So is it possible to use the solution outlined in this report as a sub-plot within a more complex graphic? The answer is "yes", in at least two ways.

The code that does the actual drawing of the pie chart and bar plot with connecting lines has been encapsulated within a piebar function so that we can experiment with different approaches. To create the final graphic, we can just call that function, as shown below.

The following code shows that we can successfully call that function within another viewport. In the resulting image, the whole page is represented by a grey rectangle, and the viewport within which we are doing our drawing is represented by a white rectangle.

Although this "just works", it is dependent on several important details within the piebar function. When we use grid.grep to find viewport paths to grobs within the page, the viewport path that we get starts from the very top-level "root" viewport. Furthermore, once we have locations from deviceLoc, those locations only make sense within the very top-level "root" viewport. As a consequence, the piebar function must use upViewport(0) to navigate up to the "root" viewport, to account for the possibility that there are viewports above those that the piebar itself has set up. Another point to be careful about is the pattern given to grid.grep to search for grobs by name. If there is other drawing on the page, then there will be other grobs and we will not know their names, so it is possible to get name conflicts and for grid.grep to return an unexpected result.

Another approach, that avoids some of the issues above, is to draw the output of piebar on its own device and then "copy" it into the current viewport. This means that the piebar code really can assume that it is the only output on the device. This approach is possible by using the grid.grabExpr function, which runs an expression on its own device and then captures the result as a 'grid' gTree, that can subsequently be drawn via grid.draw. The following code demonstrates this approach.

Why not a "din" unit ?

The solution described in this report makes use of a new function deviceLoc. This is a function that converts from a location expressed in units relative to a viewport to a location in inches relative to the graphics device. How is this different from the existing convertX function (and its ilk)?

One important difference is that convertX only converts between two different coordinate systems within the same viewport. A unit object is always relative to the coordinate systems of the viewport that it is evaluated within. For example, unit(.5, "npc") mean half way across (or half way up) the current viewport. With convertX, we can convert that to a number of inches across (or up) the same viewport, but that is all. By comparison, convertLoc converts from the coordinate systems of the current viewport to a location in inches within the "root" viewport. So convertX converts within the same viewport and convertLoc converts between different viewports.

A second difference is that convertX only converts a value relative to the x-dimension coordinates. It ignores the y-dimension coordinates. This is possible because all units are relative to the current viewport and the current viewport is always rectangular and the current viewport only has cartesian coordinate systems. By comparison, the convertLoc function converts a location - a pair of x/y units. This is because it is possible for a viewport to be rotated. For example, consider the image below, which shows a device (grey rectangle) with a rotated viewport (white rectangle). The dashed line represents constant x-values within the viewport and it is clear that we cannot convert an x-value within the viewport on its own to an x-value on the device; we also need to know a y-value within the viewport (e.g., the dot within the diagram below) in order to convert an x/y location within the viewport to an x/y location on the device.

Because of those differences, it is not possible to add a general-purpose "din" (device inches) unit to 'grid'.

Is there a better way?

For this particular example, an alternative approach would be to draw the pie chart and the bar plot ourselves with direct 'grid' calls, e.g., grid.polygon and grid.rect, rather than relying on calls to high-level 'graphics' functions. That would possibly simplify the problem because we could have greater control over the placement of the pie and the bar. However, this would not always be the case. For example, if we were combining output from a more complex 'graphics' function, such as the plot.dendrogram function, it would be much harder to replace the high-level function call with our own direct calls to 'grid' functions. The scenario considered in this report represents a general class of graphics problems where we want the convenience of using someone else's high-level plotting functions combined with the ability to query and revisit the low-level details of what those functions drew.

Another possibility to consider is producing the image with something other than R. What makes this sort of customisation possible in R is that there are high-level graphics packages like 'lattice' and 'ggplot2' (plus 'graphics' via 'gridGraphics') for drawing complete plots, but they do their drawing using a lower-level graphics system, 'grid', that records the graphical objects and coordinate systems and provides tools for modifying and querying and revisiting those lower-level details of the complete plot. Two non-R graphics systems that bear some resemblance to this arrangement are, within the TeX world, the PGF/TikZ package (Tantau, 2015) with PGFPLOTS (Feuersanger, 2012) built on top and, within the Javascript world, the D3.js library (Bostock et al., 2011) and plotting systems built on top of it, like C3.js (Tanaka, 2018).

The similarity in the case of D3 lies in the fact that it provides a higher level interface for generating HTML and SVG (and CSS) images, but the image that it produces is pure HTML and SVG (and CSS). This means that it is possible to use low-level DOM (Document Object Model) tools to customise an image that was created by D3. For example, the following javascript code uses the C3.js library (which uses D3.js) to create a simple stacked barplot from a high-level description.

The next javascript code uses low-level DOM tools to determine the location of the bars that the C3.js library drew and adds a black dot on the left edge.

PGF/TikZ is a very powerful and flexible low-level graphics system and PGFPLOTS provides a high-level interface for producing plots with PGF/TikZ, but the result can be manipulated using PGF/TikZ itself. For example, the following LaTeX code uses PGFPLOTS to draw a simple stacked barplot (the axis environment and the \addplot commands), but then adds a black dot using a low-level PGF/TikZ node, with the location of the node being calculated by querying the PGFPLOTS system (/pgfplots/xmin).

In comparison to both cases, 'grid' (plus 'lattice' or 'ggplot2' or 'graphics'/'gridGraphics') is unusual and possibly unique in its explicit support for revisiting coordinate systems and providing transformations between coordinate systems. Also, with 'grid' being in the R world, there are many more tools for data processing and calculations.

7. Summary

This report describes a complex R graphics customisation with the following important features: two 'graphics' based plots are combined together on the same page, then further drawing is added that spans the coordinate systems of both plots.

The solution consists of the following steps: convert the 'graphics' plots to 'grid' plots using grid.echo from the 'gridGraphics' package; use 'grid' functions grid.ls and grid.grep to determine the names of important grobs and the names of the viewports that they are drawn within; navigate down to those viewports and use grobX, grobY, and a new function deviceLoc to calculate important locations within each plot in terms of inches on the graphics device; navigate back up to the "root" viewport to draw the annotations that span both plots based on the locations in terms of inches on the graphics device.

8. Technical requirements

The examples and discussion in this document are mostly relevant to any recent R version (e.g., anything in the 3.* series). However, the functions deviceLoc and deviceDim are only available in the development version of R (revision r74634), which will become R version 3.6.0.

This report was generated within a Docker container (see Resources section below).

9. Resources

The raw source file for this report, a valid XML transformation of the source file, a 'knitr' document generated from the XML file, two R files and the bibtex file that are used to generate the table of contents and reference sections, two XSL files and an R file that are used to transform the XML to the 'knitr' document, and a Makefile that contains code for the other transformations and coordinates everything. These materials are also available on github.
The file pie.R from Diana Kriese that contains R code to draw an "enhanced" pie chart.
This report was generated within a Docker container. The Docker command to build the report is included in the Makefile above. The Docker image for the container is available from Docker Hub; alternatively, the image can be rebuilt from its Dockerfile.

How to cite this document

Murrell, P. (2018). "Extreme Makeover: R Graphics Edition." Technical Report 2018-04, Department of Statistics, The University of Auckland. [ bib ]

10. References

[Bostock et al., 2011]: Bostock, M., Ogievetsky, V., and Heer, J. (2011). D3: Data-driven documents. IEEE Trans. Visualization & Comp. Graphics (Proc. InfoVis). [ bib | http ]
[Feuersanger, 2012]: Feuersanger, C. (2012). Manual for Package pgfplots. [ bib | .pdf ]
[Murrell, 2011]: Murrell, P. (2011). R Graphics, Second Edition. Chapman & Hall/CRC the R series. Chapman & Hall/CRC Press, Boca Raton, FL. [ bib | http ]
[Murrell, 2015]: Murrell, P. (2015). The gridGraphics Package. The R Journal, 7(1):151--162. [ bib | .html ]
[R Core Team, 2018]: R Core Team (2018). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. [ bib | http ]
[Tanaka, 2018]: Tanaka, M. (2018). c3: A D3-based reusable chart library. [ bib | http ]
[Tantau, 2015]: Tantau, T. (2015). The TikZ and PGF Packages. [ bib | http ]
[The Inkscape Team, 2018]: The Inkscape Team (2018). Inkscape. [ bib | http ]