Speeding up gridSVG

by Paul Murrell http://orcid.org/0000-0002-3224-8858

Sunday 29 October 2017

Creative Commons License
This document by Paul Murrell is licensed under a Creative Commons Attribution 4.0 International License.

This report describes changes in version 1.6-0 of the 'gridSVG' package for R. The most important result of these changes is that 'gridSVG' is now much faster at generating SVG files (in situations where it used to be painfully slow).

Table of Contents:

1. Introduction

The 'gridSVG' package (Murrell and Potter) can be used to convert an R plot to an SVG format. It differs from the built-in svg graphics device in several ways: 'gridSVG' only works on plots drawn using 'grid' (including 'lattice' plots and 'ggplot2' plots) though the 'gridGraphics' package (Murrell) provides a pathway for 'graphics'-based plots; the SVG that 'gridSVG' generates contains more labelling and hierarchical structure (which turns out to be useful for accessibility, for example in the 'BrailleR' package (Godfrey, Godfrey and Murrell); and 'gridSVG' provides access to SVG features that are not available in normal R graphics (e.g., masks, fill patterns and gradients, and filters).

One major drawback of the 'gridSVG' package has been that it is very slow, particularly when converting plots that contain many individual shapes, such as a scatterplot with a thousand points.

For the purposes of demonstration, 'gridSVG' version 1.5-1 has been installed as the 'gridSVGslow' package and an example of its performance is shown below.

p <- xyplot(runif(1000) ~ runif(1000))
     user  system elapsed 
    6.704   0.000   6.743


Version 1.6-0 of 'gridSVG' includes some internal changes that significantly speed up this sort of plot.

     user  system elapsed 
    0.536   0.000   0.587

2. Internal changes

This section describes the idea behind the internal changes for 'gridSVG' version 0.6-1.

Tauno Metsalu (private correspondence) was the first to identify that the source of 'gridSVG's poor performance was a for loop within the code that exports points to SVG. Profiling data showed that a large percentage of the time was being spent in repeated calls to the xmlNewNode function from the 'XML' package.

The following code demonstrates a simplified version of the problem. We create a dummy XML document and add 1000 <rect> elements to it by calling newXMLNode and specifying the parent argument to add the elements to the root node of the document. This is the essence of the internal 'gridSVG' code prior to version 1.6-0.

buildXML <- function() {
doc <- newXMLDoc(node=newXMLNode("dummy"))
root <- xmlRoot(doc)
for (i in 1:1000) {
    newXMLNode("rect", parent=root)
system.time(loopXML <- buildXML())
     user  system elapsed 
    3.632   0.000   3.635

One way to speed this process up is to generate all of the <rect> elements first and then add them all at once, with the addChildren function.

buildXMLchildren <- function() {
doc <- newXMLDoc(node=newXMLNode("dummy"))
root <- xmlRoot(doc)
rects <- lapply(1:1000, function(i) { newXMLNode("rect") })
addChildren(root, kids=rects)
system.time(childrenXML <- buildXMLchildren())
     user  system elapsed 
    0.128   0.000   0.127
identical(saveXML(loopXML), saveXML(childrenXML))
  [1] TRUE

Even faster is to generate the <rect> elements as text and then parse them with the xmlParse function (and then add them all at once with the addChildren function). In the code below, <rect> has been entered as \074rect\076 to avoid conflicting with the HTML syntax of this document.

buildChar <- function() {
doc <- newXMLDoc(node=newXMLNode("dummy"))
root <- xmlRoot(doc)
xmlChar <- paste("\074temp\076",
                 paste(rep("\074rect/\076", 1000), collapse=""),
                 "\074/temp\076", sep="")
rects <- xmlParse(xmlChar)
addChildren(root, kids=xmlChildren(xmlRoot(rects)))
system.time(charXML <- buildChar())
     user  system elapsed 
    0.020   0.000   0.024
identical(saveXML(loopXML), saveXML(charXML))
  [1] TRUE

This faster approach has been implemented in 'gridSVG' for generating <rect> elements, <circle> elements, and <use> elements (which are used to export data symbols to SVG). These are the only elements that are generated in large quantities for standard plots, so they are the main bottlenecks that have slowed down 'gridSVG' in the past.

The approach of generating XML text and then parsing it was not applied more widely within 'gridSVG' for two reasons: it would have required a great deal of very invasive changes to the source code; and there are some very useful benefits from generating XML internal nodes (via newXMLNode). An example of the latter is the ability to build SVG content out of order. For example, it is easier to create reusable SVG content, like <symbol> elements, that appear at the start of the SVG document and are linked to from <use> elements later in the SVG document.

3. Benchmarking

This section presents the results of a wider range of timings to evaluate the performance of the new version of 'gridSVG'. The time taken to export a simple scatterplot was measured for both old and new versions of 'gridSVG', with the number of points on the scatterplot increasing from 10^1 to 10^3.2 (with the exponent increasing in steps of 0.2). The plot on the left below shows the timings for both old and new versions, with the old version in blue and the new version in black. This shows that the gain in performance increases as we export more points. We can also see that the performance of the old version deteriorates exponentially as the number of points increases. The plot on the right below shows just the timings for the new version of 'gridSVG'. This suggests that, not only is the performance improved overall, but the performance is now deteriorating in a linear fashion (the gain in performance is exponential as the number of points increases). Based on profiling with the 'profvis' package (Chang and Luraschi), it appears that the instability in timings at low numbers of points is due to automatic byte compilation (the first time the 'gridSVG' functions are called) and garbage collection.

4. Summary

Version 1.6-0 of the 'gridSVG' package includes internal changes that have resulted in significant speed improvements when exporting plots that contain a large number of individual elements, such as a scatterplot with many data symbols.

5. Technical requirements

The examples and discussion in this document relate to version 1.6-0 of the 'gridSVG' package.

This report was generated within a Docker container (see Resources section below).

6. Resources

How to cite this document

Murrell, P. (2017). Speeding up gridSVG. Technical Report 2017-04, University of Auckland. [ bib ]

7. References

[Chang and Luraschi, 2017]
Chang, W. and Luraschi, J. (2017). profvis: Interactive Visualizations for Profiling R Code. R package version 0.3.3. [ bib | http ]
[Godfrey, 2013]
Godfrey, A. J. R. (2013). Statistical Software from a Blind Person's Perspective. The R Journal, 5(1):73--79. [ bib | .html ]
[Godfrey, 2017]
Godfrey, A. J. R. (2017). BrailleR: Improved access for blind users. Massey University. R package version 0.27.1. [ bib | http ]
[Godfrey and Murrell, 2016]
Godfrey, A. J. R. and Murrell, P. (2016). Statistical graphs made tactile. In The 3rd International Workshop on "Digitization and E-Inclusion in Mathematics and Science 2016" DEIMS2016. [ bib ]
[Murrell, 2015]
Murrell, P. (2015). The gridGraphics Package. The R Journal, 7(1):151--162. [ bib | .html ]
[Murrell and Potter, 2014]
Murrell, P. and Potter, S. (2014). The gridSVG Package. The R Journal, 6(1):133--143. [ bib | .html ]

Creative Commons License
This document by Paul Murrell is licensed under a Creative Commons Attribution 4.0 International License.