A Structured Approach for Generating SVG

Simon Potter simon.potter@auckland.ac.nz and Paul Murrell p.murrell@auckland.ac.nz

Department of Statistics, University of Auckland

Abstract: The gridSVG package exports grid images to an SVG format for viewing on the web. This article describes a new development in the way that gridSVG produces the SVG output. The result is greater flexibility in how the SVG output is produced and increased opportunities to manipulate the SVG output, which creates new possibilities for generating more complex and sophisticated dynamic and interactive R graphics for the web.

Introduction

grid is an alternative graphics system to the traditional base graphics system provided by R [1]. Two key features of grid distinguish it from the base graphics system, graphics objects (grobs) and viewports.

Viewports are how grid defines a drawing context and plotting region. All drawing occurs relative to the coordinate system within a viewport. Viewports have a location and dimension and set scales on the horizontal and vertical axes. Crucially, they also have a name so we know how to refer to them.

Graphics objects (grobs) store information necessary to describe how a particular object is to be drawn. For example, a grid circleGrob contains the information used to describe a circle, in particular its location and its radius. As with viewports, graphics objects also have names.

The task that gridSVG [2] performs is to translate viewports and graphics objects into SVG [3] equivalents. In particular, the exported SVG image retains the naming information on viewports and graphics objects. The advantage of this is we can still refer to the same information in grid and in SVG. In addition, we are able to annotate grid grobs to take advantage of SVG features such as hyperlinking and animation.

This document describes a new development in gridSVG that changes the mechanism used to convert grid grobs and viewports to an SVG representation.

The Old Method of Creating SVG

In order to aid our explanation, a simple grid plot will be drawn using the code below.

The output from grid.ls() shows the grid display list. This is represents the list of grobs that have been plotted on the current graphics device. The display list shows that the rectangle has been drawn and we can see that it is named example-rect. When gridSVG translates example-rect into SVG, the rectangle translates into the following markup:

Prior to the recent development, gridSVG would create this SVG by concatenating strings. The first step involved creating an SVG group (g). This group needs to have all of its appropriate attributes inserted, which always include an id attribute, but can also include attributes related to animation, hyperlinking, or custom attributes by “garnishing” attributes. In R, string concatenation is accomplished using the paste() function. A fragment of pseudo-code follows, which would generate the SVG group markup.

In this case the ... represents the optional attributes applied to a group, e.g. hyperlinking. We can see already that the code to produce the SVG markup is reasonably complex compared to the markup itself. Note that we have also increased the level of indentation so that children of this group are clearly observed to be children of this particular group.

The next step is to add a child <rect /> element to this SVG group. We need to first indent to the correct position on a new line, and then draw the rectangle. The code that would be used to produce the rectangle is shown below.

We can clearly see how attribute values are inserted into the SVG output, in particular with our location and dimension attributes. Again, the ... represents other attributes that may be inserted (though not demonstrated). What is also being shown here is how we are applying the indentation. This is done by calling a function that returns a vector character with the correct number of spaces to indent our <rect /> element.

Once all children have been added to the SVG group, we can close the group so that all <rect /> elements are contained with it. Because we are closing an element, we need to decrease the level of indentation to preserve the heirarchical structure of the SVG markup. This means when closing any element, we need to do something similar to the following code which closes an SVG group.

We have shown how SVG images are built using a series of concatenated strings. It is important to note that these strings are written directly to a file (specified when calling gridToSVG()). This means each time an SVG fragment is created using paste(), it is appended to a specified file name.

This approach has a few limitations. For instance, we cannot guarantee that the output that is produced is valid SVG markup. We are also writing directly to a file, which means that we need to read the file to observe its contents; we do not retain the SVG content in resident memory. Finally, but less importantly, performance is a concern when generating output using repeated string concatenation as it is known to be a slow operation (this is less important because the drawing of the original image by grid, before export, is also slow).

To remedy these limitations a rewrite of the markup generating component of gridSVG was undertaken.

Structured Output with the XML package

The rewrite of part of the gridSVG package was achieved by utilising the XML [4] package. The XML package is an R wrapper for the libxml2 [5] XML parser and generator. The key feature that the XML package provides us with is a way of representing an SVG image as a collection of SVG nodes (elements), instead of a long character vector. We simply need to define the structure of the document, the XML package will take care of how this will be exported to text.

Image Construction

To define the structure of an SVG image, we need to establish how elements relate to each other. In the case of gridSVG, the only relationship of importance is the parent/child relationship. The earlier example with the rectangle will be recreated using the XML package to demonstrate the differences between the two approaches. The code that creates an SVG group is shown below. Notice that when we print out the node itself, the markup is generated for us by the XML package.

The group is given the name of the grob that it is going to be representing. Because we wish to add children to this <g> element, we set it as the current parent node with a call to the setParentNode() function.

The next piece of code creates a <rect /> element. It is important to note in this code that the parent parameter is given as an argument the result of the function call getParentNode(). Earlier we set the current parent node to be the <g> element. This means that the <rect> element will be a child of the <g> element.

We can now see how the document is beginning to build up as the <rect /> is added to the <g>.

A complete SVG document must have a "root" <svg>. This has been left out of the examples so far, but it is worth mentioning here because, with the XML approach we include several namespace definitions in the <svg> element. This allows the XML package to ensure that we are producing valid SVG output.

This <svg> element is made the parent node so that the <g> element we created earlier can be made a child of the root <svg> element.

If we print out the <svg> node now we see the <g> and <rect> elements nested neatly within it.

As a final step, we can write out the root SVG node. This will be inserted directly into this document.

This demonstrates how SVG images can be built up in a more reliable way than with simple string concatenation. It is clear that the way in which we define our SVG image is less prone to error in creating markup, and it also ensures that images are both well-formed (conform to XML syntax) and valid (conform to SVG syntax).

In-Memory Images

The node-based approach to SVG creation offers more advantages than just being a cleaner way of building up an image. We are saving the root node (and thus its descendents) after the image has been created. This means we can keep the image in memory until we want to save to disk, or some other output. An example where this is useful is for producing this article, plots are written out directly within the HTML document as inline SVG (rather than having to create an external file and then link to that file from the HTML document).

XPath

Another advantage is that because we are dealing with XML nodes, we can manipulate those nodes using other powerful XML tools such as XPath [6]. For example, we can retrieve and add subsets within the SVG image.

We will demonstrate this idea using a ggplot2 [7] plot (the ggplot2 package uses grid for rendering so a ggplot2 plot consists of a large number of grid viewports and grobs).

We can reduce this image by removing the legend, so that only the plot is shown. This code relies on standard functionality from the XML package for identifying and removing nodes; all we have to do is provide the XPath that describes the node that we want (in this case, a <g> element that has a specific id attribute).

Alternatively, we could extract just the legend from the plot and use it to create a new image.

These simple examples demonstrate the basic idea of extracting and combining arbitrary subsets of an SVG image. More complex applications are possible, such as combining the contents of two or more plots together. It is also important to note that these manipulations are made more convenient because the SVG produced by gridSVG has a clear and labelled structure; these tasks would be considerably more difficult if we had to work with the SVG output from the standard R svg() device.

Inserting Nodes

Another advantage of new approach is that when we create an XML node, it can then be inserted into the SVG document at any location. Previously, with the string concatenation approach we were forced to simply append to the document. Now we have the option of inserting nodes at any point in the document.

A case where this is useful is within gridSVG itself. When gridToSVG() is called, there are three parameters of particular interest: the filename, export.coords and export.js. The latter two parameters determine how JavaScript code is to be included within an SVG image, if at all. If we are going to be including JavaScript code, the SVG image is first generated. Once the image is created we insert new <script> node(s) to the root <svg> element. This demonstrates the ability to insert nodes at any location because rather than being forced to append to the document, we are able to add the nodes to be children of the root <svg> element.

Tree Simplification

One particular case where the XML package gives us some advantages is when saving an XML document. The function saveXML() provides a boolean option, indent. This determines whether there is going to be any visual structure in the form of indentation and line breaks or none at all. An example of its effect is shown below.

We can see that the output without indentation present is much more compact. In complex SVG images, particularly those with deep heirarchical structure, this could reduce the size of the resulting file greatly, which would improve the delivery speed of gridSVG plots being sent over the web by reducing the amount of data that needs to be transferred.

Another case where removing indentation is useful is when manipulating the SVG image in the browser using JavaScript. When parsing the SVG DOM with indentation present, the whitespace used for indentation is counted as a “node”. This makes it difficult to traverse the DOM as it forces us to check whether the node that we have encountered is simply whitespace text or not. When indentation is removed, we no longer have this problem and can be certain that all nodes are either elements, or actual content within them.

Conclusion

This article describes changes to the mechanism used by the gridSVG package to convert grid viewports and grobs to SVG representations. Instead of pasting strings together to generate SVG code as text within an external file, the gridSVG package now uses the XML package to create XML nodes in resident memory. The advantages of this approach include: guaranteed validity of the SVG representation; greater flexibility in the production of the SVG representation; improved access to the SVG representation; and greater flexibility in the formatting of the SVG code. There are also possible speed benefits from these changes.

These advantages have been demonstrated through simple examples, but they also have an impact on much more complex scenarios. For example, if R is being used to serve web content to a browser, it is now possible for gridSVG to provide SVG fragments (rather than complete plots) and to supply them directly from resident memory (rather than having to generate an external file as an intermediate step).

Downloads

This document is licensed under a Creative Commons Attribution 3.0 New Zealand License . The code is freely available under the GPL. The features described in this article were added to version 1.0-0 of gridSVG, which is available on R-Forge (if not on CRAN).

Code used to generate article (note, requires knitr, available on CRAN)

References

R Development Core Team (2012). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.
Murrell, P. and Potter, S. (2012). gridSVG: Export grid graphics as SVG. https://r-forge.r-project.org/projects/gridsvg/. R package version 1.0-0.
W3C (2011). Scalable Vector Graphics (SVG) 1.1 (Second Edition) Specification. http://www.w3.org/TR/SVG/.
Lang, D. T. (2012). XML: Tools for parsing and generating XML within R and S-Plus. http://www.omegahat.org/RSXML/. R package version 3.95-0
libxml2: The XML C parser and toolkit of Gnome. http://www.xmlsoft.org/.
W3C (1999). XML Path Language (XPath) Version 1.0 Specification. http://www.w3.org/TR/xpath/.
Wickham, H. and Chang, W. (2012). ggplot2: An implementation of the Grammar of Graphics. http://ggplot2.org/. R package version 0.9.2.1.