Rendering HTML Content in R Graphics

by Paul Murrell http://orcid.org/0000-0002-3224-8858

Version 3: cat(format(Sys.Date(), "%A %d %B %Y"))

Version 1: original publication
Version 2: fixed up 'gridExtra' citation
Version 3: update pdf.js code (for displaying PDFs)

opts_chunk$set(comment=" ", tidy=FALSE) options(width=100) ## For wonky desktop set up options(bitmapType="cairo") library(grid)

This document by Paul Murrell is licensed under a Creative Commons Attribution 4.0 International License.

This report describes several R packages that allow HTML content to be rendered as part of an R plot. The core package is called 'layoutEngine', but that package requires a "backend" package to perform HTML layout calculations. Three example backends are demonstrated: 'layoutEngineCSSBox', 'layoutEnginePhantomJS', and 'layoutEngineDOM'. We also introduce two new font packages, 'gyre' and 'courier'.

Introduction

The aim of the 'layoutEngine' package is to support rendering of HTML content within R graphics (). The following code provides a simple demonstration. We start with a standard 'lattice' plot ().

library(lattice) xyplot(mpg ~ disp, mtcars)

Next, we generate some HTML, in this case using xtable from the 'xtable' package ().

library(xtable) html <- print(xtable(head(mtcars[1:3])), type="html", print.results=FALSE)

Finally, we navigate to the main 'lattice' panel viewport and call grid.html from the 'layoutEngine' package (), using the 'layoutEnginePhantomJS' backend (), to render the HTML table (in the top right corner of the plot).

library(layoutEnginePhantomJS) downViewport("plot_01.panel.1.1.vp") grid.html(html, x=unit(1, "npc") - unit(2, "mm"), y=unit(1, "npc") - unit(2, "mm"), just=c("right", "top")) <> <>

We could do the same thing in a single step using a 'lattice' "panel function". The following code produces exactly the same result as the plot above.

xyplot(mpg ~ disp, mtcars, panel=function(...) { panel.xyplot(...) grid.html(html, x=unit(1, "npc") - unit(2, "mm"), y=unit(1, "npc") - unit(2, "mm"), just=c("right", "top")) })

The next two sections describe the main functions in the 'layoutEngine' package through a series of examples. Subsequent sections go into the underlying design details of the package.

Why render HTML in R graphics?

One reason for creating the 'layoutEngine' package is to be able to generate graphics that are easy (or easier) to describe in HTML compared to R graphics.

For example, in R graphics, we can normally only draw text with a single font face. The following code uses grid.html to draw text containing plain, italic, and bold font faces.

grid.html("

plain, italic, and bold

The above example is actually possible in R graphics if we (mis)use a mathematical expression, like this ...

grid.text(expression(paste("plain, ", italic(italic), ", and ", bold(bold))))

... but something that really is not possible in R graphics is the use of more than one font family in a single piece of text. The following code shows how we can do this with grid.html.

Rfonts <- c("sans", "serif", "mono", "Carlito") CSSfonts <- cssFontFamily(Rfonts) grid.html(paste0('

sans, ', ' serif, ', ' mono, and ', ' Carlito.'), fonts=Rfonts)

The code above demonstrates an important point about using grid.html. If we are rendering HTML that includes text content, we must specify the fonts that we are using for that text. This font specification must happen in R, via the fonts argument to grid.html, and in CSS within the HTML that we are rendering, typically via style attributes, and those specifications must match. The cssFontFamily function can be used to map R fonts to CSS fonts to make sure that we are specifying the same font in both R and CSS. The Section on contains much more information on this topic. We did not have to specify any fonts in the previous examples because grid.html defaults to using a "sans" font.

The introduction showed another example use of 'layoutEngine', which is to take advantage of the typesetting features of HTML to arrange text in a table, especially when packages like 'xtable' have already been written to generate the required HTML. The following code shows a more complex table example from the 'formattable' package ().

library(formattable) products <- data.frame(id = 1:5, price = c(10, 15, 12, 8, 9), rating = c(5, 4, 4, 3, 4), market_share = percent(c(0.1, 0.12, 0.05, 0.03, 0.14)), revenue = accounting(c(55000, 36400, 12000, -25000, 98100)), profit = accounting(c(25300, 11500, -8200, -46000, 65000))) sign_formatter <- formatter("span", style = x ~ style(color = ifelse(x > 0, "green", ifelse(x < 0, "red", "black")))) table <- formattable(products, list( price = color_tile("transparent", "lightpink"), rating = color_bar("lightgreen"), market_share = color_bar("lightblue"), revenue = sign_formatter, profit = sign_formatter)) grid.html(as.character(table))

The result of the rendering in the example above is not exactly the same as the rendering that a browser produces. The layout of the table is correct and the colours are correct, but in the browser the coloured backgrounds are rounded rectangles (because the HTML that 'formattable' generated uses the CSS property border-radius). This imperfect rendering reflects the fact that the 'layoutEngine' package currently only supports some basic CSS properties. For much more information on the limitations of the 'layoutEngine' package, see the Section.

In a DisplayR Blog post, Tim Bock extolls the relative simplicity of generating tables in HTML, using an example that required wrapped text for both row and column headers. In his post, he provided code for generating the HTML table from R; the following code renders that HTML code back in R graphics.

options(layoutEngine.backend=phantomjsEngine) html <- readLines("displayr.html") htmlDoc <- htmlDocument(html) grid.html(htmlDoc)

The new feature in the code above is a call to the htmlDocument function. This alternative to the htmlElement function is intended for when we have a complete HTML document (rather than just an HTML fragment) that we want to render in R. We have to call htmlDocument explicitly because grid.html assumes that character input is only an HTML fragment (and calls htmlElement itself).

Drawing tables of values can be achieved directly in R graphics (e.g., using grid.table from the 'gridExtra' package; ), but HTML (combined with CSS) provides many other typesetting features that are not covered by R graphics packages. The following code shows an example of the CSS shape-outside property (combined with the float property), which allows us to flow text around a non-rectangular shape (in this case a circle).

For this example, we switch to the 'layoutEngineDOM' backend () because the 'layoutEnginePhantomJS' backend does not support the shape-outside property. The 'layoutEngineDOM' backend provides access to the default system web browser (which is a recent Firefox for this report), which means that we can gain access to the most recent CSS properties.

library(layoutEngineDOM) ## Give browser plenty of time to do its work options(DOM.limit=10) html <- c('

', '

This text flows around a circle! Try doing that in R!

', '

') grid.html(html) <> grid.html(html, viewports=TRUE) grid.force() downViewport("BODY.1.DIV.2.DIV.2.vp") grid.circle(gp=gpar(fill=NA))

Mixing HTML and R Graphics

The real value of being able to render HTML content in R is not just so that we can reproduce what would appear in a browser. This facility becomes more useful when we combine the HTML content with other graphical output that R is good at, such as plots. An example of this was shown in the introduction, where an HTML table was drawn within a 'lattice' plot. This section demonstrates some other possibilities.

The code below uses HTML rendering to provide a plot axis label for a 'lattice' plot that contains more than one font (the R variable name is drawn with a typewriter font). One difference about this code is the use of htmlGrob rather than grid.html because we need to specify the label to xyplot as a 'grid' grob (rather than just draw it immediately). We also call the flow function to lay out the HTML content and pass that to htmlGrob (rather than passing the raw HTML). This is because xyplot queries the label grob several times (e.g., for its height and width) and we do not want to have to perform the HTML layout for every query. The Section on discusses the flow function in more detail.

For this example, we switch to the CSS Box backend, 'layoutEngineCSSBox' (). We use this backend for this example because it produces a better result than the PhantomJS backend and it is "quieter" than the 'DOM' backend. The Section on discusses the strengths and weaknesses of the different backends in more detail.

library(layoutEngineCSSBox) html <- paste0('Engine Displacement ', ' (disp)', "") flowedHTML <- flow(html) xyplot(mpg ~ disp, mtcars, xlab=htmlGrob(flowedHTML))

The next example, from a Stack Overflow post, is very similar. This example involves adding a caption to a 'ggplot2' plot (), using gridExtra::grid.arrange to arrange the plot above the caption. The caption does not involve multiple font families, but it does involve multiple font faces and it involves text split across two lines (which is a step too far for R's limited plotmath facility). This is a good example of a typesetting task that looks very simple, and is simple in HTML, but is not normally possible in R graphics.

library(ggplot2) library(gridExtra) note <- '

Note: Market concentration averages in the United States, United Kingdom, and the
Netherlands are, respectively, 1920, 1388, and 1244

' gg <- ggplot(mtcars, aes(wt, mpg)) + geom_point() caption <- htmlGrob(flow(note)) grid.arrange(gg, bottom=caption)

The next example demonstrates that, having rendered HTML content in R (with 'grid') in combination with an R plot (based on 'grid'), it is possible to make use of 'grid' tools to further integrate the HTML output with the R plot output. In the following code, we draw a 'lattice' plot, add an HTML table at the top-right of the plot, then draw an arrow from the left edge of a piece of text within the HTML table to the corresponding data point within the 'lattice' plot.

options(layoutEngine.backend=phantomjsEngine) xyplot(mpg ~ disp, mtcars) html <- print(xtable(head(mtcars[1:3])), type="html", print.results=FALSE) flowedhtml <- flow(html) downViewport("plot_01.panel.1.1.vp") grid.html(flowedhtml, x=unit(1, "npc") - unit(2, "mm"), y=unit(1, "npc") - unit(2, "mm"), just=c("right", "top"), viewports=TRUE) grid.move.to(mtcars$disp[1], mtcars$mpg[1], default.units="native") upViewport(0) vp <- grid.grep("TD.+SPAN.+vp", grep=TRUE, viewports=TRUE) downViewport(vp) arr <- arrow(ends="first", angle=15, length=unit(5, "mm"), type="closed") grid.line.to(unit(-1, "mm"), .5, arrow=arr, gp=gpar(fill="black"))

There are a couple of interesting features in the code above. One new feature is the use of viewports=TRUE in the call to grid.html. This means that, in addition to drawing the HTML content, 'grid' viewports are created that correspond to each piece of HTML content. This is what allows us to use grid.grep to find the viewport that corresponds to the first <SPAN> element within a <TD> element within the HTML table (so that we can navigate down to that viewport to specify one end of the arrow).

Another feature is the use of the flow function. The reason for calling flow in this example is slightly different than the previous one. This time we are making sure that the 'grid' grobs and viewports that are created when the HTML content is rendered are visible on the 'grid' display list (so that we can navigate to the viewports that correspond to the HTML content to add drawing). If we just passed the raw HTML directly to grid.html, we would have had to call grid.force (and trigger another HTML layout calculation) in order to make the 'grid' grobs and viewports visible on the 'grid' display list. The Section on explains the difference between flowed HTML and raw HTML in more detail.

The next example also demonstrates the use of viewports=TRUE to generate 'grid' viewports from HTML content. This example is a variation on the example from the previous section that used shape-outside to flow text around a non-rectangular shape. This time, rather than specifying a circle to flow text around, we will specify an image, and the image will be an R plot (a pie chart). We specify a transparent background for the image because shape-outside flows text around the non-transparent component of the image.

pie.sales <- c(0.12, 0.3, 0.26, 0.16, 0.04, 0.12) png("assets/pie.png", width=200, height=200, bg="transparent") par(mar=rep(0, 4)) pie(pie.sales, radius=.95, labels=NA) dev.off()

The following code uses HTML to flow text around the pie chart, then navigates to the <div> that was flowed around and draws the R pie chart within the corresponding viewport.

options(layoutEngine.backend=DOMEngine) ## Give browser plenty of time to do its work options(DOM.limit=10) html <- c('

', '

Pie charts may not be the best data visualisation tool, but they are fantastic fun to flow text around!

', '

') flowedhtml <- flow(html, assets=file.path(getwd(), "assets", "pie.png")) grid.html(flowedhtml, viewports=TRUE) vp <- grid.grep("pie", grep=TRUE, grobs=FALSE, viewports=TRUE) downViewport(vp) library(gridGraphics) grid.echo(function() { par(mar=rep(0, 4)) pie(pie.sales, radius=.95, labels=NA) }, newpage=FALSE)

One interesting feature of the code above is that we have HTML content that refers to an external resource: the file assets/pie.png. In order to layout this sort of HTML, we must use the assets argument to flow, so that the layout engine can find the external resource. This only applies when we are dealing with resources on the local file system; external resources (URLs) should be resolved by the layout engine. If we provide raw HTML directly to grid.html, we can supply the assets argument there instead.

Another detail about the code above is that we must use the 'gridGraphics' package () to draw the pie chart because the 'layoutEngine' package works in the 'grid' graphics sytem and the pie function is based on the 'graphics' system.

The next example further embraces the use of 'grid' viewports based on an HTML layout. In this example, we use CSS Grid Layout () to produce an arrangement of regions and then draw R plots within those regions. The following code describes HTML that contains a set of <div> elements, some of which are empty (we will use those to draw plots), and some of which contain text captions. This is followed by CSS code that specifies the layout of the <div> elements, which consists of two columns and four rows, where the heights of the second and fourth rows are based on the size of the text captions.

Having rendered this combination of HTML and CSS, we navigate to the empty <div> viewports and draw an R plot in each one. We switch back to the 'layoutEngineDOM' backend because we need support for CSS Grid Layouts.

options(layoutEngine.backend=DOMEngine) html <- '

Figure 1: This caption takes up more than one row

Figure 2: This does not

Figure 3: One row

Figure 4: More than one row

' css <- ' div { border-style: solid; border-width: 1px; } #main { width: 400px; height: 400px; display: grid; grid-template-columns: 1fr 1fr; grid-template-rows: repeat(2, 1fr auto); grid-auto-flow: row; } ' flowedhtml <- flow(html, css=css) grid.html(flowedhtml, viewports=TRUE) figvps <- grid.grep("fig", grep=TRUE, global=TRUE, grobs=FALSE, viewports=TRUE) for (i in 1:4) { downViewport(figvps[[i]]) grid.echo(function() { par(mar=rep(0, 4)); pie(1:i, labels=NA) }, newpage=FALSE) upViewport(0) }

This layout would be tricky-to-impossible with a 'grid' layout because the heights of the second and fourth rows depend on the number of lines in the typeset text captions.

A new feature in the code above is the css argument in the call to flow. This is how we can provide CSS code separately from HTML code for laying out HTML content.

The next example further emphasises the value of using a web browser as the backend. In this example, we render a complete HTML page that contains a table that is styled and augmented by the DataTables javascript library (). Most of this result is just CSS, but the "3 of 3 entries" line below the table is generated by javascript code. This demonstrates the idea that we can benefit from a backend that can evaluate javascript code to render HTML content that is (even partially) generated by a javascript library.

htmlDoc <- htmlDocument(readLines("DataTable.html")) grid.html(htmlDoc)

The final example really mixes things up. This time, to generate HTML content, we use an (admittedly trivial) R Markdown document (shown below).

cat(readLines("example.Rmd"), sep="\n")

The 'rmarkdown' package () is used to produce HTML, including running the embedded R code and including the R output in the final HTML, then that HTML is rendered at top-right within a 'lattice' plot.

library(rmarkdown) render("example.Rmd", "html_document") ## Run rmarkdown::render() in separate R process, rather than within ## a document that is already being knit()ed system("Rscript -e 'library(rmarkdown); render(\"example.Rmd\", \"html_document\")'") options(layoutEngine.backend=phantomjsEngine) html <- readLines("example.html") htmlDoc <- htmlDocument(html) xyplot(mpg ~ disp, mtcars, panel=function(...) { panel.xyplot(...) grid.html(htmlDoc, x=unit(1, "npc") - unit(1, "mm"), y=unit(1, "npc") - unit(1, "mm"), width=unit(3, "in"), just=c("right", "top")) })

This ends the high-level examples that demonstrate basic usage of the 'layoutEngine' package. The remaining sections delve into the underlying design of the package and lower-level details of how things work.

Defining fonts

A fundamental requirement of the 'layoutEngine' package is that exactly the same fonts are used both to layout HTML content in a backend engine (e.g., 'layoutEngineCSSBox') and to render the HTML content in R. This is a requirement because the layout of HTML content in almost all cases will depend on the typesetting of text within HTML containers (paragraphs, table cells, etc). If the fonts do not match exactly, the rendered text in R will not align with other rendered HTML content (e.g., table borders). This section describes how this matching is achieved in the 'layoutEngine' package.

Because the end goal is to render content in R graphics, we want to start with an R font specification and map that to a CSS font specification. A font specification in R consists of a font family name (e.g., "Carlito"), a font face (plain, bold, italic, or bold-italic), and a font size (e.g., 12pt).

In HTML, the font used for text can be controlled by the CSS properties font-family (e.g., "Carlito"), font-weight (e.g, "bold" or "normal"), font-style (e.g., "italic" or "normal"), and font-size (e.g., 12pt) (and font-variant, but we are going to ignore that).

The 'layoutEngine' package assumes that the HTML content that is to be rendered either contains no font family information or only font families that map to R fonts

A complication is the fact that HTML and CSS are designed to be flexible with respect to fonts so that a web browser can render content with whatever fonts are available. For example, the following CSS specification means that Helvetica will be used if it is available, then DejaVu Sans if it is available, then finally whatever sans-serif font the web browser can get its hands on.

font-family: Helvetica, DejaVu Sans, sans

By contrast, we want to tell the 'layoutEngine' backend to use an exact font, so instead we generate a CSS @font-face rule. This allows us to specify an actual font file and associate it with a font family and other properties. For example, the following CSS specifies very precisely that the font family "TeXGyreHeros", with normal style and bold weight, corresponds to the font file found at the location assets/qhvb.pfb (relative to the location of the HTML document).

@font-face {
  font-family: "TeXGyreHeros";
  font-style:  normal;
  font-weight: bold;
  src:         url('assets/qhvb.pfb');
}

In order to generate an @font-face rule, we need to know the location of the font file that we want to use. The 'layoutEngine' package makes use of the 'gdtools' package () and the 'extrafont' package () to achieve this.

library(gdtools) library(extrafont)

The sys_fonts function from the 'gdtools' package is useful because it provides information about system fonts, including the location of font files.

options(width=100) sf <- sys_fonts() head(sf[c("family", "weight", "slant", "file")]) options(width=80)

This is useful for rendering HTML on R graphics devices that are based on Cairo graphics (), such as the default screen device on Linux and the cairo_pdf device, because on those devices we can specify a system font just by its family name (e.g., "Gillius ADF"; look at the 'g' in the output below).

cairo_pdf("cairo-font.pdf", width=1, height=.5) grid.text("Testing", gp=gpar(fontfamily="Gillius ADF")) dev.off()

Unfortunately, specifying fonts on the standard pdf graphics device (and the postscript device) is less straightforward; we have to use the Type1Font function to associate AFM files with a font family name and then call pdfFonts (or postscriptFonts) to register the font for use with R. The 'extrafont' package makes this easier with its functions font_import and loadfonts, which automatically register system fonts. The fonttable function can then be used to provide information about registered fonts, including the locations of font files.

options(width=100) ft <- fonttable() head(ft[c("FamilyName", "Bold", "Italic", "fontfile")]) options(width=80)

The only problem is that 'extrafont' only registers TrueType system fonts. However, 'extrafont' also provides a font_install function that can be used to load special font packages and those can contain Type 1 fonts. As examples, the 'gyre' package () has been created to bundle the TeX Gyre fonts (, ) and the 'courier' package () bundles the IBM Courier font ().

The following code repeats the text-with-multiple-font-families example from earlier, but performs the rendering on the standard pdf device. This requires the 'gyre' and 'courier' packages to be loaded. Notice also that the mappings from R fonts to CSS fonts has to be recalculated because we are rendering to a different graphics device than before.

library(gyre) library(courier) pdf("multifont.pdf", width=3, height=.5) CSSfontsPDF <- cssFontFamily(Rfonts) grid.html(paste0('

sans, ', ' serif, ', ' mono, and ', ' Carlito.'), fonts=Rfonts) dev.off() embed_fonts("multifont.pdf")

A final wrinkle is that in R it is possible to specify one of three generic font families: "sans", "serif", and "mono". In order to map these to a specific font (that both R and CSS will use), the 'layoutEngine' package uses match_family from the 'gdtools' package (on Cairo-based devices, this is after first mapping "sans" to "Helvetica", "serif" to "Times", and "mono" to "Courier", to match what R graphics does internally).

The cssFontFamily function from the 'layoutEngine' package can be used to show how an R font family maps to a CSS font-family, though the mapping is dependent on the graphics device in use.

cssFontFamily(c("sans", "serif", "mono"), device="pdf") cssFontFamily(c("sans", "serif", "mono"), device="cairo_pdf")

In summary, when we call grid.html (or flow), we must specify which fonts we want to use by providing one or more R font family names (the default is just "sans"). These font family names are converted to CSS @font-face rules that associate a CSS font-family with an exact font file. For this to work (because we need paths to font files), we must only use R font family names that exist within the gdtools::sys_fonts font table (for Cairo-based devices) or within the extrafont::fonttable font table (for pdf or postscript). For cairo-based devices we should be able to use any system font. For pdf or postscript we can use any TrueType system font and we can add further Type 1 fonts creating a font package or making use of an existing font package like 'gyre' or 'courier'.

When we are working with a pdf or postscript graphics device, it is a good idea to embed fonts in the final document (using extrafont::embed_fonts), otherwise we risk all of our hard work being undone if a PDF viewer cannot find the fonts that we have used and is forced to substitute different fonts.

Specifying fonts in HTML code

In addition to providing exact font specifications in CSS @font-face rules, we need to specify which font is to be used for text within the HTML content that we wish to render. By default, a CSS rule is added that specifies the <body> font to be the first font given in the fonts argument to grid.html (or flow), something like the CSS code below.

body { font-family: "TeXGyreHeros" }

The default behaviour has been set up so that, on Linux at least, the default font used for HTML layout matches the default font used by R graphics. For example, in the very first plot in this report, the font used in the 'lattice' plot and the font used in the HTML table that is added to the plot are identical.

When we generate HTML code with more than one font family, we must use cssFontFamily to determine the correct font-family to use within our HTML code.

Typesetting text

In addition to needing the font to be identical for both HTML layout and R rendering, for drawing text output, we need to know how text has been broken across lines (when that happens).

This is not a problem for the 'layoutEngineCSSBox' backend because that generates separate layout information for each line of text. However, for backends built on web browser layout engines ('layoutEnginePhantomJS' and 'layoutEngineDOM'), extra work is required.

The problem is that, while web browser layout engines provide an API for querying the bounding box for laid out text, they do not provide an API for querying exactly where each letter or word of the text has been placed.

The solution adopted by both 'layoutEnginePhantomJS' and 'layoutEngineDOM' is to wrap each individual word of text content within a <span> element. This means that we can obtain the layout information for each individual word, though it also means that each individual word is drawn separately when the text is rendered in R.

The lower-level interface

The high-level interface provided by the 'layoutEngine' package means that we can simply call grid.html and provide it with raw HTML code and it will be rendered. This section looks at the lower-level functions that underly that convenient interface and provide finer control of the process.

The 'layoutEngine' package internally works with HTML as an "htmlElement" object. All raw HTML code passes through the htmlElement function to turn it into a list containing an "xml_document" object, optionally including CSS code (within a <head> element), and any external assets.

html <- htmlElement('

test

', css="p { text-align: center }", assets="assets/pie.png") html

It is assumed that the HTML content says nothing about which font family to use for text - that is supplied in the next step, when the HTML layout is calculated - but it is expected that the HTML content may contain font-weight, font-style, or font-size styling, either explicitly or implicitly (e.g., via a <th> element). If the HTML content does contain font-family styling, it is assumed that it will match one of the fonts supplied in the HTML layout step.

It is also possible to call htmlDocument instead of htmlElement, if our HTML code is a complete document, rather than just an HTML fragment.

Once we have an "htmlElement" object, we can call the flow function to generate layout information. We can specify the size of the page for the layout, but it defaults to the current 'grid' viewport size. We can also specify the fonts at this stage so, for simple HTML content at least, it is possible to generate different layouts by specifying different fonts. The mapping of fonts to CSS depends on the R graphics device we want to render onto, so we can also specify the intended output graphics device, though this defaults to the currently active graphics device. Finally, we can specify which backend layout engine we want to use. This is set by default when a backend package is loaded, but it can be overridden.

The result of an HTML layout is a "flowedhtml" object, which contains layout information for each node within the HTML content (both elements and text nodes). This is a mixture of location information and styling information. The location information is in px units, which is interpreted as 1/96in.

The layout information also includes a unique name for each node, which is used to name 'grid' grobs and viewports during rendering.

flow(html)

The main rendering work horse is the htmlGrob function; the grid.html function calls htmlGrob and draws the resulting grob. The htmlGrob function will accept raw HTML code or a "flowedhtml" object. In the latter case, a 'grid' gTree with all of its child grobs and child viewports is generated. In the former case, a 'grid' gTree is created with just the HTML code, plus font, device, and backend engine information; generation of child grobs is delayed until rendering time. The difference is that, in the former case, the HTML will reflow (the layout will be recalculated) every time that the gTree is drawn (or queried). This means that the HTML can be redrawn within different contexts and it will adapt to those contexts, but it will also reflow unnecessarily within the same context (e.g., if it is queried repeatedly for its size).

Put another way, the HTML layout becomes fixed when the flow function is called. If we know the context in which we want to draw the HTML content, we should call flow and pass the result to htmlGrob. On the other hand, if we want to use the HTML layout within different contexts, we should pass raw HTML code to htmlGrob.

'layoutEngine' backends

The 'layoutEngine' package provides the main interface for laying out and rendering HTML content in R, but it relies on a backend package to perform the actual HTML layout. This section describes general information about creating a backend and the important design details of the three backends that have been implemented so far.

Each backend has its strengths and weaknesses and it is useful to have different options to try if one backend is not producing the desired result.

Creating a backend

A 'layoutEngine' backend package only needs to export one object: a "layoutengine" object that is created by calling the layoutEngine::makeEngine function.

The makeEngine function has only one required argument, called layout, which should be a function. That layout function is called by layoutEngine::flow to perform the HTML layout.

The layout function is provided with the following arguments: html, which is the "htmlElement" object to layout; width and height, which specify the dimensions of the page for the layout (in inches); fonts, which names R font families (that will be used to render the layout); and device, which provides the name of the device for the layout. The last two arguments are mainly provided so that they can be used in calls to the helper functions described below.

How the HTML layout is performed is up to the individual backends, but there are several helper functions provided by the 'layoutEngine' package to help with that task.

The layoutEngine::fontFiles function can be used to obtain paths to font files based on the fonts argument that is given to the layout function. This can be useful to, for example, place copies of the font files in a location that the backend can access during the HTML layout.

The layoutEngine::copyAssets function can be used to copy external resources for the HTML layout to a specified directory. Again, this can be useful for the backend to set up files within the local file system for the HTML layout.

The layoutEngine::makeLayout function must be used to generate the final layout result. This takes a number of arguments, the names of which are available via names(layoutEngine::layoutFields). The idea is that each argument contains a different piece of layout information for all of the nodes in the HTML. All arguments should have the same length, but many of the arguments can contain NA, and several should (for example, there are arguments that contain information about text nodes that should contain NA values for non-text nodes).

This layout result format is the most fragile part of the 'layoutEngine' package and the most likely to experience change. The makeLayout function provides some level of protection by making it very likely that any incompatibility between the result format expected by the 'layoutEngine' package and the layout format generated by a backend package will result in immediate and spectacular failure.

An optional second argument to the layoutEngine::makeEngine function, called CSStransforms, allows a backend to specify a list of functions. If any of these are provided, the layoutEngine::flow function will call them to transform specific CSS properties, when generating @font-face rules. At the time of writing, two transformation functions were looked for: fontWeight and fontFile. Examples of their use are described in the sections on individual backends below.

The CSSBox backend

The 'layoutEngineCSSBox' backend is based on the CSSBox Java library (). The motivation for this backend is that CSSBox was designed for the purpose of generating HTML layout information (rather than for rendering HTML itself). In other words, it is a standalone HTML layout engine.

One advantage that arises from this is in the layout of text within HTML, because CSSBox generates information for every line of text (after layout). This is better than the API provided by most web browsers (see the Section on ).

One important detail about this backend is that it generates text layout information at different levels of accuracy based on what sort of device the HTML will be rendered on. For print devices (PDF or PostScript output in R), it will use so-called "fractional metrics" for text layout, but for pixel-based devices (screen or PNG), the text layout information is rounded to whole pixels. This is done so that text positioning looks right on both print and on-screen rendering.

The major downside to CSSBox is that it does not implement as many CSS properties or support them as well as modern browsers. For example, CSSBox does not support numeric font-weight properties and for this reason it defines a fontWeight function (via the cssTransforms argument to makeEngine) that converts numeric font-weight values to "bold" or "normal".

This backend has no support for modern CSS properties like shape-outside or CSS Grid Layout and it has no javascript engine.

The PhantomJS backend

The 'layoutEnginePhantomJS' backend is based on the PhantomJS program (), a headless web browser based on the WebKit browser engine ().

The motivation for this backend is that it is based on a modern web browser engine, so has good support for HTML and CSS features, and a javascript engine, but it does not require a graphical user interface; it can perform HTML layout off-screen.

The major downside to this backend is that development of PhantomJS has recently ceased. Even prior to that, it was always based on a slightly old WebKit engine, so lacked the most up-to-date HTML and CSS features.

The 'DOM' backend

The 'layoutEngineDOM' backend is based on the 'DOM' package (), which provides communication between R and a web browser (via web sockets). A specific web browser can be selected using options(browser=).

The motivation for this backend is that it provides access to the latest web browsers and thereby the latest HTML and CSS features (and javascript).

The major downside to this backend is that, every time an HTML layout occurs, the HTML is rendered in the web browser that R is communicating with, so a new browser window or tab is created on screen every time (although this can be incredibly useful for debugging).

One important detail about this backend is that (at least some) web browsers do not support Type 1 fonts (.pfb or .pfa font files). This is not unreasonable given that the W3C Recommendation () only mentions TrueType, OpenType, WOFF (1.0 or 2.0), and SVG Fonts, but this means that, if we are rendering in R onto a standard pdf or postscript graphics device, the Type 1 font files that R will use are not suitable for the web browser to use.

For this reason, the 'layoutEngineDOM' backend automatically converts Type 1 fonts to TrueType fonts (using fontforge; ) and it defines a fontFile function (via the cssTransforms argument to makeEngine) that converts the font file suffixes from .pfb or .pfa to .ttf.

Limitations

The 'layoutEngine' package and its three backends provide the ability to render HTML content within R graphics. The examples in the early sections of this document demonstrate that these packages can already perform some useful tasks. However, the packages are all at an early stage of development. They suffer from a number of issues and limitations, which will be discussed in this section.

Platform dependency

The packages have been developed and tested on Linux only. There is no obvious reason why 'layoutEngineCSSBox' should not also work on Windows, but the other backends rely on software tools that may not be available on Windows or are at least cumbersome to install (e.g., fontforge for 'layoutEngineDOM').

Device dependency

The packages have only been designed to work on Cairo-based R graphics devices, plus pdf and postscript.

Some devices will be extremely difficult to support, at least because of compatibility of fonts. For example, matching or converting X11 fonts for the X11 device to fonts that layout backends can use would be hard.

On the other hand, it should be possible to expand support to more devices, for example the svg device, which is based on Cairo graphics (though not explicitly supported by 'layoutEngine' yet).

Support for native Windows and MacOS graphics devices is one of the obstacles to overcome for cross-platform support.

Unsupported CSS properties

We have already seen that the 'layoutEngine' package only has support for basic CSS properties at this stage. There is a very long list of properties that could be added. The CSS transform property is just one example of a property that may take significant effort to support.

However, the situation is not completely as dire as that long list of CSS properties would suggest. This is because the 'layoutEngine' package only needs to provide explicit support for a subset of CSS properties. Some CSS properties, like text-align only affect where HTML content is rendered and 'layoutEngine' relies on the backend packages to implement those properties (and just positions its rendering based on the layout result). It is only properties such as border-left-style that affect how the HTML content is rendered that 'layoutEngine' needs to worry about.

That is not to say that the 'layoutEngine' package has a small task to perform. Even apparently straightforward properties like borders have only partial support at present. For example, the following code renders a <div> element with a border that has a different width and colour on top than on the three other sides. The result (at the corners) is not what a web browser produces.

grid.html('

There are also a number of border styles, such as groove, ridge, inset, and outset that are not yet supported (they currently get converted to solid, with a warning).

Hyphenation

In HTML layout, the CSS property hyphens can be used to control hyphenation. By default, the value is manual, so a line break will only occur where there is already an explicit hyphen or where a special "soft" hyphen () has been specified. However, it is possible to specify the value auto, in which case (assuming that the web browser has a hyphenation algorithm/database) the web browser can decide where to break text all by itself. (Note that the lang attribute must be set in the <html> element to specify the language of the document before hyphenation will work.)

This CSS property is not only unsupported at present, but it is not clear how support could be added.

Unfortunately, CSSBox does not support the hyphens property, and the technique of wrapping each word in <span> tags does not help with hyphenation for the other backends. This means that none of the existing backends can produce layout information for individual pieces of hyphenated text (and may never be able to). The following HTML and CSS code provides a simple demonstration (the correct behaviour is shown below the incorrect result from 'layoutEngine').

HTML <- '

An extremely long English word.

' CSS <- ' div { hyphens: auto; } ' html <- htmlElement(HTML, CSS) grid.html(html) ## Not run because DOM:::render() does not produce a file!!! ## (so just using manually-generated image for now) library(DOM) library(xml2) oldwd <- getwd() setwd(file.path(tempdir(), "layoutEngineDOM")) body <- xml_find_first(html$doc, "body") head <- xml_find_first(html$doc, "head") page <- htmlPage(body, head, client=phantomClient) ## 'lang' attribute MUST be set for hyphenation to work htmlNode <- getElementsByTagName(page, "html", nodePtr()) setAttribute(page, htmlNode, "lang", "en") setwd(oldwd) DOM:::render(page, "phantom-hyphen.png") closePage(page)

Pixel resolution

Many of the values that are returned by HTML layout backends are in "px" units. The 'layoutEngine' package assumes that 1px = 1/96in. This is a simplification of the more subtle CSS definition of "px" units, but should work when the backend is run on a standard computer monitor.

For the layout results from a backend (run on a standard computer monitor) to render properly on a pixel-based R graphics device, such as the png device, the resolution of the graphics device should be set to 96 dpi.

Performance

Rendering of HTML content in R is not fast. The potential value of the 'layoutEngine' package is to expand what is possible within R graphics, but it comes at a speed cost.

The rendering performance is doomed to some extent because it is based on a layout engine flowing the HTML content and then R rendering it, which is almost double the normal amount of work. R graphics rendering is also much slower than what would happen in a web browser. There is also the problem that, at least with some backends, individual words of text are rendered in R one at a time.

There should be some places where performance could be improved. For example, the 'layoutEngineDOM' backend converts Type 1 fonts to TrueType fonts every time it performs a layout. Some sort of caching mechanism could reduce the workload there. However, the 'layoutEngine' package is never going to be the fastest way to render HTML content.

Discussion

The 'layoutEngine' package (together with its backend packages) makes it possible to render HTML content within R graphics. This is useful for producing graphical effects, especially layouts, that are easy or easier to achieve in HTML (even if the HTML code is generated from R) compared to achieving the same result directly in R graphics.

Related work

To the author's knowledge, no other package provides this facility for R, though there are many R packages that perform related functions. In particular, R packages that generate HTML output, such as 'xtable' and 'htmltools' (), are rich sources of HTML code that the 'layoutEngine' package can then be used to render in R.

There are several R packages that provide an R interface to javascript layout libraries (for arranging HTML content), for example 'RagGrid' () and 'htmllayout', but these are aimed at producing HTML rendering in a web browser, not in R.

The 'Rcssplot' package () provides CSS styling for R plots, but this is an alternative interface to controlling the appearance of existing R graphics facilities, rather than an interface to new HTML graphics facilities.

Claus Wilke is working on a 'gridtext' package to improve R's text rendering facilities, including support for rich text (simple HTML markup). This overlaps with some of the HTML features that the 'layoutEngine' provides, though with a narrower focus. On the other hand, 'gridtext' requires far less infrastructure and fewer dependencies compared to the 'layoutEngine' package.

Future work

The 'layoutEngine' package is far from complete. There are many CSS properties for which support could be added and improved and there are issues such as breaking lines of text and text encodings that require further work (see the Section on ).

There are also many other ways that a backend could be developed for the 'layoutEngine' package. For example, a backend based on Selenium () (or the 'RSelenium' package; ) might be a significant improvement on the PhantomJS backend. Another PhantomJS alternative is the SlimerJS headless browser (). Instead of the 'DOM' backend, something based on 'shiny' () would almost certainly provide a more stable and mature browser interface (and potentially integration with RStudio; ).

Technical requirements

The examples and discussion in this document relate to version 0.1-0 of the 'layoutEngine' package, version 0.1-0 of the 'layoutEngineCSSBox' package, version 0.1-0 of the 'layoutEnginePhantomJS' package, and version 0.1-0 of the 'layoutEngineDOM' package.

The report is also dependent on version 1.0-0 of the 'gyre' font package and version 1.0-0 of the 'courier' font package.

The report is also dependent on the 'cssbox-4.14-mod' branch of a fork of CSSBox, which is required to allow fractional metrics in the HTML layout calculation.

The report is also dependent on version 0.6-0 of the 'DOM' package. This contains updates for compatibility with recent versions of the 'httpuv' package (), a new head argument for the htmlPage function (so that we can inject CSS @font-face rules into a page), and new support for serving files from an assets directory (so that we can provide external resources such as font files and image files to a page).

The report is also dependent on version 0.18 of the 'extrafont' package (which is a fork of Winston Chang's package), to allow install_font to work with font packages that contain .pfa files (in addition to the existing support for .pfb files). This is required for the 'courier' font package.

This report was generated within a Docker container (see Resources section below).

Resources

The raw source file for this report, a valid XML transformation of the source file, a 'knitr' document generated from the XML file, two R files and the bibtex file that are used to generate the table of contents and reference sections, two XSL files and an R file that are used to transform the XML to the 'knitr' document, and a Makefile that contains code for the other transformations and coordinates everything. These materials are also available on github.
This report was generated within a Docker container. The Docker command to build the report is included in the Makefile above. The Docker image for the container is available from Docker Hub; alternatively, the image can be rebuilt from its Dockerfile.
The example Rmd file.
Tim Bock's HTML table code.
The DataTables HTML document.

How to cite this document

Murrell, P. (2018). "Rendering HTML Content in R Graphics" Technical Report 2018-13, Department of Statistics, The University of Auckland. [ bib ]

References

This document by Paul Murrell is licensed under a Creative Commons Attribution 4.0 International License.