Revisiting Mathematical Equations in R:
The 'dvir' package

by Paul Murrell http://orcid.org/0000-0002-3224-8858

Version 2: cat(format(Sys.Date(), "%A %d %B %Y"))

Version 1: original publication
Version 2: update pdf.js code (for displaying PDFs)

opts_chunk$set(comment=" ", tidy=FALSE) options(width=100) ## For wonky desktop set up options(bitmapType="cairo") library(grid)

This document by Paul Murrell is licensed under a Creative Commons Attribution 4.0 International License.

This report describes an R package called 'dvir' that aims to use TeX as a layout engine, but performs all rendering within R. The package reads DVI files that are produced from TeX files and renders the content using the R package 'grid'.

Mathematical Equations in R

The image below shows a 'lattice' () line plot of the Standard Normal probability distribution function, with a text annotation showing the general form of the Gaussian function. This image was drawn with R () using the "plotmath" feature that makes it possible to annotate a plot with mathematical equations ()

library(lattice) x <- seq(-4, 4, length.out=100) y <- dnorm(x) xyplot(y ~ x, type="l", ylim=c(0, .6), panel=function(...) { panel.xyplot(...) ltext(0, .5, expression(g(x) == frac(1, sigma*sqrt(2*pi))*e^{-frac(1, 2)*(frac(x - mu, sigma))^2})) })

The basic text-drawing functions in R graphics all accept, in addition to a simple character value, an R expression. An R expression is interpreted as a mathematical equation, with certain symbols, such as mu and sigma, converted to greek characters, and certain functions, such as frac and sqrt treated as layout instructions similar to the \frac and \sqrt operators in TeX mathematical expressions (). The following R code provides a simple example and the result is shown below the code.

expr <- expression(bgroup("(", frac(x - mu, sigma), ")")) library(grid) grid.text(expr)

The algorithm used to draw the mathematical equations in R attempts to mimic the algorithm used by TeX, but unfortunately the result is nowhere near the quality of the real thing.

One significant difference arises from the fact that R does not use the TeX math fonts, but it is possible to make use of the TeX fonts, with the 'fontcm' extension () to the 'extrafont' package (), as shown below.

library(extrafont) font_install('fontcm') loadfonts("pdf") pdf("fontcm.pdf", width=1, height=1) grid.text(expr, gp=gpar(fontfamily="CM Roman")) dev.off() embed_fonts("fontcm.pdf", outfile="fontcm-embed.pdf")

While this shows a small improvement (the greek symbols are TeX's math italic variants), it is still some distance from the TeX result, which is shown below.

writeLines(c("\\documentclass{standalone}", "\\begin{document}", "$(\\frac{x - \\mu}{\\sigma})$", "\\end{document}"), "fragment.tex") system("latex fragment.tex") system("dvipdfm fragment.dvi")

A different approach to including mathematical equations within R plots is to use the 'tikzDevice' package. This allows us to specify an equation using TeX syntax within character values. For example, we can write code like the following.

grid.text("$\\frac{x - \\mu}{\\sigma}$")

The following code reproduces the plot from the start of this section with the full Gaussian function annotation.

library(tikzDevice) options(tikzDocumentDeclaration = "\\documentclass[12pt]{article}") tikz("tikz.tex", standAlone=TRUE, height=4) tex <- "$g(x) = \\frac{1}{\\sigma\\sqrt{2\\pi}}e^{-\\frac{1}{2}(\\frac{x - \\mu}{\\sigma})}$" xyplot(y ~ x, type="l", ylim=c(0, .6), panel=function(...) { panel.xyplot(...) ltext(0, .5, tex) }) dev.off() system("pdflatex tikz.tex")

This produces a full-quality TeX version of the mathematical equation because the 'tikzDevice' package generates a TeX version (actually a PGF/TikZ version) of the entire plot. This is evident in the fact that the axis and tick labels on the plot are also rendered using TeX fonts.

TeX fonts everywhere is a nice feature if the plot is to be used within a TeX document, but it can be undesirable if all we want is the equation in TeX format.

This report introduces an R package called 'dvir' that allows the plot to be normal R graphics with just the equation rendered in full-quality TeX layout and fonts. The package is in the early stages of development, but it can reproduce plain LaTeX output within R, on a range of graphics devices, on Linux.

The next section describes the convenient high-level interface that the 'dvir' package provides for rendering LaTeX equations in R graphics. Subsequent sections document the lower-level interface and internal design of the 'dvir' package.

The 'dvir' package

The simplest interface provided by the 'dvir' package is the grid.latex function. The first argument to this function is a character value, which is interpreted as LaTeX code. This can be just plain text, but it can also contain, for example, TeX mathematical expressions. The following code provides a simple demonstration.

library(dvir) grid.latex("$x - \\mu$")

It is also possible to use standard LaTeX commands, as in the following example.

grid.latex("plain, {\\it italic}, and {\\bf bold}")

The following code shows the how the grid.latex function can be used to generate a complete plot with Gaussian function annotation (the LaTeX string tex was defined in the 'tikzDevice' example above).

xyplot(y ~ x, type="l", ylim=c(0, .6), panel=function(...) { panel.xyplot(...) grid.latex(tex, 0, .5, default.units="native") })

In the following example, we use the 'xtable' package () to generate LaTeX code for a table and then 'dvir' to draw the table within a 'lattice' plot.

library(xtable) xyplot(mpg ~ disp, mtcars, panel=function(...) { panel.xyplot(...) tex <- print(xtable(head(mtcars[1:3])), floating=FALSE) grid.latex(tex, x=unit(1, "npc") - unit(2, "mm"), y=unit(1, "npc") - unit(2, "mm"), just=c("right", "top")) })

The grid.latex function is just a thin wrapper around a call to LaTeX, which generates a DVI file, followed by calls to functions that read the DVI file and render its contents in R. These functions that read and render DVI files make up the real heart of the 'dvir' package and provide the focus for the remainder of this report.

Reading DVI files

A standard LaTeX workflow consists of writing a source file containing LaTeX text and code (suffix .tex) and then running pdflatex on that file to produce a PDF document. An alternative is to run latex, which produces an intermediate DVI file (suffix .dvi), and then dvips or dvisvgm (or even dvipdfm) to produce a PostScript or SVG (or PDF) document from the DVI file.

A DVI file is a device-independent description of the placement of individual characters on a page. It contains instructions that move the current location across or up and down, define fonts, select fonts, place characters at the current location, and draw vertical and horizontal rectangles.

The 'dvir' package provides a function readDVI to read a DVI file into R. For example, the following LaTeX code describes a very simple document that just contains the word "Hello". This code has been saved in a text file called "simple.tex.

cat(readLines("simple.tex"), sep="\n")

If we run latex on that file ...

system("latex simple.tex")

... we get a DVI file called simple.dvi and the following code reads that DVI file into R.

dvi <- readDVI("simple.dvi") dvi

The DVI format is a binary format (), so 'dvir' uses the 'hexView' package to define memory blocks for each possible DVI operation and to read those memory blocks from the DVI file. The result of readDVI is a "DVI" object, which is a list of 'hexView' "rawFormat" objects ...

class(dvi[[1]]) dvi[[1]]

... and it is easy to run through the list of DVI operations simply by calling lapply (or sapply) on this list. For example, the following code generates a numeric vector containing all of the operation codes from the DVI file.

sapply(dvi, function(op) { hexView::blockValue(op$blocks$op.opcode) })

Rendering DVI files

The grid.dvi function renders a "DVI" object, by converting the DVI instructions from a DVI file into 'grid' drawing on an R graphics device.

grid.dvi(dvi)

The essential steps in faithfully rendering the DVI file are as follows:

coordinate systems: Convert DVI locations and distances into an R graphics coordinate system
font mappings: Convert DVI font definions into R graphics font specifications
character encodings: Convert DVI characters into R character values

Coordinate systems

The DVI coordinate system has (0, 0) at top-left and the scale of locations and distances is defined in the first "preamble" operation in the DVI file.

dvitxt <- capture.output(dvi) cat(dvitxt[1:2], sep="\n")

We multiply a location or distance by the numerator, divide by the denominator, multiply by the magnitude, and divide by 1000 to get a value in 10^(-7)mm units.

The 'grid' package can specify locations and dimensions in mm via the unit function. What the 'dvir' package actually does is calculate a bounding box from the DVI operations, create a 'grid' viewport based on the size of that bounding box (in mm), with an x-scale and a y-scale that encompasses the DVI operations, and renders the DVI operations using "native" coordinates within the viewport.

Font mappings

The most important part of a DVI font definition is the fontname. In our simple example, this name is cmr12 (a Computer Modern Roman serif font at 12pt size).

fontdef <- grep("fnt_def_1", dvitxt)[1] + 0:1 cat(dvitxt[fontdef], sep="\n")

We must generate an R graphics font specification from just this font name, a task that is complicated by the fact that font specifications are different for different graphics devices in R.

In the case of the pdf graphics device, we specify a font by giving the name of a Type 1 Font definition and we define a Type 1 Font by specifying a path to an AFM (Adobe Font Metrics) file. We also need to find a path to a PFB (Printer Font Binary) file so that we can embed the actual font within the final PDF file.

The 'dvir' package uses the kpsewhich program to first find the font mapping file pdftex.map, which contains information on mappings from font names (as seen in the DVI file) to actual font files, and then kpsewhich again to find the actual font files. Some typical results on an Ubuntu system with TeX Live are shown below. First, we have the location of the pdftex.map file ...

mapfile <- system("kpsewhich pdftex.map", intern=TRUE) mapfile

... and the line in this file for the cmr12 DVI font name shows the name of an actual font file at the end of the line ...

system("grep ^cmr12 $(kpsewhich pdftex.map)", intern=TRUE)

We can get the location of this PFB file ...

pfbfile <- system("kpsewhich cmr12.pfb", intern=TRUE) pfbfile

... and the location of the corresponding AFM file ...

afmfile <- system("kpsewhich cmr12.afm", intern=TRUE) afmfile

This gives us enough information to create a Type 1 Font definition called "cmr12" in R ...

Type1Font("cmr12", rep(afmfile, 4))

... and we can then draw text with 'grid' using that font (on a pdf device) by specifying "cmr12" as the font family ...

pdf("test.pdf", width=1, height=.5) grid.text("Test", gp=gpar(fontfamily="cmr12")) dev.off()

For the resulting PDF file to display properly, it is best to embed the fonts in the PDF file with the embedFonts function. This requires us to specify the locations of the PFB files to embed.

embedFonts("test.pdf", outfile="test-embed.pdf", options=paste0("-sFONTPATH=", pfbfile))

Revisiting Mathematical Equations in R:
The 'dvir' package

Mathematical Equations in R

The 'dvir' package

Reading DVI files

Rendering DVI files

Coordinate systems

Font mappings

Character encodings

Discussion

Related work

Technical requirements

Resources

How to cite this document

References

Revisiting Mathematical Equations in R: The 'dvir' package

Coordinate systems

Font mappings

Character encodings

Related work

How to cite this document

Revisiting Mathematical Equations in R:
The 'dvir' package