by Paul Murrell http://orcid.org/0000-0002-3224-8858
Version 2:
Version 1: original publication
Version 2: update pdf.js code (for displaying PDFs)
This document
by Paul
Murrell is licensed under a Creative
Commons Attribution 4.0 International License.
This report describes an R package called 'dvir' that aims to use TeX as a layout engine, but performs all rendering within R. The package reads DVI files that are produced from TeX files and renders the content using the R package 'grid'.
The image below shows a 'lattice' () line plot of the Standard Normal probability distribution function, with a text annotation showing the general form of the Gaussian function. This image was drawn with R () using the "plotmath" feature that makes it possible to annotate a plot with mathematical equations ()
The basic text-drawing functions in R graphics all accept,
in addition to a simple character value, an R expression.
An R expression is interpreted as a mathematical equation,
with certain symbols, such as mu
and sigma
,
converted to greek characters,
and certain functions, such as frac
and sqrt
treated as layout instructions similar to the
\frac
and \sqrt
operators in TeX mathematical
expressions ().
The following R code provides a simple
example and the result is shown below the code.
The algorithm used to draw the mathematical equations in R attempts to mimic the algorithm used by TeX, but unfortunately the result is nowhere near the quality of the real thing.
One significant difference arises from the fact that R does not use the TeX math fonts, but it is possible to make use of the TeX fonts, with the 'fontcm' extension () to the 'extrafont' package (), as shown below.
While this shows a small improvement (the greek symbols are TeX's math italic variants), it is still some distance from the TeX result, which is shown below.
A different approach to including mathematical equations within R plots is to use the 'tikzDevice' package. This allows us to specify an equation using TeX syntax within character values. For example, we can write code like the following.
The following code reproduces the plot from the start of this section with the full Gaussian function annotation.
This produces a full-quality TeX version of the mathematical equation because the 'tikzDevice' package generates a TeX version (actually a PGF/TikZ version) of the entire plot. This is evident in the fact that the axis and tick labels on the plot are also rendered using TeX fonts.
TeX fonts everywhere is a nice feature if the plot is to be used within a TeX document, but it can be undesirable if all we want is the equation in TeX format.
This report introduces an R package called 'dvir' that allows the plot to be normal R graphics with just the equation rendered in full-quality TeX layout and fonts. The package is in the early stages of development, but it can reproduce plain LaTeX output within R, on a range of graphics devices, on Linux.
The next section describes the convenient high-level interface that the 'dvir' package provides for rendering LaTeX equations in R graphics. Subsequent sections document the lower-level interface and internal design of the 'dvir' package.
The simplest interface provided by the 'dvir' package
is the grid.latex
function. The first argument
to this function is a character value, which is interpreted as LaTeX code.
This can be just plain text, but it can also contain,
for example, TeX mathematical expressions.
The following code provides a simple demonstration.
It is also possible to use standard LaTeX commands, as in the following example.
The following code shows the how the
grid.latex
function can be used to generate a complete
plot with Gaussian function annotation
(the LaTeX string tex
was defined in the 'tikzDevice' example above).
In the following example, we use the 'xtable' package () to generate LaTeX code for a table and then 'dvir' to draw the table within a 'lattice' plot.
The grid.latex
function is just a thin wrapper around
a call to LaTeX, which generates a DVI file, followed by calls
to functions that read the DVI file and render its contents in R.
These functions that read and render DVI files
make up the real heart of the 'dvir' package and
provide the focus for the remainder of this report.
A standard LaTeX workflow consists of writing a source file
containing LaTeX text and code
(suffix .tex
) and then running pdflatex
on that file to produce a PDF document. An alternative is to
run latex
, which produces an intermediate DVI file
(suffix .dvi
), and then dvips
or
dvisvgm
(or even dvipdfm
) to
produce a PostScript or SVG (or PDF) document from the DVI file.
A DVI file is a device-independent description of the placement of individual characters on a page. It contains instructions that move the current location across or up and down, define fonts, select fonts, place characters at the current location, and draw vertical and horizontal rectangles.
The 'dvir' package provides a function readDVI
to
read a DVI file into R. For example, the following LaTeX
code describes a very simple document that just contains the
word "Hello". This code has been saved in a text file called
"simple.tex
.
If we run latex
on that file ...
... we get a DVI file called
simple.dvi
and the following code reads that DVI
file into R.
The DVI format
is a binary format (),
so 'dvir' uses the 'hexView'
package to define memory blocks for each possible DVI operation
and to read those memory blocks from the DVI file.
The result of readDVI
is a "DVI" object, which is
a list of 'hexView' "rawFormat"
objects ...
... and it is easy to run through the list of DVI operations simply
by calling lapply
(or sapply
) on this list.
For example, the following code generates a numeric vector
containing all of the operation codes from the DVI file.
The grid.dvi
function renders a "DVI" object,
by converting the DVI instructions
from a DVI file into 'grid' drawing on an R graphics device.
The essential steps in faithfully rendering the DVI file are as follows:
The DVI coordinate system has (0, 0) at top-left and the scale of locations and distances is defined in the first "preamble" operation in the DVI file.
We multiply a location or distance by the num
erator,
divide by the den
ominator, multiply by the
mag
nitude, and divide by 1000
to get a value in 10^(-7)mm units.
The 'grid' package can specify locations and dimensions in mm
via the unit
function. What the 'dvir'
package actually does is calculate a bounding box from
the DVI operations, create a 'grid' viewport based on the size
of that bounding box (in mm),
with an x-scale and a y-scale that encompasses
the DVI operations, and renders the DVI operations using "native"
coordinates within the viewport.
The most important part of a DVI font definition is the
fontname
. In our simple example, this name
is cmr12
(a Computer Modern Roman serif font
at 12pt size).
We must generate an R graphics font specification from just this font name, a task that is complicated by the fact that font specifications are different for different graphics devices in R.
In the case of the pdf
graphics device, we specify a
font by giving the name of
a Type 1 Font definition and we define a Type 1 Font by specifying
a path to an AFM (Adobe
Font Metrics) file. We also need to find a path to a PFB
(Printer Font Binary) file so that we can embed the actual font
within the final PDF file.
The 'dvir' package uses the kpsewhich
program to
first find the font mapping file pdftex.map
, which contains
information on mappings from font names (as seen in the DVI file)
to actual font files,
and then kpsewhich
again to find the actual font files.
Some typical results on an Ubuntu system with TeX Live are shown below.
First, we have the location of the pdftex.map
file ...
... and the line in this file for the cmr12
DVI font name
shows the name of an actual font file at the end of the line ...
We can get the location of this PFB file ...
... and the location of the corresponding AFM file ...
This gives us enough information to create a Type 1 Font definition
called "cmr12"
in R ...
... and we can then draw text with 'grid' using that font (on a
pdf
device) by
specifying "cmr12"
as the font family ...
For the resulting
PDF file to display properly, it is best to embed the fonts
in the PDF file with the embedFonts
function.
This requires us to specify the locations of the PFB files to embed.