Including fancy glyphs in R Graphics PDF output
by Paul Murrell
Problem | Simple Solution | Hard Solution |

The Problem

An example of the problem was presented by Ivo Welch (R-help 2010-08-18). He wanted to produce a lowercase-script-l, ℓ, ("\ell" in LaTeX parlance). This is pretty straightforward on recent screen devices — you just use the appropriate UNICODE input. For example, ...

grid.text("A lowercase-script-l: \u2113")

... produces ...

However, it is not that straightforward on the standard PDF device ...

> pdf("ell.pdf")
> grid.text("A lowercase-script-l: \u2113")
Warning messages:
1: In"L_text", as.graphicsAnnot(x$label), x$x,  :
  conversion failure on 'A lowercase-script-l: ℓ' in 'mbcsToSbcs': dot substituted for <e2>
2: In"L_text", as.graphicsAnnot(x$label), x$x,  :
  conversion failure on 'A lowercase-script-l: ℓ' in 'mbcsToSbcs': dot substituted for <84>
3: In"L_text", as.graphicsAnnot(x$label), x$x,  :
  conversion failure on 'A lowercase-script-l: ℓ' in 'mbcsToSbcs': dot substituted for <93>

The problem is that the standard PDF device can only cope with single-byte encodings, which means that all multi-byte text has to be converted to a single-byte equivalent. As the warning messages above indicate, this conversion is not always possible (because a single-byte encoding cannot accomodate as many characters as a multi-byte encoding).

A Simple Solution

There's an easy solution: Use a Cairo-based PDF device, which can cope with multi-byte text.

On Linux or MacOS X (with Cairo [and Pango?] capability), something like this:

grid.text("A lowercase-script-l: \u2113")

On Windows, with the Cairo package installed, something like this:

grid.text("A lowercase-script-l: \u2113")
NOTE that these depend on having appropriate fonts installed.

A Harder Solution

The cairo_pdf() device is still "experimental" and maybe you don't want to (or have work policies that don't allow you to) install the Cairo package and maybe you don't have the right fonts installed and just because, although it's complex, it's quite interesting, it is possible to get this to work on the standard PDF graphics device — it's just a little bit of work.

There are two things we need to do: find a single-byte encoding that contains a lowercase-script-l (so that we can convert the multi-byte text to that encoding); and find a font that contains a lowercase-script-l.

The first step is unfortunately a bit hard. Standard single-byte encodings like ISO-8859-1 (Latin1) and WinAnsi do not include lowercase-script-l. In fact, as far as I can tell, there is no single-byte encoding that has it. But, the good news is that we can quite easily make one (for use with R's PDF device).

Take a copy of something standard, like ISOLatin1.enc from $R_HOME/library/grDevices/enc/, which has some \.notdef entries and modify it to replace one of the \.notdef with \lscript (we will see why that is "\lscript" later). The file special.enc shows an example (see the entry at location \200; we will see why that is "\200" later). This is a custom encoding file that can be used with R's PDF device.

The second step (finding a font) is not too bad. The tricky part is finding out what glyph name is used for lowercase-script-l. We know that LaTeX can produce a lowercase-script-l (via \ell), so one of the Computer Modern fonts is bound to include that glyph. Turns out the Math Italic font is the one.

The AMS provides Type1 versions of the Computer Modern fonts, so we can get .afm and .pfb files from there. Inspection of the .afm file and matching up with the TeX font tables reveals that lowercase-script-l is referred to as \lscript in the .afm file — that is why we put "\lscript" in our special encoding file.

The next step is to create a Type1Font description in R that uses this font with our special encoding and register this font with the R PDF device (for simplicity, this code assumes that the .afm and the .enc files are in the current working directory).

lscriptFont <- Type1Font(family="special",

This font is now ready to use with the PDF device, but there is an extra complication. If we are working in a multi-byte locale (e.g., my Linux system is en_NZ.UTF-8) then R will try to convert multi-byte text like "\u2113" to single-byte text for us. The problem is that it will NOT directly use the special encoding file that we created above — it will use iconv to do the translation, just using the name of our encoding file ("special"). BUT iconv does not know anything about an encoding called "special", so it will fail.

What we have to do is work in a single-byte locale instead, so that there is no need for the multi-byte to single-byte conversion. Here's some code that switches my system to a (single-byte) ISO-8859-1 locale.

Sys.setlocale("LC_CTYPE", "en_NZ.iso-8859-1")

With this locale in place, R will not try to convert my text for the PDF device as long as I use only single-byte text. So we cannot use the UNICODE specification, "\u2113", for lowercase-script-l. But, if we draw text using the special font that we defined above, we can use a byte-specification that corresponds to the correct character in the special encoding file that that font uses. We modified entry \200 in the encoding file, so we can use "\200" to get our special character. For example ...

grid.text("A lowercase-script-l: \200", 

(that should have produced some warnings — we'll come back to those).

A final step is required so that the PDF can be viewed with any viewer and printed anywhere. The following code assumes that the .pbf file is in the current working directory.

embedFonts("ell.pdf", out="ellEmbedded.pdf",

The resulting file is available here.

Final caveat

The good news is that the file shows the lowercase-script-l that we wanted. However, it also shows some issues that we may have to deal with. By making a special font with a special encoding, we have to be careful just to use it for drawing the special text that we set it up for (in this case the lowercase-script-l). Other text may not look right in the font we have chosen (in this case all text comes out italic) and, much worse, the font may not contain all glyphs mentioned in the encoding file that we used (in this case, the font does not contain a "dash" character or a "colon" character — that was the source of the warnings).