This document provides information on how to create and modify an XML document in R with the 'xml2' package.
library(xml2)
We will use SVG as the example XML dialect because it makes pretty pictures :) The SVG code below shows a simple SVG file that draws a circle. The resulting image is shown below the SVG code.
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" version="1.1"
width="100" height="100">
<circle cx="50" cy="52" r="45"/>
</svg>
If a document already exists, we can get it into R
easily with read_xml().
svg <- read_xml("circle.svg")
The result is an "xml_document" that can be used with the other 'xml2' functions to add/remove/modify the XML code.
svg
{xml_document}
<svg version="1.1" width="100" height="100" xmlns="http://www.w3.org/2000/svg">
[1] <circle cx="50" cy="52" r="45"/>
An important detail is that this document has a "namespace" (which says that this is not just any XML document, it is specifically an SVG document).
xml_ns(svg)
d1 <-> http://www.w3.org/2000/svg
We can remove that namespace information with
xml_ns_strip(). This is not something we
would always want to do, but for simple cases like we
are dealing with, it will make our lives easier.
We will see later how to put the namespace back again.
xml_ns_strip(svg)
We can use XPath expressions to find existing elements within an "xml_document".
xml_find_first() finds the first matching element
and xml_find_all() finds all matching elements.
For example, the following
code finds the first <circle> element anywhere
within the "xml_document".
This XPath means that we can start from anywhere in the document
(//) and we want to find a <circle> element
(circle).
circle <- xml_find_first(svg, "//circle")
The result is an "xml_node". This is another object
that we can add/remove/edit with other 'xml2' functions.
circle
{xml_node}
<circle cx="50" cy="52" r="45">
The following code finds the <svg> element at the
root of the "xml_document" AND any <circle>
elements anywhere in the document (though in this case there is
only one <circle> element).
This XPath means that we can start at the top element in the document
(/) and we want to find an <svg> element
(svg) OR (|) we can start anywhere
and we want to find a <circle> element (//circle).
elements <- xml_find_all(svg, "/svg|//circle")
The result now is an "xml_nodeset", which is
essentially a list of nodes.
elements
{xml_nodeset (2)}
[1] <svg version="1.1" width="100" height="100">\n <circle cx="50" cy="52" r ...
[2] <circle cx="50" cy="52" r="45"/>
Normal list-handling functions work on "xml_nodeset"
objects. For example, the following code gets the
attributes from each element
in the "xml_nodeset" using the xml_attrs()
function.
lapply(elements, xml_attrs)
[[1]] version width height "1.1" "100" "100" [[2]] cx cy r "50" "52" "45"
If you prefer CSS Selectors to XPath expressions,
the css_to_xpath() function from the 'selectr' package
can convert them for you.
library(selectr) xml_find_first(svg, css_to_xpath("circle"))
{xml_node}
<circle cx="50" cy="52" r="45">
xml_find_all(svg, css_to_xpath("svg, circle"))
{xml_nodeset (2)}
[1] <svg version="1.1" width="100" height="100">\n <circle cx="50" cy="52" r ...
[2] <circle cx="50" cy="52" r="45"/>
We can edit an existing node with, for example,
xml_set_attr().
xml_set_attr(circle, "fill", "blue")
An important detail is that this has destructively modified the "xml_node" AND the "xml_document"; the whole XML document is now different. (This is different from R's normal copy-on-modify semantics.)
circle
{xml_node}
<circle cx="50" cy="52" r="45" fill="blue">
svg
{xml_document}
<svg version="1.1" width="100" height="100">
[1] <circle cx="50" cy="52" r="45" fill="blue"/>
We can add new XML elements with xml_add_child().
Notice that we just need to give the name of the element
we want to add and any further unnamed arguments become the
content of that element.
xml_add_child(circle, "title", "I am a circle")
Again, it is important to realise that this change has destructively modified the node and the entire document.
circle
{xml_node}
<circle cx="50" cy="52" r="45" fill="blue">
[1] <title>I am a circle</title>
svg
{xml_document}
<svg version="1.1" width="100" height="100">
[1] <circle cx="50" cy="52" r="45" fill="blue">\n <title>I am a circle</titl ...
The full document now looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<svg version="1.1" width="100" height="100">
<circle cx="50" cy="52" r="45" fill="blue">
<title>I am a circle</title>
</circle>
</svg>
It is also possible to add a parent element in between
an existing node and its current parent, with
xml_add_parent().
Notice that named arguments, after the new element name,
become attributes of the new element.
xml_add_parent(circle, "a", href="https://www.w3.org/svg")
The full document now looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<svg version="1.1" width="100" height="100">
<a href="https://www.w3.org/svg">
<circle cx="50" cy="52" r="45" fill="blue">
<title>I am a circle</title>
</circle>
</a>
</svg>
We can also start a new "xml_document" with
xml_new_root().
svg2 <- xml_new_root("svg", version="1.1", width="100", height="100", xmlns="http://www.w3.org/2000/svg")
The new document is empty, but we can use the functions above to add content to it.
svg2
{xml_document}
<svg version="1.1" width="100" height="100" xmlns="http://www.w3.org/2000/svg">
xml_add_child(svg2, "rect", x="5", y="5", width="90", height="90") svg2
{xml_document}
<svg version="1.1" width="100" height="100" xmlns="http://www.w3.org/2000/svg">
[1] <rect x="5" y="5" width="90" height="90"/>
We can create an XML file from an "xml_document" with
write_xml(). The following code does that
for the XML document that we created from scratch
in the previous section.
write_xml(svg2, "rect.svg")
Our new SVG file looks like this (and the image is shown below the code) ...
<?xml version="1.0" encoding="UTF-8"?> <svg xmlns="http://www.w3.org/2000/svg" version="1.1" width="100" height="100"> <rect x="5" y="5" width="90" height="90"/> </svg>
Going back to the original XML file that we started with (and have modified in various ways), before we save that XML document, we must put its namespace back.
xml_set_attr(xml_root(svg), "xmlns", "http://www.w3.org/2000/svg") write_xml(svg, "circle-mod.svg")
Now our modification of the original SVG file looks like this (and the image is shown below the code) ...
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" version="1.1" width="100" height="100">
<a href="https://www.w3.org/svg">
<circle cx="50" cy="52" r="45" fill="blue">
<title>I am a circle</title>
</circle>
</a>
</svg>
Overall, we have changed the original circle from black to blue, added a tooltip (visible by hovering the mouse over the circle), and created a hyperlink so that if we click on the circle, the web browser will navigate to World-Wide Web Consortium's web page for SVG.
