Creating and modifying XML documents

This document provides information on how to create and modify an XML document in R with the 'xml2' package.

library(xml2)

We will use SVG as the example XML dialect because it makes pretty pictures :) The SVG code below shows a simple SVG file that draws a circle. The resulting image is shown below the SVG code.

<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" version="1.1" 
     width="100" height="100">
  <circle cx="50" cy="52" r="45"/>
</svg>

Reading an existing XML document

If a document already exists, we can get it into R easily with read_xml().

svg <- read_xml("circle.svg")

The result is an "xml_document" that can be used with the other 'xml2' functions to add/remove/modify the XML code.

svg
{xml_document}
<svg version="1.1" width="100" height="100" xmlns="http://www.w3.org/2000/svg">
[1] <circle cx="50" cy="52" r="45"/>

An important detail is that this document has a "namespace" (which says that this is not just any XML document, it is specifically an SVG document).

xml_ns(svg)
d1 <-> http://www.w3.org/2000/svg

We can remove that namespace information with xml_ns_strip(). This is not something we would always want to do, but for simple cases like we are dealing with, it will make our lives easier. We will see later how to put the namespace back again.

xml_ns_strip(svg)

Finding existing elements

We can use XPath expressions to find existing elements within an "xml_document".

xml_find_first() finds the first matching element and xml_find_all() finds all matching elements. For example, the following code finds the first <circle> element anywhere within the "xml_document". This XPath means that we can start from anywhere in the document (//) and we want to find a <circle> element (circle).

circle <- xml_find_first(svg, "//circle")

The result is an "xml_node". This is another object that we can add/remove/edit with other 'xml2' functions.

circle
{xml_node}
<circle cx="50" cy="52" r="45">

The following code finds the <svg> element at the root of the "xml_document" AND any <circle> elements anywhere in the document (though in this case there is only one <circle> element). This XPath means that we can start at the top element in the document (/) and we want to find an <svg> element (svg) OR (|) we can start anywhere and we want to find a <circle> element (//circle).

elements <- xml_find_all(svg, "/svg|//circle")

The result now is an "xml_nodeset", which is essentially a list of nodes.

elements
{xml_nodeset (2)}
[1] <svg version="1.1" width="100" height="100">\n  <circle cx="50" cy="52" r ...
[2] <circle cx="50" cy="52" r="45"/>

Normal list-handling functions work on "xml_nodeset" objects. For example, the following code gets the attributes from each element in the "xml_nodeset" using the xml_attrs() function.

lapply(elements, xml_attrs)
[[1]]
version   width  height 
  "1.1"   "100"   "100" 

[[2]]
  cx   cy    r 
"50" "52" "45" 

If you prefer CSS Selectors to XPath expressions, the css_to_xpath() function from the 'selectr' package can convert them for you.

library(selectr)
xml_find_first(svg, css_to_xpath("circle"))
{xml_node}
<circle cx="50" cy="52" r="45">
xml_find_all(svg, css_to_xpath("svg, circle"))
{xml_nodeset (2)}
[1] <svg version="1.1" width="100" height="100">\n  <circle cx="50" cy="52" r ...
[2] <circle cx="50" cy="52" r="45"/>

Modifying existing elements

We can edit an existing node with, for example, xml_set_attr().

xml_set_attr(circle, "fill", "blue")

An important detail is that this has destructively modified the "xml_node" AND the "xml_document"; the whole XML document is now different. (This is different from R's normal copy-on-modify semantics.)

circle
{xml_node}
<circle cx="50" cy="52" r="45" fill="blue">
svg
{xml_document}
<svg version="1.1" width="100" height="100">
[1] <circle cx="50" cy="52" r="45" fill="blue"/>

Add new elements

We can add new XML elements with xml_add_child(). Notice that we just need to give the name of the element we want to add and any further unnamed arguments become the content of that element.

xml_add_child(circle, "title", "I am a circle")

Again, it is important to realise that this change has destructively modified the node and the entire document.

circle
{xml_node}
<circle cx="50" cy="52" r="45" fill="blue">
[1] <title>I am a circle</title>
svg
{xml_document}
<svg version="1.1" width="100" height="100">
[1] <circle cx="50" cy="52" r="45" fill="blue">\n  <title>I am a circle</titl ...

The full document now looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<svg version="1.1" width="100" height="100">
  <circle cx="50" cy="52" r="45" fill="blue">
    <title>I am a circle</title>
  </circle>
</svg>

It is also possible to add a parent element in between an existing node and its current parent, with xml_add_parent(). Notice that named arguments, after the new element name, become attributes of the new element.

xml_add_parent(circle, "a", href="https://www.w3.org/svg")

The full document now looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<svg version="1.1" width="100" height="100">
  <a href="https://www.w3.org/svg">
    <circle cx="50" cy="52" r="45" fill="blue">
      <title>I am a circle</title>
    </circle>
  </a>
</svg>

Creating XML from scratch

We can also start a new "xml_document" with xml_new_root().

svg2 <- xml_new_root("svg",
                     version="1.1",
                     width="100", height="100",
                     xmlns="http://www.w3.org/2000/svg")

The new document is empty, but we can use the functions above to add content to it.

svg2
{xml_document}
<svg version="1.1" width="100" height="100" xmlns="http://www.w3.org/2000/svg">
xml_add_child(svg2, "rect", x="5", y="5", width="90", height="90")
svg2
{xml_document}
<svg version="1.1" width="100" height="100" xmlns="http://www.w3.org/2000/svg">
[1] <rect x="5" y="5" width="90" height="90"/>

Writing a new XML document

We can create an XML file from an "xml_document" with write_xml(). The following code does that for the XML document that we created from scratch in the previous section.

write_xml(svg2, "rect.svg")

Our new SVG file looks like this (and the image is shown below the code) ...

<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" version="1.1" width="100" height="100">
  <rect x="5" y="5" width="90" height="90"/>
</svg>

Going back to the original XML file that we started with (and have modified in various ways), before we save that XML document, we must put its namespace back.

xml_set_attr(xml_root(svg), "xmlns", "http://www.w3.org/2000/svg")
write_xml(svg, "circle-mod.svg")

Now our modification of the original SVG file looks like this (and the image is shown below the code) ...

<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" version="1.1" width="100" height="100">
  <a href="https://www.w3.org/svg">
    <circle cx="50" cy="52" r="45" fill="blue">
      <title>I am a circle</title>
    </circle>
  </a>
</svg>
I am a circle

Overall, we have changed the original circle from black to blue, added a tooltip (visible by hovering the mouse over the circle), and created a hyperlink so that if we click on the circle, the web browser will navigate to World-Wide Web Consortium's web page for SVG.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.