'DOM' Version 0.4

by Paul Murrell http://orcid.org/0000-0002-3224-8858

Wednesday 09 November 2016


Creative Commons License
'DOM' version 0.4 by Paul Murrell is licensed under a Creative Commons Attribution 4.0 International License.


This report describes changes in version 0.4 of the 'DOM' package for R. The main change in this version is the addition of new functions that allow control over the Cascading Style Sheet (CSS) content of a web page. This provides programmatic control over the styling of HTML and SVG content on a page.

Element Style

For demonstration purposes, we will work with a web page consisting of a single paragraph (a more complex example is provided later).

library(DOM)
page <- htmlPage("<p>A paragraph</p>")

Because we will be working with this paragraph multiple times, the following code creates a pointer to the paragraph element. We will be able to use this to refer to the paragraph from now on.

p <- getElementsByTagName(page, "p", response=nodePtr())

A simple way to use CSS styling on an element on a web page is to define a style attribute for the element. The existing setAttribute function in the 'DOM' package already provides support for this. The following code sets the style attribute for the paragraph so that the text turns red.

setAttribute(page, p, "style", "color: red")

However, this setAttribute approach is heavy-handed and does not provide fine control over the CSS styling because the entire style attribute has to be specified. For example, the following modification of the CSS styling replaces the previous setting; the text is now italic, but it is no longer red.

setAttribute(page, p, "style", "font-style: italic")

Properties versus Attributes

Another way to access the CSS styling on an element is through the style property of the element. In version 0.4 of 'DOM' there are two new functions getProperty and setProperty that allow us to access and modify element properties. The following code gets the style property for the paragrah.

style <- getProperty(page, p, "style")
style
  An object of class "DOM_CSSStyleDeclaration_ptr"
  [1] "1"
  Slot "pageID":
  [1] 1

The result is a DOM_CSSStyleDeclaration_ptr. Compare that result to what we get from getAttribute (another new function in version 0.4), which is just a character vector.

getAttribute(page, p, "style")
  [1] "font-style: italic"

With getProperty, we get a pointer to a style object, rather than just the text value for a style attribute. The advantage of the style object is that we can access and set individual properties of that object. For example, the following code accesses the font-style property of the paragraph style.

getProperty(page, style, "font-style")
  [1] "italic"

The following code sets the color property of the style. The advantage of this, compared to setting an attribute, is that we only set the color property of the style; the font-style property (italic) is untouched.

setProperty(page, style, "color", "red")

There is also a short hand provided for getting and setting properties.

p$style$color
  [1] "red"
p$style$color <- "green"

In summary, with the new ability to get and set properties, we can easily access and modify individual CSS properties within the style property of an HTML element on a web page.

Style sheets

Another way to use CSS styling on an element is to add a style sheet to the web page, with a CSS rule that targets the element. For examples in this section, we will start with a fresh page (because CSS styling via a style sheet has a lower priority than inline CSS styling via a style attribute).

page <- htmlPage("<p>A paragraph</p>")

A style sheet can be added to a page by adding a <style> element to the <head> element of the web page. Another option would be to add a <link> element (to point to an external style sheet). The existing appendChild function can do this for us.

appendChild(page,
          htmlNode('<style type="text/css">p { color: red; }</style>'),
          parent=css("head"))

The style sheet consists of zero or more rules. In this case, there is a single rule:

p { color: red; }
  

Each rule consists of a selector and zero or more style declarations. The selector specifies the target of the rule (in this case, the selector p means that the rule will apply to all <p> elements in the page) and the style declarations have the same format as in the style attribute of an element: a CSS property name, followed by a colon, followed by a CSS property value (with a semi-colon between multiple style declarations).

We can add more than one style sheet to a page and we can remove style sheets (with removeChild), but, as with style attributes, this is heavy-handed and does not allow fine control of the details of a style sheet.

CSS Rules

The new styleSheets function provides access to the current style sheets on a page. The result is a DOM_CSSStyleSheet_ptr, which is one or more pointers to the style sheet objects in the browser.

sheets <- styleSheets(page)
sheets
  An object of class "DOM_CSSStyleSheet_ptr"
  [1] "0"
  Slot "pageID":
  [1] 2

Having access to these style sheet objects is useful because we can use them with the new insertRule and deleteRule functions to add/remove individual rules to/from a style sheet. For example, the following code adds a new CSS rule, that also applies to <p> elements, so that the paragraph text becomes italic as well as red.

insertRule(page, sheets[1], "p { font-style: italic; }", 0)

However, adding and removing entire rules is still a fairly coarse level of control. Even better would be control of the components of a rule: the selector and the style declarations.

CSS Style Rules

The cssRules property of a style sheet produces a DOM_CSSRule_ptr object: a vector of pointers to individual CSS rules. In this case, there are two CSS rules in the style sheet.

sheets[1]$cssRules
  An object of class "DOM_CSSRule_ptr"
  [1] "1" "2"
  Slot "pageID":
  [1] 2

We can access the style property of a CSS style rule and that gives us a DOM_CSSStyleDeclaration_ptr (just like we got from accessing the style property of an HTML element). We can then get and set the properties of that object to access and modify the style declarations in the CSS rule in the style sheet. In the following code, we are using CSS rule number 2 to get the rule that controls color because the rule that we inserted above to control font-style was inserted at index 0 (i.e., BEFORE the color rule that was already in the style sheet).

sheets[1]$cssRules[2]$style$color
  [1] "red"
sheets[1]$cssRules[2]$style$color <- "green"

The function propertyNames can be used to get the names of all properties in a style declaration. This does not correspond to a DOM method; it is just a convenience function.

propertyNames(page, sheets[1]$cssRules[2]$style)
  [1] "color"

We can remove an existing property from a style declaration with the removeProperty function.

removeProperty(page, sheets[1]$cssRules[2]$style, "color")
  [1] "green"
propertyNames(page, sheets[1]$cssRules[2]$style)
  character(0)

It is also possible to access the selector for a CSS rule, but this cannot be modified; if we want a rule to control a different target, we should make a new rule.

sheets[1]$cssRules[2]$selectorText
  [1] "p"

Similarly, we can view (but not edit) the full text for a CSS rule via the cssText property.

sheets[1]$cssRules[2]$cssText
  [1] "p { }"

In summary, several new functions, combined with the ability to get and set properties, allows us to access and modify entire style sheets for a web page. This means that we can programmatically control the appearance of entire sets of elements at once.

Building style from scratch

Most of the examples so far have involved working with a ready-made element with a style attribute or working with a ready-made style sheet. This section briefly demonstrates how to build a stylesheet for a web page from the ground up.

We will again start with a web page containing a single paragraph and no CSS styling.

page <- htmlPage("<p>A paragraph</p>")

The first step is to create an empty style sheet. We can do this by creating an empty <style> element and adding that to the page.

styleElement <- createElement(page, "style")
appendChild(page, styleElement, parent=css("head"))
  An object of class "DOM_node_HTML"
  [1] "<style></style>"

We can access the style sheet via the sheet property of the <style> element. The first thing we do with the style sheet is disable it so that we can build it up without affecting the page.

styleSheet <- styleElement$sheet
styleSheet$disabled <- TRUE

The next step is to add an empty rule to the style sheet. This allows us to specify just the selector for the rule.

insertRule(page, styleSheet, "p { }", 0)
  [1] 0

We now create a short-cut to the new rule, to save on typing, and add style declarations to the rule.

rule1 <- styleSheet$cssRules[1]
rule1$style$color <- "red"
rule1$style$"font-style" <- "italic"

The last step is to enable the style sheet so that it can have an effect on the contents of the page.

styleSheet$disabled <- FALSE

CSS and SVG

All of the examples so far have involved styling HTML elements. Styling SVG elements is very similar, but with the added complication that individual SVG elements have presentation attributes in addition to a style attribute.

For an HTML element, a style declaration in the style attribute will override any style declarations in a style sheet that target the element. For an SVG element, a style declaration in the style attribute will override any style declarations in a style sheet that target the element, which in turn will override any presentation attributes on the SVG element.

The following code demonstrates these rules. First of all, we add an SVG image to the page and then we set the presentation attribute fill on the element (so that it is filled blue).

appendChild(page,
          svgNode('<svg xmlns="http://www.w3.org/2000/svg" 
                        width="50" height="50">
                     <circle id="c" r="50"/>
                   </svg>'),
          ns=TRUE)
circle <- getElementById(page, "c", response=nodePtr())
setAttribute(page, circle, "fill", "blue")

Now if we add a style sheet to the page that targets that SVG element and has a style declaration for fill, it overrides the element's presentation attribute, and the circle turns green.

appendChild(page, htmlNode("<style>#c { fill: green }"), css("head"))
  An object of class "DOM_node_HTML"
  [1] "<style>#c { fill: green }</style>"

Finally, if we add a style attribute to the SVG element, that overrides both the style sheet and the element's presentation attribute, and the circle turns red.

setAttribute(page, circle, "style", "fill: red")

A more complex example

This section provides a brief demonstration of the new 'DOM' features on a more realistically sized example. The following code creates a web page and adds a 'lattice' plot to the page as SVG content.

library(lattice)
library(gridSVG)
library(DOM)
library(XML)
xyplot(mpg ~ disp, mtcars, pch=16, cex=2)
svg <- grid.export(NULL)$svg
page <- htmlPage()
appendChild(page, svgNode(saveXML(svg)), ns="SVG")

The following code adds and builds a style sheet for the page that modifies the styling of the data symbols in the plot (so that they all turn red). This code takes advantage of the fact that all data symbols in the SVG that is generated by 'gridSVG' are <use> elements.

styleElement <- createElement(page, "style")
appendChild(page, styleElement, parent=css("head"))
styleSheet <- styleElement$sheet
insertRule(page, styleSheet, "use { }", 0)
styleSheet$cssRules[1]$style$fill <- "red"

Lower-level details

Previous sections have focused on the new user-facing features in version 0.4 of the 'DOM' package; how the package should work when we are using it to perform actions that the package is designed to support. This section looks at some of the lower-level and internal changes to the package, which can still affect us if we attempt to use the package for tasks that are not explicitly supported.

Object pointers

The previous version of the 'DOM' package introduced the idea of DOM node pointers, which are R objects that contain a pointer to a DOM node object (an HTML or SVG element) in the browser. This idea has been generalised in version 0.4 to allow for pointers to any DOM object. In addition to things like HTML elements, we can now have an R object that points to, for example, a style sheet object within the browser.

The complete class hierarchy for 'DOM' version 0.4 is shown below. There is now a DOM_obj root class that subsumes the existing DOM_node class. DOM objects are either a primitive type (DOM_string, DOM_number, or DOM_boolean), or something more complex (DOM_obj_ref). For everything other than HTML and SVG nodes, a DOM object in R is a pointer to a DOM object in the browser (DOM_obj_ptr). Some DOM objects (so far, only ones related to CSS) have their own specific representation in R (e.g., DOM_CSSStyleSheet_ptr).

Because we can now represent any DOM object in R, and we can get and set properties on these objects, it is now possible to access any part of the web page (as long as it is accessible via object properties). For example, if a web page has a <style> element, we can access the style sheet object relating to that element through its sheet property.

page <- htmlPage()
appendChild(page, htmlNode('<style id="s1"/>'), parent=css("head"))
  An object of class "DOM_node_HTML"
  [1] "<style id=\"s1\"></style>"
styleElement <- getElementById(page, "s1", response=nodePtr())
styleElement$sheet
  An object of class "DOM_CSSStyleSheet_ptr"
  [1] "1"
  Slot "pageID":
  [1] 5

In the above example, there is a specific class to represent the DOM object (in this case, DOM_CSSStyleSheet_ptr), but the 'DOM' package does not have a specific representation for all possible DOM objects. For example, the following code adds a CSS media rule to the style sheet of the page and then attempts to access the media property of this rule.

insertRule(page, styleElement$sheet,
         "@media screen { body { background-color: #AAA } }", 0)
  [1] 0
styleElement$sheet$cssRules[1]$media
  An object of class "DOM_obj_ptr"
  [1] "3"
  Slot "pageID":
  [1] 5

The result is a generic DOM_obj_ptr. The 'DOM' package does not know exactly what sort of DOM object this is and this has two consequences. First, the 'DOM' package does not know anything about the properties of the object, so provides less protection against doing something silly like trying to set a read-only property. This should only result in an error, so it is not a big problem, though in some cases it might just silently not do anything, which is more dangerous. A larger problem is that the 'DOM' package may not provide functions corresponding to the methods of the object. For example, the 'DOM' package knows about CSSStyleSheet objects, so it provides an insertRule function to mirror the method of that name for CSSStyleSheet objects. But the 'DOM' package does not have a specific representation for media list objects (which is what we have accessed in the code above) and that is reflected in the fact that there are no functions for working with media list objects in the 'DOM' package.

Future versions of the 'DOM' package may expand the set of supported DOM objects to cover some of these holes.

In the meantime, the generalised access to DOM objects and their properties does still allow a much greater scope for exploring and interacting with a web page. The example below shows that, even though 'DOM' does not make a distinction between CSS style rules and CSS media rules, we can access the CSS style rule within the CSS media rule just through the properties of the DOM_obj_ptr object.

styleElement$sheet$cssRules[1]$cssRules[1]$style$"background-color"
  [1] "rgb(170, 170, 170)"

The following code shows that we can now extract the complete HTML code for a page just through object properties. As an interesting side note, the result demonstrates that modifications to the DOM style sheet object are NOT reflected in the HTML content of the web page.

body <- getElementsByTagName(page, "body", response=nodePtr())
body$parentNode$outerHTML
  [1] "<html><head><style id=\"s1\"></style></head><body></body></html>"

Summary

Version 0.4 of the 'DOM' package introduces several new classes and functions for working with CSS content in a web page. The most important change is the ability to get and set properties on DOM objects in the browser, including style properties on individual HTML and SVG elements. In addition, the styleSheets function provides access to style sheets on a web page, the insertRule function allows CSS rules to be added to a style sheet, and then the ability to get and set properties allows us to control the style properties of CSS rules within style sheets. The overall result is the ability to programmatically control the styling of HTML and SVG content in a web page.

Technical requirements

The examples and discussion in this document relate to version 0.4 of the 'DOM' package.

This report was generated within a Docker container (see Resources section below).

Resources


Creative Commons License
'DOM' version 0.4 by Paul Murrell is licensed under a Creative Commons Attribution 4.0 International License.