Subsections


7.3 Querying XML

The direct counterpart to SQL that is designed for querying XML documents is a language called XQuery. However, a full discussion of XQuery is beyond the scope of this book. Instead, this section will only focus on XPath, a language that underpins XQuery, as well as a number of other XML-related technologies.

The XPath language provides a way to specify a particular subset of an XML document. XPath makes it easy to identify a coherent set of data values that are distributed across multiple elements within an XML document.


7.3.1 XPath syntax

An XPath expression specifies a subset of elements and attributes from within an XML document. We will look at the basic structure of XPath expressions via an example.

7.3.2 Case study: Point Nemo (continued)

Figure 7.7 shows the temperature data at Point Nemo in an XML format (this is a reproduction of Figure 5.13 for convenience).

Figure 7.7: The first few lines of the surface temperature at Point Nemo in an XML format. This is a reproduction of Figure 5.16.
 

<?xml version="1.0"?>
<temperatures>
    <variable>Mean TS from clear sky composite (kelvin)</variable>
    <filename>ISCCPMonthly_avg.nc</filename>
    <filepath>/usr/local/fer_dsets/data/</filepath>
    <subset>93 points (TIME)</subset>
    <longitude>123.8W(-123.8)</longitude>
    <latitude>48.8S</latitude>
    <case date="16-JAN-1994" temperature="278.9" />
    <case date="16-FEB-1994" temperature="280" />
    <case date="16-MAR-1994" temperature="278.9" />
    <case date="16-APR-1994" temperature="278.9" />
    <case date="16-MAY-1994" temperature="277.8" />
    <case date="16-JUN-1994" temperature="276.1" />

    ...

</temperatures>

This XML document demonstrates the idea that values from a single variable in a data set may be scattered across separate XML elements. For example, the temperature values are represented as attributes of case elements; they are not assembled together within a single column or a single block of memory within the file.

We will use some XPath expressions to extract useful subsets of the data set from this XML document.

The most basic XPath expressions consist of element names separated by forwardslashes. The following XPath selects the temperatures element from the XML document. In each of the XPath examples in this section, the elements or attributes that are selected by the XPath expression will be shown below the XPath code. If there are too many elements or attributes, then, to save space, only the first few will be shown, followed by ... to indicate that some of the results were left out.

/temperatures



<temperatures>
  <variable>Mean TS from clear sky composite (kelvin)</variable>
  <filename>ISCCPMonthly_avg.nc</filename>
 ...

More specifically, because the expression begins with a forwardslash, it selects the root element called temperatures. If we want to select elements below the root element, we need to specify a complete path to those elements, or start the expression with a double-forwardslash.

Both of the following two expressions select all case elements from the XML document. In the first example, we specify case elements that are directly nested within the (root) temperatures element:



/temperatures/case



<case date="16-JAN-1994" temperature="278.9"/>
<case date="16-FEB-1994" temperature="280"/>
<case date="16-MAR-1994" temperature="278.9"/>
 ...

The second approach selects case elements no matter where they are within the XML document.



//case



<case date="16-JAN-1994" temperature="278.9"/>
<case date="16-FEB-1994" temperature="280"/>
<case date="16-MAR-1994" temperature="278.9"/>
 ...

An XPath expression may also be used to subset attributes rather than entire elements. Attributes are selected by specifying the appropriate name, preceded by an @ character. The following example selects the temperature attribute from the case elements.

/temperatures/case/@temperature



278.9
280
278.9
 ...

Several separate paths may also be specified, separated by a vertical bar. This next XPath selects both longitude and latitude elements from anywhere within the XML document.



//longitude | //latitude



<longitude>123.8W(-123.8)</longitude>
<latitude>48.8S</latitude>

It is also possible to specify predicates, which are conditions that must be met for an element to be selected. These are placed within square brackets. In the following example, only case elements where the temperature attribute has the value 280 are selected.



/temperatures/case[@temperature=280]



<case date="16-FEB-1994" temperature="280"/>
<case date="16-MAR-1995" temperature="280"/>
<case date="16-MAR-1997" temperature="280"/>

We will demonstrate more examples of the use of XPath expressions later in Section 9.7.7, which will include an example of software that can be used to run XPath code.

Paul Murrell

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 New Zealand License.