6.1 XML syntax

The first line of an XML document should be a declaration that this is an XML document, including the version of XML being used.

<?xml version="1.0"?>

It is also useful to include a statement of the encoding used in the file.

<?xml version="1.0" encoding="UTF-8"?>

The main content of an XML document consists entirely of XML elements. An element usually consists of a start tag and an end tag, with plain text content or other XML elements in between.

A start tag is of the form <elementName> and an end tag has the form </elementName>.

The following code shows an example of an XML element.

<filename>ISCCPMonthly_avg.nc</filename>

The components of this XML element are shown below.

element: <filename>ISCCPMonthly_avg.nc</filename>
start tag: <filename>ISCCPMonthly_avg.nc</filename>
content: <filename> ISCCPMonthly_avg.nc</filename>
end tag: <filename>ISCCPMonthly_avg.nc </filename>

The start tag may include attributes of the form attrName="attrValue". The attribute value must be enclosed within double-quotes.

The names of XML elements and XML attributes are case-sensitive.

It is also possible to have an empty element, which consists of a single tag, with attributes. In this case, the tag has the form <elementName />.

The following code shows an example of an empty XML element with two attributes.

<case date="16-JAN-1994" 
      temperature="278.9" />

The components of this XML element are shown below.

element name: < case date="16-JAN-1994"
attribute name: <case date="16-JAN-1994"
attribute value: <case date=" 16-JAN-1994"
attribute name:        temperature="278.9" />
attribute value:       temperature=" 278.9" />

XML elements may be nested; an XML element may have other XML elements as its content. An XML document must have a single root element, which contains all other XML elements in the document.

The following code shows a very small, but complete, XML document. The root element of this document is the temperatures element. The filename and case elements are nested within the temperatures element.

<?xml version="1.0"?>
<temperatures>
    <filename>ISCCPMonthly_avg.nc</filename>
    <case date="16-JAN-1994" 
          temperature="278.9"/>
</temperatures>

A comment in XML is anything between the delimiters <!-- and -->.

For the benefit of human readers, the contents of an XML element are usually indented. However, whitespace is preserved within XML so this is not always possible when including plain text content.

In XML code, certain characters, such as the greater-than and less-than signs, have special meanings. Table 6.1 lists these special characters and also gives the escape sequence required to produce the normal, literal meaning of the characters.


Table 6.1: The predefined XML entities.
Character Description Entity
< less-than sign &lt;
> greater-than sign &gt;
& ampersand &amp;
" quotation mark &quot;
' apostrophe &apos;

A special syntax is provided for escaping an entire section of plain text content for the case where many such special characters are included. Any text between the delimiters <![CDATA[ and ]]> is treated as literal.

Paul Murrell

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 New Zealand License.