Subsections


6.2 Document Type Definitions

An XML document that obeys the rules of the previous section is described as well-formed.

It is also possible to specify additional rules for the structure and content of an XML document, via a schema for the document. If the document is well-formed and also obeys the rules given in a schema, then the document is described as valid.

The Document Type Definition language (DTD) is a language for describing the schema for an XML document. DTD code consists of element declarations and attribute declarations.

6.2.1 Element declarations

An element declaration should be included for every different type of element that will occur in an XML document. Each declaration describes what content is allowed inside a particular element. An element declaration is of the form:

<!ELEMENT elementName elementContents>

The elementContents specifies whether an element can contain plain text, or other elements (and if so, which elements, in what order), or whether the element must be empty. Some possible values are shown below.

EMPTY
 
The element is empty.

ANY
 
The element may contain anything (other elements, plain text, or both).

(#PCDATA)
 
The element may contain plain text.

(elementA)
 
The element must contain exactly one elementA element. The parentheses, ( and ), are essential in this example and all others below.

(elementA*)
 
The element may contain zero or more elementA elements. The asterisk, *, indicates “zero or more”.

(elementA+)
 
The element must contain one or more elementA elements. The plus sign, +, indicates “one or more”.

(elementA?)
 
The element must contain zero or one elementA elements. The question mark, ?, indicates “zero or one”.

(elementA,elementB)
 
The element must contain exactly one elementA element and exactly one elementB element. The element names are separated from each other by commas.

(elementA|elementB)
 
The element must contain either exactly one elementA element or exactly one elementB element. The vertical bar, |, indicates alternatives.

(#PCDATA|elementA|elementB*)
 
The element may contain plain text, or a single elementA element, or zero or more elementB elements. The asterisk, *, is inside the parentheses so only applies to the elementB element.

(#PCDATA|elementA|elementB)*
 
The element may contain plain text, plus zero or more occurrences of elementA elements and elementB elements. The asterisk, *, is outside the parentheses so applies to all elements within the parentheses.


6.2.2 Attribute declarations

An attribute declaration should be included for every different type of element that can have attributes. The declaration describes which attributes an element may have, what sorts of values the attribute may take, and whether the attribute is optional or compulsory. An attribute declaration is of the form:

<!ATTLIST elementName
    attrName attrType attrDefault
    ...
>

The attrType controls what value the attribute can have. It can have one of the following forms:

CDATA
 
The attribute can take any value. Attribute values must always be plain text and escape sequences (XML entities) must be used for special XML characters (see Table 6.1).

ID
 
The value of this attribute must be unique for all elements of this type in the document (i.e., a unique identifier). This is similar to a primary key in a database table.

The value of an ID attribute must not start with a digit.

IDREF
 
The value of this attribute must be the value of some other element's ID attribute. This is similar to a foreign key in a database table.

(option1|option2)
 
This form provides a list of the possible values for the attribute. The list of options is given, separated by vertical bars, |. This is a good way to limit an attribute to only valid values (e.g., only "male" or "female" for a gender attribute).

<!ATTLIST elementName
    gender (male|female) #REQUIRED>

The attrDefault either provides a default value for the attribute or states whether the attribute is optional or required (i.e., must be specified). It can have one of the following forms:

value
 
This is the default value for the attribute.

#IMPLIED
 
The attribute is optional. It is valid for elements of this type to contain this attribute, but it is not required.

#REQUIRED
 
The attribute is required so it must appear in all elements of this type.

6.2.3 Including a DTD

A DTD can be embedded within an XML document or the DTD can be located within a separate file and referred to from the XML document.

The DTD information is included within a DOCTYPE declaration following the XML declaration. An inline DTD has the form:

<!DOCTYPE rootElementName [
    DTD code
]>

An external DTD stored in a file called file.dtd would be referred to as follows:

<!DOCTYPE rootElementName SYSTEM "file.dtd">

The name following the keyword DOCTYPE must match the name of the root element in the XML document.


6.2.4 An example

Figure 6.1 shows a very small, well-formed and valid XML document with an embedded DTD.

Figure 6.1: A well-formed and valid XML document, with an embedded DTD. The line numbers (in grey) are just for reference.
 

<?xml version="1.0"?>
<!DOCTYPE temperatures [
    <!ELEMENT temperatures (filename, case)>
    <!ELEMENT filename (#PCDATA)>
    <!ELEMENT case EMPTY>
    <!ATTLIST case 
        date        CDATA  #REQUIRED
        temperature CDATA  #IMPLIED>
 ]>
<temperatures>
    <filename>ISCCPMonthly_avg.nc</filename>
    <case date="16-JAN-1994" 
          temperature="278.9"/>
</temperatures>

Line 1 is the required XML declaration.

Lines 2 to 9 provide a DTD for the document. This DTD specifies that the root element for the document must be a temperatures element (line 2). The temperatures element must contain one filename element and one case element (line 3). The filename element must contain only plain text (line 4) and the case element must be empty (line 5).

The case element must have a date attribute (line 7) and may also have a temperature attribute (line 8). The values of both attributes can be arbitrary text (CDATA).

The elements within the XML document that mark up the actual data values are on lines 10 to 14.


Paul Murrell

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 New Zealand License.