Subsections


2.2 Syntax

The first thing we need to learn about a computer language is the correct syntax for the language.

Syntax is essentially the punctuation and grammar rules for a computer language. Certain characters and words have special meanings and must appear in a particular order for the computer code to make any sense.

A simple example from line 3 in Figure 2.2 is the piece of HTML code <head>. The < character is special in HTML because it indicates the start of a keyword in HTML. The > character is also special; it marks the end of the keyword. Also, this particular keyword, <head>, must appear at the very start of a piece of HTML code.

These rules and special meanings for specific characters and words make up the syntax for a computer language.

Computers are extremely fussy when it comes to syntax, so computer code will have to have all syntax rules perfectly correct before the code will work.

The next section describes the basic syntax rules for the HTML language.


2.2.1 HTML syntax

HTML has a very simple syntax.

HTML code consists of two basic components: elements, which are special HTML keywords, and content, which is just normal everyday text.

There are a few elements that have to go in every HTML document--Figure 2.3 shows the smallest possible HTML document--and then it is up to the author to decide on the main contents of the web page.

Figure 2.3: A minimal HTML document. This is the basic code that must appear in any HTML document. The main content of the web page is described by adding further HTML elements within the body element.
 

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
    <head>
        <title></title>
    </head>
    <body>
    </body>
</html>

An HTML element consists of a start tag, an end tag and some content in between.

As an example, we will look closely at the title element from line 4 of Figure 2.2.

<title>Poles of Inaccessibility</title>

This code is broken down into its separate components below, with one important component labeled and highlighted (underlined) on each row.

start tag: <title>Poles of Inaccessibility</title>
content: <title> Poles of Inaccessibility</title>
end tag: <title>Poles of Inaccessibility </title>

The greater-than and less-than signs have a special meaning in HTML code; they mark the start and end of HTML tags. All of the characters with a special meaning in this title element are highlighted below.

special characters: <title >Poles of Inaccessibility </title >

Some HTML elements may be empty, which means that they only consist of a start tag (no end tag and no content). An example is the img (short for “image”) element from Figure 2.2, which inserts the plot in the web page.

<img src="poleplot.png">

The entire img element consists of this single tag.

There is a fixed set of valid HTML elements and only those elements can be used within HTML code. We will encounter several important elements in this chapter and a more comprehensive list is provided in Chapter 3.

2.2.1.1 Attributes

HTML elements can have one or more attributes, which provide more information about the element. An attribute consists of the attribute name, an equals sign, and the attribute value, which is surrounded by double-quotes. Attributes only appear in the start tag of an element. We have just seen an example in the img element above. The img element has an attribute called src that describes the location of a file containing the picture to be drawn on the web page. In the example above, the attribute is src="poleplot.png".

The components of this HTML element are shown below.

HTML tag: <img src="poleplot.png">
element name: < img src="poleplot.png">
attribute: <img src="poleplot.png">
attribute name: <img src="poleplot.png">
attribute value: <img src=" poleplot.png">

Again, some of the characters in this code are part of the HTML syntax. These special characters are highlighted below.

special characters: <img src ="poleplot.png ">

Many attributes are optional, and if they are not specified, a default value is provided.

2.2.1.2 Element order

HTML tags must be ordered properly. All elements must nest cleanly and some elements are only allowed inside specific other elements. For example, a title element can only be used inside a head element, and the title element must start and end within the head element. The following HTML code is invalid because the title element does not finish within the head element.

<head>
    <title>
    Poles of Inaccessibility
</head>
    </title>

To be correct, the title element must start and end within the head element, as in the code below.

<head>
    <title>
    Poles of Inaccessibility
    </title>
</head>

Finally, there are a few elements that must occur in an HTML document: there must be a DOCTYPE declaration, which states what computer language we are using; there must be a single html element, with a single head element and a single body element inside; and the head element must contain a single title element. Figure 2.3 shows a minimal HTML document.

HTML is defined by a standard, so there is a single, public specification of HTML syntax. Unfortunately, as is often the case, there are several different versions of HTML, each with its own standard, so it is necessary to specify exactly which version of HTML we are working with. We will focus on HTML version 4.01 in this book. This is specified in the DOCTYPE declaration used in all examples.

These are the basic syntax rules of HTML. With these rules, we can write correct HTML code. In Section 2.3 we will look at the next step, which is what the code will do when we give it to the computer to run.

2.2.2 Escape sequences

As we have seen in HTML, certain words or characters have a special meaning within the language. These are sometimes called keywords or reserved words to indicate that they are reserved by the language for special use and cannot be used for their normal natural-language purpose.

This means that some words or characters can never be used for their normal, natural-language meaning when writing in a formal computer language and a special code must be used instead.

For example, the < character marks the start of a tag in HTML, so this cannot be used for its normal meaning of “less than”.

If we need to have a less-than sign within the content of an HTML element, we have to type &lt; instead. This is an example of what is called an escape sequence.

Another special character in HTML is the greater-than sign, >. To produce one of these in the content of an HTML element, we must type &gt;.

In HTML, there are several escape sequences of this form that all start with an ampersand, &. This means of course that the ampersand is itself a special character, with its own escape sequence, &amp;. A larger list of special characters and escape sequences in HTML is given in Section 3.1.2.

We will meet this idea of escape sequences again in the other computer languages that we encounter later in the book.

Recap

Computer code is just text, but with certain characters or words having special meanings.

The punctuation and grammar rules of a computer language are called the syntax of the language.

Computer code must have all syntax rules correct before it can be expected to work properly.

An escape sequence is a way of getting the normal meaning for a character or word that has a special meaning in the language.

HTML consists of elements, which consist of a start tag and an end tag, with content in between.

HTML elements may also have attributes within the start tag.

Paul Murrell

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 New Zealand License.