How this book is organized

This book is designed to be accessible and practical, with an emphasis on useful, applicable information. To this end, each topic is introduced via one or more case studies, which helps to motivate the need for the relevant ideas and tools. Practical examples are used to demonstrate the most important points and there is a deliberate avoidance of minute detail. Separate reference chapters then provide a more structured and detailed description for a particular technology, which is more useful for finding specific information once the big picture has been obtained. These reference chapters are still not exhaustive, so pointers to further reading are also provided.

The main topics are organized into four core chapters, with supporting reference chapters, as described below.

Chapter 2: Writing Computer Code
This chapter discusses how to write computer code, using the HyperText Markup Language, HTML, as a concrete example. A number of important ideas and terminology are introduced for working with any computer language, and it includes guidelines and advice on the practical aspects of how to write computer code in a disciplined way. HTML provides a way to produce documents that can be viewed in a web browser and published on the world wide web.

Chapters 3 and 4 provide support in the form of reference material for HTML and Cascading Style Sheets.

Chapter 5: Data Storage
This chapter covers a variety of data storage topics, starting with a range of different file formats, which includes a brief discussion of how data values are stored in computer memory, moving on to a discussion of the eXtensible Markup Language, XML, and ending up with the structure and design issues of relational databases.

Chapter 6 provides reference material for XML and the Document Type Definition language.

Chapter 7: Data Queries
This chapter focuses on accessing data, with a major focus on extracting data from a relational database using the Structured Query Language, SQL. There is also a brief mention of the XPath language for accessing data in XML documents.

Chapter 8 provides reference material for SQL, including additional uses of SQL for creating and modifying relational databases.

Chapter 9: Data Processing
This chapter is by far the largest. It covers a number of tools and techniques for searching, sorting, and tabulating data, plus several ways to manipulate data to change the data into new forms. This chapter introduces some very basic programming concepts and introduces the R language for statistical computing.

Chapter 10 provides reference material for R and Chapter 11 provides reference material for regular expressions, which is a language for processing text data.

Chapter 12 provides a brief wrap-up of the main ideas in the book.

There is an overall progression through the book from writing simple computer code with straightforward computer languages to more complex tasks with more sophisticated languages. The core chapters also build on each other to some extent. For example, Chapter 9 assumes that the reader has a good understanding of data storage formats and is comfortable writing computer code. Furthermore, examples and case studies are carried over between different chapters in an attempt to illustrate how the different technologies need to be combined over the lifetime of a data set. There are also occasional “flashbacks” to a previous topic to make explicit connections between similar ideas that reoccur in different settings. In this way, the book is set up to be read in order from start to finish.

However, every effort has been made to ensure that individual chapters can be read on their own. Where necessary, figures are reproduced and descriptions are repeated so that it is not necessary to jump back and forth within the book in order to acquire a complete understanding of a particular section.

Much of the information in this book will require practice in order to gain a full understanding. The reader is encouraged to make use of the exercises on the book's web site.

Paul Murrell

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 New Zealand License.