Visual Perception

Introduction
Readings:
- Visual Perception and visual tasks
- Colour
References:
Bibliography:

The purpose of this topic is to learn about some fundamental properties of visual perception and to apply that knowledge to creating data visualisations that communicate clearly and accurately.

This topic has a more formal basis than the informal guidelines of the previous section because the rules and guidelines described in this topic are based on scientific research (rather than just common sense and experience).

We will use ‘ggplot2’ to produce data visualisations, but we will need to control the fine details of plots using themes, labelling, and even ‘grid’ if necessary.

This topic also covers colour and we will make use of the ‘colorspace’ package for generating and assessing colour palettes.

Introduction

When we create a data visualisation, we must not set out to deliberately deceive the viewer. However, good intentions and a pure heart will not on their own guarantee that a data visualisation will correctly convey information to the viewer. We must also be aware of how a data visualisation will be perceived.

Comparing numbers

Consider the two circles below. The circle on the right has an area twice as large as the circle on the left, but most people have difficulty perceiving the circle on the right to be exactly twice as large.

We could, with good intentions, draw a scatter plot where the area of the circular symbols represents the value of a variable, but that would produce a misleading data visualisation, or at least a visualisation that is difficult to interpret, because of how viewers perceive the scatter plot.

If our purpose is to compare numbers (a continuous variable), then area is not the best way to represent the values. We are much better at comparing lengths. For example, most people accurately perceive that the top horizontal line below is twice the length of the bottom horizontal line.

On the other hand, context is very important. For example, it is much harder for people to perceive that the top horizontal line below is the same length as the bottom horizontal line. This is known as the Muller-Lyer Illusion.

Even if we add further cues to show that the lines are the same length, the visual illusion is hard to shake.

Another important idea is that our visual perception is generally relative rather than absolute. For example, we perceive “lighter” or “darker” rather than an absolute level of luminance. This can also mislead us, in an effect called colour contrast or luminance contrast.

For example, in the diagram below, the smaller squares both have the same luminance. They just appear to be different because of the contrast provided by the larger squares (one very dark and one very light).

Similarly, the smaller squares below are both the same colour (hue). They just appear to be different because of the contrast provided by the larger squares.

All of this means that it is important for us to have some understanding of human visual perception, so that we can produce data visualisations that represent data accurately for the viewer.

The Readings from Healey demonstrate some other visual illusions and discuss how best to represent data values, e.g., length versus area. In the terminology used in that reading, we need to choose the correct channel to represent data values. You should be able to see some strong parallels with aesthetic mappings in ‘ggplot2’.

Identifying groups

Comparing numbers is one sort of visual task that we may perform when viewing a data visualisation. For example, this is the sort of thing we need to do in order to extract information from a barplot. Another example of a visual task is identifying groups. For example, we may want to identify which points in a scatter plot belong to the same group.

Again, there are better and worse ways to represent a grouping variable (or categorical variable).
In the plot below, colour (hue) is used to represent different groups and it is very easy to identify points that belong to the same group.

In the plot below, different groups are given different shapes for the data symbols. This makes it a lot harder to identify points that belong to the same group.

This ability to be able to immediately identify objects with the same colour is called preattentive pop-out; some features of a data visualisation will come to our attention rapidly and without conscious effort. Knowledge of these preattentive capabilities will allow us to produce more efficient data visualisations. Using shapes to represent group membership is not dishonest or misleading, it just makes it harder, and slower, for the viewer to perceive the message in our data visualisation.

The image below shows another example where we naturally perceive groups of objects. Most people see rows of dots on the left, but columns of dots on the right, even though the horizontal and vertical spacings between the dots are quite similar in both cases. This is an example of what are known as Gestalt rules; we tend to perceive objects that are close to each other as belonging to the same group (the dots on the left are slightly closer to the dots to either side than they are to the dots above and below, and vice versa for the dots on the right).

Although we cannot (or should not) control the placement of dots on a scatter plot, we will naturally perceive groupings in the placement of the dots. We can also use this idea of proximity deliberately, for example, to arrange a text label close to the data values that it relates to.

The Readings from Healy discuss the idea of preattentive pop-out and describe several more Gestalt rules.

Colour

As we have just seen, colour (hue) can be a powerful tool when creating a data visualisation. However, it can be difficult to decide on which colours to use, particularly when we have to select several colours at once.

The ‘ggplot2’ package provides useful default colours, but we may want to control the colour selection. For example, the bar plot below shows the proportion of people with different eye colours (blue, brown, green, or hazel) for both males and females, with the fill colour of the bars corresponding to the eye colour that is represented. Deliberately selecting colours like this can help to make it easier and faster for viewers to comprehend a data visualisation. For example, we may be able to avoid a legend altogether so that the viewer does not have to switch back and forth between a legend and the bars in order to understand which group is which.

The Readings from Wilke discuss what we can sensibly use colour for and what sorts of colours to use in different scenarios.

The “Colours in R” Reference describes some important R functions for generating sets of colours.

Readings:

Visual Perception and visual tasks

Sections 1.3, 1.4, and 1.5 of “Data Visualization: A practical introduction” by Kieran Healy.

This gives a nice overview of important visual perception concepts, including preattention and Gestalt rules, and it describes the “visual tasks” involved in decoding a data visualisation.

Colour

Chapters 4 and 19 of “Fundamentals of Data Visualization” by Claus O. Wilke.

These chapters focus specifically on the use of colour in data visualisations and have lots of practical advice. They also covers colour-vision deficiency (“colour blindness”). You should already be familiar with the latter chapter from the previous topic.

References:

“Colours in R”.

This document describes R functions for selecting colours.
The ‘colorspace’ package by Ross Ihaka et al.
‘ggplot2’ function documentation.
Help pages for ‘grid’ functions for R version 4.1.2.

Bibliography:

“The elements of graphing data” and “Visualizing data” by William S. Cleveland. These are pretty hard to get hold of these days, but The University of Auckland library has a copy of “The elements of graphing data”.
“Information Visualisation” by Colin Ware.
The University of Auckland has the 3rd Edition in electronic version plus older editions in hard copy.
“Coloring in R’s Blind Spot” by Achim Zeileis and Paul Murrell. More details on R’s colour palettes.

This work is licensed under a Creative Commons Attribution 4.0 International License.