The purpose of this lab is to apply principles of visual perception to evaluate and improve data visualisations.
The data set is a CSV file, nzpolice-proceedings.csv,
which was derived from “Dataset 5” of Proceedings
(offender demographics) on the policedata.nz
web site.
We can read the data into an R data frame with
read.csv().
We will focus on youth crime (aged 15-19 inclusive).
Age.Lower Police.District ANZSOC.Division SEX Date
1 15 Tasman Acts Intended to Cause Injury Female 2015-12-01
5 15 Auckland City Acts Intended to Cause Injury Female 2015-12-01
6 15 Auckland City Acts Intended to Cause Injury Female 2015-12-01
22 15 Auckland City Dangerous or Negligent Acts Endangering Persons Female 2015-12-01
23 15 Auckland City Dangerous or Negligent Acts Endangering Persons Female 2015-12-01
41 15 Auckland City Illicit Drug Offences Female 2015-12-01
Month
1 2015-12-01
5 2015-12-01
6 2015-12-01
22 2015-12-01
23 2015-12-01
41 2015-12-01
The following code reorders the levels of the
ANZSOC.Division factor according to the overall counts for
each type of crime. It also generates typelabels, which are
line-wrapped versions of the ANZSOC.Division levels.
divLevels <- levels(factor(youth$ANZSOC.Division))
divLabels <- unlist(lapply(strwrap(divLevels, width=30, simplify=FALSE),
function(x) {
if (length(x) < 3)
x <- c(x, rep(" ", 3 - length(x)))
paste(x, collapse="\n")
}))
types <- table(youth$ANZSOC.Division)
typeLevels <- names(types)[order(types, decreasing=TRUE)]
typeLabels <- unlist(lapply(strwrap(typeLevels, width=30, simplify=FALSE),
function(x) {
if (length(x) < 3)
x <- c(x, rep(" ", 3 - length(x)))
paste(x, collapse="\n")
}))
youth$Type <- factor(youth$ANZSOC.Division, levels=typeLevels)The following code generates a table of counts for the number of incidents for youth broken down by type of crime.
Var1 Freq
1 Theft and Related Offences 23280
2 Acts Intended to Cause Injury 17456
3 Traffic and Vehicle Regulatory Offences 15737
4 Public Order Offences 12358
5 Dangerous or Negligent Acts Endangering Persons 11144
6 Property Damage and Environmental Pollution 9534
The following code reorders the levels of the
ANZSOC.Division factor according to the first
month count for each type of crime. It also generates
newlabels, which are line-wrapped versions of the
ANZSOC.Division levels.
monthTypes <- table(youth$ANZSOC.Division, youth$Month)[,1]
monthTypeLevels <- names(types)[order(monthTypes, decreasing=TRUE)]
monthTypeLabels <- unlist(lapply(strwrap(monthTypeLevels, width=30,
simplify=FALSE),
function(x) {
if (length(x) < 3)
x <- c(x, rep(" ", 3 - length(x)))
paste(x, collapse="\n")
}))
youth$monthType <- factor(youth$ANZSOC.Division, levels=monthTypeLevels)The following code generates a table of counts for the number of incidents per month, broken down by type of crime.
youthMonthType <- as.data.frame(table(youth$Month, youth$monthType))
youthMonthType$Month <- as.Date(youthMonthType$Var1)
head(youthMonthType) Var1 Var2 Freq Month
1 2014-07-01 Theft and Related Offences 393 2014-07-01
2 2014-08-01 Theft and Related Offences 369 2014-08-01
3 2014-09-01 Theft and Related Offences 343 2014-09-01
4 2014-10-01 Theft and Related Offences 367 2014-10-01
5 2014-11-01 Theft and Related Offences 369 2014-11-01
6 2014-12-01 Theft and Related Offences 348 2014-12-01
For youth crimes, we are interested in comparisons between different types of crime.
Amongst the three most common crimes:
We will also make a specific comparison between “Public Order Offences” and “Dangerous or Negligent Acts Endangering Persons”:
The data visualisation below shows a pie chart of the number of incidents for each different type of crime. This was modelled on a figure from page 11 of the Youth Justic Indicators Summary Report (i.e., a data visualisation of this sort was published in an official government report).
Write R code to produce this pie chart.
Commment on what this data visualisation tells us about the questions of interest.
Identify the visual channel or channels that are being used to represent the data values in this data visualisation and comment on whether they are appropriate channels.
Identify a perceptual problem with this data visualisation that is related to “contrast” effects. If you add a white border to the segments of the pie chart, does this fix the problem? Explain why or why not?
Write R code to produce two different data visualisations of the data from the preceding question that make use of different visual channels: one should make use of at least one better visual channel and one should make use of at least one worse visual channel.
Comment on the changes that you have made and whether they make it easier or harder to answer the questions of interest.
The plot below shows the number of incidents per month for each type of crime, with two of the crimes highlighted by drawing a white line behind the normal (coloured) line.
Write R code to produce this data visualisation.
Commment on what this data visualisation tells us about the questions of interest.
Identify at least one example of preattentive pop out and at least three examples of Gestalt Rules in this data visualisation.
The plot below shows the number of incidents per month for each type of crime, with two of the crimes highlighted by desaturating the default colours for most lines, but retaining full saturation for the two lines (i.e., with a custom colour scale).
Write R code to produce this data visualisation.
Write R code to produce a version of the data visualisation from the previous question that shows what a viewer with deuteranomaly would see.
Comment on whether the data visualisation would be effective for a viewer with deuteranomaly.
The plot below shows the number of incidents per month for each type of crime, with a separate panel for each type of crime.
NOTE that each panel has the Public Order and Dangerous or Negligent Acts data plotted in grey in addition to the relevant panel data (in colour). Also, the x-axis labels have been customised.
Write R code to produce this data visualisation.
Commment on what this data visualisation tells us about the questions of interest.
Identify the visual channel or channels that are being used to represent the data values in this data visualisation and comment on whether they are appropriate channels.
No marks will be given for this question.
Can you produce the plot below? Does this improve on any of the previous plots in any way? What visual perception concepts are being used here?
Your submission should consist of a knitted R Markdown document, in HTML format, submitted via Canvas.
Your report should include:
Don’t forget to also complete the Canvas Quiz!
Marks will be lost for:
This
work is licensed under a
Creative
Commons Attribution 4.0 International License.