The Data Set

The data set is a CSV file, nzpolice-proceedings.csv, which was derived from “Dataset 5” of Proceedings (offender demographics) on the policedata.nz web site.

We can read the data into an R data frame with read.csv().

crime <- read.csv("nzpolice-proceedings.csv")

The following code generates a data frame containing the number of incidents per month for both male and female offenders.

crimeTable <- table(crime$Date, crime$SEX)
crimeProp <- as.data.frame(apply(crimeTable, 1, function(x) x[1]/sum(x)))
names(crimeProp) <- "Prop"
crimeProp$Month <- as.Date(rownames(crimeProp))
crimeSex <- as.data.frame(crimeTable)
names(crimeSex) <- c("Date", "Sex", "Freq")
crimeSex$Month <- as.Date(crimeSex$Date)
head(crimeSex)
        Date    Sex Freq      Month
1 2014-07-01 Female 2755 2014-07-01
2 2014-08-01 Female 2671 2014-08-01
3 2014-09-01 Female 2663 2014-09-01
4 2014-10-01 Female 2701 2014-10-01
5 2014-11-01 Female 2586 2014-11-01
6 2014-12-01 Female 2667 2014-12-01

The following code generates a data frame containing the number of incidents per month for both male and female offenders, broken down by the type of crime. The Type factor has levels in decreasing order of the number of incidents for males in the first month of data.

crimeTypeSexList <- lapply(split(crime, 
                                 list(crime$Date, 
                                      crime$SEX, 
                                      crime$ANZSOC.Division)),
                           nrow)
crimeTypeSex <- cbind(as.data.frame(do.call(rbind, crimeTypeSexList)),
                      as.data.frame(do.call(rbind, 
                                            strsplit(names(crimeTypeSexList), 
                                                     "[.]"))))
rownames(crimeTypeSex) <- NULL
colnames(crimeTypeSex) <- c("Count", "Month", "Sex", "Type")
crimeTypeSex$Month <- as.Date(crimeTypeSex$Month)
monthFirst <- subset(crimeTypeSex, Month == "2014-07-01" & Sex == "Male")
monthLevels <- monthFirst$Type[order(monthFirst$Count, decreasing=TRUE)]
monthLabels <- unlist(lapply(strwrap(monthLevels, width=30, simplify=FALSE),
                             function(x) {
                                 if (length(x) < 3)
                                     x <- c(x, rep(" ", 3 - length(x)))
                                 paste(x, collapse="\n")
                             }))
crimeTypeSex$Type <- factor(crimeTypeSex$Type, levels=monthLevels)
head(crimeTypeSex)
  Count      Month    Sex                                                              Type
1   121 2014-07-01 Female Abduction, Harassment and Other Related Offences Against a Person
2    89 2014-08-01 Female Abduction, Harassment and Other Related Offences Against a Person
3   129 2014-09-01 Female Abduction, Harassment and Other Related Offences Against a Person
4   112 2014-10-01 Female Abduction, Harassment and Other Related Offences Against a Person
5   110 2014-11-01 Female Abduction, Harassment and Other Related Offences Against a Person
6   116 2014-12-01 Female Abduction, Harassment and Other Related Offences Against a Person

Questions of Interest

We already know that there are many more incidents involving Male offenders than Female offenders.

library(ggplot2)
ggplot(crime) +
    geom_bar(aes(SEX)) +
    labs(title="Number of Incidents (thousands)", 
         subtitle="July 2014 to Dec 2022") +
    scale_y_continuous(breaks=seq(0, 600000, 200000), labels=seq(0, 600, 200)) +
    theme(axis.title.y=element_blank(),
          axis.title.x=element_blank())

In this lab, we will focus on differences between Males and Females over time:

Data Visualisations

  1. The following code produces a line plot of the number of incidents over time, broken down by the sex of the offender.

    We can see that there are more than twice the number of incidents for male offenders than for female offenders and, while both groups show a decreasing trend, female offending appears to be decreasing at a slightly slower rate.

    In terms of CRAP guidelines:

    • There is some contrast between the colourful lines and the otherwise grey background and labels, though it is not particularly strong. The variation in text size does NOT create a strong contrast. Overall, attention is drawn to the main data lines, but there could be much greater contrast.
    • There is repetition in the use of colours between the main plot lines and the legend.
    • There is alignment of the main title with the left edge of the plot panel.
    • There is proximity between the elements of the legend that make it easy to see the legend as a single, coherent item.

     

    ggplot(crimeSex) +
        geom_line(aes(Month, Freq, colour=Sex)) +
        labs(title="Number of Incidents") 

  2. The following code produces a modified version of the previous plot by applying the following CRAP design guidelines:

    • The grey background has been removed and the text labels and the remaining grid lines have been drawn grey, which produces a much stronger contrast between the data lines and other elements of the plot.

    • The legend has been replaced with direct labels and the labels are drawn in the same colour as the lines (repetition).

    • There is much more alignment, particularly of text labels: in addition the to aligned title, the sex labels are right-aligned with the right edge of the plot panel, and the y-axis labels have been bottom-aligned with the horizontal grid lines.

    • The direct labels have been placed right next to the ends of the lines (proximity).

    The end result is arguably cleaner and simpler than the original (less clutter), with a stronger emphasis on the data lines. This makes it easier to concentrate on the most important elements of the plot, which are the trends in the data lines.

    library(gggrid)
    Loading required package: grid
    label <- function(data, coords) {
        textGrob(data$sex, 1, just="right",
                 unit(coords$y[nrow(coords)], "npc") + unit(5, "mm"),
                 gp=gpar(col=data$colour[1]))
    }
    ggplot(crimeSex) +
        geom_line(aes(Month, Freq, colour=Sex), linewidth=1) +
        grid_group(label, aes(Month, Freq, colour=Sex, sex=Sex)) +
        labs(title="Number of Incidents") +
        scale_x_date(expand=expansion(0)) +
        theme(panel.background=element_blank(),
              panel.grid.major.y=element_line(colour="grey", linewidth=.1),
              legend.position="none",
              text=element_text(colour="grey"),
              plot.title=element_text(colour="black", size=16, face="bold"),
              axis.title=element_blank(),
              axis.text=element_text(colour="grey", vjust=0),
              axis.ticks=element_blank(),
              plot.margin=unit(rep(1, 4), "cm"))

  3. The following code produces a facetted plot with one panel for each type of crime and separate lines for males and females within each panel.

    We can see that the overall trend of many more incidents involving Male offenders than Female offenders repeats across all crimes (though it is difficult to see with the low numbers in the bottom row). There are also some remarkable consistencies in the details of the trends over time, for several types of crime: where there are pronounced peaks or troughs for Male offenders, there are similar peaks or troughs for Female offenders. Whatever is driving those variations appears to have no regard for sex. Looking in more detail, the separation between the Male and Female lines is noticeably different for some crimes. For example, Abduction is more dominated by Males than Fraud. It would be useful to plot the proportions of Males vs Females to explore this further.

    The following examples of CRAP design are being employed:

    • The repetition of colours between the title and the lines (no need for a legend!). Similarly, all of the text labels are the same muted grey.
    • The alignment of labels, particularly the left-alignment of the direct crime type labels with the left edge of the panels.
    • The proximity of the direct labels with the data lines, which helps to identify the separate panels. There is also a clear gap between the plot title and the plot panels (the words in the title are closer to each other than to the plot panels).
    • The contrast of the bright colours for the data lines versus the light colours for the labels and grid lines.

     

    NOTE that we need the ‘ggtext’ package to achieve the colours in the title.

    library(gggrid)
    library(ggtext)
    typeLabel <- function(data, coords) {
        subset <- subset(coords, sex == "Male")
        label <- paste(strwrap(data$type[1], width=25), collapse="\n")
        textGrob(label,
                 0,
                 unit(subset$y[1], "npc") + unit(4, "mm"),
                 just=c("left", "bottom"),
                 gp=gpar(col="grey50", fontsize=9))
    }
    cols <- scales::hue_pal()(2)
    title <- paste0('Number of Incidents for <span style="color: ', cols[2],
                    '">Males</span> and <span style="color: ', cols[1], 
                    '">Females</span>')
    ggplot(crimeTypeSex) +
        grid_panel(typeLabel, aes(y=Count, type=Type, sex=Sex)) +
        grid_panel(segmentsGrob(0, 0, 1, 0, gp=gpar(col="grey50"))) +
        geom_line(aes(Month, Count, colour=Sex)) +
        scale_x_date(expand=expansion(0)) +
        scale_y_continuous(expand=expansion(0), 
                           breaks=seq(0, 900, 300)) +
        coord_cartesian(ylim=c(0, 1500)) +
        labs(title=title, subtitle=" ") +
        facet_wrap("Type") +
        theme(strip.text=element_blank(),
              axis.text=element_text(colour="grey50"),
              axis.title=element_blank(),
              axis.text.y=element_text(vjust=0),
              plot.background=element_rect(fill=grey(.1)),
              panel.background=element_rect(fill=grey(.1)),
              panel.grid=element_blank(),
              panel.spacing.x=unit(5, "mm"),
              legend.position="none",
              plot.title=element_markdown(size=16, colour="grey50"),
              plot.margin=unit(rep(1, 4), "cm"))

Challenge

  1. The following paragraph and modified version of the plot above makes use of the same font: the Google Font Gruppo. See the .Rmd file for the code that sets up the font in the paragraph of text.

    typeLabelGruppo <- function(data, coords) {
        subset <- subset(coords, sex == "Male")
        label <- paste(strwrap(data$type[1], width=25), collapse="\n")
        textGrob(label,
                 0,
                 unit(subset$y[1], "npc") + unit(4, "mm"),
                 just=c("left", "bottom"),
                 gp=gpar(col="grey50", fontsize=9, 
                         fontfamily="Gruppo", lineheight=1, fontface="bold"))
    }
    ggGruppo <- ggplot(crimeTypeSex) +
        grid_panel(typeLabelGruppo, aes(y=Count, type=Type, sex=Sex)) +
        grid_panel(segmentsGrob(0, 0, 1, 0, gp=gpar(col="grey50"))) +
        geom_line(aes(Month, Count, colour=Sex)) +
        scale_x_date(expand=expansion(0)) +
        scale_y_continuous(expand=expansion(0), 
                           breaks=seq(0, 900, 300)) +
        coord_cartesian(ylim=c(0, 1500)) +
        labs(title=title, subtitle=" ") +
        facet_wrap("Type") +
        theme(strip.text=element_blank(),
              axis.text=element_text(colour="grey50"),
              axis.title=element_blank(),
              axis.text.y=element_text(vjust=0, family="Gruppo", face=2),
              plot.background=element_rect(fill=grey(.1)),
              panel.background=element_rect(fill=grey(.1)),
              panel.grid=element_blank(),
              panel.spacing.x=unit(5, "mm"),
              legend.position="none",
              plot.title=element_markdown(size=16, colour="grey50",
                                          family="Gruppo", face=2),
              plot.margin=unit(rep(1, 4), "cm"))

    This paragraph of text and the plot below BOTH use the same font (the Google Font Gruppo). The plot is the same as the last question above, just with the Gruppo font applied. The width of the paragraph has also been set to be the same as the width of the plot (8 inches).

Summary

In this lab we have practised assessing data visualisations in terms of CRAP design guidelines. We have also attempted to modify data visualisations using CRAP design guidelines. Hopefully we have demonstrated that graphic design is not just about making an image pretty, but it can have a genuine impact on the clarity and focus of a data visualisation.

We have explored the crime data some more as well, learning that the differences between Males and Females does depend on the type of crime.

We have also gained more experience with ‘ggplot2’ and ‘grid’, particularly in terms of controlling the details of a data visualisation and modifying the default choices that ‘ggplot2’ makes for us.