Lab 5: Graphic Design


The purpose of this lab is to apply graphic design guidelines to evaluate and improve data visualisations.

The Data Set

The data set is a CSV file, nzpolice-proceedings.csv, which was derived from “Dataset 5” of Proceedings (offender demographics) on the policedata.nz web site.

We can read the data into an R data frame with read.csv().

crime <- read.csv("nzpolice-proceedings.csv")

The following code generates a data frame containing the number of incidents per month for both male and female offenders.

crimeTable <- table(crime$Date, crime$SEX)
crimeProp <- as.data.frame(apply(crimeTable, 1, function(x) x[1]/sum(x)))
names(crimeProp) <- "Prop"
crimeProp$Month <- as.Date(rownames(crimeProp))
crimeSex <- as.data.frame(crimeTable)
names(crimeSex) <- c("Date", "Sex", "Freq")
crimeSex$Month <- as.Date(crimeSex$Date)
head(crimeSex)
        Date    Sex Freq      Month
1 2014-07-01 Female 2755 2014-07-01
2 2014-08-01 Female 2671 2014-08-01
3 2014-09-01 Female 2663 2014-09-01
4 2014-10-01 Female 2701 2014-10-01
5 2014-11-01 Female 2586 2014-11-01
6 2014-12-01 Female 2667 2014-12-01

The following code generates a data frame containing the number of incidents per month for both male and female offenders, broken down by the type of crime. The Type factor has levels in decreasing order of the number of incidents for males in the first month of data.

crimeTypeSexList <- lapply(split(crime, 
                                 list(crime$Date, 
                                      crime$SEX, 
                                      crime$ANZSOC.Division)),
                           nrow)
crimeTypeSex <- cbind(as.data.frame(do.call(rbind, crimeTypeSexList)),
                      as.data.frame(do.call(rbind, 
                                            strsplit(names(crimeTypeSexList), 
                                                     "[.]"))))
rownames(crimeTypeSex) <- NULL
colnames(crimeTypeSex) <- c("Count", "Month", "Sex", "Type")
crimeTypeSex$Month <- as.Date(crimeTypeSex$Month)
monthFirst <- subset(crimeTypeSex, Month == "2014-07-01" & Sex == "Male")
monthLevels <- monthFirst$Type[order(monthFirst$Count, decreasing=TRUE)]
monthLabels <- unlist(lapply(strwrap(monthLevels, width=30, simplify=FALSE),
                             function(x) {
                                 if (length(x) < 3)
                                     x <- c(x, rep(" ", 3 - length(x)))
                                 paste(x, collapse="\n")
                             }))
crimeTypeSex$Type <- factor(crimeTypeSex$Type, levels=monthLevels)
head(crimeTypeSex)
  Count      Month    Sex                                                              Type
1   121 2014-07-01 Female Abduction, Harassment and Other Related Offences Against a Person
2    89 2014-08-01 Female Abduction, Harassment and Other Related Offences Against a Person
3   129 2014-09-01 Female Abduction, Harassment and Other Related Offences Against a Person
4   112 2014-10-01 Female Abduction, Harassment and Other Related Offences Against a Person
5   110 2014-11-01 Female Abduction, Harassment and Other Related Offences Against a Person
6   116 2014-12-01 Female Abduction, Harassment and Other Related Offences Against a Person

Questions of Interest

We already know that there are many more incidents involving Male offenders than Female offenders.

In this lab, we will focus on differences between Males and Females over time:

  • Are the trends in the number of incidents over time the same for Male and Female offenders?
  • Are the trends over time the same for Male and Female offenders for different types of crime?

Data Visualisations

  1. Write R code to produce the plot below.

    Comment on what this data visualisation tells us about the questions of interest.

    Comment on the design of this plot. For each of the CRAP design guidelines, give at least one example where the guideline is being applied.

  2. Write R code to produce a modified version of the plot from the previous question that includes at least one additional example of applying each of the CRAP guidelines.

    Comment on each change that you make and explain which guideline you are using for each change.

    Comment on whether the changes have made it easier or harder to answer the questions of interest.

  3. Write R code to produce the data visualisation shown below.

    Comment on what this data visualisation tells us about the questions of interest.

    Identify at least ONE example of each of the CRAP design guidelines that is being employed in this data visualisation.

    NOTE that the image is 8 inches square.

Challenge

  1. No marks will be given for this question.

    The paragraph below and the plot below that show a useful example of repetition: use the same font in the plot as is used in the main text. Can you reproduce this example ?

    The font is the Google Font Gruppo.

    This paragraph of text and the plot below BOTH use the same font (the Google Font Gruppo). The plot is the same as the last question above, just with the Gruppo font applied. The width of the paragraph has also been set to be the same as the width of the plot (8 inches).

The Report

Your submission should consist of a knitted R Markdown document, in HTML format, submitted via Canvas.

Your report should include:

  • A brief description of the data and the question we are trying to answer.
  • For each data visualisation, R code AND a brief text commentary.
  • A brief overall summary.

Don’t forget to also complete the Canvas Quiz!

Marking

Marks will be lost for:

  • Plagiarism.
  • Section of the report is missing.
  • The summary is too short or does not make sense.
  • Significantly poor R (or other) code.
  • Overly verbose code, output, or commentary.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.