The Data Set

The data set is a CSV file, nzpolice-proceedings.csv, which was derived from “Dataset 5” of Proceedings (offender demographics) on the policedata.nz web site.

We can read the data into an R data frame with read.csv().

crime <- read.csv("nzpolice-proceedings.csv")
head(crime)
  Age.Lower Police.District                                                   ANZSOC.Division
1        15          Tasman                                     Acts Intended to Cause Injury
2        20   Auckland City Abduction, Harassment and Other Related Offences Against a Person
3        40   Auckland City Abduction, Harassment and Other Related Offences Against a Person
4        10   Auckland City                                     Acts Intended to Cause Injury
5        15   Auckland City                                     Acts Intended to Cause Injury
6        15   Auckland City                                     Acts Intended to Cause Injury
     SEX       Date
1 Female 2015-12-01
2 Female 2015-12-01
3 Female 2015-12-01
4 Female 2015-12-01
5 Female 2015-12-01
6 Female 2015-12-01

Questions of Interest

We will focus on just youth crime, where we define “youth” as aged 15-19 inclusive.

youth <- subset(crime, Age.Lower == 15)

We will look at the proportion of male versus female offenders in this age group and we will look at the trends over time in the number of incidents, broken down by type of crime.

Data Visualisations

We will begin with just drawing data visualisations entirely using ‘grid’.

library(grid)
  1. The following code reproduces a simple bar visualisation from page 9 of the Youth Justice Indicators Summary Report that shows the proportion of male versus female offenders.

    We first generate the proportion of males versus females and define the colours that we will be using.

    youthSex <- table(youth$SEX)/nrow(youth)
    female <- "#E46C0A"
    male <- "#0070C0"

    The next code defines a function to draw the bars. We do this so that we can reuse this code easily in other questions. The function takes a vector of two proportions and draws two rectangles the full width of the current viewport and 1cm high. The rectangles have no border, but they are filled with the colours taken from the Youth Justice Report. The text labels are generated from the proportions and drawn 2mm in from either side of the current viewport (and they are bold).

    sexBar <- function(props, height=unit(1, "cm")) {
        grid.rect(c(0, props[1]), width=props, height=height,
                  just="left", gp=gpar(col=NA, fill=c(female, male)))
        grid.text(paste0("Female ", round(100*props[1]), "%"),
                  unit(2, "mm"), just="left",
                  gp=gpar(fontface = "bold"))
        grid.text(paste0("Male ", round(100*props[2]), "%"),
                  unit(1, "npc") - unit(2, "mm"), just="right",
                  gp=gpar(fontface = "bold"))
    }

    Finally, we draw the bars using the proportions calculated previously. A grey rectangle is drawn to show the extent of the image (the bars do not cover the entire height of the image). This clearly shows that the majority of offenders are male; the labels allow us to see quite precise values for the split.

    grid.newpage()
    grid.rect(gp=gpar(col=NA, fill="grey90"))
    sexBar(youthSex)

  2. The following code draws the same bars from the previous question, using exactly the same code (to draw the bars), but this time within a viewport that is 2cm narrower than the entire image. Having written a function in the previous question makes it very easy and efficient to use the same code (to draw the bars) in this question. A grey rectangle is drawn to show the extent of the image.

    grid.newpage()
    grid.rect(gp=gpar(col=NA, fill="grey90"))
    pushViewport(viewport(width=unit(1, "npc") - unit(2, "cm")))
    sexBar(youthSex)
    popViewport()

  3. The following code produces an array of bars that shows the proportion of male versus female offenders broken down by year. We first generate proportions of male versus female offenders per year.

    youthSexYear <- t(apply(table(substr(youth$Date, 1, 4), youth$SEX), 1, 
                            function(x) x/sum(x)))

    Now we create a viewport with margins (1in to the left and 1cm all other sides) and a y-scale that goes from 0 to 9. The y-scale will make it easy to place the per-year bars. We then loop over the years and push a viewport for each year; we can again reuse the sexBar() function that we wrote to draw the bars. Having the height argument

    We define a new function, sexYear(), so that we can draw this data visualisation at different sizes.

    sexYear <- function() {
        pushViewport(viewport(x=unit(1, "in"), 
                              width=unit(1, "npc") - 
                                    unit(1, "in") - unit(1, "cm"),
                              height=unit(1, "npc") - unit(2, "cm"), 
                              just="left",
                              yscale=c(0, 9)))
        for (i in 1:9) {
            pushViewport(viewport(y=unit(i - .5, "native")))
            sexBar(youthSexYear[i,], height=.08)
            grid.text(rownames(youthSexYear)[i], x=unit(-2, "mm"), just="right")
            popViewport()
        }
        popViewport()
    }

    Finally, we draw the array of bars. This shows us that the proportion of male versus female offenders has remained quite stable over time, though there is a suggestion of a small trend even more towards males in the last few years.

    grid.newpage()
    grid.rect(gp=gpar(col=NA, fill="grey90"))    
    sexYear()

    We draw the array of bars at a different size to show that the margins stay the same size and the bars just shrink.

    grid.newpage()
    grid.rect(gp=gpar(col=NA, fill="grey90"))    
    sexYear()

Combining ‘grid’ with ‘ggplot2’

We now turn our attention to combining ‘grid’ output with ‘ggplot2’ output. We already know that the number of incidents shows a downward trend over time with a slight suggestion of an increase since the start of 2022 (shown below). We will look in more detail at these trends by breaking crimes down by type of crime.

library(ggplot2)
    youthTrend <- as.data.frame(table(youth$Date))
    youthTrend$Date <- as.Date(youthTrend$Var1)
    ggplot(youthTrend) +
        geom_line(aes(Date, Freq))

  1. The following code draws a line plot of the number of incidents over time with a facet for each type of crime. First we generate counts of incidents per month for each type of crime.

    youth$Abbrev <- 
        gsub(",", "",
             unlist(lapply(strsplit(as.character(youth$ANZSOC.Division), " "),
                           function(x) x[1])))
    youthTrendType <- as.data.frame(table(youth$Date, youth$Abbrev))
    youthTrendType$Date <- as.Date(youthTrendType$Var1)
    ## Order crime types
    types <- table(youth$Abbrev)
    newlevels <- names(types)[order(types, decreasing=TRUE)]
    youthTrendType$Type <- factor(youthTrendType$Var2, levels=newlevels)

    We have abbreviated the crime type labels so that the facet labels are not too long; the output below provides a “legend” to decode facet labels back to full crime type labels.

    print(unique(youth[c("Abbrev", "ANZSOC.Division")]), 
          right=FALSE, row.names=FALSE)
     Abbrev        ANZSOC.Division                                                  
     Acts          Acts Intended to Cause Injury                                    
     Dangerous     Dangerous or Negligent Acts Endangering Persons                  
     Illicit       Illicit Drug Offences                                            
     Offences      Offences Against Justice Procedures, Govt Sec and Govt Ops       
     Property      Property Damage and Environmental Pollution                      
     Public        Public Order Offences                                            
     Robbery       Robbery, Extortion and Related Offences                          
     Theft         Theft and Related Offences                                       
     Traffic       Traffic and Vehicle Regulatory Offences                          
     Prohibited    Prohibited and Regulated Weapons and Explosives Offences         
     Fraud         Fraud, Deception and Related Offences                            
     Sexual        Sexual Assault and Related Offences                              
     Unlawful      Unlawful Entry With Intent/Burglary, Break and Enter             
     Abduction     Abduction, Harassment and Other Related Offences Against a Person
     Miscellaneous Miscellaneous Offences                                           
     Homicide      Homicide and Related Offences                                    

    The next code produces the facetted line plot. We can see several differences between types of crime: some crimes are much more common than others (e.g., Theft vs Homicide); some crimes are reducing at a much greater rate (e.g., Theft); Dangerous Acts Endangering Persons are actually slowing increasing; there is a strange peak in Offences Against Justice in early 2021; only some types show the up-tick from the start of 2022, with the most obvious being Traffic and Vehicle! Overall, there is increased evidence of a recent increase in youth crime, particularly associated with traffic-related crimes.

    gg <- ggplot(youthTrendType) +
        geom_line(aes(Date, Freq)) +
        facet_wrap("Type") 
    gg

  2. The following code adds bars showing the proportion of males versus females (from earlier questions) above the facetted plot. We do this by drawing the ‘ggplot2’ plot within a ‘grid’ viewport to make space for the bars.

    pushViewport(viewport(y=0, height=.9, just="bottom"))
    print(gg, newpage=FALSE)
    popViewport()
    pushViewport(viewport(y=1, height=.1, just="top"))
    sexBar(youthSex, height=1)
    popViewport()

  3. The following code uses ‘gggrid’ to draw a ‘grid’ rectangle within each panel (to highlight the right-hand end of each time series).

    library(gggrid)    

    This just requires providing grid_panel() with a grob to draw. The rectangle position is just 85% across the panel, so it does not align exactly with 2022-01-01.

    highlight <- rectGrob(.85, width=.15, just="left",
                          gp=gpar(col=NA, fill=adjustcolor(4, alpha=.5)))
    ggplot(youthTrendType) +
        grid_panel(highlight, aes(x=Date)) +
        geom_line(aes(Date, Freq)) +
        facet_wrap("Type")    

  4. The following code repeats the previous question, but only draws a rectangle that bounds the data from 2022-01-01 onwards. This requires passing a function to grid_panel() so that a different rectangle can be drawn in each panel.

    This does more effectively highlight the relevant data, almost making a mini-plot within each panel. That in turn makes the recent trend in Theft look more significant (although it is a jump followed by a plateau). The trend in Traffic still looks strongest.

    highlightDate <- function(data, coords) {
        subset <- coords[data$x >= as.Date("2022-01-01"), ]
        rectGrob(x=min(subset$x),
                 width=diff(range(subset$x)),
                 y=min(subset$y),
                 height=diff(range(subset$y)),
                 just=c("left", "bottom"), 
                 gp=gpar(col=adjustcolor(4, alpha=.5), 
                         fill=adjustcolor(4, alpha=.5)))
    }
    ggplot(youthTrendType) +
        grid_panel(highlightDate, aes(Date, Freq)) +
        geom_line(aes(Date, Freq)) +
        facet_wrap("Type")    

Challenge

  1. The following code produces a line plot of the number of incidents over time, broken down by the type of crime and further broken down by the sex of the offender. A mini bar has been drawn on the right side of each panel that represents the proportion of male versus female offenders within each crime type. Horizontal lines are drawn on these bars at 0.1 to 0.5 in steps of 0.1 to assist with comparisons between panels. Finally, a custom legend has been drawn to the right of the panels using ‘grid’.

    We first prepare the counts that we need.

    youthTrendTypeSex <- as.data.frame(table(youth$Date, youth$Abbrev,
                                             youth$SEX))
    youthTrendTypeSex$Date <- as.Date(youthTrendTypeSex$Var1)
    youthTrendTypeSex$Type <- factor(youthTrendTypeSex$Var2, levels=newlevels)

    Next we define a function that will draw the mini bars in each panel.

    sexGrob <- function(data, coords) {
        counts <- unlist(lapply(split(data$y, data$sex), sum))
        props <- counts/sum(counts)
        bars <- segmentsGrob(1, c(0, props[1]), 1, cumsum(props),
                             gp=gpar(col=unique(data$colour), lwd=5))
        ticks <- segmentsGrob(1, seq(.1, .5, .1), 
                              unit(1, "npc") - unit(1, "mm"), seq(.1, .5, .1),
                              gp=gpar(col="white"))
        gTree(children=gList(bars, ticks))
    }  

    Finally, we draw the facetted plot with the ‘grid’ legend alongside.

    grid.newpage()
    gg <- ggplot(youthTrendTypeSex) +
        grid_panel(sexGrob, aes(y=Freq, sex=Var3, colour=Var3)) +
        geom_line(aes(Date, Freq, colour=Var3)) +
        facet_wrap("Type") +
        theme(legend.position="none")
    pushViewport(viewport(x=0, width=.9, just="left"))
    print(gg, newpage=FALSE)
    popViewport()
    pushViewport(viewport(x=1, width=.1, just="right", height=.5))
    grid.rect(y=c(0, youthSex[1]), height=youthSex, just="bottom",
              width=unit(1, "lines"),
              gp=gpar(col=NA, fill=c("#F8766D", "#00BFC4")))
    grid.text("female", y=unit(2, "mm"), rot=90, just="left")
    grid.text("male", y=unit(1, "npc") - unit(2, "mm"), rot=90, just="right")
    popViewport()

Summary

In this lab we have gained some experience with the ‘grid’ graphics system, drawing shapes, using different coordinate systems, and using viewports to arrange components. We have also combined ‘grid’ output with ‘ggplot2’ output, both by drawing a ‘ggplot2’ plot within a ‘grid’ viewport and by using the ‘gggrid’ package. The latter approach allows us to align the ‘grid’ output with the ‘ggplot2’ output.

We have also learned a little more about youth crime. Overall, males are committing crimes much more than females. Having broken down the data by type of crime, we can see that any recent increases in crime have only occurred in one or two types of crime, one of which is traffic-related crime.