The data set is a CSV file, nzpolice-proceedings.csv,
which was derived from “Dataset 5” of Proceedings
(offender demographics) on the policedata.nz
web site.
We can read the data into an R data frame with
read.csv().
crime <- read.csv("nzpolice-proceedings.csv")
head(crime)
Age.Lower Police.District ANZSOC.Division
1 15 Tasman Acts Intended to Cause Injury
2 20 Auckland City Abduction, Harassment and Other Related Offences Against a Person
3 40 Auckland City Abduction, Harassment and Other Related Offences Against a Person
4 10 Auckland City Acts Intended to Cause Injury
5 15 Auckland City Acts Intended to Cause Injury
6 15 Auckland City Acts Intended to Cause Injury
SEX Date
1 Female 2015-12-01
2 Female 2015-12-01
3 Female 2015-12-01
4 Female 2015-12-01
5 Female 2015-12-01
6 Female 2015-12-01
We will focus on just youth crime, where we define “youth” as aged 15-19 inclusive.
youth <- subset(crime, Age.Lower == 15)
We will look at the proportion of male versus female offenders in this age group and we will look at the trends over time in the number of incidents, broken down by type of crime.
We will begin with just drawing data visualisations entirely using ‘grid’.
library(grid)
The following code reproduces a simple bar visualisation from page 9 of the Youth Justice Indicators Summary Report that shows the proportion of male versus female offenders.
We first generate the proportion of males versus females and define the colours that we will be using.
youthSex <- table(youth$SEX)/nrow(youth)
female <- "#E46C0A"
male <- "#0070C0"
The next code defines a function to draw the bars. We do this so that we can reuse this code easily in other questions. The function takes a vector of two proportions and draws two rectangles the full width of the current viewport and 1cm high. The rectangles have no border, but they are filled with the colours taken from the Youth Justice Report. The text labels are generated from the proportions and drawn 2mm in from either side of the current viewport (and they are bold).
sexBar <- function(props, height=unit(1, "cm")) {
grid.rect(c(0, props[1]), width=props, height=height,
just="left", gp=gpar(col=NA, fill=c(female, male)))
grid.text(paste0("Female ", round(100*props[1]), "%"),
unit(2, "mm"), just="left",
gp=gpar(fontface = "bold"))
grid.text(paste0("Male ", round(100*props[2]), "%"),
unit(1, "npc") - unit(2, "mm"), just="right",
gp=gpar(fontface = "bold"))
}
Finally, we draw the bars using the proportions calculated previously. A grey rectangle is drawn to show the extent of the image (the bars do not cover the entire height of the image). This clearly shows that the majority of offenders are male; the labels allow us to see quite precise values for the split.
grid.newpage()
grid.rect(gp=gpar(col=NA, fill="grey90"))
sexBar(youthSex)
The following code draws the same bars from the previous question, using exactly the same code (to draw the bars), but this time within a viewport that is 2cm narrower than the entire image. Having written a function in the previous question makes it very easy and efficient to use the same code (to draw the bars) in this question. A grey rectangle is drawn to show the extent of the image.
grid.newpage()
grid.rect(gp=gpar(col=NA, fill="grey90"))
pushViewport(viewport(width=unit(1, "npc") - unit(2, "cm")))
sexBar(youthSex)
popViewport()
The following code produces an array of bars that shows the proportion of male versus female offenders broken down by year. We first generate proportions of male versus female offenders per year.
youthSexYear <- t(apply(table(substr(youth$Date, 1, 4), youth$SEX), 1,
function(x) x/sum(x)))
Now we create a viewport with margins (1in to the left and 1cm all
other sides) and a y-scale that goes from 0 to 9. The y-scale will make
it easy to place the per-year bars. We then loop over the years and push
a viewport for each year; we can again reuse the sexBar()
function that we wrote to draw the bars. Having the height
argument
We define a new function, sexYear(), so that we can draw
this data visualisation at different sizes.
sexYear <- function() {
pushViewport(viewport(x=unit(1, "in"),
width=unit(1, "npc") -
unit(1, "in") - unit(1, "cm"),
height=unit(1, "npc") - unit(2, "cm"),
just="left",
yscale=c(0, 9)))
for (i in 1:9) {
pushViewport(viewport(y=unit(i - .5, "native")))
sexBar(youthSexYear[i,], height=.08)
grid.text(rownames(youthSexYear)[i], x=unit(-2, "mm"), just="right")
popViewport()
}
popViewport()
}
Finally, we draw the array of bars. This shows us that the proportion of male versus female offenders has remained quite stable over time, though there is a suggestion of a small trend even more towards males in the last few years.
grid.newpage()
grid.rect(gp=gpar(col=NA, fill="grey90"))
sexYear()
We draw the array of bars at a different size to show that the margins stay the same size and the bars just shrink.
grid.newpage()
grid.rect(gp=gpar(col=NA, fill="grey90"))
sexYear()
We now turn our attention to combining ‘grid’ output with ‘ggplot2’ output. We already know that the number of incidents shows a downward trend over time with a slight suggestion of an increase since the start of 2022 (shown below). We will look in more detail at these trends by breaking crimes down by type of crime.
library(ggplot2)
youthTrend <- as.data.frame(table(youth$Date))
youthTrend$Date <- as.Date(youthTrend$Var1)
ggplot(youthTrend) +
geom_line(aes(Date, Freq))
The following code draws a line plot of the number of incidents over time with a facet for each type of crime. First we generate counts of incidents per month for each type of crime.
youth$Abbrev <-
gsub(",", "",
unlist(lapply(strsplit(as.character(youth$ANZSOC.Division), " "),
function(x) x[1])))
youthTrendType <- as.data.frame(table(youth$Date, youth$Abbrev))
youthTrendType$Date <- as.Date(youthTrendType$Var1)
## Order crime types
types <- table(youth$Abbrev)
newlevels <- names(types)[order(types, decreasing=TRUE)]
youthTrendType$Type <- factor(youthTrendType$Var2, levels=newlevels)
We have abbreviated the crime type labels so that the facet labels are not too long; the output below provides a “legend” to decode facet labels back to full crime type labels.
print(unique(youth[c("Abbrev", "ANZSOC.Division")]),
right=FALSE, row.names=FALSE)
Abbrev ANZSOC.Division
Acts Acts Intended to Cause Injury
Dangerous Dangerous or Negligent Acts Endangering Persons
Illicit Illicit Drug Offences
Offences Offences Against Justice Procedures, Govt Sec and Govt Ops
Property Property Damage and Environmental Pollution
Public Public Order Offences
Robbery Robbery, Extortion and Related Offences
Theft Theft and Related Offences
Traffic Traffic and Vehicle Regulatory Offences
Prohibited Prohibited and Regulated Weapons and Explosives Offences
Fraud Fraud, Deception and Related Offences
Sexual Sexual Assault and Related Offences
Unlawful Unlawful Entry With Intent/Burglary, Break and Enter
Abduction Abduction, Harassment and Other Related Offences Against a Person
Miscellaneous Miscellaneous Offences
Homicide Homicide and Related Offences
The next code produces the facetted line plot. We can see several differences between types of crime: some crimes are much more common than others (e.g., Theft vs Homicide); some crimes are reducing at a much greater rate (e.g., Theft); Dangerous Acts Endangering Persons are actually slowing increasing; there is a strange peak in Offences Against Justice in early 2021; only some types show the up-tick from the start of 2022, with the most obvious being Traffic and Vehicle! Overall, there is increased evidence of a recent increase in youth crime, particularly associated with traffic-related crimes.
gg <- ggplot(youthTrendType) +
geom_line(aes(Date, Freq)) +
facet_wrap("Type")
gg
The following code adds bars showing the proportion of males versus females (from earlier questions) above the facetted plot. We do this by drawing the ‘ggplot2’ plot within a ‘grid’ viewport to make space for the bars.
pushViewport(viewport(y=0, height=.9, just="bottom"))
print(gg, newpage=FALSE)
popViewport()
pushViewport(viewport(y=1, height=.1, just="top"))
sexBar(youthSex, height=1)
popViewport()
The following code uses ‘gggrid’ to draw a ‘grid’ rectangle within each panel (to highlight the right-hand end of each time series).
library(gggrid)
This just requires providing grid_panel() with a grob to
draw. The rectangle position is just 85% across the panel, so it does
not align exactly with 2022-01-01.
highlight <- rectGrob(.85, width=.15, just="left",
gp=gpar(col=NA, fill=adjustcolor(4, alpha=.5)))
ggplot(youthTrendType) +
grid_panel(highlight, aes(x=Date)) +
geom_line(aes(Date, Freq)) +
facet_wrap("Type")
The following code repeats the previous question, but only draws
a rectangle that bounds the data from 2022-01-01 onwards. This requires
passing a function to grid_panel() so that a different
rectangle can be drawn in each panel.
This does more effectively highlight the relevant data, almost making a mini-plot within each panel. That in turn makes the recent trend in Theft look more significant (although it is a jump followed by a plateau). The trend in Traffic still looks strongest.
highlightDate <- function(data, coords) {
subset <- coords[data$x >= as.Date("2022-01-01"), ]
rectGrob(x=min(subset$x),
width=diff(range(subset$x)),
y=min(subset$y),
height=diff(range(subset$y)),
just=c("left", "bottom"),
gp=gpar(col=adjustcolor(4, alpha=.5),
fill=adjustcolor(4, alpha=.5)))
}
ggplot(youthTrendType) +
grid_panel(highlightDate, aes(Date, Freq)) +
geom_line(aes(Date, Freq)) +
facet_wrap("Type")
The following code produces a line plot of the number of incidents over time, broken down by the type of crime and further broken down by the sex of the offender. A mini bar has been drawn on the right side of each panel that represents the proportion of male versus female offenders within each crime type. Horizontal lines are drawn on these bars at 0.1 to 0.5 in steps of 0.1 to assist with comparisons between panels. Finally, a custom legend has been drawn to the right of the panels using ‘grid’.
We first prepare the counts that we need.
youthTrendTypeSex <- as.data.frame(table(youth$Date, youth$Abbrev,
youth$SEX))
youthTrendTypeSex$Date <- as.Date(youthTrendTypeSex$Var1)
youthTrendTypeSex$Type <- factor(youthTrendTypeSex$Var2, levels=newlevels)
Next we define a function that will draw the mini bars in each panel.
sexGrob <- function(data, coords) {
counts <- unlist(lapply(split(data$y, data$sex), sum))
props <- counts/sum(counts)
bars <- segmentsGrob(1, c(0, props[1]), 1, cumsum(props),
gp=gpar(col=unique(data$colour), lwd=5))
ticks <- segmentsGrob(1, seq(.1, .5, .1),
unit(1, "npc") - unit(1, "mm"), seq(.1, .5, .1),
gp=gpar(col="white"))
gTree(children=gList(bars, ticks))
}
Finally, we draw the facetted plot with the ‘grid’ legend alongside.
grid.newpage()
gg <- ggplot(youthTrendTypeSex) +
grid_panel(sexGrob, aes(y=Freq, sex=Var3, colour=Var3)) +
geom_line(aes(Date, Freq, colour=Var3)) +
facet_wrap("Type") +
theme(legend.position="none")
pushViewport(viewport(x=0, width=.9, just="left"))
print(gg, newpage=FALSE)
popViewport()
pushViewport(viewport(x=1, width=.1, just="right", height=.5))
grid.rect(y=c(0, youthSex[1]), height=youthSex, just="bottom",
width=unit(1, "lines"),
gp=gpar(col=NA, fill=c("#F8766D", "#00BFC4")))
grid.text("female", y=unit(2, "mm"), rot=90, just="left")
grid.text("male", y=unit(1, "npc") - unit(2, "mm"), rot=90, just="right")
popViewport()
In this lab we have gained some experience with the ‘grid’ graphics system, drawing shapes, using different coordinate systems, and using viewports to arrange components. We have also combined ‘grid’ output with ‘ggplot2’ output, both by drawing a ‘ggplot2’ plot within a ‘grid’ viewport and by using the ‘gggrid’ package. The latter approach allows us to align the ‘grid’ output with the ‘ggplot2’ output.
We have also learned a little more about youth crime. Overall, males are committing crimes much more than females. Having broken down the data by type of crime, we can see that any recent increases in crime have only occurred in one or two types of crime, one of which is traffic-related crime.