The Data Set

The data set is a CSV file, nzpolice-proceedings.csv, which was derived from “Dataset 5” of Proceedings (offender demographics) on the policedata.nz web site.

We can read the data into an R data frame with read.csv().

crime <- read.csv("nzpolice-proceedings.csv")
crime$Month <- as.Date(crime$Date)
crime$Year <- as.POSIXlt(crime$Date)$year + 1900
typeCount <- table(crime$ANZSOC.Division)
crime$Type <- factor(crime$ANZSOC.Division,
                     levels=names(typeCount)[order(typeCount)])

For this lab we will drop the year 2014 (for which we only have partial data).

crime <- subset(crime, Year >= 2015)

Some questions will only look at data for youth crime.

youth <- subset(crime, Age.Lower <= 15)

Questions of Interest

In this lab we are interested in a variety of questions (most data visualisations will only be relevant to one of these):

Data Visualisations

library(grid)
library(ggplot2)
library(magick)
Linking to ImageMagick 6.9.10.23
Enabled features: fontconfig, freetype, fftw, lcms, pango, webp, x11
Disabled features: cairo, ghostscript, heic, raw, rsvg
Using 12 threads
library(gganimate)
  1. The following code generates an animation of the proportion of female versus male youth crimes using the ‘magick’ package.

    We can see that the proportions remain pretty stable over time, only dropping (fewer females) towards the end (in 2022).

    The code to draw the bar is copied from Lab 2; the only addition within each frame is the text label that shows the changing month.

    youthSexMonth <- t(apply(table(youth$Month, youth$SEX), 1, 
                             function(x) x/sum(x)))
    female <- "#E46C0A"
    male <- "#0070C0"
    library(grid)
    sexBar <- function(props, height=unit(1, "cm")) {
        grid.rect(c(0, props[1]), width=props, height=height,
                  just="left", gp=gpar(col=NA, fill=c(female, male)))
        grid.text(paste0("Female ", round(100*props[1]), "%"),
                  unit(2, "mm"), just="left",
                  gp=gpar(fontface = "bold"))
        grid.text(paste0("Male ", round(100*props[2]), "%"),
                  unit(1, "npc") - unit(2, "mm"), just="right",
                  gp=gpar(fontface = "bold"))
    }
    drawFrame <- function(i) {
        grid.newpage()
        pushViewport(viewport(y=.5, height=unit(1, "cm"), just="top"))
        sexBar(youthSexMonth[i, ])
        popViewport()
        grid.text(rownames(youthSexMonth)[i], 
                  y=unit(1, "npc") - unit(1, "lines"))
    }
    library(magick)
    img <- image_graph(400, (2/2.54)*100, res = 100)
    for (i in 1:nrow(youthSexMonth)) {
        drawFrame(i)
    }
    dev.off()
    image_animate(img, fps = 10, optimize = TRUE, loop = 1)

  2. The following code produces a ‘gganimate’ version of the previous animation.

    There are several important details: we have to specify a constant value for x to get a single bar and we have to set state_length to avoid a pause at each month. There is also a specific syntax for adding the title that shows the month changing.

    longYouthSexMonth <- reshape2::melt(youthSexMonth)
    names(longYouthSexMonth) <- c("Month", "Sex", "Prop")
    library(gganimate)
    gg <- ggplot(longYouthSexMonth) +
        geom_col(aes(x=1, y=Prop, fill=Sex)) +
        coord_flip() + 
        labs(title="{closest_state}") +
        transition_states(Month, state_length=0)
    animate(gg)

  3. The following code produces an animation of the number of incidents for each type of crime, per year.

    The main feature is the drop overall in crimes over time, particularly in 2021 and 2022. The effect is made quite obvious by seeing the bars drawn at the same location as the preceding year.

    An important detail in this code is the use of group=Year. This means that the bars in each state we transition between are separate from the bars in all other states, which means that the bars in one year “exit” before the next year and the next set of bars “enters”. Without that grouping, there is no exit/entry to control.

    gg <- ggplot(crime) +
        geom_bar(aes(Type, group=Year)) +
        coord_flip() +
        labs(title="{closest_state}") +
        xlab("") +
        theme(axis.title.x=element_blank()) +
        exit_fade() + enter_fade() +
        transition_states(Year)
    animate(gg)

  4. The following code produces a line plot of the number of crimes per year for different types of crime, with two of the crimes highlighted by drawing a line and point that move as the years change.

    We can clearly see the overall decrease in number of crimes, although the steep decline in 2021/2022 is perhaps less obvious. The animation is very effective at highlighting the Public Order and Dangerous Acts lines.

    The space for the horizontal crime type labels is created by hand-tuning the x-axis scale.

    I could only get this to work using the ‘magick’ approach (because of the combination of point, which is a “state” transition, and line, which is a “reveal” transition).

    crimeYearType <- as.data.frame(table(crime$Year, crime$ANZSOC.Division))
    crimeYearType$Year <- as.numeric(as.character(crimeYearType$Var1))
    crimeYearType$Type <- crimeYearType$Var2
    lastCount <- subset(crimeYearType, Year == 2022)
    public <- crimeYearType$Type == "Public Order Offences"
    dangerous <- 
        crimeYearType$Type == "Dangerous or Negligent Acts Endangering Persons"
    cols <- colorspace::desaturate(scales::hue_pal()(16), .5)
    drawFrame <- function(i) {
        print(ggplot(crimeYearType) + 
                  geom_line(aes(Year, Freq, colour=Type, group=Type)) +
                  geom_line(data=subset(crimeYearType, public & Year <= i),
                            aes(Year, Freq, colour=Type), linewidth=1) +
                  geom_line(data=subset(crimeYearType, dangerous & Year <= i),
                            aes(Year, Freq, colour=Type), linewidth=1) +
                  geom_point(data=subset(crimeYearType, public & Year == i),
                             aes(Year, Freq, colour=Type), size=2) +
                  geom_point(data=subset(crimeYearType, dangerous & Year == i),
                             aes(Year, Freq, colour=Type), size=2) +
                  geom_text(data=lastCount, x=2022,
                            aes(label=Type, y=Freq, colour=Type), 
                            hjust=0, size=3) +
                  scale_colour_manual(values=cols) +
                  scale_x_continuous(limits=c(2015, 2030)) +
                  theme(legend.position="none"))
    }
    library(magick)
    img <- image_graph(800, (3/2.54)*400, res = 100)
    for (i in sort(unique(crimeYearType$Year))) {
        drawFrame(i)
    }
    dev.off()
    image_animate(img, fps = 1, optimize = TRUE)

    These are the best results I could get from ‘gganimate’; I can get red lines and I can get red points, but NOT both at once (requires transition_states and transition_reveal at the same time).

    crimeYearType2 <- subset(crimeYearType, Type == "Public Order Offences")
    crimeYearType2$Year2 <- crimeYearType2$Year
    ggplot(crimeYearType) + 
        geom_line(aes(Year, Freq, group=Type, colour=Type)) +
        geom_line(data=crimeYearType2, 
                  aes(Year2, Freq, group=Type, colour=Type),
                  linewidth=1) +
        geom_text(data=lastCount, x=2022,
                  aes(label=Type, y=Freq, colour=Type), 
                  hjust=0, size=3) +
        scale_colour_manual(values=cols) +
        scale_x_continuous(limits=c(2015, 2030)) +
        theme(legend.position="none") +
        transition_reveal(Year2)

    crimeYearType3 <- subset(crimeYearType, Type == "Public Order Offences")
    crimeYearType3$Year3 <- crimeYearType3$Year
    ggplot(crimeYearType) + 
        geom_line(aes(Year, Freq, group=Type, colour=Type)) +
        geom_point(data=crimeYearType3, 
                   aes(Year3, Freq, group=Year3, colour=Type)) +
        geom_text(data=lastCount, x=2022,
                  aes(label=Type, y=Freq, colour=Type), 
                  hjust=0, size=3) +
        scale_colour_manual(values=cols) +
        scale_x_continuous(limits=c(2015, 2030)) +
        theme(legend.position="none") +
        transition_states(Year3)

  5. The following code produces a static plot of the number of incidents for each age group, for each type of crime and for each year.

    This is our first view of the change over time for less frequent crimes. We can see the the decrease in crimes over time only occurs for some crimes. In fact there are some crimes where there is a significant increase (e.g., Miscellaneous Offences). We have presumably not seen this before because these are less common crimes (so they have not been visible and the decreases in the more common crimes have overwhelmed these smaller increases in the overall counts).

    An important detail is that we have specified a border colour for the bars so that they are visible.

    ggplot(crime) +
        geom_bar(aes(Age.Lower, fill=Type, group=Year), 
                 colour="black", linewidth=.1) + 
        facet_grid(Type ~ Year, scales="free_y") +
        theme(legend.position="none",
              panel.grid=element_blank(),
              axis.title.y=element_blank(),
              axis.text.x=element_blank(),
              axis.ticks.x=element_blank(),
              strip.text.y=element_text(angle=0, size=8, hjust=0))

  6. The following code produces an animated version of the previous question, with transitions between years.

    It may be easier to see that some crime types actually increase in frequency in some years (we have not seen this previously because they are just not very common crimes). The horizontal shift (to the right) in the peak for some crimes is also more visible in this animation (e.g., Prohibited and Regulated Weapons). This may be a result of both the animation (items changing in place) and the extra space we get by not showing all years at once.

    gg <- ggplot(crime) +
        geom_bar(aes(Age.Lower, group=Year, fill=Type), colour="black") + 
        facet_grid("Type", scales="free_y") +
        labs(title="Year = {previous_state}") +
        theme(aspect.ratio=.25, 
              panel.spacing.y=unit(0, "mm"),
              legend.position="none",
              panel.grid=element_blank(),
              axis.title.y=element_blank(),
              strip.text.y=element_text(angle=0, size=8, hjust=0,
                                        margin=margin(0, 10, 0, 10))) +
        exit_fade() + enter_fade() +
        transition_states(Year)
    gg

Challenge

  1. The following code produces an animated version of the complex static plot, with age groups spread over time.
    This does not provide any further insight and is much worse in some ways because the missing values mean that some of the bars appear to jump around all over the place. The one benefit is labelling of types, but otherwise I am not happy.

    NOTE that it is also quite hard to hold the number of facets constant.

    ages <- split(crime, crime$Age.Lower)
    ## Try to make sure that every crime type appears in every split
    ## (so that number of facets does not change)
    ages <- lapply(ages, 
                   function(x) { 
                       rbind(x, 
                             data.frame(Age.Lower=x$Age.Lower[1], 
                                        Police.District=NA,
                                        ANZSOC.Division=NA,
                                        SEX=NA,
                                        Date=NA,
                                        Month=as.Date("2015-01-01"),
                                        Year=NA,
                                        Type=levels(crime$Type)))
                   })
    drawFrame <- function(i) {
        print(
        ggplot(ages[[i]]) + 
            geom_bar(aes(Month, colour=SEX, fill=SEX), 
                     position="fill") +
            facet_grid("Type", drop=FALSE) + 
            labs(title=names(ages)[i]) +
            scale_x_date(limits=range(crime$Month, na.rm=TRUE)) +
            theme(legend.position="none",
                  panel.grid=element_blank(),
                  axis.title.y=element_blank(),
                  axis.text.y=element_blank(),
                  axis.ticks.y=element_blank(),
                  strip.text.y=element_text(angle=0, size=8, hjust=0)) 
        )
    }   
    img <- image_graph(800, (3/2.54)*600, res = 100)
    for (i in seq_along(ages)) {
        drawFrame(i)
    }
    dev.off()
    image_animate(img, fps = 2, optimize = TRUE)    

Summary

In this lab we have demonstrated the generation of animations with both low-level, ‘magick’, and high-level, ‘gganimate’, approaches. We have seen some small examples of animation providing a better view of some features of the data (with bars changing in place). There is also the advantage of being able to see more data in less physical space, by using time to show one of the variables.

In terms of the data set, we have seen some detailed features that we have not seen before, such as the increase in crimes over time for some of the rare types of crime. This is partly due to viewing more variables at once (and partly due to using “free” y-axis scales).