This lab is broken into two sections, requiring TWO submissions: a non-shiny part and a shiny part. This is the non-shiny part.
The data set is a CSV file, nzpolice-proceedings.csv,
which was derived from “Dataset 5” of Proceedings
(offender demographics) on the policedata.nz
web site.
We can read the data into an R data frame with
read.csv().
crime <- read.csv("nzpolice-proceedings.csv")
crime$Month <- as.Date(crime$Date)
crime$Year <- as.POSIXlt(crime$Date)$year + 1900
typeCount <- table(crime$ANZSOC.Division)
crime$Type <- factor(crime$ANZSOC.Division,
levels=names(typeCount)[order(typeCount)])
For this lab we will drop the year 2014 (for which we only have partial data).
crime <- subset(crime, Year >= 2015)
Some questions will use the “raw” crime data above, with
one row per incident, and some questions will use the table-of-counts
version of the data below, with one row per combination of crime type
and month.
counts <- as.data.frame(table(crime$Type, crime$Month))
names(counts) <- c("Type", "Month", "Freq")
counts$Month <- as.Date(counts$Month)
counts$Abbrev <- counts$Type
levels(counts$Abbrev) <- sub("(.+?)(,|and|With|Offences|Endangering)(.+)",
"\\1", levels(counts$Abbrev))
For each data visualisation in this Lab, we will be interested in answering the following question:
For specific data visualisations there may be additional specific questions of interest.
library(plotly)
The following code produces an interactive plot of crime frequencies over time for different types of crime.
The abbreviated type labels are used in the legend to save space, but
the text aesthetic is used to add the full type labels to
the tooltips.
gg <- ggplot(counts, aes(Month, Freq)) +
geom_line(aes(colour=Abbrev, group=Abbrev, text=Type))
ggplotly(gg)
Zooming and panning was used to explore the trends in less frequent crimes. A snapshot view is shown below.
Interactions with the legend were used to isolate the trends for Public Order Offences and Dangerous or Negligent Acts. A snapshot view is shown below.
Tooltips were used to identify that the sudden dip in Dangerous or Negligent Acts occurred in April 2020.
The following interesting features and comparisons in trends were identified:
In this section we make use of ‘plotscaper’ to generate linked plots.
library(plotscaper)
For this question we will limit the exploration to just Jan 2021 onwards (to limit the data size).
crimeRecent <- subset(crime, Year >= 2021)
The following code produces three (linked) interactive bar plots.
layout <- matrix(c(1:3, 3), byrow=TRUE, ncol=2)
schema <- create_schema(crimeRecent)
bar <- add_barplot(schema, "Type")
bar2 <- add_barplot(bar, "Age.Lower")
bar3 <- add_barplot(bar2, "Date")
plot <- set_layout(bar3, layout)
render(plot, width=800, height=600)