Lab 8: Maps


The purpose of this lab is to practise generating maps and, in particular, adding representations of data values to a map.

The Data Set

In this lab, we have more code than usual for setting up the data set. This is because we need data about the map we want to draw (the outline of New Zealand Police regions) in addition to the crime data that we have been using throughout the course. We then need to create data sets that combine the map data and the crime data.

Crime data

The data set is a CSV file, nzpolice-proceedings.csv, which was derived from “Dataset 5” of Proceedings (offender demographics) on the policedata.nz web site.

We can read the data into an R data frame with read.csv().

crime <- read.csv("nzpolice-proceedings.csv")

The following code generates a column of real dates, generates a Year column, and makes a tweak to the Police.District column (which will be useful later when we merge this crime data with the map outline data).

crime$Month <- as.Date(crime$Date)
crime$Year <- as.POSIXlt(crime$Date)$year + 1900
crime$Police.District <- gsub("Of", "of", crime$Police.District)

For this lab we will drop the year 2014 (for which we only have partial data).

crime <- subset(crime, Year >= 2015)

The following code generates total counts per Police District.

library(dplyr)
crimePerDistrict <- count(crime, Police.District)

The following code generates total counts for each year per Police District.

crimeYearPerDistrict <- count(crime, Police.District, Year)

The following code generates total counts for each type of crime per Police District.

crimeTypePerDistrict <- count(crime, Police.District, ANZSOC.Division)

Map data

Map data for the Police Districts was obtained from (Koordinates)[https://koordinates.com/layer/105480-nz-police-district-boundaries-29-april-2021/].

library(sf)
districts <- st_read("nz-police-district-boundaries-29-april-2021.shp")
Reading layer `nz-police-district-boundaries-29-april-2021' from data source 
  `/home/pmur002/Uni of Auckland Dropbox/r-project/Files/Teaching/STATS787/2025/Labs/nz-police-district-boundaries-29-april-2021.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 12 features and 3 fields
Geometry type: MULTIPOLYGON
Dimension:     XYZ
Bounding box:  xmin: 1067061 ymin: 4701317 xmax: 2114868 ymax: 6242140
z_range:       zmin: 0 zmax: 0
Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000

The following code adds centroids per region (as X and Y).

centroids <- st_coordinates(st_centroid(st_geometry(districts)))
districts$X <- centroids[,1]
districts$Y <- centroids[,2]

Combined data

The following code combines the map data with the different crime counts.

crimeDistricts <- inner_join(districts, crimePerDistrict,
                             by=join_by(DISTRICT_N == Police.District))

crimeTypeDistricts <- inner_join(districts, crimeTypePerDistrict,
                                 by=join_by(DISTRICT_N == Police.District))

crimeYearDistricts <- inner_join(districts, crimeYearPerDistrict,
                                 by=join_by(DISTRICT_N == Police.District))

Questions of Interest

Each data visualisation in this lab will address at least one of the following questions:

  • Which Police Districts have the most incidents?
  • How does the number of incidents change over time in each District?
  • What types of crime are more common than others in each District?
  • Are there obvious differences between North Island and South Island crime?

Data Visualisations

  1. Write R code to produce a map of the New Zealand Police Districts, with a label for each district.

    HINT: I used hjust to shift the labels out of each others way.

  2. Write R code to produce a map of the New Zealand Police Districts, with each region coloured to represent the number of incidents in the region.

    Comment on what this map tells us about the questions of interest. Are the visual channels used in this data visualisation helping or hindering us?

    Comment on the major substantive problem with this map (hint: we read about substantive problems in week 1).

  3. Write R code to produce a map of the New Zealand Police Districts, with a dot within each region and the area of the dot representing the number of incidents.

    NOTE: the dots are semitransparent.

    HINT: the dots are drawn at the centroids of the regions.

    Comment on what this map tells us about the questions of interest. Does the different visual channel help with answering the questions?

  4. Write R code to produce an animated map that shows the number of incidents in each region over time (one frame per year).

    NOTE: that there is a year label above the map.

    Comment on what this map tells us about the questions of interest.

  5. Write R code to produce the data visualisation below of the number of incidents per region over time.

    Comment on what this data visualisation tells us about the questions of interest. Are there features that are easier to see in this plot versus the animated map? Are there features that are easier to see in the animated map versus this plot?

  6. Write R code to produce a facetted plot of regions with one facet for each type of crime.

    NOTE: there is no legend, there are no axis ticks or labels, and the strip labels are left-aligned.

    Comment on what this map tells us about the questions of interest. What could we do to the colour scale to improve the effectiveness of this data visualisation?

  7. Write R code to produce a non-map data visualisation that shows the number of incidents in each region for each type of crime.

    Comment on whether it is easier or harder to answer the questions of interest with this data visualisation compared to the previous question and explain why.

Challenge

  1. No marks will be given for this question.

    Write R code to produce a map of Police Districts with a simple embedded line plot for each region that shows the number of incidents over time for each region.

    Is this a better data visualisation than the ones in Questions 4 and 5? What visual channels are we employing here?

The Report

Your submission should consist of a knitted R Markdown document, in HTML format, submitted via Canvas.

Your report should include:

  • A brief description of the data and the question we are trying to answer.
  • For each data visualisation, R code AND a brief text commentary.
  • A brief overall summary.

Don’t forget to also complete the Canvas Quiz!

Marking

Marks will be lost for:

  • Plagiarism.
  • Section of the report is missing.
  • The summary is too short or does not make sense.
  • Significantly poor R (or other) code.
  • Overly verbose code, output, or commentary.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.