The purpose of this lab is to practice producing data visualisations using the ‘ggplot2’ package and its Grammar of Graphics concepts.
The data set is a CSV file, nzpolice-proceedings.csv,
which was derived from “Dataset 5” of Proceedings
(offender demographics) on the policedata.nz
web site.
We can read the data into an R data frame with
read.csv().
Age.Lower Police.District ANZSOC.Division
1 15 Tasman Acts Intended to Cause Injury
2 20 Auckland City Abduction, Harassment and Other Related Offences Against a Person
3 40 Auckland City Abduction, Harassment and Other Related Offences Against a Person
4 10 Auckland City Acts Intended to Cause Injury
5 15 Auckland City Acts Intended to Cause Injury
6 15 Auckland City Acts Intended to Cause Injury
SEX Date
1 Female 2015-12-01
2 Female 2015-12-01
3 Female 2015-12-01
4 Female 2015-12-01
5 Female 2015-12-01
6 Female 2015-12-01
Each row contains information on a single incident that the Police
handled. The Age.Lower column gives the lower bound of a
5-year age band of the offender, the SEX column gives the
sex of the offender, and the Date column gives the year and
month of the incident (all incidents are marked as occurring on the
first day of the month). There are over 800,000 incidents recorded
between 2014 and 2022.
Our main interest is in trends over time in youth offending (up to age 19), particularly at the end of 2021 and the beginning of 2022.
We are also interested in the a comparison of youth offending versus adult offending and any differences between males and females.
Write R code to produce a bar plot of the number of incidents in each age group.
Identify the geoms and stats and aesthetic mappings that you are using in this plot.
Comment on what this data visualisation tells us about the questions of interest.
Note the detail that each bar is left-aligned with
the lower bound of the age band. Reading the help page
?geom_bar should reveal an argument that will help you to
do that.
The following code creates a table of counts from the data.
Write R code that uses this crimeTab
data frame to produce the same plot as in the previous question.
Identify the geoms and stats and aesthetic mappings that you are using in this plot.
Write R code to produce a stacked bar plot of
the number of incidents in each age group broken down by the sex of the
offender. We are back to using the crime data frame in this
question.
Identify the geoms and stats and aesthetic mappings that you are using in this plot.
Comment on what this data visualisation tells us about the questions of interest.
Write R code to produce three variations on the bar plot from the previous questions that use “dodge”, “identity” and “fill” positioning of the bars.
Comment on what these data visualisations tell us about the questions of interest.
The following code creates a data frame containing the number of incidents in each month.
Write R code to produce a plot of the number of incidents per month.
Identify the geoms and stats and aesthetic mappings that you are using in this plot.
Comment on what this plot tells us about the questions of interest.
Note the scale on the y-axis.
The following code creates a data frame containing the number of incidents per month broken down by the sex of the offender.
crimeTrendSex <- as.data.frame(table(crime$Date, crime$SEX))
crimeTrendSex$Date <- as.Date(crimeTrendSex$Var1)Write R code to produce a plot showing the number of incidents per month with separate lines for males and females.
Identify the geoms and stats and aesthetic mappings that you are using in this plot.
Comment on what this plot tells us about the questions of interest.
Note the colours of the lines and the thickness of
the lines. The blue is the colour 4 in R and the pink is
the colour "pink" in R.
The following code creates a data frame containing the number of incidents per month for each age group.
crimeTrendAge <- as.data.frame(table(crime$Date, crime$Age.Lower))
crimeTrendAge$Date <- as.Date(crimeTrendAge$Var1)
crimeTrendAge$Age <- as.numeric(as.character(crimeTrendAge$Var2))Write R code that produces a plot of the number of incidents per month with a separate line for each age group.
Identify the geoms and stats and aesthetic mappings that you are using in this plot.
Comment on what this plot tells us about the questions of interest.
Write R code to produce a plot of the number of incidents per month, with a different facet for each age group.
Identify the geoms and stats and aesthetic mappings that you are using in this plot.
Comment on what this plot tells us about the questions of interest.
The following code adds a Year column to the
original crime data frame.
Write R code to produce a bar plot of the number of incidents in each age group, with a different facet for each year.
Identify the geoms and stats and aesthetic mappings that you are using in this plot.
Comment on what this plot tells us about the questions of interest.
No marks will be given for this question.
Can you produce the data visualisation below? This shows the number of incidents for different types of crime within each age group. Can you see any interesting features?
Your submission should consist of a knitted R Markdown document, in HTML format, submitted via Canvas.
Your report should include:
Don’t forget to also complete the Canvas Quiz!
Marks will be lost for:
This
work is licensed under a
Creative
Commons Attribution 4.0 International License.