Maps


The purpose of this topic is to look at visualising data with a geographic component, which means drawing maps. In addition to drawing the map itself, our challenge will be to add representations of data values to the map.

The main advantage of using maps is that we can represent geographic locations very effectively, because the representation of a geographic location is less abstract (it corresponds to the actual physical location) and because shapes and arrangements of map regions are often very familiar to most viewers. Maps also allow us to represent specific categorical variables like country, state, city, suburb, etc, simultaneously, in a single cartesian coordinate system.

The main challenge when using maps is to decide how to accurately represent non-geographic variables, like population, average income, etc. The problem is that the best “visual channels”, like “position”, are already taken by the map regions, so we are left with less effective visual channels like colour.

There are also technical challenges involved with producing maps, such as map data formats, merging map data with non-geographic data, and map projections. We will see examples of some of these issues, but they will largely be handled for you in this course.

Introduction

We will use ‘ggplot2’ to draw our maps, but the functions that we use will depend on the format of the map data and how accurate we want the map to be.

When the map data is in a very simple format, and we are not concerned about the accuracy of the map, we can just use geom_polygon() and coord_quickmap(). For example, the following code uses simple map data from the ‘maps’ package to draw the outline of New Zealand.

library(ggplot2)
nz <- map_data("nz")
ggplot(nz) +
    geom_polygon(aes(long, lat, group = group), 
                 colour = "black", fill = "white") +
    coord_quickmap()

However, if the map data format is more complex, for example “shape files” or a “geodatabase”, and/or we desire greater map accuracy, we need to use geom_sf() and coord_sf(). The example below works with a set of shape files for the New Zealand coastline from Stats NZ.

The read_sf() function from the ‘sf’ package can handle these more complex map formats and creates a “simple features” data frame, which behaves a lot like a normal data frame, but has a column that contains map data.

library(sf)
nz <- read_sf(file.path(path, "coastline"))
nz[order(nz$name),]
## Simple feature collection with 9133 features and 7 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 165.869 ymin: -52.62088 xmax: 183.8457 ymax: -29.23134
## Geodetic CRS:  NZGD2000
## # A tibble: 9,133 × 8
##    name           macronated grp_macron TARGET_FID grp_ascii grp_name name_ascii
##    <chr>          <chr>      <chr>           <dbl> <chr>     <chr>    <chr>     
##  1 Abbey Rocks    N          N                3948 <NA>      <NA>     Abbey Roc…
##  2 Adams Island   N          N                   0 Auckland… Aucklan… Adams Isl…
##  3 Adams Rocks    N          N                   0 Auckland… Aucklan… Adams Roc…
##  4 Ahoroa Rock    N          N                7067 <NA>      <NA>     Ahoroa Ro…
##  5 Aiguilles Isl… N          N                6629 <NA>      <NA>     Aiguilles…
##  6 Albert Stack   N          N                   0 Snares I… Snares … Albert St…
##  7 Alhambra Rock  N          N                3889 <NA>      <NA>     Alhambra …
##  8 Allports Isla… N          N                4367 <NA>      <NA>     Allports …
##  9 Amerikiwhati … N          N                4396 <NA>      <NA>     Amerikiwh…
## 10 Amherst Rock   N          N                   0 Auckland… Aucklan… Amherst R…
## # ℹ 9,123 more rows
## # ℹ 1 more variable: geometry <MULTIPOLYGON [°]>

The following code draws a map of the New Zealand coastline using this data.

ggplot(nz) + 
    geom_sf() +
    coord_sf()

Official map data of the sort we are working with here will often provide more information than we need, but the familiar tools can be used to select or filter subsets of interest. For example, the following code eliminates the thousands of tiny islands that we are not interested in.

library(dplyr)
nzmain <- nz %>% 
    filter(grepl("(North|South|Stewart) Island", name))
ggplot(nzmain) + 
    geom_sf()

Geographic databases can contain information about other sorts of features besides coastlines. For example, where the coastline defines closed areas, a road defines a series of lines. We can add multiple geom_sf() layers to draw multiple features and the geom_sf_label() function can be used to draw text labels. If we have point data, like a city location, we can use the more familiar geom_point() to add dots.

highway1 <- read_sf(file.path(path, "roads")) %>%
    filter(hway_num == "1")
cities <- read_sf(file.path(path, "cities")) %>%
    filter(grepl("(Auckland|Wellington|Christchurch) City", name))
ggplot(nzmain) + 
    geom_sf(fill = "white", lwd = .1) +
    geom_sf(data = highway1) +
    geom_sf_label(data = cities, aes(label = name), size = 2)

Besides the map itself, we will typically want to add data values to a map as well. When the data values correspond to a region, a common solution is to fill the region with a colour representing the data value. This produces a choropleth map.

For example, the following code reads a Geodatabase map of electorate boundaries for New Zealand and a CSV file of data that records which political party won each electorate in 2020. It then uses dplyr::inner_join() to combine the two data sets so that we have both map data and statistical data in one object.

electorates <- read_sf("nzmap/electorates/general-electorates-2020.gdb")
winners <- read.csv("nzmap/winners.csv")
election <- inner_join(electorates, winners,
                       by = c("GED2020_V1_00_NAME" = "Electorate"))

We can now draw a map and specify an aesthetic mapping of fill colour to the electorate winner.

ggplot(nzmain) +  
    geom_sf(lwd = .1, fill = "white") +
    geom_sf(data = election, aes(fill = Winner), alpha = .5) +
    scale_fill_manual(values = c(1, 3, 2, 4)) +
    coord_sf(xlim = c(166, 179))

The above technique can be effective when we want to represent a categorical data value and there are only a few different categories. It can be less effective when we want to represent a continuous variable because, as we know from the Perception Topic, colour is not the best visual channel for this purpose.

For example, the following code reads a different CSV that records what percent of each electorate voted for Labour (in the party vote). We can still map the data to fill colour, but features may be harder to see.

labourVote <- read.csv("nzmap/election-results.csv")[, 1:2]
percentLabour <- inner_join(electorates, labourVote,
                            by = c("GED2020_V1_00_NAME" = "Electorate"))
ggplot(nzmain) +  
    geom_sf(lwd = .1, fill = "white") +
    geom_sf(data = percentLabour, aes(fill = Labour)) +
    scale_fill_gradient(low = hcl(0, 40, 40), high = hcl(0, 80, 80)) +
    coord_sf(xlim = c(166, 179))

When we have more than one variable to represent on the map, things just get harder. Some solutions we can try are the techniques that we have learned in other topics, like facetting, animation, and interaction.

Software

We will use the ‘ggplot2’ package for drawing maps in R because it allows us to build on our knowledge of ‘ggplot2’ for other data visualisations.

There are several alternatives, e.g., the ‘tmap’ package, including packages that produce interactive maps, e.g., ‘leaflet’, though they require learning a different style of interface for describing the data and how it should be added to a map.

Reading

  • The Maps chapter in “ggplot2: Elegant Graphics for Data Analysis” by Hadley Wickham, Danielle Navarro, and Thomas Lin Pedersen.

    This provides the most up to date advice for writing ‘ggplot2’ code to produce a map.

    We will not be looking at “Raster maps” so you can ignore Section 6.5.

  • Chapter 15 of “Fundamentals of Data Visualization” by Claus O. Wilke.

    This describes some of the basic concepts and challenges of visualising data with a geographic element.

Bibliography


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.