4 Maps
There are numerous ways to make a map with plotly – each with it’s own strengths and weaknesses. Generally speaking the approaches fall under two categories: integrated or custom. Integrated maps leverage plotly.js’ built-in support for rendering a basemap layer. Currently there are two supported ways of making integrated maps: either via Mapbox or via an integrated d3.js powered basemap. The integrated approach is convenient if you need a quick map and don’t necessarily need sophisticated representations of geo-spatial objects. On the other hand, the custom mapping approach offers complete control since you’re providing all the information necessary to render the geo-spatial object(s). Section 4.2 covers making sophisticated maps (e.g., cartograms) using the sf R package, but it’s also possible to make custom plotly maps via other tools for geo-computing (e.g., sp, ggmap, etc).
4.1 Integrated maps
4.1.1 Overview
If you have fairly simple latitude/longitude data and want to make a quick map, you may want to try one of plotly’s integrated mapping options (i.e., plot_mapbox()
and plot_geo()
). Generally speaking, you can treat these constructor functions as a drop-in replacement for plot_ly()
and get a dynamic basemap rendered behind your data. Furthermore, all the scatter-based layers we learned about in Section 3 work as you’d expect it to with plot_ly()
.11 For example, Figure 4.1 uses plot_mapbox()
and add_markers()
to create a bubble chart:
plot_mapbox(maps::canada.cities) %>%
add_markers(
x = ~long,
y = ~lat,
size = ~pop,
color = ~country.etc,
colors = "Accent",
text = ~paste(name, pop),
hoverinfo = "text"
)
The Mapbox basemap styling is controlled through the layout.mapbox.style
attribute. The plotly package comes with support for 7 different styles, but you can also supply a custom URL to a custom mapbox style. To obtain all the pre-packaged basemap style names, you can grab them from the official plotly.js schema()
:
styles <- schema()$layout$layoutAttributes$mapbox$style$values
styles
#> [1] "basic" "streets"
#> [3] "outdoors" "light"
#> [5] "dark" "satellite"
#> [7] "satellite-streets"
Any one of these values can be used for a mapbox style. Figure 4.2 demonstrates the satellite earth imagery basemap.
layout(
plot_mapbox(),
mapbox = list(style = "satellite")
)
Figure 4.3 demonstrates how to create an integrated plotly.js dropdown menu to control the basemap style via the layout.updatemenus
attribute. The idea behind an integrated plotly.js dropdown is to supply a list of buttons (i.e., menu items) where each button invokes a plotly.js method with some arguments. In this case, each button uses the relayout method to modify the layout.mapbox.style
attribute.12
style_buttons <- lapply(styles, function(s) {
list(label = s, method = "relayout", args = list("mapbox.style", s))
})
layout(
plot_mapbox(),
mapbox = list(style = "dark"),
updatemenus = list(
list(y = 0.8, buttons = style_buttons)
)
)
The other integrated mapping solution in plotly is plot_geo()
. Compared to plot_mapbox()
, this approach has support for different mapping projections, but styling the basemap is limited and can be more cumbersome. Figure 4.4 demonstrates using plot_geo()
in conjunction with add_markers()
and add_segments()
to visualize flight paths within the United States. Whereas plot_mapbox()
is fixed to a mercator projection, the plot_geo()
constructor has a handful of different projection available to it, including the orthographic projection which gives the illusion of the 3D globe.
Click to show code
library(plotly)
library(dplyr)
# airport locations
air <- read.csv('https://raw.githubusercontent.com/plotly/datasets/master/2011_february_us_airport_traffic.csv')
# flights between airports
flights <- read.csv('https://raw.githubusercontent.com/plotly/datasets/master/2011_february_aa_flight_paths.csv')
flights$id <- seq_len(nrow(flights))
# map projection
geo <- list(
projection = list(
type = 'orthographic',
rotation = list(lon = -100, lat = 40, roll = 0)
),
showland = TRUE,
landcolor = toRGB("gray95"),
countrycolor = toRGB("gray80")
)
plot_geo(color = I("red")) %>%
add_markers(
data = air, x = ~long, y = ~lat, text = ~airport,
size = ~cnt, hoverinfo = "text", alpha = 0.5
) %>%
add_segments(
data = group_by(flights, id),
x = ~start_lon, xend = ~end_lon,
y = ~start_lat, yend = ~end_lat,
alpha = 0.3, size = I(1), hoverinfo = "none"
) %>%
layout(geo = geo, showlegend = FALSE)
One nice thing about plot_geo()
is that it automatically projects geometries into the proper coordinate system defined by the map projection. For example, in Figure 4.5 the simple line segment is straight when using plot_mapbox()
yet curved when using plot_geo()
. It’s possible to acheive the same effect using plot_ly()
or plot_mapbox()
, but the relevant marker/line/polygon data has to be put into an sf data structure before rendering (see Section 4.2.1 for more details).
map1 <- plot_mapbox() %>%
add_segments(x = -100, xend = -50, y = 50, yend = 75) %>%
layout(
mapbox = list(
zoom = 0,
center = list(lat = 65, lon = -75)
)
)
map2 <- plot_geo() %>%
add_segments(x = -100, xend = -50, y = 50, yend = 75) %>%
layout(geo = list(projection = list(type = "mercator")))
library(htmltools)
browsable(tagList(map1, map2))
4.1.2 Choropleths
In addition to scatter-based layers, the plot_geo()
constructor also supports a choropleth layer. Figure 4.6 shows the population density of the U.S. via a choropleth, and also layers on markers for the state center locations, using the U.S. state data from the datasets package (R Core Team 2016). By simply providing a z
attribute, plotly_geo()
objects will try to create a choropleth, but you’ll also need to provide locations
and a locationmode
. It’s worth noting that the locationmode
is currently limited to countries and US states, so if you need to a different geo-unit (e.g., counties, muncipalities, etc), you can use the the custom mapping approach discussed in Section 4.2.
density <- state.x77[, "Population"] / state.x77[, "Area"]
g <- list(
scope = 'usa',
projection = list(type = 'albers usa'),
lakecolor = toRGB('white')
)
plot_geo() %>%
add_trace(
z = ~density, text = state.name, span = I(0),
locations = state.abb, locationmode = 'USA-states'
) %>%
layout(geo = g)
Figure 4.6 helps illuminate a problem with choropleths from a graphical perception point of view. We typically use the color in choropleths to encode a numeric variable (e.g., GDP, net exports, average SAT score, etc) and the eye naturally perceives the area that a particular color covers as proportional to its overall effect. This ends up being misleading since the area the color covers typically has no sensible relationship with the data encoded by the color. A classic example of this misleading effect in action is in US election maps – the proportion of red to blue coloring is not representative of the overall popular vote (Newman 2016).
Cartograms are an approach to reducing this misleading effect and grants another dimension to encode data through the size of geo-spatial features. Section 4.2.2 covers how to render cartograms in plotly using sf and cartogram.
4.2 Custom maps
4.2.1 Simple features (sf)
The sf R package is a modern approach to working with geo-spatial data structures based on tidy data principles (Pebesma 2018); (Wickham 2014b). The key idea behind sf is that it stores geo-spatial geometries in a list-column of a data frame. This allows each row to represent the real unit of observation/interest – whether it’s a polygon, multi-polygon, point, line, or even a collection of these features – and as a result, works seamlessly inside larger tidy workflows.13 The sf package itself does not really provide geo-spatial data – it provides the framework and utilties for storing and computing on geo-spatial data structures in an opinionated way.
There are numerous packages for accessing geo-spatial data as simple features data structures. A couple notable examples include rnaturalearth and USAboundaries. The rnaturalearth package is better for obtaining any map data in the world via an API provided by https://www.naturalearthdata.com/ (South 2017). The USAboundaries package is great for obtaining map data for the United States at any point in history (Mullen and Bratt 2018). It doesn’t really matter what tool you use to obtain or create an sf object – once you have one, plot_ly()
knows how to render it:
library(rnaturalearth)
world <- ne_countries(returnclass = "sf")
class(world)
#> [1] "sf" "data.frame"
plot_ly(world, color = I("gray90"), stroke = I("black"), span = I(1))
How does plot_ly()
know how to render the countries? It’s because the geo-spatial features are encoded in special (geometry) list-column. Also, meta-data about the geo-spatial structure are retained as special attributes of the data. Figure 4.8 augments the print method for sf to data frames to demonstrate that all the information needed to render the countries (i.e., polygons) in Figure 4.7 is contained within the world
data frame. Note also, that sf provides special dplyr methods for this special class of data frame so that you can treat data manipulations as if it were a ‘tidy’ data structure. One thing about this method is that the special ‘geometry’ column is always retained – if we try to just select the name
column, then we get both the name and the geometry.
library(sf)
world %>%
select(name) %>%
print(n = 4)
There are actually 4 different ways to render sf objects with plotly: plot_ly()
, plot_mapbox()
, plot_geo()
, and via ggplot2’s geom_sf()
. These functions render multiple polygons using a single trace by default, which is fast, but you may want to leverage the added flexibility of multiple traces. For example, a given trace can only have one fillcolor
, so it’s impossible to render multiple polygons with different colors using a single trace. For this reason, if you want to vary the color of multiple polygons, make sure the split
by a unique identifier (e.g. name
), as done in Figure 4.9. Note that, as discussed for line charts in Figure 3.2, using multiple traces automatically adds the ability to filter name
via legend entries.
canada <- ne_states(country = "Canada", returnclass = "sf")
plot_ly(canada, split = ~name, color = ~provnum_ne)
Another important feature for maps that may require you to split
multiple polygons into multiple traces is the ability to display a different hover-on-fill for each polygon. By providing text
that is unique within each polygon and specifying hoveron='fills'
, the tooltip behavior is tied to the trace’s fill (instead of displayed at each point along the polygon).
plot_ly(
canada,
split = ~name,
color = I("gray90"),
text = ~paste(name, "is \n province number", provnum_ne),
hoveron = "fills",
hoverinfo = "text",
showlegend = FALSE
)
Although the integrated mapping approaches (plot_mapbox()
and plot_geo()
) can render sf objects, the custom mapping approaches (plot_ly()
and geom_sf()
) are more flexible because they allow for any well-defined mapping projection. Working with and understanding map projections can be intimatidating for a causal map maker. Thankfully, there are nice resources for searching map projections in a human-friendly interface, like http://spatialreference.org/. Through this website, one can search desirable projections for a given portion of the globe and extract commands for projecting their geo-spatial objects into that projection. One way way to perform the projection is to supply the relevant PROJ4 command to the st_transform()
function in sf (PROJ contributors 2018).
# filter the world sf object down to canada
canada <- filter(world, name == "Canada")
# coerce cities lat/long data to an official sf object
cities <- st_as_sf(
maps::canada.cities,
coords = c("long", "lat"),
crs = 4326
)
# A PROJ4 projection designed for Canada
# http://spatialreference.org/ref/sr-org/7/
# http://spatialreference.org/ref/sr-org/7/proj4/
moll_proj <- "+proj=moll +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs"
# perform the projections
canada <- st_transform(canada, moll_proj)
cities <- st_transform(cities, moll_proj)
# plot with geom_sf()
p <- ggplot() +
geom_sf(data = canada) +
geom_sf(data = cities, aes(size = pop), color = "red", alpha = 0.3)
ggplotly(p)
Some geo-spatial objects have an unnecessarily high resolution for a given visualization. In these cases, you may want to consider simplifying the geo-spatial object to improve the speed of the R code and responsiveness of the visualization. For example, we could recreate Figure 4.7 with a much higher resolution by specifying scale = "large"
in ne_countries()
this gives us a sf object with over 50 times more spatial coordinates than the default scale. The higher resolution allows us to zoom in better on more complex geo-spatial regions, but it allow leads to slower R code, larger HTML files, and slower responsiveness. Sievert (2018b) explores this issue in more depth and demonstrates how to use the st_simplify()
function from sf to simplify features before plotting them.
sum(rapply(world$geometry, nrow))
#> [1] 10586
world_large <- ne_countries(scale = "large", returnclass = "sf")
sum(rapply(world_large$geometry, nrow))
#> [1] 548121
Analogous to the discussion surrounding 3.2, it pays to be aware of the tradeoffs involved with rendering plotly graphics using one or many traces, and knowledgable about how to leverage either approach. Specifically, by default, plotly attempts to render all simple features in a single trace, which is performant, but doesn’t have a lot of interactivity.
plot_mapbox(world_large, color = NA, stroke = I("black"), span = I(0.5))
For those interested in learning more about geocomputation in R with sf and other great R packages like sp and raster, Robin Lovelace (2019) provides lots of nice and freely available learning resources (Pebesma and Bivand 2005); (Hijmans 2019).
4.2.2 Cartograms
Cartograms distort the size of geo-spatial polygons to encode a numeric variable other than the land size. There are numerous types of cartograms and they are typically categorized by their ability to perserve shape and maintain contingous regions. Cartograms has been shown to be an effective approach to both encode and teach about geo-spatial data, though the effects certainly vary by cartogram type (Nusrat S, Alam MJ, Kobourov S. 2018). The R package cartogram provides an interface to several popular cartogram algorithms (Jeworutzki 2018). A number of other R packages provide cartogram algorithms, but the great thing about cartogram is that all the functions can take an sf (or sp) object as input and return an sf object. This makes it incredibly easy to go from raw spatial objects, to transformed objects, to visual. Figure 4.12 demonstrates a continuous area cartogram of US population in 2014 using a rubber sheet distortion algorithm from James A. Dougenik, Nicholas R. Chrisman, Duane R. Niemeyer (1985).
library(cartogram)
library(albersusa)
us_cont <- cartogram_cont(usa_sf("laea"), "pop_2014")
plot_ly(us_cont) %>%
add_sf(
color = ~pop_2014,
split = ~name,
span = I(1),
text = ~paste(name, scales::number_si(pop_2014)),
hoverinfo = "text",
hoveron = "fills"
) %>%
layout(showlegend = FALSE) %>%
colorbar(title = "Population \n 2014")
Figure 4.13 demonstrates a non-continuous Dorling cartogram of US population in 2014 from Dorling, D (1996). This cartogram does not try to preserve the shape of polygons (i.e., states), but instead uses circles instead to represent each geo-spatial object, then encodes the variable of interest (i.e., population) using the area of the circle.
us <- usa_sf("laea")
us_dor <- cartogram_dorling(us, "pop_2014")
plot_ly(stroke = I("black"), span = I(1)) %>%
add_sf(
data = us,
color = I("gray95"),
hoverinfo = "none"
) %>%
add_sf(
data = us_dor,
color = ~pop_2014,
split = ~name,
text = ~paste(name, scales::number_si(pop_2014)),
hoverinfo = "text",
hoveron = "fills"
) %>%
layout(showlegend = FALSE)
Figure 4.14 demonstrates a non-continuous cartogram of US population in 2014 from Olson, J. M. (1976). In contrast to the Dorling cartogram, this approach does preserve the shape of polygons. The implementation behind Figure 4.14 is to simply take the implementation of Figure 4.13 and change cartogram_dorling()
to cartogram_ncont()
.
A popular class of contiguous cartograms that do not preserve shape are sometimes referred to as tile catograms (aka tilegrams). At the time of writing, there doesn’t seem to be a great R package for computing tilegrams, but Pitch Interactive provides a nice web service where you can generate tilegrams from existing or custom data https://pitchinteractiveinc.github.io/tilegrams/. Moreover, the service allows you to download a TopoJSON file of the generated tilegram, which we can read in R and convert into an sf object via geojsonio (Chamberlain and Teucher 2018). Figure 4.15 demonstrates a tilegram of U.S. Population in 2016 exported directly from Pitch’s free web service.
library(geojsonio)
tiles <- geojson_read("~/Downloads/tiles.topo.json", what = "sp")
tiles_sf <- st_as_sf(tiles)
plot_ly(tiles_sf, split = ~name)
References
R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Newman, Mark. 2016. “Maps of the 2016 Us Presidential Election Results.” Blog. http://www-personal.umich.edu/~mejn/election/2016/.
Pebesma, Edzer. 2018. Sf: Simple Features for R. https://CRAN.R-project.org/package=sf.
Wickham, Hadley. 2014b. “Tidy Data.” The Journal of Statistical Software 59 (10). http://www.jstatsoft.org/v59/i10/.
South, Andy. 2017. Rnaturalearth: World Map Data from Natural Earth. https://CRAN.R-project.org/package=rnaturalearth.
Mullen, Lincoln A., and Jordan Bratt. 2018. “USAboundaries: Historical and Contemporary Boundaries of the United States of America.” Journal of Open Source Software 3 (23): 314. https://doi.org/10.21105/joss.00314.
PROJ contributors. 2018. PROJ Coordinate Transformation Software Library. Open Source Geospatial Foundation. https://proj4.org/.
Sievert, Carson. 2018b. “Learning from and Improving Upon Ggplotly Conversions.” Blog. https://blog.cpsievert.me/2018/01/30/learning-improving-ggplotly-geom-sf/.
Robin Lovelace, Jannes Muenchow, Jakub Nowosad. 2019. Geocomputation with R. Chapman and Hall/CRC.
Pebesma, Edzer J., and Roger S. Bivand. 2005. “Classes and Methods for Spatial Data in R.” R News 5 (2): 9–13. https://CRAN.R-project.org/doc/Rnews/.
Hijmans, Robert J. 2019. Raster: Geographic Data Analysis and Modeling. https://CRAN.R-project.org/package=raster.
Nusrat S, Alam MJ, Kobourov S. 2018. “Evaluating Cartogram Effectiveness.” IEEE Trans Vis Comput Graph. https://doi.org/10.1109/TVCG.2016.2642109.
Jeworutzki, Sebastian. 2018. Cartogram: Create Cartograms with R. https://CRAN.R-project.org/package=cartogram.
James A. Dougenik, Nicholas R. Chrisman, Duane R. Niemeyer. 1985. “An Algorithm to Construct Continuous Area Cartograms.” The Professional Geographer.
Dorling, D. 1996. “Area Cartograms: Their Use and Creation.” Concepts and Techniques in Modern Geography (CATMOG).
Olson, J. M. 1976. “Noncontiguous Area Cartograms.” The Professional Geographer.
Chamberlain, Scott, and Andy Teucher. 2018. Geojsonio: Convert Data from and to ’Geojson’ or ’Topojson’. https://CRAN.R-project.org/package=geojsonio.
Unfortunately, non-scatter traces currently don’t work with
plot_mapbox()
/plot_geo()
meaning that, for one, raster (i.e., heatmap) maps are not natively supported.↩To see more examples of creating and using plotly.js’ integrated dropdown functionality to modify graphs, see https://plot.ly/r/dropdowns/↩
This is way more intuitive compared to older workflows based on, say using
ggplot2::fortify()
to obtain a data structure where a row to represents particular point along a feature and having another column track which point belongs to each feature (for example).↩