6 Boxplots
Boxplots encode the five number summary of a numeric variable, and are more efficient than trellis displays of histograms for comparing many numeric distributions. The add_boxplot()
function requires one numeric variable, and guarantees boxplots are oriented correctly, regardless of whether the numeric variable is placed on the x or y scale. As Figure 6.1 shows, on the axis orthogonal to the numeric axis, you can provide a discrete variable (for conditioning) or supply a single value (to name the axis category).
p <- plot_ly(diamonds, y = ~price, color = I("black"),
alpha = 0.1, boxpoints = "suspectedoutliers")
p1 <- p %>% add_boxplot(x = "Overall")
p2 <- p %>% add_boxplot(x = ~cut)
subplot(
p1, p2, shareY = TRUE,
widths = c(0.2, 0.8), margin = 0
) %>% hide_legend()
If you want to partition by more than one discrete variable, you could use the interaction of those variables to the discrete axis, and coloring by the nested variable, as Figure 6.2 does with diamond clarity and cut. Another approach would be to use a trellis display, similar to Figure 13.9.
plot_ly(diamonds, x = ~price, y = ~interaction(clarity, cut)) %>%
add_boxplot(color = ~clarity) %>%
layout(yaxis = list(title = ""))
It is also helpful to sort the boxplots according to something meaningful, such as the median price. Figure 6.3 presents the same information as Figure 6.2, but sorts the boxplots by their median, and makes it immediately clear that diamonds with a cut of “SI2” have the highest diamond price, on average.
d <- diamonds %>%
mutate(cc = interaction(clarity, cut))
# interaction levels sorted by median price
lvls <- d %>%
group_by(cc) %>%
summarise(m = median(price)) %>%
arrange(m) %>%
pull(cc)
plot_ly(d, x = ~price, y = ~factor(cc, lvls)) %>%
add_boxplot(color = ~clarity) %>%
layout(yaxis = list(title = ""))
Similar to add_histogram()
, add_boxplot()
sends the raw data to the browser, and lets plotly.js compute summary statistics. Unfortunately, plotly.js does not yet allow precomputed statistics for boxplots.18
Follow the issue here https://github.com/plotly/plotly.js/issues/1059↩