19 Working with JSON
JavaScript and other web technologies are intimidating and time-consuming to learn, but by borrowing some knowledge of R’s data structures34, we can get up and running with useful examples fairly quickly. JavaScript Object Notation (JSON) is a popular data-interchange format that JavaScript uses to work with data. As turns out, working with JSON in JS is somewhat similar to working with list()
s in R – both are recursive and heterogenous data structures that have similar semantics for accessing values. In JSON, there are three basic building blocks: objects, arrays, and primitive data types (e.g., number, string, boolean, null
, undefined
).
Loosely speaking, a JSON array is similar to a un-named list()
in R and a JSON object is similar to an un-named list()
. In fact, if you’re already comfortable creating and subsetting named and un-named list()
s in R, you can transfer some of that knowledge to JSON arrays and objects.
19.1 Assignment, subsetting, and iteration
In R, the <-
operator assigns a value to a name, and the [[
operator extracts a list element by index:
arr <- list("hello", "world", 10)
arr[[1]]
#> "hello"
In JS, the =
assigns a value to a name. When assigning a new name, you should include the var
keyword (or similar) to avoid creation of a global variable. The [
operator extracts list elements by index, but be careful, indexing in JS starts at 0 (not 1)!
var arr = ["hello", "world", 10];
arr[0]
// "hello"
In R, the $
and [[
operator can be used to extract list elements by name. The difference is that $
does partial matching of names, while [[
requires the exact name.
obj <- list(x = c("hello", "world"), zoo = 10)
obj$z
#> 10
obj[["zoo"]]
#> 10
In JS, the .
and [
operator can be used to extract list elements by name. In either case, the naming must be exact.
var obj = {
x: ["hello", "world"],
zoo: 10
}
obj.zoo
// 10
obj['zoo']
// 10
Unlike R list()
s, arrays and objects in JS come with properties and methods that can be accessed via the .
operator. Arrays, in particular, have a length
property and a map()
method for applying a function to each array element:
arr.length
// 3
arr.map(function(item) { return item + 1; });
// ["hello1", "world1", 11]
In R, both the lapply()
and purrr::map()
family of functions provide a similar functional interface. Also, note that operators like +
in JS do even more type coercion than R, so although item + 1
works for strings in JS, it would throw an error in R (an that’s ok, most times you probably don’t want to add a string to a number)! If instead, you wanted to only add 1 to numeric values, you could use is.numeric()
in R within an if else statement.
purrr::map(arr, function(item) if (is.numeric(item)) item + 1 else item)
#> [[1]]
#> [1] "hello"
#>
#> [[2]]
#> [1] "world"
#>
#> [[3]]
#> [1] 11
In JS, you can use the typeof
keyword to get the data type as well as the conditional ternary operator (condition ? exprT : exprF
) to acheive the same task.
arr.map(function(item) { return typeof item == "number" ? item + 1 : item; });
// ["hello", "world", 11]
There are a handful of other useful array and object methods, but to keep things focused, we’ll only cover what’s required to comprehend section 20. A couple examples in that section use the filter()
method, which like map()
applies a function to each array element, but expects a logical expression and returns only the elements that meet the condition.
arr.filter(function(item) { return typeof item == "string"; });
// ["hello", "world"]
19.2 Mapping R to JSON
In R, unlike JSON, there is no distinction between scalars and vectors of length 1. That means there is ambiguity as to what a vector of length 1 in R should map to in JSON. The jsonlite package defaults to an array of length 1, but this can be avoided by setting auto_unbox = TRUE
.
jsonlite::toJSON("A string in R")
#> ["A string in R"]
jsonlite::toJSON("A string in R", auto_unbox = TRUE)
#> "A string in R"
It’s worth noting that plotly.js, which consumes JSON objects, has specific expectations and rules about scalars versus arrays of length 1. If you’re calling the plotly.js library directly in JS, as we’ll see later in section 20, you’ll need to be mindful of the difference between scalars and arrays of length 1. Some attributes, like text
and marker.size
, accept both scalars and arrays and apply different rules based on the difference. Some other attributes, like x
, y
, and z
only accept arrays and will error out if given a scalar. To learn about these rules and expectations, you can use the schema()
function from R to inspect plotly.js’ specification as shown in Figure 19.1. Note that attributes with a val_type
of 'data_array'
require an array while attributes with an arrayOk: true
field accept either scalars or arrays.
schema()
In JSON, unlike R, there is no distinction between a heterogeneous and homogeneous collection of data types. In other words, in R, there is an important difference between list(1, 2, 3)
and c(1, 2, 3)
(the latter is an atomic vector and has a different set of rules). In JSON, there is no strict notion of a homogenous collection, so working with JSON arrays is essentially like being forced to use list()
in R. This subtle fact can lead to some suprising results when trying to serialize R vectors as JSON arrays. For instance, if you wanted to create a JSON array, say [1,"a",true]
using R objects, you may be tempted to do the following:
jsonlite::toJSON(c(1, "a", TRUE))
#> ["1","a","TRUE"]
But this actually creates an array of strings instead of the array with a number, string, and boolean that we desire. The problems actually lies in the fact that c()
coerces the collection of values into an atomic vector. Instead, you should use list()
over c()
:
jsonlite::toJSON(list(1, "a", TRUE), auto_unbox = TRUE)
#> [1,"a",true]
If you’d like a nice succinct overview on the topic, see http://adv-r.had.co.nz/Data-structures.html↩