Chapter 5. Data

Data is an extremely broad term, only slightly less vague than the nearly all-encompassing information. What is data? (What isn’t data?) What kinds of data are there, and what can we use with D3?

Broadly speaking, data is structured information with potential for meaning.

In the context of programming for visualization, data is stored in a digital file, typically in either text or binary form. Of course, potentially every piece of digital ephemera may be considered “data”—not just text, but bits and bytes representing images, audio, video, databases, streams, models, archives, and anything else.

Within the scope of D3 and browser-based visualization, however, we will limit ourselves to text-based data. That is, anything that can be represented as numbers and strings of alpha characters. If you can get your data into a .txt plain text file, a .csv comma-separated value file, or a .json JSON document, then you can use it with D3.

Whatever your data, it can’t be made useful and visual until it is attached to something. In D3 lingo, the data must be bound to elements within the page. Let’s address how to create new page elements first. Then attaching data to those elements will be a cinch.

Typically, when using D3 to generate new DOM elements, the new elements will be circles, rectangles, or other visual forms that represent your data. But to avoid confusing matters, we’ll start with a simple example and create a lowly p paragraph element.

Begin by creating a new document with our simple HTML template from the last chapter. You can find it in the sample code files as 01_empty_page_template.html, and it looks like the following code. (Eagle-eyed viewers will notice that I’ve modified the src path here due to work with the directory structure of the code samples. If that doesn’t mean anything to you, don’t worry about it.)

<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="utf-8">
        <title>D3 Page Template</title>
        <script type="text/javascript" src="../d3/d3.v3.js"></script>
    </head>
    <body>
        <script type="text/javascript">
            // Your beautiful D3 code will go here
        </script>
    </body>
</html>

Open that page in your web browser. Make sure you’re accessing the page via your local web server, as we discussed in Chapter 4. So the URL in your browser’s location bar should look something like this:

http://localhost:8888/d3-book/chapter_05/01_empty_page_template.html

If not viewed through a web server, the URL path will start with file:/// instead of http://. Confirm that the URL does not look like this:

file:///…/d3-book/chapter_05/01_empty_page_template.html

Once you’re viewing the page, pop open the web inspector. (As a reminder, see Developer Tools on how to do that.) You should see an empty web page, with the DOM contents shown in Figure 5-1.

Back in your text editor, replace the comment between the script tags with:

d3.select("body").append("p").text("New paragraph!");

Save and refresh, and voilà! There is text in the formerly empty browser window, and the web inspector will look like Figure 5-2.

See the difference? Now in the DOM, there is a new paragraph element that was generated on the fly! This might not be exciting yet, but you will soon use a similar technique to dynamically generate tens or hundreds of elements, each one corresponding to a piece of your dataset.

Let’s walk through what just happened. (You can follow along with 02_new_element.html.) To understand that first line of D3 code, you must first meet your new best friend, chain syntax.

Let’s deconstruct each line in this chain of code:

d3
References the D3 object, so we can access its methods. Our D3 adventure begins here.
.select("body")
Give the select() method a CSS selector as input, and it will return a reference to the first element in the DOM that matches. (Use selectAll() when you need more than one element.) In this case, we just want the body of the document, so a reference to body is handed off to the next method in our chain.
.append("p")
append() creates whatever new DOM element you specify and appends it to the end (but just inside) of whatever selection it’s acting on. In our case, we want to create a new p within the body. We specified "p" as the input argument, but this method also sees the reference to body that was passed down the chain from the select() method. So an empty p paragraph is appended to the body. Finally, append() hands off a reference to the new element it just created.
.text("New paragraph!")
text() takes a string and inserts it between the opening and closing tags of the current selection. Because the previous method passed down a reference to our new p, this code just inserts the new text between <p> and </p>. (In cases where there is existing content, it will be overwritten.)
;
The all-important semicolon indicates the end of this line of code. Chain over.

The Hand-off

Many, but not all, D3 methods return a selection (actually, a reference to a selection), which enables this handy technique of method chaining. Typically, a method returns a reference to the element that it just acted on, but not always.

So remember this: when chaining methods, order matters. The output type of one method has to match the input type expected by the next method in the chain. If adjacent inputs and outputs are mismatched, the hand-off will function more like a dropped baton in a middle-school relay race.

When sussing out what each function expects and returns, the API reference is your friend. It contains detailed information on each method, including whether or not it returns a selection.

What is binding, and why would I want to do it to my data?

Data visualization is a process of mapping data to visuals. Data in, visual properties out. Maybe bigger numbers make taller bars, or special categories trigger brighter colors. The mapping rules are up to you.

With D3, we bind our data input values to elements in the DOM. Binding is like “attaching” or associating data to specific elements, so that later you can reference those values to apply mapping rules. Without the binding step, we have a bunch of data-less, unmappable DOM elements. No one wants that.

We use D3’s selection.data() method to bind data to DOM elements. But there are two things we need in place first, before we can bind data:

  • The data
  • A selection of DOM elements

Let’s tackle these one at a time.

Data

D3 is smart about handling different kinds of data, so it will accept practically any array of numbers, strings, or objects (themselves containing other arrays or key/value pairs). It can handle JSON (and GeoJSON) gracefully, and even has a built-in method to help you load in CSV files.

But to keep things simple, for now we will start with a boring array of five numbers. Here is our sample dataset:

var dataset = [ 5, 10, 15, 20, 25 ];

If you’re feeling adventurous, or already have some data in CSV or JSON format that you want to play with, here’s how to do that. Otherwise, just skip ahead to Please Make Your Selection.

CSV stands for comma-separated values. A CSV data file might look something like this:

Food,Deliciousness
Apples,9
Green Beans,5
Egg Salad Sandwich,4
Cookies,10
Vegemite,0.2
Burrito,7

Each line in the file has the same number of values (two, in this case), and values are separated by a comma. The first line in the file often serves as a header, providing names for each of the “columns” of data.

If you have data in an Excel file, it probably follows a similar structure of rows and columns. To get that data into D3, open it in Excel, then choose Save as… and select CSV as the file type.

If we saved the preceding CSV data into a file called food.csv, then we could load the file into D3 by using the d3.csv() method:

d3.csv("food.csv", function(data) {
    console.log(data);
});

csv() takes two arguments: a string representing the path of the CSV file to load in, and an anonymous function, to be used as a callback function. The callback function is “called” only after the CSV file has been loaded into memory. So you can be sure that, by the time the callback is called, d3.csv() is done executing.

When called, the anonymous function is handed the result of the CSV loading and parsing process; that is, the data. Here I’m naming it data, but this could be called whatever you like. You should use this callback function to do all the things you can do only after the data has been loaded. In the preceding example, we are just logging the value of the data array to the console, to verify it, as shown in Figure 5-3. (See 03_csv_loading_example.html in the example code.)

You can see that data is an array (because of the hard brackets [] on either end) with six elements, each of which is an object. By toggling the disclosure triangles next to each object, we can see their values (see Figure 5-4).

Aha! Each object has both a Food property and a Deliciousness property, the values of which correspond to the values in our CSV! (There is also a third property, __proto__, but that has to do with how JavaScript handles objects, and you can ignore it for now.) D3 has employed the first row of the CSV for property names, and subsequent rows for values. You might not realize it, but this just saved you a lot of time.

One more thing to note is that each value from the CSV is stored as a string, even the numbers. (You can tell because 9 is surrounded by quotation marks, as in "9" and not simply 9.) This could cause unexpected behavior later, if you try to reference your data as a numeric value but it is still typed as a string.

Verifying your data is a great use of the csv() callback function, but typically this is where you’d call other functions that construct the visualization, now that the data is available, as in:

var dataset;  //Declare global var

d3.csv("food.csv", function(data) {

    //Hand CSV data off to global var,
    //so it's accessible later.
    dataset = data;

    //Call some other functions that
    //generate your visualization, e.g.:
    generateVisualization();
    makeAwesomeCharts();
    makeEvenAwesomerCharts();
    thankAwardsCommittee();

});

One more tip: if you have tab-separated data in a TSV file, try the d3.tsv() method, which otherwise behaves exactly as the preceding method.

The data is ready to go. As a reminder, we are working with this simple array:

var dataset = [ 5, 10, 15, 20, 25 ];

Now you need to decide what to select. That is, what elements will your data be associated with? Again, let’s keep it super simple and say that we want to make a new paragraph for each value in the dataset. So you might imagine something like this would be helpful:

d3.select("body").selectAll("p")

and you’d be right, but there’s a catch: the paragraphs we want to select don’t exist yet. And this gets at one of the most common points of confusion with D3: how can we select elements that don’t yet exist? Bear with me, as the answer might require bending your mind a bit.

The answer lies with enter(), a truly magical method. See this code, which I’ll explain:

d3.select("body").selectAll("p")
    .data(dataset)
    .enter()
    .append("p")
    .text("New paragraph!");

View the example code 04_creating_paragraphs.html and you should see five new paragraphs, each with the same content, as shown in Figure 5-5.

Here’s what’s happening:

All right! Our data has been read, parsed, and bound to new p elements that we created in the DOM. Don’t believe me? Take another look at 04_creating_paragraphs.html and whip out your web inspector, shown in Figure 5-6.

Okay, I see five paragraphs, but where’s the data? Switch to the JavaScript console, type in the following code, and click Enter. The results are shown in Figure 5-7:

console.log(d3.selectAll("p"))

An array! Or, really, an array containing another array. Click the gray disclosure triangle to reveal its contents, shown in Figure 5-8.

You’ll notice the five ps, numbered 0 through 4. Click the disclosure triangle next to the first one (number zero), which results in the view shown in Figure 5-9.

See it? Do you see it? I can barely contain myself. There it is (Figure 5-10).

Our first data value, the number 5, is showing up under the first paragraph’s __data__ attribute. Click into the other paragraph elements, and you’ll see they also contain __data__ values: 10, 15, 20, and 25, just as we specified.

You see, when D3 binds data to an element, that data doesn’t exist in the DOM, but it does exist in memory as a __data__ attribute of that element. And the console is where you can go to confirm whether or not your data was bound as expected.

The data is ready. Let’s do something with it.

You might be wondering why you have to write out function(d) { … } instead of just d on its own. For example, this won’t work:

.text("I can count up to " + d);

In this context, without wrapping d in an anonymous function, d has no value. Think of d as a lonely little placeholder value that just needs a warm, containing hug from a kind, caring function’s parentheses. (Extending this metaphor further, yes, it is creepy that the hug is being given by an anonymous function, but that only confuses matters.)

Here is d being held gently and appropriately by a function:

.text(function(d) {  // <-- Note tender embrace at left
    return "I can count up to " + d;
});

The reason for this syntax is that .text(), attr(), and many other D3 methods can take a function as an argument. For example, text() can take either simply a static string of text as an argument:

.text("someString")

or the result of a function:

.text(someFunction())  // Presumably, someFunction() would return a string

or an anonymous function itself can be the argument, such as when you write:

.text(function(d) {
    return d;
})

Here, you are defining an anonymous function. If D3 sees a function there, it will call that function, while handing off the current datum d as the function’s argument. Here, I’ve named the argument d just by convention. You could call it datum or info or whatever you like. All D3 is looking for is any argument name, into which it can pass the current datum. Throughout this book, we’ll use d because it is concise and familiar from many of the other D3 examples found online.

In any case, without that function in place, D3 couldn’t relay the current data value. Without an anonymous function and its argument there to receive the value of d, D3 could get confused and even start crying. (D3 is more emotional than you’d expect.)

At first, this might seem silly and like a lot of extra work to just get at d, but the value of this approach will become clear as we work on more complex pieces.