This chapter sets the context for the rest of this book with an introductory discussion on data visualization and visual data storytelling. It explores how these two concepts are similar and different and how both practices have been transformed in the digital era by new technologies and bigger, more diverse, and more dynamic data. Lastly, the chapter explores the value of visual data storytelling for data communication, and establishes how data storytelling is the perfect skill to bridge the very broad and expansive business—IT gap.
A data revolution is happening across the globe. From academics to politics and everywhere in between, the world’s stories are being told through their data points. Although using visualization to tell stories about data isn’t particularly new (in fact, as you’ll soon discover, we’ve been doing it for quite some time), we are now telling them in more influential and impactful ways than ever before.
Today, the resurgence in the power of data visualization—alongside a virtual gold rush of bigger, more diverse, and more dynamic data—is providing new tools and innovative techniques to help us transform raw data into compelling visual data narratives. Propelled by this newfound horsepower in data visualization, we are recreating the entire analytic process. We’re also making it increasingly more visual—from how we explore data to discover new insights all the way to how we curate dashboards, storyboards, and interactive visualizations to share the fruits of our labor. We are always looking for new ways to show off the messages hidden within our data, and we’re getting pretty good at it, too. Charts and graphs created five years ago in Excel do not compare to the incredible visuals we are now producing with best-of-breed tools like Tableau, or scripting with dynamic JavaScript libraries like D3.js (see Figure 1.1).
Our newest breed of data visualizations are moving beyond the classic bar, line, and pie charts of the past, and pushing beyond the boundaries of traditional information displays to powerful new territories of graphic representation. With determination and a healthy spirit of curiosity and adventure, we are visually representing our data on everything from massive, mural-sized visualizations like the Affinity Map,1 a 250-square meter visualization produced by the Swiss Federal Institute of Technology in Lausanne, to interactive visualizations like Trendalyzer,2 a statistical animation visualization developed by the late Hans Rosling’s Gapminder Foundation, to streaming visualizations that bring data to life with real-time movement, to fluid, customizable dashboards that toggle between form factors from the desktop to the smartphone with pixel-perfect rendering. If Gene Roddenberry, creator of the science fiction series Star Trek, had scripted today’s visual analytics movement, he might have said we are boldly going where no viz (visualization) has gone before—and he’d be right.
However, all of these visualizations, from the most dynamic to the most static, need more than just data to make the leap from information representation to resonation. They need a story—something to show, or, more aptly, to “tell” visually—and finding this tale isn’t always obvious when digging through a data set. It takes exploration, curiosity, and a shift in mindset to move from creating a data visualization to scripting a data narrative. They are similar, but not identical, skill sets.
Scripting a data narrative might sound like a vague or even an overwhelming process. After all, many of us would consider ourselves analysts first, “data people” before storytellers. We enjoy numbers and analytics and computation more so than the artsy craft of writing stories. Nevertheless, the two are fundamentally intertwined: We must know our data, its context, and the results of analytics in order to extrapolate these into meaning for an audience who doesn’t. That’s all a story is, really—one person sharing something new and unknown with another in a way that is easily understandable and relatable. The good news? There’s no single way to do it. We can use several proven narrative frameworks to design a data storyboard, and numerous quintessential examples exist where a data storyteller has exercised a generous amount of creative liberty and done something entirely new. After all, like any kind of story, data stories require a certain amount of creativity—and although tools and technology can do much with our data for us, creativity is a uniquely human contribution to any narrative (see Figure 1.2). We’ll take a look at some of these examples as we go forward.
note
Data visualization is the practice of graphically representing data to help people see and understand patterns, insights, and other discoveries hidden inside information. Data storytelling translates seeing into meaning by weaving a narrative around the data to answer questions and support decision making.
Data visualization and data storytelling are not the same thing; however, they are two sides of the same coin. A true data story utilizes data visualizations as a literary endeavor would use illustrations—proof points to support the narrative. However, there’s a little bit of a role reversal here: whereas data visualizations provide the “what” in the story, the narrative itself answers the “why.” As such, the two work together in tandem to translate raw data into something meaningful for its audience. So, to be a proper data storyteller you need to know how to do both: curate effective data visualizations and frame a storyboard around them. This starts with learning how to visualize data, and more importantly, how to do so in the best way for communication rather than purely analytical purposes. As discussed later in this book, visualizations for analysis versus presentation are not always the same thing in data storytelling.
One of the most common clichés in the viz space is that “data visualizations are only as effective as the insights they reveal.” In this context, effectiveness is a function of careful planning. Any meaningful visualization is a two-pronged one. It requires analytical perfection and correct rendering of statistical information, as well as a well-orchestrated balance of visual design cues (color, shape, size, and so on) to encode that data with meaning. The two are not mutually exclusive.
Data visualization is a place where science meets art, although the jury is still out on whether the practice is more a scientific endeavor or an artistic one. Although experts agree that a compelling visual requires both, it tends to be something of a chicken and egg scenario. We haven’t quite come to a consensus as to whether science comes before design or we design for the science—and the decision changes depending on whom you ask, who is creating the visualization, and who its audience is. That said, whichever side of the argument you land on, the result is the same. We need statistical understanding of the data, its context, and how to measure it; otherwise, we run the risk of faulty analysis and skewed decision making that, eventually, leads to risk. Likewise, our very-visual cognition system demands a way to encode numbers with meaning, and so we rely on colors and shapes to help automate these processes for us. An effective visual must strike the right balance of both to accurately and astutely deliver on its goal: intuitive insight at a glance.
This might sound like an easy task, but it’s not. Learning to properly construct correct and effective data visualization isn’t something you can accomplish overnight. It takes as much time to master this craft as it does any other, as well as a certain dedication to patience, practice, and keeping abreast of changes in software. In addition, like so many other things in data science, data visualization and storytelling tend to evolve over time, so an inherent need exists for continuous learning and adaptation, too. The lessons in this book will guide you as you begin your first adventures in data storytelling using data visualizations in Tableau.
With all the current focus on data visualization as the best (and sometimes only) way to see and understand today’s biggest and most diverse data, it’s easy to think of the practice as a relatively new way of representing data and other statistical information. In reality, the practice of graphing information—and communicating visually—reaches back all the way to some of our earliest prehistoric cave drawings where we charted minutiae of early human life, through initial mapmaking, and into more modern advances in graphic design and statistical graphics. Along the way, the practice of data visualization has been aided by both advancements in visual design and cognitive science as well as technology and business intelligence, and these have given rise to the advancements that have led to our current state of data visualization.
In today’s data-driven business environment, an emerging new approach to storytelling attempts to combine data with graphics and tell the world’s stories through the power of information visualization. For as far back as we can trace the roots of data visualization, storytelling stretches further. Storytelling has been dubbed the world’s oldest profession. Likewise, it is now and has always been an integral part of the human experience. There’s even evidence of the cognitive effects of storytelling in our neurology. It’s a central way that we learn, remember, and communicate information—which has important implications when the goal of a visualization or visual data story is to prepare business decision makers to leave a data presentation with a story in their head that helps them both remember your message and take action on it. We’ll discuss the cognitive and anthropological effects of stories more in later chapters.
Graphing stories is the intersection of data visualization and storytelling. American author Kurt Vonnegut is quoted as having famously said, “There is no reason that the simple shapes of stories can’t be fed into a computer—they have beautiful shapes.” Likewise, we could restate this to say that data stories provide the shapes to communicate information in ways that facts and figures alone can’t. Just as much as today’s approach to data visualization has changed the way we see and understand our data, data storytelling has equally—if not more—been the catalyst that has radically changed the way we talk about our data.
Learning to present insights and deliver the results of analysis in visual form involves working with data, employing analytical approaches, choosing the most appropriate visualization techniques, applying visual design principles, and structuring a compelling data narrative. Also, although crafting an effective and compelling visual data story is, like traditional storytelling, a uniquely human experience, tools and software exist that can help. Referring back to Vonnegut’s quote, stories have shapes. In visual data storytelling, we find the shape of the story through exploration of the data, conduct analysis to discover the sequence of the data points, and use annotations to layer knowledge to tell a story.
To visualize the data storytelling process, consider the graphic shown in Figure 1.3. This is the process we’ll follow throughout this book. It’s worthwhile to note that this process isn’t always as straightforward or linear as it might initially appear. In reality, this process is, like all discovery processes, iterative. For example, as a result of analysis we might need to revisit data wrangling (for example, if we find that we are missing a required attribute that we need for our proposed model). Further, as we find insights we might need to revisit the analysis or adjust the data. Finally, as the story unfolds we might need to revisit previous steps to support claims we did not originally plan to make.
Before we move into building skills and competencies in visual data storytelling, let’s take a moment to pause and think about why we are doing this. We’ve danced around this already in previous conversations, and while we could make a convincing argument that mastering new tools and ways to interact with data is an inevitable result of the big data era, that would only be half of the reason. Data will continue to grow, technologies to adapt and innovate, and analytical approaches to chart new territory in how we work with and try to uncover meaning and value hidden within our data. The real value in becoming a data storyteller is to amass the ability to share—to communicate—about our data.
So far, I’ve put data visualization first and communication second, because that is the order you follow when you structure your visual analysis—you have to explore and build something before you can tell a story about it. However, we shouldn’t underestimate the communication that happens before you ever touch your data. Communication skills are a prerequisite listed on every job description, but just how important are these skills in data analysis and visual data storytelling—and why?
In 2012, academic researchers with the AIS Special Interest Group on Decision Support, Knowledge, and Data Management Systems (SIG DSS) and Teradata University Network (TUN) formed the Business Intelligence Congress 3 to survey and assess the state of business intelligence and analytics. They surveyed more than 400 recruiters from technical companies, asking what skills and competencies they looked for in new analytic hires.
Their number one answer? Communication skills3 (see Figure 1.4).
The BI Congress survey isn’t the only piece of data to pinpoint the importance of communication skills in analysis. A second recent piece of research comes from data research and advisory firm Gartner.4 It conducted a research study to determine why big data projects fail—specifically, what percentage of big data projects fail due to organizational problems, like communication, and what percentage fail due to technical problems, like programming or hardware? Only about 1% of companies responded that technical issues alone were the fail point of their data analytics problems. The other 99% of companies said that at least half of the reasons their data analytics projects failed were due to poor organizational skills, namely communication, and not technical skills.
Of course, there isn’t a perfect correlation between organizational skills and communication, but the reality is that one of the most important organizational skills is the ability to communicate—hence its inclusion in every business academic program and on every aforementioned job posting. Although communication skills might live on the softer side of things in terms of skillsets, it is nonetheless a skill that is critical for success, particularly when helping others to see the story within data. However, sharing a story isn’t enough. Anyone can do that. If we can’t communicate, we can’t inspire change or action. Real communication is a two-way dialogue between a sender and a receiver, or receivers. It prompts an action, supports a decision, or generates understanding.
When we discuss the importance of communication skills within the context of data storytelling we are looking at it from an audience-first perspective. This means putting the audience’s needs ahead of the storytellers. Successful communication hinges on the ability to influence the people who matter the most—the stakeholder to your analysis, be that an executive, a teacher, the general public, or anyone else. Ultimately, how data—visual or otherwise—is interpreted is fundamentally influenced by context. Context is a multifaceted thing. It is driven in part by your audience, but just as important to your story is the part of the context driven by you—your assumptions, your goals, and what you already know.
Understanding the importance of context is the focus of Chapter 4. For now, to answer the question I posed earlier—how important are communication skills in visual data storytelling?—they are paramount.
A NOTE ABOUT “DESSERT CHARTS”
After more than 200 years of use (the first being credited to William Playfair’s Statistical Breviary of 1801) what have come to be called “dessert charts”—those circular visualizations including pie and donut charts that “slice” data into wedges reminiscent of our favorite sweets—have had a bit of a fall from grace. Although still widely in use, many visualization experts and educators preach against the use of these types of charts, myself included. However, it should be noted that hatred of pie charts is not merely an opinion, and there is empirical research that provides the basis for why these types of charts just don’t work analytically. That said, there are ways to use them productively—particular as mechanisms for data storytelling—if a few words of caution are followed. We’ll take a deeper look at how to best curate “dessert charts” for visual data storytelling in Chapter 7, “Preparing Data for Storytelling.”
DATA SCIENCE EDUCATION GETS ON THE MAP
By now we are all in agreement: The business of data is changing. Business users are more empowered to work with data; IT is shifting its focus to be less about control and more about enablement. New data science job descriptions—like data scientist and visual data artist—are springing up as companies look for the right people with the right skill sets to squeeze more value from their data. Data itself is getting bigger, hardware more economical, and analytical software more “self-service.” We’ve embraced the paradigm shift from traditional BI to iterative data discovery. We’re depending on data visualization and data storytelling to see, understand, and share data in ways like never before. It’s the visual imperative in action.
As you might expect, these changes have a significant effect on how people work in data science, be they executives, data scientists, researchers, analysts, or even data storytellers. There are a lot of skills available and a very big toolbox to choose tools from, and we are all learning together. Adding to that, over the past few years we’ve been reminded that data workers are in high demand, and we’ve seen firsthand how limited the current supply is. There’s the familiar U.S. Bureau of Labor Statistics estimate that expects 1.4 million computer science jobs by 2020. Another familiar statistic from the McKinsey Global Institute estimates that there will be 140,000 to 180,000 unfilled data scientist positions in the market in 2018. That’s a lot of empty seats to fill. So, we are faced with two challenges: 1) we need more capable data people and 2) we need them with deeper, more dynamic skillsets. This means we have to start thinking about cultivating talent—rather than recruiting it—and training an incoming workforce isn’t something that an industry can do alone, no matter how many specialized software training programs, MOOCs, conferences, and excellent publications we produce. To enact lasting change and a sustainable funnel of competent data workers suited to the new era of the data industry, we need to move further down the pipeline to that place where we all discovered we wanted to be data people in the first place: the classroom.
That’s exactly what we’re doing. The academic community has been tasked with developing new educational programs that can develop the skills and education needed by new data science professionals. These university information science programs—called business analytics, data science, professional business science, or the dozen or so terms used by academia—are only just beginning to be sorted out. However, they are growing exponentially across the country, and so far enrollment is promising.
Different universities are taking different approaches to structuring a new kind of data science education. Some are developing entirely new pedagogy focused on the fluid and dynamic fields of data science. Others are reshaping existing curricula by unifying across academic silos to integrate disciplines of study, particularly among business and IT domains. Others are forming academic alliance programs to give students learning experiences with contemporary industry tools and creating projects that expose students to analytical problems within real-world business context.
Nevertheless, all universities are listening to campus recruiters, who are clearly saying that we need people with more data skills and knowledge, and they’re working hard to fill that gap. More importantly, there are a few things that these programs have in common. They’re focused on real-world applications of data problems. They’re doing their best to keep pace with fluid changes in technology adoption, new programming languages, and on-the-market software packages. They’re also putting a premium on data visualization and data storytelling. Vendors like Tableau with its Tableau for Teaching program are helping, too.
As you might expect, these changes have a significant effect on how people work in data science, be they executives, data scientists, researchers, analysts, or even data storytellers. There are a lot of skills available and a very big toolbox to choose tools from, and we are all learning together. Adding to that, over the past few years we’ve been reminded that data workers are in high demand, and we’ve seen firsthand how limited the current supply is. There’s the familiar U.S. Bureau of Labor Statistics estimate that expects 1.4 million computer science jobs by 2020. Another familiar statistic from the McKinsey Global Institute estimates that there will be 140,000 to 180,000 unfilled data scientist positions in the market in 2018. That’s a lot of empty seats to fill. So, we are faced with two challenges: 1) we need more capable data people and 2) we need them with deeper, more dynamic skillsets. This means we have to start thinking about cultivating talent—rather than recruiting it—and training an incoming workforce isn’t something that an industry can do alone, no matter how many specialized software training programs, MOOCs, conferences, and excellent publications we produce. To enact lasting change and a sustainable funnel of competent data workers suited to the new era of the data industry, we need to move further down the pipeline to that place where we all discovered we wanted to be data people in the first place: the classroom.
That’s exactly what we’re doing. The academic community has been tasked with developing new educational programs that can develop the skills and education needed by new data science professionals. These university information science programs—called business analytics, data science, professional business science, or the dozen or so terms used by academia—are only just beginning to be sorted out. However, they are growing exponentially across the country, and so far enrollment is promising.
Different universities are taking different approaches to structuring a new kind of data science education. Some are developing entirely new pedagogy focused on the fluid and dynamic fields of data science. Others are reshaping existing curricula by unifying across academic silos to integrate disciplines of study, particularly among business and IT domains. Others are forming academic alliance programs to give students learning experiences with contemporary industry tools and creating projects that expose students to analytical problems within real-world business context.
Nevertheless, all universities are listening to campus recruiters, who are clearly saying that we need people with more data skills and knowledge, and they’re working hard to fill that gap. More importantly, there are a few things that these programs have in common. They’re focused on real-world applications of data problems. They’re doing their best to keep pace with fluid changes in technology adoption, new programming languages, and on-the-market software packages. They’re also putting a premium on data visualization and data storytelling. Vendors like Tableau with its Tableau for Teaching program are helping, too.
So just how big is data science education? Over the past couple of years, the number of new business analytics program offerings has significantly increased. In 2010 there were a total of 131 confirmed, full-time BI/BA university degree programs, including 47 undergraduate-level programs. Today, that number has nearly tripled and continues to rise with new and improved programs at the undergraduate, graduate, and certificate levels—both on and off campus—springing up at accredited institutions across the country (see Figure 1.5). So, while we might not have access to all this new data talent yet, if academia has anything to say about it, help is on the way.
note
This dataset is regularly updated and maintained by Ryan Swanstrom, and is available via Github at https://github.com/ryanswanstrom/awesome-datascience-colleges.
This chapter focused on providing an introductory discussion on data visualization and visual data storytelling by taking a look at how these concepts are similar and different, and how both have been transformed in the digital era. The next chapter takes a closer look at the power of visual data stories to help us understand what makes them so powerful and important in today’s data deluge.
_____________
1. https://actu.epfl.ch/news/the-world-s-largest-data-visualization/
2. https://www.gapminder.org/tag/trendalyzer/
3. Wixom, Barbara; Ariyachandra, Thilini; Douglas, David; Goul, Michael; Gupta, Babita; Iyer, Lakshmi; Kulkarni, Uday; Mooney, John G.; Phillips-Wren, Gloria; and Turetken, Ozgur (2014). “The Current State of Business Intelligence in Academia: The Arrival of Big Data,” Communications of the Association for Information Systems: Vol. 34 , Article 1.