Chapter 7
Best Practice #6
Focus on Descriptive Analytics for Data Literacy
“Over 85% of Data Analytics in the Industry is Descriptive Analytics.”
Gartner
As discussed in the preceding sections, businesses today originate and capture enormous amounts of data. But many companies struggle to become a data-driven enterprise. Research published in Harvard Business Review found that companies are failing in their efforts to become data-driven and the percentage of firms identifying themselves as being data-driven has declined in each of the past three years – from 37.1% in 2017 to 32.4% in 2018 to 31.0% in 2019 [Bean and Davenport, 2019]. Some alarming results from the research are:
One of the main reasons that prevent companies from becoming data-driven is the lack of data literacy, which is the ability to understand and communicate data and insights. In fact, according to Gartner, data literacy is the second key reason that is preventing companies to become data-driven and by 2020, 50% of organizations will lack sufficient AI and data literacy skills to achieve business value [Gartner, 2019]. The key findings of Gartner’s CDO research are below.
So, how can data literacy be inculcated in the business enterprise? While there are many strategies to drive data literacy and one key strategy is leveraging the execution of descriptive analytics. But what exactly is descriptive analytics in the context of data literacy? Descriptive analytics is interpreting historical data to better understand past business performance. In simple words, descriptive analytics answers the question, “what happened?” using historical business data. Examples include, what were our sales last quarter? Who are the top five vendors based on dollar spend? Which product had the most defects? Questions like these form the foundation for the entire analytics strategy as these types of basic questions and the associated key performance indicators (KPIs) form the basis of enterprise business performance.
Descriptive analytics is the most common of the three types of analytics in business: the other two being predictive and prescriptive analytics. According to business analytics experts, Piyanka Jain and Puneet Sharma, 80% of the analytics reports used in enterprises are descriptive in nature [Jain and Sharma, 2014]. According to Gartner, just 13% of organizations are using predictive, and 3% are using prescriptive reports; just 16% of the reports are on advanced analytics; a combination of predictive and prescriptive analytics [Williamson, 2015].
A holistic analytics solution should address the insight needs of analysts, managers, and executives.
Descriptive analytics is technically realized with dashboards and reports using the MAD (Monitor-Analyze-Detail) framework, which is explained below.
Within the MAD framework, the monitor function is for the senior management, the analysis function is mainly for the managers, and the detail function is for the analysts. The MAD Insight consumption framework is in Figure 7.2.
The table below contains the mapping of the MAD framework to the user type and the three types of analytics.
Why is this a best practice?
So, how can descriptive analytics help with data literacy in the business? As discussed earlier, data literacy is enabling the business stakeholders to work with both data and insights for better business performance. So, if organizations can get reliable, quick, and easy access to both data and the insights and if they practice and work in that insights-based environment long enough, then data literacy can be potentially achieved. According to an article in HBR, it takes time and deliberate practice to become an expert [Ericsson et al., 2007]. According to the American Educator Edgar Dale, who developed the Cone of Experience or the Learning Pyramid, the more experiential is the learning, the greater is the retention by the individual of what they are learning. Malcolm Gladwell, in his bestselling book, Outliers, said one needs 10,000 hours of practice to gain expertise. The general premise is - systematic and constant exposure to data, and the insights will enhance data literacy in the organization, and descriptive analytics can offer that promise.
But how does descriptive analytics offer that promise? While data and insights can be provided by one of the three types of analytics – descriptive, predictive, and prescriptive, why is descriptive analytics preferred over others? The reason is descriptive analytics facilitates reliable, quick, and easy access to data and insights. There are five main reasons for this.
Research by IBM says that basic reporting and dashboarding capabilities, that is, descriptive analytics capabilities, can improve the return on investment (ROI) by 188%, and improving data quality can further boost the ROI to as high as 1209% [IBM, 2017]. For all these reasons, descriptive analytics can be a key enabler for building data literacy in the company.
Realizing the best practice
So, how can a business enterprise use descriptive analytics for data literacy? To implement descriptive analytics and build data literacy, a business enterprise must harness three key capabilities:
Use data to build the data-driven culture
The foundation for data literacy is a data-driven culture. Organizations need to promote a data-first culture that encourages data-driven decision making (3DM). But how can businesses realize the data-driven culture on the ground? One technique to build the data-driven culture is to use the data to do the talking, and this can be done by building quality data sets. In this regard, below are the three key steps for building a data-driven culture in the enterprise.
Profile the data and fix data quality issues
In chapter 4, we discussed data profiling in detail. Data profiling is the process of examining the existing data available and collecting statistics about that data. Given that business data is normally distributed, these statistics on the profile of the data should cover not only the database related parameters, but also metrics such as standard deviation (on data accuracy), standard error (on data precision), range (on variation in data), average (mean, median, and mode), and z-score (for outliers in the data set).
In addition, during profiling, there might be some data quality issues, specifically on the 12 data quality dimensions. Here are three key tactics to fix the data quality: especially in the “after-the-event” situation.
Once the data quality is improved, ensure that the right stakeholders have access to the right data. This can be done by classifying data according to the compliance view, which is of three main categories: public, confidential, and restricted data. If the data is sensitive, it must be protected, if not the access to the data can be opened.
Open the access to non-sensitive data
Once the enterprise data is profiled based on the compliance view, provide access to non-sensitive data to everyone in the company. When more data is available at the user’s disposal, the chances of building a data-driven culture and data literacy is enhanced. The public data in the business is usually non-sensitive data. In addition, reference data and master data are typically non-sensitive data when compared to transactional data, which is usually sensitive due to its contextual nature. For example, while the dollar value in the purchase order, which is transactional data is sensitive, the vendor and item data in the purchase order, which is master data is not very sensitive. The table below is the categorization of non-sensitive data and sensitive data.
Empower users to use data
One key strategy to empower business users is to leverage Self-Service Analytics (SSA). In SSA, business users perform queries and generate reports and dashboards on their own without relying much on the IT developers. SSA promises data democratization and faster data-based decisions. SSA does not eliminate IT-Business collaboration as some amount of training, governance, and change management is still required for the business from IT. The two main issues with SSA are data security and licensing of analytics tools.
Building data pipelines
A data pipeline is an extract of discrete and/or time-series data from multiple data sources and loading the data into the data warehouse or data lake for analytics. The data extract is with SQL stored procedures, which enables the data pipelines to quickly and efficiently extract data from the transactional source systems, transform it, and ingest it into data-warehouse or data-lake for deriving insights thereby improving the data literacy. Fundamentally, data pipelines bring reliability in the data integration process, thereby improving the trust in the way the data is sourced, transformed, and ingested.
The data pipeline architecture has three layers where each layer feeds into the next until data reaches its destination, which can be the data warehouse, data lake, or data hubs.
Implementing reports and dashboards
Once the data is made available, the third capability in improving the data literacy in business is building reports and dashboards – the two pillars of descriptive analytics. Fundamentally, a report is a list of data attributes generated based on the criteria defined. Reports can be tabular reports where the data typically comes from the transactional IT systems like ERP or CRM system, or the reports can be from the BI systems where the data comes from a data warehouse or data lake. Fundamentally, tabular reports are presented as views in the data visualization layer; a view is the result set of a SQL query on the data. Tabular reports have four key characteristics.
The second type of report is the BI report. BI reports provide features to sort, filter, group or aggregate, and visualize data across multiple dimensions. The data warehouses are denormalized databases in BI systems where the data extraction query does not have to go to multiple tables to get the right data. This saves a lot of querying time, thereby improving the speed of data rendering. BI reports are presented as cubes; a cube is stored data in a multi-dimensional form. Just like tabular reports which are on historical data, BI reports have four key characteristics.
So, when does one go for tabular reports, and when does one for go BI reports? If the requirements are for detailed or granular data, then the reports must be transactional, where the data is coming from the transactional systems. The need for transactional reports will usually come from the analysts. On the other hand, if the business users need aggregated and multi-dimensional data in a fast way or ad-hoc way, then BI reports are a good place to go! This need will usually come from middle and senior management. The image on the facing page shows the comparison between Transactional and BI reports.
While the reports (transactional and BI) are the presentation of the data on the database attributes, some users, especially the managers and the senior managers, require specific information to be presented quickly and visually as a KPI. To address the insight needs of these types of “monitor” users, the best practices are to go for dashboards. A dashboard fundamentally presents insights or KPIs in a visual manner. A typical dashboard will have four key types of insights, comparisons, trends, distribution, and relationships, and is visually presented using charts. The relationship between the four key types of dashboard insights and the 13 key visuals or charts is as shown below. The figure below is adapted from the work of Abela, A [Abela, 2009].
Conclusion
In the same way literacy has contributed to human progress, data literacy is essential in ensuring the progress or the growth of the organization in today’s data-centric world. What is even more important is that today understanding data and deriving insights in business operations is no longer the skill of just data scientists and IT experts. It is an essential or core skill for every knowledge worker if the company wants to leverage data for improved business performance and become data-driven.
References