Chapter 12
Conclusion
“Great things are not done by impulse, but by a series of small things brought together.”
Vincent van Gogh
Most business enterprises today are data-rich but poor in insights. For example, the oil and gas industry has historically captured data for operations and compliance, and today they are aggressively building capabilities to convert the captured data to insights [Wethe, 2018]. In transforming data into insights, this book offers practical guidance with these ten key analytics best practices on what one can do for successfully delivering analytics initiatives for the organization:
These ten best analytics practices were applied in an analytics program in an OFS (Oil Field Services) company. The first section of this chapter will share details on how these best practices were applied in this OFS company. Also, throughout the book, the discussion was focused on delivering good analytics and insights. If successful analytics depends on senior management support, stakeholder alignment, data architecture, quality data, the right algorithms, and change management, how does bad analytics look? The second section of this chapter is on bad analytics. Lastly, deriving insights rely on the statistical models. So, what are the key statistical tools required to derive business insights? In all, this chapter looks at three main topics – a case study, what bad analytics looks like, and the key statistical tools for analytics.
Before we look at these three topics, Figure 12.1 summarizes the different types of business analytics discussed so far.
Case study: Data insight product for Payload Technologies
The ten best analytics practices discussed in the previous chapters were implemented in Payload (PL) Technologies (https://www.payload.com/), a Canadian Oil Field Services (OFS) technology company based in Calgary, Canada. Payload has two flagship cloud SaaS (Software-as-a-Service) products – eTicket and eManifest. These two products, which are used for digitizing oil movement regulatory documents, can be accessed as a web application or as a mobile application.
Payload’s current operating model is a digital network platform. A digital network platform is a technology-enabled business model that creates value by facilitating interactions between two or more interdependent groups. Technically, digital platforms have three distinct features.
Payload’s digital network platform is as shown below. In Payload’s digital network platform, there are three interdependent groups – the E&P companies, the trucking companies, and the drivers. The E&P companies search for oil reserves and then drill the oil wells to extract oil. Once the oil is extracted, the oil products are transported by the trucking companies from the oil wells to the collection points for further shipments to the refineries. In Payload’s digital network platform, there are seven E&P companies that have contracted over 120 trucking companies who work with over 1900 drivers for the oil product movement.
In this backdrop, Payload (PL) has captured operational and compliance data on oil movements for over five years from the E&P companies, trucking companies, and the drivers. This data is approximately 425,000 field tickets and manifests that cover transportation of about 65 million barrels of crude oil (and it’s derivative products like emulsion and condensate) over 86 million kilometers resulting in a trade of over C$ 3.5 billion for the Canadian Oil industry.
Payload wants to monetize the data and offer insights to the E&P and the Trucking companies as a new data analytics product. The data product will be valuable to these companies as these data products will give them insights to:
The business operating model, that is, value stream mapping (VSM), of Payload (PL), is mapped to create the enterprise data model (EDM). The EDM shown below is based on the conceptual and integrated value chain of four key elements – Projects, Orders, Field tickets, and Master Tickets and uses different types of reference data, master data, and transactional data elements.
As the first step in building the new data product, the Payload analytics team identified the stakeholder personas and their key value proposition, basically tied the stakeholders’ goals to questions & KPIs.
Based on the value proposition of the stakeholders, the analytics team worked on addressing the following key business questions:
The answers or the data related to these key questions were captured in the eTicket and eManifest application and the data was stored in PostgreSQL, an open-source relational database management system. The data from PostgreSQL was transferred every hour to the Snowflake Cloud Data Platform (CDP), which served as the Data warehouse for reporting. As the data was captured in the eTicket and eManifest application in a structured format with data integrity rules, the data quality level was high. Hence, the entire population data set was considered for analytics.
After extensive discussion with Payload’s senior management and key subject matter experts (SMEs), the data product strategy was to build the following five analytics offerings. The five data products branded as PL Insights are:
The roadmap for deploying these data products, PL Insights, was mapped to the three types of data products, data enhancing, data exchanging, and data experiencing products, as shown in Figure 12.4. The first wave of data product development was to focus on the two data experiencing products - Basic Data Products for the Trucking Companies and Basic Data Products for the E&P Companies. The priority was to first improve the data literacy and adoption in the user ecosystem or in the digital network with descriptive analytics (reports and dashboard) before embarking on advanced analytics solutions, which were predictive and prescriptive analytics solutions. Hence the development of the data exchanging and data enhancing products was deferred until market success was realized with two basic data analytics products.
Basic data analytics products include dashboards and reports. While the reports can be transactional or BI reports, the analytics team blended the key features of transactional and BI reports leveraging the technical capability in the Snowflake data warehouse (DWH) platform. Snowflake data warehouse uses a columnar format to store data and is designed for quick analytic queries. These queries are saved as views and embedded into the eTicket and eManifest products using Sigma Computing, a cloud data analysis and visualization software.
The Basic Data Product for the Trucking companies appears in Figure 12.6, and for E&P companies in Figure 12.7.
To help Trucking and E&P companies succeed in consuming these data products in their business operations, PL Insights was delivered as a holistic solution – a combination of data products and services. While the basic data products for the Trucking and E&P companies included the dashboard and the descriptive analytics reports, the services included:
The data governance was mainly done by the Operations team in Payload. The data governance activities include training the users from the E&P and the trucking companies on using the right descriptive analytics solution, that is, the dashboard and report, assigning the users to the right reports based on RBAC (Role-Based Access Control), data cleansing and validation, and so on.
About bad analytics
While there are a lot of discussions on good analytics, how does bad analytics look? What exactly is bad analytics, and what are its key characteristics? Bad analytics is more than not having good insights. Here are the eleven key characteristics of bad analytics, which will help you identify and prevent bad analytics in your company.
a) Confirmation bias. A confirmation bias involves favoring insights that confirms previously existing beliefs or findings. It is insights that are already proven or known.
b) Availability bias. Getting good quality data for analytics is challenging. Availability bias is the tendency to share insights that come readily to mind instead of thoroughly analyzing the issue.
c) Selection bias. Selection bias refers to the data sample that does not represent the size of the population. In addition, the sample data is not representative of the population, and there is no proper randomization of the selected data.
d) Anchoring bias. It is fixating on initial information and failing to adapt for subsequent information. This becomes an important issue if the analytics is not made on the most recent data.
e) Framing bias. It is the tendency to be influenced by the way a problem is formulated or defined to suit one interests.
f) Sunk Costs bias. It is the tendency to “honor” already spent resources, especially time and money. This happens when investments are on bad insights, and businesses do not want to lose the time or money already invested, instead of making the decision that would give them the best outcome going forward.
g) Authority bias. Authority bias is accepting the opinion of the highest-paid person’s opinion (HiPPO). The highest-paid person usually has the most power and the highest designation in the room. Once his or her opinion is out, dissent is shut out, thereby affecting a thorough analysis of the problem and the solution.
Overall bad analytics does not support evidence-based and data-driven decision-making (3DM) for business results. Analytics should be designed for a purpose, and the best way to avoid bad analytics is to work on real problems for actual customers or stakeholders. In other words, tying stakeholder insight needs to goals, questions, and quality data.
Selecting statistical tools for analytics
The decision of which statistical test to use depends on the business question, the distribution of the data, and the data type of the variable. As discussed in Chapter 11, there are four main types of business questions from a statistical perspective – composition, comparison, relationship, and distribution. The business data is normally distributed and hence parametric tests are used. In general, if the data is normally distributed, parametric tests should be used. The data type of the analytics variable is nominal, ordinal and continuous. The image below is a list of key statistical tests and their typical use cases.
Closing thoughts
The amount of data generated by the business today is unprecedented. As this growth continues, so do the opportunities for organizations to derive insights from their data analytics initiatives and derive sustainable competitive advantage. Given the complexity in business operations, today, decision making must inevitably rely on the insights derived from data analytics. Analytics today is seen as the next frontier for innovation and productivity in business. But achieving a sustainable competitive advantage from analytics is a complex endeavor and demands a lot of commitment from the organization.
As discussed in chapter 1, the implementation of these ten best analytics practices is in a playbook fashion – a combination of strategy and tactical elements to deliver the greatest value to the business. From the strategic perspective, it means enabling organizations to develop the analytics talent, the culture, data literacy, the discipline, and the organization structure. From a tactical perspective, it means implementation of the ten best analytics practices that reflects the process workflows, standard operating procedures (SOP), and the cultural values.
In implementing the ten best practices, the analytic team will run into many challenges. If there is no data or quality data for validating the hypothesis, one option is to rework your hypothesis. If the team is challenged with acquiring data internally, one approach is to get data from external sources. If there is no good data for analytics, one strategy is to leverage sampling or feature engineering techniques. If there is no precise or accurate business data, one solution is to use ranges and confidence intervals. The bottom line is that analytics is a probabilistic process and not a deterministic process. One cannot expect a perfect situation in the analytics initiatives. It simply doesn’t exist. Overall, the analytics implementation is an evolutionary process, just like the business entity itself. The insight needs of the businesses constantly change, the organizational capabilities continuously mature, the data sets grow, improve, and sometimes even degrade, and the technological capabilities to process the data improve over time.
Reference