Chapter 16
IN THIS CHAPTER
Assessing the current state
Selecting a data science use case
Analyzing the data skill gap
Evaluating AI ethics
Selecting the best data science use case to implement first (or next) is one of the most important aspects of setting up your projects for success. To be in a position to do that successfully, you need to assess your company based on the evidence you’ve collected about its current state. Assuming that you’re following the steps prescribed by the STAR framework, which I discuss in Chapter 15, at this point you’ve been authorized to propose a new data science project, you’ve surveyed the industry, and you’ve taken stock of your company’s current state, as shown in Figure 16-1.
At this point in the process, you’re ready to assess your company and identify the lowest-hanging-fruit data use case — the focus of this chapter. In Chapter 17, you can see how to make recommendations based on your assessment.
FIGURE 16-1: Assessing your company’s current state.
If you follow the instructions I lay out in Chapter 15 for researching your company, you end up with a lot of documentation describing the inner and outer workings of your company. That’s great! That’s just the strong foundation you need for meaningfully and accurately assessing your company’s current state. If you’re not familiar with this term, it’s just consulting vernacular that refers to the state of your company, as it is now. This term is juxtaposed with a company’s future state — the status of the company at some defined point in the future. When a company hires a consultant — data or otherwise — it’s hiring someone to provide advice, strategic plans, or training (or all three) to help the company bridge the gap between its current state and future state.
After you have all that documentation on hand, it’s time to review it. This process involves thoroughly reading and reviewing all the documentation you collected on your company. You can produce a SWOT analysis as you work through it, in order to make note of the key findings that jump out at you. (In case the term SWOT is new to you, it’s just a simple set of notes you create about your company’s strengths, weaknesses, opportunities, and threats that you notice while you’re working your way through the documentation.) These notes can help you summarize your findings so that you can more easily identify the lowest-hanging-fruit data science use case.
After you have thoroughly reviewed all the documentation you’ve collected and have produced a high-level SWOT analysis, you should be in a good position to select some contenders for the lowest-hanging-fruit data science use case.
You can do that by bringing to mind the data science use cases you surveyed during Part 1 of the STAR framework. (For more on the STAR framework, see Chapter 15.) Consider all the data resources, technologies, and skillsets that are required in order to implement each case. If you conclude, based on your documentation review, that your company lacks (or cannot easily acquire) the data resources, technologies, and skillsets required in order to implement a particular data science use case, throw it out. It isn’t the lowest-hanging-fruit use case you’re looking for.
After you’ve sifted through relevant data science use cases and removed the ones that aren’t promising, you have to then apply your best judgment to select the top three data science cases that seem the most promising for your company — based on what you know about its current state. These three use cases are the final contenders for the lowest-hanging-fruit data science use case for your company’s next successful data science project.
After you’ve narrowed the list to a set of three promising data science use cases, you then select the absolute best, most promising data science use case. If this is your first time being strategic about your data science project planning, you most definitely want this use case to be a quick win for your company. (When I say “quick win,” I mean that you want the project to produce a positive, quantifiable return on investment, or ROI, in its first three months — this success establishes a solid foundation for you in gaining the confidence from business leaders that you need in order to lead even bigger data science projects.)
To assess these top three data science use cases against your company’s current state, prudently produce an alternatives analysis, or at least some draft versions of one. For each of the potential data science use cases, first consider your working knowledge of your company as well as your findings from the documentation review, and then answer the following questions:
Answering these questions should help narrow your data science use case options, but I suggest whittling them down further by using a POTI model.
Before doing a final data science use case selection, you need to assess how each of the potential use cases will affect your company’s POTI — its processes, organization, technology, and information, in other words. Let’s look at each of these factors in greater detail:
Based on your answers to these questions, it should be rather straightforward to pinpoint a data science use case that offers the greatest potential ROI for your company, in the shortest amount of time, with the lowest level of risk.
From there, I suggest one more stopgap assessment question. Does the data science use case support your company in better reaching its business mission and vision? If the answer to this question is yes, then let this use case be your lowest-hanging-fruit data science use case. If the answer to the question is no, step back to the second-most-promising data science use case.
If you’ve made it to this point in your data science journey, congratulations! You’ve identified the lowest-hanging-fruit data science use case for your company. That’s a huge deal — and a lot of work you’ve accomplished to reach this point. If you’ve followed the documentation collection-and-review processes as I’ve described them, your decision-making will be well-supported with ironclad evidence. Furthermore, you created draft assessments before selecting a final use case. Be sure to keep all your documentation and draft assessments in a safe place. I recommend that you make them addendums to your data science project plan so that you have a full, comprehensive body of evidence to support any recommendations you might make. Before you reach that point, however, you still need to complete the assessment phase. The act of selecting your use case only marks the halfway point when it comes to assessments within the STAR framework.
The type of assessments you need depends heavily on the data science use case you’ve selected. Though you won’t find a one-size-fits-all approach, I offer you some suggestions for assessments that I feel would be helpful in a wide variety of data science projects.
The following assessment protocols are meant to be used as helpful plug-and-play suggestions that you can use according to your company’s needs.
The STAR framework, which I talk about in Chapter 15, offers you a clear, repeatable process that you can use whenever you need to plan out a data science project. As part of this framework, you take stock of the data skillsets of the people who work at your company. Because you’ve already surveyed these individuals, you have a pretty good idea of the existing competencies to be found within the relevant human capital at your company. You also have already chosen a data science use case. With that use case selection, you’ve also narrowed in on a range of skills you’ll need in order to support the data science project. In the best-case scenario, your company already has people with those skills, and those people have the capacity to help support your data science project. Another favorable outcome is that your company has people with the basic prerequisites needed to learn the data skills your project requires. And, in the worst-case scenario, your company needs to make one (or a few) new hires.
Assessing the data skillset requirements for the project is straightforward. You need to look at the technologies the project will require and the data science modeling approaches that are implicit in the use case you selected. With that step out of the way, all you then need to do is cross-reference those skillset requirements with your survey results to see who at your company can help deliver this project.
As you gain more experience and trust with respect to planning and leading data science projects, you earn a little more of an allowance in terms of resource allocation (or reallocation, as it were). In this situation, if your company has people who have the exact skills you need, you also need to address the other sticking point — their availability. If those individuals are completely locked down in supporting other teams and projects, adding one more project to their plate will be a tough sell to their superiors. You have to decide whether it's worth the risk to hire someone new to help. In other cases, you may find people with the available capacity who just don't have the exact skills that are needed. If it’s possible that, with training, these individuals could support your data science project, that would eliminate the need for making a costly new hire. It would also create more value from that person’s time (and your company’s investment in retaining that time). In this case, you’d want to assess your survey results and produce a training plan for each of these workers in order to take them from their current competency to the higher competency levels your project requires.
If you’re working at a company that truly supports its mission, vision, and values and its leaders are data literate, those leaders will back initiatives that inquire about and reinforce higher ethical standards with respect to the company’s AI solutions. Unfortunately, most leaders aren’t all that data literate (which represents another opportunity for improvement in most companies' data strategies). In such cases, speak to them in a language they understand. Speaking strictly from a business perspective, gaps in AI ethics represent significant reputational risk to the company. If your company uses AI technology in a way that produces inequitable and/or biased outcomes and that grievance is discovered, you can expect the company to appear in media headlines — somehow, somewhere.
In Chapter 15, I talk about how important it is for you to itemize all your company’s active AI solutions as well as to collect reports and documentation that describes the machine learning processes and model metadata that each of these AI solutions deploys. I also talk there about the fact that you should collect, for each AI solution, user manuals, so to speak, that explain how each of these solutions works. Lastly, I make it clear in Chapter 15 that you have to gather any information that references potential biases produced by these solutions. After you’ve gathered all this requisite information, you can use it to do a preliminary assessment of your company’s AI ethics. The next few sections point you in the right direction.
Accountable, explainable, unbiased: These words represent what your company’s AI solutions need to be in order for them to be truly ethical. But what does “accountable, explainable, unbiased” AI actually mean, and why does it matter to real-life human beings? Let me explain with a true story.
Imagine yourself as a healthcare provider who’s been tasked with treating Mr. Smith, a 65-year-old who has already been diagnosed with lung cancer and who experiences severe bleeding. You’re working with one of the leading oncology clinics in the US, so you’re privy to all the latest-and-greatest technologies. The latest gizmoid acquired by your clinic is IBM Watson for Oncology. You’ve been instructed to consult with this cutting-edge, costly software when making your treatment recommendation.
You follow orders, so you go in and feed Mr. Smith’s patient data into the machine and await its recommendations. A few minutes later, it spits out a recommendation that the chemotherapy drug Bevacizumab should be administered to Mr. Smith as a form of treatment for his lung cancer. You stand back in complete shock because, experienced oncology specialist that you are, you know that Bevacizumab has a black box warning that it should never, ever, be used to treat cancer patients who experience severe bleeding.
This expensive technology is supposed to improve the quality and safety of the medical recommendations you make. Instead, its recommendation is downright dangerous. What if you hadn’t been educated and aware of the medical contradiction for yourself? Do you see what a huge liability this machine could be setting you up for? Not to mention the negative impact on the lives of the people who depend on you to survive. I am sorry to break it to you, friends, but this scenario actually happened in real life. It’s just one example of the real and present dangers you expose yourself to when you depend on AI solutions in the healthcare industry. Implicit risks like these, however, are baked into AI solutions that are used across every industry in existence. As users and beneficiaries of AI solutions, everyone must be extremely vigilant about their implicit risk.
What does that look like? I mean, what do you look for to determine whether an AI solution is trustworthy? Well, for starters, you have to take proactive measures to make sure that your AI system is, as I say a little earlier, accountable, explainable, and unbiased. (For more on this topic, see Chapter 15.)
How can you identify whether your AI system is indeed accountable? You start by thoroughly reviewing the documentation you collected about the accountability of your company’s AI solutions. Those are the reports you collected that describe the machine learning processes and model metadata that each of these AI solutions deploys. First read the documentation yourself, and jot down notes inside a draft SWOT analysis, just to get your own thoughts recorded, in a first pass, on paper. Then make another pass through the documentation — this time, answering the following questions:
You probably have guessed it, but your answer to all these questions should be a resounding yes! If any of the answers comes out as no, you have a gap in the accountability of your company’s AI solutions. This gap represents risk to the business and should be remedied.
Let me bring up the elephant in the room here: General Data Protection Regulation, otherwise known as GDPR. I talk about data privacy in Chapters 14 and 15, but never go so far as to name exact regulations. (There are a lot of them!) GDPR is a mammoth in terms of its elephant-ness, though, and it’s one of the main drivers behind the need for explainable AI, so it pays to look at it in greater detail.
GDPR asserts data privacy rights for all EU citizens, regardless of where they reside in the world. According to Recital 71 of GDPR, “[the data subject should have] the right … to obtain an explanation of the decision reached” if their personal data was used in any part of reaching that decision — period. End of story.
GDPR extends to all EU citizens the right to an explanation anytime a predictive model is used in making a judgment about them. This right stipulates that EU citizens can demand an explanation for any judgments made that impact them, including for reasons such as credit risk scoring, autonomous vehicle behaviors, healthcare decisions, and what-have-you.
As for the financial risk to companies found in violation of GDPR, Article 83(4) of GDPR states that infringements shall be “subject to administrative fines up to 10 000 000 EUR, or … up to 2% of the total worldwide annual turnover of the preceding financial year, whichever is higher.” Enough said?
Now that you understand the gravity of this explainable AI matter, I want to show you a couple of the ways that you can identify explainability in your company’s AI systems. Read the documentation you’ve collected, and then answer the following questions:
Having this manual comes in handy when inevitable questions arise about how and why recommendations are being made. In that case, the appropriate decision-maker can just explain their judgment and how the AI solution impacted it, providing a copy of the plain-language manual to supplement that response. In most cases, this type of explanation should be more than sufficient.
Designated representatives of your company must be able to explain how its AI solutions work. If your company uses a vendor AI solution, the vendor who supplies the AI system must provide a plain-language manual explaining how it works, and in a way that makes sense to non-data people. If they can’t, it’s time for you to seek alternatives.
To move into a description of one more characteristic of ethical AI, your AI system needs to produce unbiased results. This one is tough to explain, so let’s start with a fictitious example.
An elderly man named Tom has been admitted to the hospital for medical testing related to a bronchitis diagnosis. The testing takes longer than expected, and Tom’s loving family desperately want him to be released to them at home. In a few days, when it comes time for Tom to be released, he is discharged into a state-run nursing facility and isn’t permitted to go home. When asked about this decision, the healthcare provider thoughtfully explains that, because Tom’s household income is low, the AI system has predicted that he won’t receive adequate support and care at home. The AI has determined that Tom should be discharged to a state care facility — where he will subsequently incur higher costs of services and a greater chance of readmission to the hospital. This is an example of an incredibly biased outcome.
Bias can be integrated into a predictive model in two main ways, when either of these statements is true:
The cognitive bias of data professionals whose work impacts the project.
Examples of variables that are at high-risk of producing bias are race, gender, skin color, religion, national origin, marital status, sexual orientation, education background, income, and age. Generally speaking, if your model requires these variables, be sure to evaluate its results for unfair bias. If you find bias, you have to go back to the drawing board and find another way to generate results — one that does not produce unfair discrimination of people based on their demographic.
As to what to look for when assessing whether your AI systems are biased, your best bet is to rigorously test the solution in trials that approximate real life as much as possible and then make some personal judgments about its outputs. It would be worth your time to assemble a user group to evaluate the level of potential bias. Here are some questions to consider when evaluating bias in AI outputs:
If, by exploring these questions, you find troubling answers, I suggest that you take it on yourself to start brainstorming which measures your company can put into place to ensure that future AI projects are accountable, explainable, and unbiased. These are exactly the type of suggestions you’d want to include when you produce recommendations, a topic we cover thoroughly in Chapter 17.
Here’s the simple truth: You can’t have ethical AI without good data governance supporting it. Why? Because in order for your AI to be explainable, unbiased, and accountable, it must be built from high-quality data that has been properly documented and maintained. If it helps to grasp this concept, think of data governance as a type of chain of custody that secures the integrity and reliability of your organization’s data. To build explainable AI, you need to understand and, more importantly, trust the data that goes into it.
Being in a position to explain your AI to any doubters is one of the main benefits of insisting on good data governance, but there are other benefits as well, such as these:
You can assess your company’s data governance by simply looking to see whether it exhibits the characteristics of good data governance I just outlined. Some signs of overly lax governance policies are redundant duplicate data sets running amok, poor data quality, and overall unmanageability in a company’s data operations. Excessively strict data governance often makes itself known in the form of data bottleneck problems, where users can’t access data without submitting a formal request from the IT department — a bureaucratic hoop that takes a long time to jump through. The result is that you and all the other business users wait in line to gain access to the basic data needed to do your jobs. If you see gaping data governance problems inside your company, you definitely need to address them in the planning and recommendations of your data science project.
As you might expect, data governance policies don’t appear out of thin air — somebody has to come up with them. That’s where the data governance council — a team of elected individuals who together decide on a company's data governance policies — comes into play. Any mature company has a data governance council in place, but in case your company is smaller or newer, you probably need to form one. When doing so, assess and identify those individuals best suited to serve on the council.
For starters, you should select people who have some mixture of the following characteristics:
Capable of good communication with a knack for getting along with others
Look for some training background (so that the person can help educate the rest of the organization about the importance of data governance).
Data governance policies — or data policies, for short — are simply policies that document the rules and processes that should be followed in order for your data resources to remain consistent and of high quality. Document these rules and processes for every major data-intensive activity that happens within your business. Such rules and processes are often referred to as data governance standards. A mature company should be in a position to bring together the documentation for the entire set of major business activities into one data governance document — that’s your organization’s data policy. After being compiled, the data policy must be maintained and adjusted per changes in your organization. And again, data policy is but one of the fundamental constituents of solid data governance.
Document any potential gaps, omissions, or risks you can think of so that you can address them in the recommendations you’ll surely make in line with your upcoming data science project. (For more on how to fashion these recommendations, check out Chapter 17.)