HUMAN-GENERATED DATA

17.1 Establish policies to mask sensitive human-generated data.

17.2 Use unstructured human-generated data to improve the quality of structured data.

17.3 Manage the lifecycle of human-generated data to reduce costs and comply with regulations.

17.4 Extract insight from unstructured human-generated data to enrich MDM.

These best practices are discussed in detail in the rest of this chapter.

17.1 Establish Policies to Mask Sensitive Human-Generated Data

Human-generated data might contain personally identifiable information that needs to be masked. Case Study 17.1 discusses the big data governance policies to mask sensitive information within voice data at call centers.

Case Study 17.1: Big data governance policies to mask sensitive information within voice data

Many call centers make voice recordings of some or all their calls, for a variety of purposes. These organizations then analyze calls either after the fact or in real-time for a variety of reasons:

Operational efficiency (post-processing of voice calls)—Call centers analyze voice data to understand why clients need to speak to a customer service representative (CSR). The call centers then implement techniques such as changing the prompts in the automated voice response menu to encourage customers to use self-service options rather than the more expensive option of talking to a CSR.

Quality assurance (post-processing of voice calls)—Quality assurance personnel or management listen to a sample of the voice recordings to ensure that CSRs are handling calls politely and in accordance with policies.

Cross-sell and up-sell (real-time processing of voice calls)—A small number of call centers use real-time voice analytics so that CSRs can make offers while they are on the phone with customers.

Call centers need to protect the privacy of callers when voice recordings contain sensitive information. This is especially true in financial services, insurance, and healthcare. For example, call centers might need to mask sensitive information such as name, Social Security number, account number, and address within voice recordings.

17.2 Use Unstructured Human-Generated Data to Improve the Quality of Structured Data

Human-generated data can also provide insights that are not available within structured data. Case Study 17.2 discusses the use of call center agents’ notes and nurses’ notes to improve the overall quality of data for predictive modeling at a health plan.

Case Study 17.2: Leveraging big data for health and wellness programs at a health plan

A large health plan already had an information governance program that had been in place for several years. The information governance program was led by the director of business intelligence, who reported to the CIO. The director of business intelligence was a visionary who understood the value of big data, but needed a substantial project to generate buy-in from the business.

The director of business intelligence sponsored a big data initiative that leveraged text analytics technology to glean key words such as “congestive heart failure” from notes made by call center agents and nurses. The health plan then fed this data into their predictive models to help determine if certain members were at high risk and, therefore, warranted advanced care management. Because of this exercise, the health plan found that a large number of members were not following their doctors’ instructions, which increased the overall cost of care. They gained this insight based on key phrases such as “I don’t like my doctor” and “I don’t like my medication.” Based on this project, the director of business intelligence was able to generate an initial level of buy-in from the business.

As of the publication of this book, the marketing team was exploring text analytics on customer surveys, and the operations department was considering voice analytics within the call centers.

17.3 Manage the Lifecycle of Human-Generated Data to Reduce Costs and Comply with Regulations

Organizations need to adhere to regulatory requirements regarding human-generated data such as email messages and voice recordings. Let’s consider a legal action by the United States Securities and Exchange Commission (SEC) against Morgan Stanley & Co. in May 2006. The SEC filed an action against Morgan Stanley for not producing thousands of emails and other electronic records during the course of investigations, due to inadequate retention procedures. Morgan Stanley ultimately settled the case out of court and agreed to pay a fine of $15 million.

17.4 Extract Insights from Unstructured Human-Generated Data to Enrich MDM

Unstructured human-generated data may yield useful information about customer relationships and other attributes that can be used to enrich MDM. Case Study 11.4 discusses the use of email to enrich customer MDM.

Summary

1. “Protecting Telephone-Based Payment Card Data.” PCI DSS 2.0 Information Supplement, March 2011.

2. “Telephone Recording: Recording of voice conversations and electronic communications.” Financial Services Authority Policy Statement 08/1, March 2008.

CHAPTER 17

HUMAN-GENERATED DATA

17.1 Establish Policies to Mask Sensitive Human-Generated Data

Case Study 17.1: Big data governance policies to mask sensitive information within voice data

17.2 Use Unstructured Human-Generated Data to Improve the Quality of Structured Data

Case Study 17.2: Leveraging big data for health and wellness programs at a health plan

17.3 Manage the Lifecycle of Human-Generated Data to Reduce Costs and Comply with Regulations

17.4 Extract Insights from Unstructured Human-Generated Data to Enrich MDM

Summary