Human beings generate vast quantities of data, such as call center agents’ notes, voice recordings, emails, paper documents, surveys, and electronic medical records. Most human-generated data is unstructured, as summarized in Table 17.1.
Here are some of the best practices relating to the governance of human-generated data:
17.1 Establish policies to mask sensitive human-generated data.
17.2 Use unstructured human-generated data to improve the quality of structured data.
17.3 Manage the lifecycle of human-generated data to reduce costs and comply with regulations.
17.4 Extract insight from unstructured human-generated data to enrich MDM.
These best practices are discussed in detail in the rest of this chapter.
Human-generated data might contain personally identifiable information that needs to be masked. Case Study 17.1 discusses the big data governance policies to mask sensitive information within voice data at call centers.
Many call centers make voice recordings of some or all their calls, for a variety of purposes. These organizations then analyze calls either after the fact or in real-time for a variety of reasons:
Call centers need to protect the privacy of callers when voice recordings contain sensitive information. This is especially true in financial services, insurance, and healthcare. For example, call centers might need to mask sensitive information such as name, Social Security number, account number, and address within voice recordings.
In addition, many call centers make recordings of voice conversations to comply with regulatory requirements. However, these voice recordings might contain Payment Card Industry (PCI) data, such as the three-digit or four-digit card verification code and primary account numbers (PANs). The Payment Card Industry Data Security Standard (PCI DSS) stipulates that three-digit or four-digit card verification codes cannot be retained after validation, and that PANs cannot be kept without further security measures.
The PCI Security Standards Council has issued guidance in this regard.1 The guidelines suggest that organizations do one of the following:
Human-generated data can also provide insights that are not available within structured data. Case Study 17.2 discusses the use of call center agents’ notes and nurses’ notes to improve the overall quality of data for predictive modeling at a health plan.
A large health plan already had an information governance program that had been in place for several years. The information governance program was led by the director of business intelligence, who reported to the CIO. The director of business intelligence was a visionary who understood the value of big data, but needed a substantial project to generate buy-in from the business.
The director of business intelligence sponsored a big data initiative that leveraged text analytics technology to glean key words such as “congestive heart failure” from notes made by call center agents and nurses. The health plan then fed this data into their predictive models to help determine if certain members were at high risk and, therefore, warranted advanced care management. Because of this exercise, the health plan found that a large number of members were not following their doctors’ instructions, which increased the overall cost of care. They gained this insight based on key phrases such as “I don’t like my doctor” and “I don’t like my medication.” Based on this project, the director of business intelligence was able to generate an initial level of buy-in from the business.
As of the publication of this book, the marketing team was exploring text analytics on customer surveys, and the operations department was considering voice analytics within the call centers.
Organizations need to adhere to regulatory requirements regarding human-generated data such as email messages and voice recordings. Let’s consider a legal action by the United States Securities and Exchange Commission (SEC) against Morgan Stanley & Co. in May 2006. The SEC filed an action against Morgan Stanley for not producing thousands of emails and other electronic records during the course of investigations, due to inadequate retention procedures. Morgan Stanley ultimately settled the case out of court and agreed to pay a fine of $15 million.
Voice data also needs to be subject to information lifecycle management. For example, the UK Financial Services Authority requires firms that are managing client orders to record relevant telephone conversations and retain those recordings for six months.2 Chapter 12, on managing the lifecycle of big data, deals with issues such as regulatory compliance and cost reduction. These principles also apply to human-generated data such as email messages and voice recordings.
Unstructured human-generated data may yield useful information about customer relationships and other attributes that can be used to enrich MDM. Case Study 11.4 discusses the use of email to enrich customer MDM.
Human-generated data includes call center agents’ notes, voice recordings, email messages, paper documents, surveys, and electronic medical records. The big data governance disciplines relating to privacy, master data integration, data quality, and information lifecycle management also apply to human-generated data.
1. “Protecting Telephone-Based Payment Card Data.” PCI DSS 2.0 Information Supplement, March 2011.
2. “Telephone Recording: Recording of voice conversations and electronic communications.” Financial Services Authority Policy Statement 08/1, March 2008.