This chapter includes significant contributions from Mika Nikolopoulou.
This chapter examines the best practices associated with governing the large amounts of smart meter data generated by utilities. First, let’s begin with a primer on smart meters.
Smart meters can be used to measure the consumption of electricity, gas, and water. These meters are often coupled with wireless capabilities to enable automated meter reading (AMR). In addition, smart meters are enabled with real-time sensors to monitor power outages and power quality.
Traditional electricity and gas meters are only read on a monthly or quarterly basis. These meters measure only gross consumption and give no insights into when the energy was actually consumed.
Several utilities either have rolled out, or are in the process of rolling out, smart meters. These smart meters typically capture usage data every 15 to 60 minutes for residential and commercial customers and communicate that information on a daily basis to the utility for billing and analytics.
Smart meters offer a number of advantages to consumers, including these:
Smart meters also offer a number of advantages to utilities, including these:
Case Study 19.1 discusses a large water utility that was rolling out a smart meter program. This case study has been disguised.
A water utility serving a large metropolitan area was in the process of rolling out a wireless meter program across more than a million customers. The objective of the program was to reduce peak water usage through tiered pricing. As shown in Figure 19.1, the solution architecture for the smart meter program consisted of five tiers:
The utility company had 800,000 MTUs in its operating area. There was one MTU per building. These units made wireless transmissions of very simple information every six hours. This information consisted of the location identifi er that was embedded in the MTU fi rmware, the timestamp, and the actual meter reading. The system generated more than three million unique meter readings per day. Each MTU could hold up to three days’ worth of meter data if the readings could not be sent upstream.
The MTUs sent their readings to 380 collection MTUs. The system had built-in redundancy, with each MTU sending their readings to three or four collection units to ensure that no reading was ever lost.
Each collection MTU sent files containing 10, 50, or 300 readings to the data collection center for collection and further processing. The data collection center consisted of eight servers that received the files from the collection MTUs. The servers parsed the files, read the MTU identifier, and used lookup tables to insert the reading into the corresponding account number in the table. Over the course of 24 hours, the data center processed upwards of one million raw data files.
Data was then propagated to the analytics environment in nightly batch mode from 1:00 a.m. to 6:00 a.m.
The billing application was on the mainframe and contained the customer master data for account number, name, and address. Address data on the mainframe was validated against an online address standardization system. The data in the analysis center was augmented with additional attributes from the mainframe, including account number, name, address, and additional information for commercial customers. This information was presented to residential and commercial customers via the web and call center.
The smart meter program had to deal with a number of big data governance challenges, which are described below.
Because each MTU sent its readings to three or to four collection points, there was a huge data redundancy problem that could cause data quality problems if not appropriately addressed.
The utility established a process that continuously monitored redundant readings so only a single unique reading could be inserted for each timestamp for a given building or account. During the initial phase, the application inserted a row in the table and then checked if the reading already existed in the table, by searching for duplicates. Since this approach was inefficient, the utility used an index to search for duplicates prior to inserting rows into the table.
Out of the more than 10 million raw readings, only about three million were unique. As a result, the application had to search for duplicates through a table of 270 million rows (3 million x 90 days). To improve the overall performance of the system, the utility deployed streaming analytics so that the data could be processed dynamically in real time. That way, only the unique meter readings would actually flow into the database.
Several database tables had the timestamp of the meter reading as the primary key. This caused a violation of referential integrity because timestamps were not unique across readings from different meters. As a result, a number of rows were rejected upon insert. This, in turn, led to manual processing of these rejected rows and, ultimately, to higher maintenance costs.
The IT team deployed a composite key consisting of the MTU location identifier and the timestamp.
Even in an automated environment, a small percentage of the three million readings were bound to be incorrect. For example, a building might send the following four readings for the day: 500, 520, 600, 10,000. The last reading is likely either a mistake, a sign of a leakage, or some other serious problem.
The smart meter application calculated running totals and a daily average consumption for each MTU, which created a pattern of normalcy for each account. The application then checked the readings for errors or outliers. It then discarded the “erroneous” reading and used the prior reading (500, 520, 600, 600), or created an estimate based on consumption history (500, 520, 600, 630). The application would flag this calculated entry as an estimate and not as the real reading. Because the anomalies might actually be caused by water leakages, the system flagged these readings for further research if the problem persisted.
The system also had to deal with inconsistent addresses for the same customer across the billing application (mainframe) and the data collection and analysis centers (distributed).
The billing system on the mainframe used a live application to standardize customer addresses. In case of any inconsistencies between the customer addresses in the mainframe and distributed environments, the billing application was considered the golden copy. The billing application received only a subset of all the daily meter readings using the following business rules:
The utility needed to archive smart meter readings so that it did not accumulate large volumes of data that hindered performance during row inserts or when running queries.
The utility implemented the following policies to govern the lifecycle of its meter data:
Although the analysis environment stored two years of data, the utility found that most reports only used a few months of data. The utility explored various options to partition the data by time and location, or both, to speed up queries and to reduce overhead. Ultimately, the utility partitioned the data by month, with about 30 partitions for the years 2009-2011.
There have been several newspaper reports of the potential privacy threats posed by smart meters. For example, smart meter data could possibly tell an observer everything a subscriber does in his or her home, down to how often microwave dinners are eaten, how often towels are washed, and even what brand of washer-dryer they’re washed in.1
As part of the overall roadmap, the utility planned to establish policies to monitor access to the smart meter data by privileged users such as database administrators.
The proposed technical architecture for the smart meter project is shown in Figure 19.2. Key components of this technical architecture are described in the following pages.
It was more efficient to process the meter readings in-flight between the collection units and the data collection center. The utility deployed IBM InfoSphere® Streams to process millions of semi-structured meter readings in flight and to do the following:
After the data was cleansed and de-duplicated, the historical and aggregated data had to be stored in the right data model for reporting and analytical purposes. The utility selected IBM Netezza® for its analytical warehouse to support audit activities, respond to advocacy groups regarding water shortages, analyze consumption patterns, conduct financial analyses, and report on customer consumption patterns.
The utility in the case study deployed archiving technology to lower storage, hardware, and maintenance costs and improve system performance by ensuring that queries would run against smaller datasets. The utility selected IBM InfoSphere Optim™ Data Growth as its archiving platform. The applications could still access current and archived data using reports, ODBC/JDBC connectors, relational databases, and even Microsoft Excel®. The applications team had to treat the data as complete business objects. For example, it had to archive not only a set of rows, but also the entire database structure related to those rows that had business relevance. So, a set of meter readings might be archived together with the billing information, historical trends and patterns, and other relevant customer data.
As part of its roadmap, the utility wanted to establish multiple archiving tiers. For example, a set of data that was three to five years old could be on faster storage, data that was five to ten years old could be on less expensive storage, and older data could be archived to tape, compact disc, or optical storage devices. Native application access was generally most useful in the first two to three years of the information management lifecycle, while archived data was being used mainly for historical trends. Older data was necessary for legal discovery requests.
This chapter reviews the case study of a water utility that governed M2M data relating to smart meters. The utility had to address big data governance issues pertaining to data quality, privacy, and information lifecycle management.
1. “Smart Meters Raise Privacy Concerns.” Smart Money Magazine, June 3, 2010.