8
How to Create Profit Out of Data

8.1 Introduction

Undoubtedly data can provide a means of creating profit, but data essentially describes the present and the past. Focussing too much on data can be a barrier to new thoughts and ideas. It is important to be open to the sudden, disruptive cross‐overs to totally new concepts that occur from time to time. For example, however much a candle is improved by data analysis it will not morph into a light bulb. The light bulb has the same role as the candle but is a totally different technology with much more versatility and functionality.

Sometimes we can make a conceptual leap and completely change the way things are done. This has happened with the sharing economy, which has opened up the way we use our spare time, our commute to work and our property. The gradual evolution of the internet has had significant collateral effects. Changed working practices have increased opportunities to work from home, with the knock‐on effect of opening up the countryside for people to live in and create businesses away from the cities.

There are many pathways to turning data into money. The first step is motivating data owners and companies to think about what can be done (see Figure 8.1).

Diagram depicting the pathways to monetizing data. A circle labeled “Big data monetization impulses” is linked to a clockwise arrow with circles markers makers for ask the right questions, detect patterns, etc. — **Figure 8.1** Pathways to monetising data.

Customers can benefit from the information hidden in their data although they may not be aware of their needs. For example, if you show interest in a certain item in a shop, then this information combined with what you bought before can be analysed to reveal your latent interests and these can be matched with other products that also have this latent trait and suitable recommendations can be made. For example, the latent trait may imply affinities to other related areas; if you like fast cars, you may also be interested in extreme sports and travel to unusual places.

The data scientist should extract sample information to demonstrate the potential in the data because there is always resistance to believing in its value. This is where data visualisation is so important as it can educate and inspire data owners to expend the requisite effort to explore their data. For example, the segmentation features in Figure 8.2 help elicit ideas for how to attract and retain customers.

Illustration of the segmentation features of walk-in customers, with a clustered bar graph (left) and 9 boxes for average age, gender, quality index, etc. which indicates the description of customers (right). — **Figure 8.2** Segmentation features of walk‐in customers.

Thinking about what influences potential customers may prompt ideas about how additional input could make data more useful at minimal extra cost. Noting the personnel on duty each day or the weather can provide extra variables to explain and predict your sales. Integrating the data with information about the demographics, footfall and parking in the area obtained from open data sources is always a useful exercise and enables you to compare your trade with the wider picture.

Some managers are remarkably naïve about what simple data analysis can reveal. If they are shown a plot of the frequency of demand of their different products, they may be surprised to learn that some products are highly sought after and others are rarely chosen, even though you would expect this to be well known. Similarly, with on line communication, senders may assume that each action is equally effective, whereas in fact there are often large differences that can often be explained by an examination of the digital environment of the communication. The success may be affected by what was sent out at the same time, the style used and the likely audience at the time and location. A special kind of communication style will attract different people because it is relevant to a specific audience.

A common misconception among front‐line staff is to think that all products take equally long to produce and they quote a standard lead time for all orders. A simple data analysis may show that most products can easily be accessed and it is only a few of the rarely requested items that take a long time to obtain.

Similarly, in the digital world, some communications are the result of extensive resources being expended and we would hope that these are the communications that have the most impact, but this needs to be checked and verified. For business success, it is important to spend money on actions that give the best return on investment. We know that to most readers this is rather obvious, familiar ground but we know from business experience that people still do not seek information from their data and do not act on the information they do have; those businesses who do so have a big competitive advantage.

Organisations can use their data as fodder for analytics, from which they can create value, such as a report or a description or to mark a single data line with a score. Or they can also see their data as a product in itself and earn money from selling it, or earn more money by combining it with their knowledge or by embellishing it with related metadata and displaying it on maps.

It is more obvious to consider monetisation in a positive sense of companies or individuals gaining from selling their data. However, we should also consider monetisation in the negative sense. For example, self‐confessed smoking habits or lack of physical activity can affect insurance premium or financial borrowing costs. If you were to refuse to give this data, then it can go against you in terms of increased costs. Recorded driving behaviour or number of steps walked can be given to secure more favourable rates, implying that they can also have a negative effect. With increased opportunities for self‐quantification this type of exchange can only become more prevalent. Recorded activity levels on social media influence rates charged for insurance because all these measures go into score cards used to decide whether you are eligible for preferential treatment or not.

8.2 Business Models for Monetising Data

8.2.1 Introduction

How intrinsic is the data to the business? In some cases, the company is deliberately set up to collect business intelligence, for example launching a lottery and collecting addresses and demographic data about the customer; the apparent nature of the business is just a cover hiding the core business of data collection. Companies have a specific business but collecting data is easily as important. For example, music and movie pick list providers also benefit from collecting data and the demographics and preferences of their customers. The importance of the data is firstly to improve their own business and secondly as an additional source of income. This is a major motivation to other companies to becoming interested in the opportunities of big data. They find that although they didn’t consciously collect the data, they are in possession of this valuable resource. Their business can exist without the data but data is a sideline that can be very lucrative. Analytics provides extra revenue for retailers, publishers, telecommunications companies, finance businesses and others. These companies have a great opportunity and many have exploited it. See Figure 8.3.

Diagram listing the business opportunities for acquiring, growing, and retaining customers, for optimizing processes and minimizing fraud, for maximizing insights and improving economics, etc. — **Figure 8.3** Business opportunities.

Financial advisers, health advisers and expert system providers in general who have collected data as part of their core business are in an enviable position to monetise it and add another income stream to their business. They can do this at a number of levels:

the ground level of businesses collect data and offer a service or entertainment as a means of payment; this may be referred to as the ‘lotto level', where a service is offered and your data are collected
companies that collect data to empower their own non‐data‐related core business
companies in which collecting data is a vital part of the company, such as providers of expert systems or search machines that ask for personal details so that recommendations can be made
companies that buy in data as a resource and/or dig around for data and/or pre‐process it, and sell on the pre‐processed information.

Both companies and people can monetise their data because individuals have a valuable data resource in themselves that they can give or sell as they choose, even though they may not be aware of having this resource. We can look at the data exchange process from the points of view of the company and the user, as in Table 8.1.

Table 8.1 Business models for types of exchange.

Type of exchange	Example	What the user gets from the company in return for their data	What the company core offer is	What monetisation the company gets in addition to the core offer
1	Social network	Esteem and belonging	Service, entertainment	Money from advertisers
2	B2C selling goods and services	Extra services, e.g. gifts, vouchers	Retail	Optimise their own advertisement and money from reports on brand usage, etc
3	Advisors selling knowledge	Advice and save time	Advice	Knowledge about the customer and money from selling on information to companies in their sector.
4	Service providers	No payment or less payment	Providing technical solutions and knowledge to turn the data into valuable assets	No additional

We consider each of these types of exchange in more detail below.

8.2.2 Social Network Business Models

Relatively new social network companies were quick to realise the value of the detailed data they hold about people and their relationships with each other. Indeed, they have become ever more creative in utilising the information. One big change is their expansion from the original catchment group; for example, from university friends to the whole world; from business colleagues to a wide range of professionals and aspiring business people.

Some social networks focus on a subset of particular media, such as photographs, short comments with a limited number of characters or special interest groups. The latter provide a popular service and also a ready catchment group of people with more or less similar and predictable interests.

8.2.3 B2C Selling of Goods and Services

In this category we consider retailers, major supermarkets, small shops, mail order houses and agencies providing services such as locating transport, food, accommodation or companionship. These companies all have a distinctive, clear main core business but can also collect vast amounts of data. They enjoy the benefits of their data as they can sell or rent it, either as datasets or as reports on brand usage, and so on. They use data to optimise their own advertising and to design tailored offers. They have one‐to‐one communication with their customers. The companies differ in their business sub‐model, as shown in Table 8.2.

Table 8.2 Business models for B2C selling.

Type of exchange	Example	Data giving is compulsory or voluntary	What the company core offer is	What monetisation the company gets in addition to the core offer and standard benefits of data
1	B2C selling goods locally, e.g. shops and hotels	Data giving is not compulsory, but if given results in extra services, e.g. gifts, vouchers	Goods and service; delivery is optional	They can offer local‐based services
2	B2C selling goods on line or mail order, e.g. providing a service, such as banking	Data has to be given by the individual because the buying is remote, not face‐to‐face.	Goods and delivery	No additional
3	Online market place, e.g. online sales	Data is compulsory.	Goods from different industries and areas are delivered with convenience, e.g. easy payment methods.	The marketplace means that there is an overview of consumer interest in different areas
4	Agencies, e.g. taxis and accommodation	Data is compulsory; the company does not own goods but they do own the customer relationship.	The company offers the link only.	The nature of the agency means that they have much more detailed knowledge of day to day living and personal insights

The financial sector may be considered to have a type 2 exchange because data must be given before a bank account can be opened. The hotel is also type 2, assuming identity has to be given before the customer can stay. Agencies overlap somewhat with our next category in the following section because during the selling process they are also offering advice, making use of their knowledge of a particular area.

Companies, such as those in the sports sector, offer as a perk self‐quantification devices that allow the user to measure themselves. Users who opt for the data collected by the devices to be stored by the provider have the advantage of retaining and viewing it. Those who give the necessary permissions are giving a valuable source of information, which can be analysed and built on using predictive analytics.

8.2.4 Advisers Selling Knowledge

People can undertake extensive research themselves when they want to find out about something, or they can ask an expert. The expert has already spent a lot of time researching available options and learning relevant techniques and therefore it is reasonable for them to be paid by the person commissioning the advice. Their payment can come from a range of other sources as well. For example, an expert advising on equipment may receive payment from the equipment suppliers, and also from people with a vested interest in making sure that people have the right equipment, as well as from the person who needs the equipment.

Some advisers who intermediate between customers and providers offer services free at point of use and get paid by the providers, for example by a click‐through payment. They also attract banner advertising and can charge for this niche access. Some companies act as adviser and agent, for example in recommending accommodation and then also booking it for customers.

8.2.5 Service Providers

Service providers that turn raw data into more tailored offerings can operate because of the increase in open data and semi‐open data such as is available via web scraping and social media. These companies have monetisation as their main business so they have no additional layer to their business. We distinguish cases with slightly different business sub‐models. All of them aim to make it more convenient for companies and people to enjoy the benefits of big data. The differences arise in the way they do this, the type of data and its source. These are summarised in Table 8.3.

Table 8.3 Business models for service providers.

Type of exchange	Example	What the user gets from the company in return for their data	What the company core offer is
1	Using official statistics and public data	Provider gets recognition and justification of costs, and in some cases money; other public sources may have no payment; privately owned sources get money	Market view and collated reports from multiple sources; some may be free but some may require a subscription
2	Using privately owned data and integrating it with public data	Services and money to those who own bulk data, payments or gifts or entertainment, entering into raffles, etc for people completing surveys	Accumulate data on an individual level for B2B and B2C and rent the information to interested companies who may use it to optimise advertising, marketing and sales processes
3	Delivering an IT environment	No payment as the data is just stored	Providing IT infrastructure that enables big data processing and individualised marketing information
4	Using public data with a subtle theme	No payment as only web available public data is used	Providing IT infrastructure that enables big data processing and individualised marketing information

These companies all provide the bespoke service of sifting through diverse appropriate data sources with an expert eye to assemble a valuable product. They may offer web mining services for specific keywords or offer to find out how people think about different companies (tonality) or about new ideas that are emerging. They focus on mining text published on the web and in the social web to turn words into quantitative information: numbers with interpretation and meaning. There are also companies that will search for data to integrate with a specific company’s data to give better insights.

8.3 Data Product Design

We now consider ways in which data as a product is designed and shaped. It can be sold raw, cleaned, summarised, tabulated into frequency tables, categorised, processed into KPIs, interpreted in the business context, or it can be analysed to give a useful product like a forecast or a business rule for targeted advertising.

The data itself can be sold as a product or can be more and more processed to give a more refined product that can be sold for a higher price. As a good parallel, data is like any other resource, such as potatoes which are more valuable when transformed into crisps, or oil, which is worth more when refined and processed into a plastic bag!

Making better use of data to improve or develop your own business requires reliable data handling and the powerful triptych of statistics, IT and business knowledge synthesised in data science. Innovative, creative thinking should be encouraged. New ideas about unlocking the potential of the data emerge all the time. What is profitable and what should be developed will change with time as new ideas come into fashion or become more important. Technical advances are a major factor and can enable data products that were previously out of the question.

8.4 Value of Data

8.4.1 Introduction

We now consider the value of the data and look in particular at four dimensions that give value to the data: accessibility, rarity, quality and utility.

8.4.2 Accessibility

From the data owner’s point of view, data extraction may take the same effort for each of the data items, but it is evident that the consumer sees some information as much more valuable. This implies that some aspects of extraction can be charged at a higher price than others.

Data that is difficult to access is more valuable. The time and effort to produce the data product is important and depends on knowledge and expert skill to process it. This follows the same law as for other products. For example, a garment or sculpture that is difficult to make is more valuable.

8.4.3 Rarity

The concept of rarity in the context of data implies that the data does not occur very often, belongs to a very small group, happened at a particular moment in time or is very personal and sensitive. For example, information about personal income is sensitive and in some countries real data is hard to come by. Often estimates based on a sample are used instead of a full census. This is core business for official statistics institutions. Much of their data is rare and only approved researchers can work with it. Analysts must apply for approved researcher status and may then be granted access to it, but only in the controlled environment of a micro‐data laboratory. They must receive training and can only take out collated results and not the original data. Even for approved researchers the data may still be anonymised.

8.4.4 Quality

Quality of data, as with other products, is a major issue if absent or lower than expected. Quality implies fitness for purpose and we must look at

missing values
de‐duplication
obvious fraud
error.

Missing values occur structurally as well as through mistakes; they upset the balance of data and reduce the power of the analysis and risk making it unrepresentative. De‐duplication is a major task, especially as people update their profiles and may change their title or add a middle name or write their address differently. It takes effort and therefore adds value when complete. Obvious fraud includes malicious and playful deceit, such as people putting 10 Downing Street or Buckingham Palace as their address and calling themselves Donald Duck or John Smith when this is not their name. Errors can result from honest or dishonest misinterpretation of a question and giving unreasonable answers, such as earning €50,000 per week when it should be per year or giving one’s age as 150, or 21 when it is not.

Clearly quality is of paramount importance to ensure maximum monetisation. This means that management and data processes should be in place to carry out quality control of data collection instruments and quality improvement prior to analysis.

In the digital environment, we need to consider both human and non‐human data sources, each with their specific quality issues.

8.4.5 Utility

To be useful, data has to have high information content, which is tantamount to having high information quality. Information quality (InfoQ) is considered to have eight dimensions:

data resolution
data structure
data integration
temporal relevance
chronology of data and goal
generalisability
operationalisation
communication.

InfoQ is defined in relation to the goal of the analysis. Data should be at the right level in all of the dimensions if InfoQ is to be high.

The four major marketing utilities include form utility, time utility, place utility, and possession utility. More recent studies include psychological utility. In addition, the data needs to be sensitively priced.

8.5 Charging Mechanisms

8.5.1 Introduction

In this section, we look at how data arises and becomes the possession of the data owner. We then consider how the data can be sold on and how the price can be fixed. We identify a number of revenue strategies.

8.5.2 Data Acquisition

Most data is acquired as part of everyday business functions, such as inventory, invoicing, logistics and transactional processing. This is the valuable data resource that is at the heart of the new data analytics revolution.

Contact lists can be built up by a company as part of their trading history. The lists need to be refreshed periodically and more names can be purchased from companies specialising in identifying potential customers. Names can also be traded between businesses with a similar customer base, a transaction which can be mutually beneficial to both. No money changes hands and both companies gain.

Some data is acquired as a legal requirement, as with census data. Businesses have a requirement to submit data routinely to government. They also periodically are required to complete business surveys if asked for. This free donation of data does not need to go unrewarded if informed use is made of the combined results of the survey. Businesses are often reluctant to conform and refer to this data collection as a business burden.

Much data is acquired as a secondary input in addition to the core business of the company, say from tracking customers. Information is drawn in and intelligence is the product. These types of data have been discussed in terms of the profit they can bring when suitably monetised.

8.5.3 Revenue Strategies

There are different dimensions to data, often corresponding to database key variables. These include time, product and customer. Some revenue‐earning transactions involve all dimensions, some focus on just one or two.

Organisations whose value proposition is the data they hold may be referred to as ‘data brokers’. Examples are the companies who sell data to recruitment agencies. Where the value proposition is the insight from the analysis of the data that it holds, the company may be referred to as an ‘insight innovator’. Examples are telecommunication companies that sell insight around people’s communications and wellbeing. The insight innovator business model involves an interplay between statistics and business modelling to ensure that the analysis is deep and also business focused.

Data brokers simply sell data, whereas insight innovators preprocess the data before it is sold on in a more easily accessible and integrated form. The market for data products includes buyers who differ depending on their core business and the purpose for which they want the insight.

We can consider four main revenue strategies for data and data products:

Tariff model There are different pay scales, such as gold, silver and bronze. Each scale functions as a further level of insight. This model is useful if it is decided that some data dimensions are much more valuable than others. However, if there are differences in the value within a data dimension then this method would yield poor outcomes.
Per data model A traditional model for data brokers, where the level of information an organisation receives is determined by how much they are willing to pay. For insight innovators, for the buyer to be able to get more useful data insights, they would have to pay for more packets of data. This is an interesting model, but it would have to be deployed strictly to avoid giving ‘false’ insights.
Points model This can be viewed as the ‘bitcoin’ model or a gift model. This is where insight is priced at different mark‐ups; the user pays for currency to use on the website, and then this currency is equivalent to a certain amount of insight or pay‐as‐you‐go insight. This is useful when different buyers’ value insight differently. It is also useful when the buyer has different interests across the various dimensions of data.
Advertising model This model is traditionally used for many dotcom businesses where an advertising aggregator shows adverts, the content and relevance of which have been generated through analysis of the user’s search history. However, these advertisements can be viewed as a nuisance and the use of ad‐blocking technology results in the site generating minimal or no revenue from these adverts. A better advertising model would be to have targeted advertising relevant to the subject matter of the website.

The successful data broker or insight innovator will combine and adjust these different revenue models to maximise sales. It is of paramount importance to ensure that the data or insight gives a pure message and does not accidentally mislead. For example, if the buyer requests a time series of market interest in a certain product, it has to be noted that the underlying population can be changing over time so that an increase in interest could be due to an increase in a certain subset of people using the company generating the data. This complicated issue is addressed in the next chapter.

There are reputational risks in selling data. The quality of the data and of the transaction reflect directly on the seller. There needs to be a follow‐up process and open exchange with the buyer in the same way as for any other business transaction.

8.5.4 Quantifying the Baseline

Before any monetisation exercise is started, the baseline costs and benefits need to be evaluated so that the change, hopefully an increase in value, can be monitored. The company needs to decide on suitable key performance indicators (KPIs).

Each piece of data has a cost to the producer and owner arising from storage, processing and other related functions. The number of times each dataset is used gives an indication of its value. It is not unusual to find that some data items are collected but are of no interest or use to anyone, although this may change in the future. The number and type of errors in the data are an important indication of the data quality and utility; monitoring this KPI feeds into the data improvement cycle. At the start of the monetisation process, it could be said that a particular set of data items brings in no revenue and only has a cost. As monetisation progresses, this will change and the revenue from actual sales can be monitored as well as the more intangible benefits of increased market presence, customer trust and loyalty, and changed business focus and reputation.

8.5.5 Statistical Process Control for KPIs

KPIs need to be collected at suitable time intervals and should be analysed using the methods of statistical process control (SPC). As noted in Section 5.6.8, manufacturing industry, process industries, healthcare, finance and other sectors all make considerable use of SPC. In the first phase, the method involves plotting KPIs to visualise their patterns and changes over time. Where appropriate, expected values and control limits are constructed from representative data and then the SPC chart gives a powerful improvement tool. KPIs within the control limits indicate that the process is running smoothly with no change; points outside of the control limits imply that a real improvement (or decline) has taken place. The bibliography contains some useful references.

8.6 Connectivity as an Opportunity for Streamlining a Business

Another way to monetise data is to take advantage of it to run the business more efficiently. The organisation can employ smartphone technology to cut its own running costs in the workplace, use smart climate control and lighting, reduce dependence on paper, or save office space. All of these benefits can be monitored as key performance indicators, showing where there are improvements and where action is needed. It is also popular to capture ongoing responses of customers and employees by inviting feedback via smartphone and other devices, for example when you go through customs and are presented with an electronic pad showing five faces ranging from miserable to happy and are invited to tap on whichever one of them expresses how you feel. On the surface, this step looks to be for the benefit of the respondents, allowing them to express themselves, but it also potentially allows the site operator to gather data about the number of users, timing and location, and monitor the flow through the service, assuming most people interact.

Changed methods of connecting with the customer also lead to opportunities for companies to exploit them. This has been seen in banking, where the transition from cheque books and paper to online banking was encouraged by making everything free initially. Now that there is no going back and most people have converted away from using cheques, charges are now added for these transactions.

In health and other insurance products, self‐measurement data starts out being optional and beneficial to prices, but increasingly becomes the norm and incurs a cost if not provided. The cost can be in monetary terms but also in convenience. For example, if we agree to our data being added to the national data bank we are given immediate feedback about our health and prospects, but if we do not donate the data, the threat is that our feedback will only come in the form of aggregated reports.