Gerhard Drexler
Mondi UFP
Andrej Duh
University of Maribor
Andreas Kornherr
Mondi UFP
Dean Korošak
University of Maribor
Big Data refers to mostly unstructured datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze. Emergence of the ability to gather and analyze this massive amount of data is one of the major shifts caused by Big Data. This chapter will provide insight into this shift and offer a set of approaches toward utilization of Big Data in terms of strategic foresight.
In today's world of widely distributed knowledge, innovative companies can no longer afford to base their development solely on their own knowledge and research. Especially, Open Innovation (OI) requires intuitive tools that integrate data into day-to-day processes and translate them into tangible business actions. In order to combine internal and external resources, ideas, and technologies, organizations often face difficulties in identifying external knowledge. Open Innovation depends largely on the ability to scan, interpret, and incorporate both external knowledge and partners. Companies have started to look for new ways to increase the efficiency and effectiveness of their innovation processes. For instance, innovation activities may be fostered through an active search for new technologies and ideas outside of the firm, but also through cooperation with suppliers and competitors. Another important aspect is the further development or out-licensing of ideas and technologies that do not fit the own company's strategy. A number of activities have already been implemented to achieve these tasks, e.g., checking publications in scientific journals, analyzing patents, and attending conferences and seminars. In addition, managers have to tackle further issues like searching for new ideas, evaluating the market potential of a given opportunity, recruiting prospective partners, capturing value through commercialization, and extending innovative activities together with external partners. As the identification and use of external knowledge for technical advancement is an important part of the OI process, the interaction with external sources of knowledge is crucial for firms in order to raise their innovative performance. If substantial resources have to be dedicated to these tasks, companies face a real dilemma: how to focus on the most promising sources of external knowledge while keeping an eye on more visionary and radical inputs. In other words, it is both important to pay attention to knowledge and ideas that are most likely to generate short-term profits and to scan for more future-oriented options which could result in breakthrough innovations.
Open Innovation, as a co-creating and co-developing process together with outside partners, strives for combining internal and external knowledge and technologies. Managers responsible for Open Innovation may leverage these challenges basically in two ways: one is by introducing organizational changes in the innovation environment, and the other is by introducing and using Big Data technologies.
Reorganizing the innovation environment in a world of huge amounts of mostly unstructured data involves establishing data science methods within the organization or using external data science teams. A number of organizations have already appointed senior executive positions related to data science, for example, a CDO (Chief Data Officer). While the term “data scientist” has indeed become one of the buzzwords related to Big Data and many organizations are considering hiring or setting up data science teams, it is important for managers to know exactly what they are looking for when establishing such positions. One of the possible descriptions of a data scientist is given by IBM (www-01.ibm.com/software/data/infosphere/data-scientist/): “The data scientist will sift through all incoming data with the goal of discovering a previously hidden insight, which in turn can provide a competitive advantage or address a pressing business problem. A data scientist does not simply collect and report on data, but also looks at it from many angles, determines what it means, then recommends ways to apply the data.”
Social media technologies and data mining tools bear the potential for idea generation, idea sharing, and measuring the responses to ideas. The product development process can be enhanced using mobile computing for distributing work, fast prototyping, and testing. Advanced analytics can be used to support the market launch phase by measuring the impact of the new products, the sentiment of customers, and the response of users.
Data science comprises a set of principles that support and guide the targeted and efficient extraction of information and knowledge from data. It involves principles, processes, and techniques for identifying and understanding phenomena through the analysis of data.
One of the emerging key approaches is accessing digital resources including media such as email, blogs, Facebook, and Twitter, together with the integration of previously unrelated and unstructured data from various sources. This is the “Big Data” approach. The availability of these huge amounts of data, along with both proven and novel statistical tools, offers a new way of understanding the world (see Figure 11.1).
Figure 11.1: Worldwide Amount of Big Data Streams from Different Sources in 2012
According to a recent joint survey of more than 1100 business and IT executives, most organizations are in the early stage of incorporating Big Data solutions (47 percent are planning Big Data activities, 28 percent are in a pilot phase, and 24 percent have not yet commenced Big Data activities). Almost half of the respondents (49 percent) answered that the top priority of their organizations, when it comes to Big Data, are the customer-centric objectives (other functional objectives were: operational optimization (18 percent), risk/financial management (15 percent), new business model (14 percent), and employee collaboration (4 percent)). Organizations therefore see Big Data enabling “a more complete picture of customers' preferences and demands; through this deeper understanding, organizations of all types are finding new ways to engage with existing and potential customers” (Schroeck et al., 2012).
The question is: How do organizations or enterprises use Big Data to sense inputs for prospective innovations, gain new insights for early decision making, or identify the best partners for joint research and development? This chapter focuses on the utilization of various data channels and social media for automatic generation of interactive knowledge data streams. It describes the design and implementation of open collective intelligence to detect important messages from a huge amount of various data channels and social network data. Organizations must be able to collect, classify, interpret, and exploit information from unstructured multimedia sources in order to obtain structured knowledge and information about developments with high predictive value in defined business sectors. These processes are discussed in detail, supported by two case studies featuring industrial applications and their outcomes. The utilization of new tools also enables companies to overcome their blind spots: either they concentrate on the details of a certain market segment and try to catch as much information as possible or they screen several segments trying to get a holistic view. One often gets trapped either by a very narrow field view thus overlooking most other challenges and opportunities, or one uses a wide field analysis thus neglecting details and staying on the surface only.
Figure 11.2 depicts that dilemma in the form of a radar screen that comprises six different segments of high importance for the success of a company—trends, technology, customers, gap analysis, competitors, and market. For example, trends in the form of habits or behaviors prevalent among customers should be identified as early and exactly as possible to make sure that appropriate new goods and services are made available in a timely manner. Similarly, technology-based companies have to sustain and develop their technological competences. Techniques to determine what steps need to be taken in order to move from a current state to a more favorable future state (e.g., gap analysis), require the identification of characteristic factors of the present situation and the factors needed to achieve future objectives.
Figure 11.2: Graph Symbolizing the Narrow and Wide Field View on the Six Different Fields Being Relevant for Innovation and Sustainability of a Company
Many people and organizations still trust in experts' statements, forecasts, and arguments. The reason for such behavior can be traced back to the so-called “halo effect.” The more convincing an expert is able to communicate his or her belief, the more we are willing to trust him or her. Normally, people are fascinated by experts who explain why something is going to happen—they give the public easily understandable explanations and bring order into a very complex and complicated world. What experts basically suggest is that based on their experience and knowledge they are able to give a precise forecast. Unfortunately, such forecasts are in many cases simply wrong. Nate Silver, an American data scientist and writer, impressively demonstrates this phenomenon in his recent book The Signal and the Noise: Why So Many Predictions Fail—But Some Don't, He analyzed the accuracy of the forecasts of TV pundits in the show “The McLaughlin Group.” Half of the experts' forecasts were wrong—or in other words, listening to experts may sometimes be not much better than throwing a coin.
Of course, it is not the case that society does not need any experts. Scientists, doctors, technicians, and the like, who usually build their expertise on knowledge-based models, experience, and data will still be of tremendous importance. However, there will be an increasing importance to exchange a mere gut feeling, often called experience, by data-driven approaches. In politics, economics, management, science, and medicine both questions as to why and what will be important. If the inputs for such activities, namely the what?, are based on data-driven correlations which include not only some but a huge amount of data, the results may exhibit completely new inherent properties. It is not just another piece of statistics; it is—when using appropriate mathematical and statistical models—the beginning of the utilization of Big Data.
So far, there is no standardized definition of Big Data but most people agree that Big Data science deals with data of large quantity that no longer fit into a single spreadsheet of a relational database nor can these data be analyzed by hand. Accordingly, new tools have to be developed to handle these enormous amounts of data and to analyze and interpret them. Mayer-Schoenberger and Cukier (2013) state that “Big Data refers to things one can do at large scale that cannot be done at a smaller one, extract new insights or create new forms of value, in ways that change markets, organizations, the relationship between citizens and governments, and more.”
Many leading organizations strive for improvement of their search processes by information and communication tools for screening scientific and patent databases, identifying trends, generating and evaluating ideas, identifying prospective partners, and analyzing social networks. A number of examples based on the analysis of internal and external networks have been described by Drexler and Janse (2013). But Big Data goes beyond the analysis of single data sources or simple network structures. For the purpose of this chapter we will simply take Big Data to mean datasets that are too large for traditional data-processing systems and that therefore require new technologies. Extracting useful knowledge from data to solve business problems can be treated systematically by following a process with reasonably well-defined stages. Figure 11.3 depicts how data input from a variety of sources may be utilized to provide input to the front end of Open Innovation. Probably the broadest current business applications are in marketing, such as targeted marketing, online advertising, customer behavior analysis, and recommendations for cross-selling. Other applications focus on competitive intelligence and technology foresight. But what is going to matter most is the dynamic analysis of issues related to new market demands, emergence of new and convergence of existing technologies, and the quick identification of competent and reliable partners for Open Innovation.
Figure 11.3: How Big Data Approaches Provide Input into the Front End of Open Innovation
Over the past years, many applications have been developed based on Big Data, and many more continue to evolve, proven by a look at Google Trends. This tool displays a continuing and sharp increase in search volume for the term “Big Data,” especially since late 2011 (Figure 11.4.).
Figure 11.4: Google Trends (www.google.com/trends) Results for the Search Term “Big Data” in the Period Jan. 2004–Aug. 2013
Big Data isn't just a description of huge amounts of data; it is about identifying and understanding the relations and correlations among pieces of information, and it's about predictions. Big Data analytics enable organizations to capture new data sources in order to gain the insights that they offer.1
Ideally, Big Data analytics provide an overview on emerging issues, grouped into different topics of interest, available in real time, and applicable to in-depth analysis when necessary. This makes them perfectly suited for fostering Open Innovation.
Figure 11.5 summarizes the idea for the top layer of a Big Data tool. Information and communication technologies provide us with convenience and ease of access to information. Very current and important pieces of information can be derived from the analysis of huge amounts of unstructured data and shown on the computer screens or mobile devices of top executives on demand, based on their interests and in accordance with company strategy and predefined search fields. The topics may range from technology screening to trend scouting, customer responses, and much more, symbolized by the six different tag clouds in Figure 11.5. (The tag clouds are similar to the six different innovation fields in Figure 11.2.) The basic idea is that such Big Data tools provide structured knowledge and information on a daily basis in a very condensed form. In the best case, a tool like that takes you only five minutes per day to digest a specific amount of new knowledge. It does not need your attention for longer than the time one needs to have a cup of coffee, that's why we call it a “Cup of Information” (CoI).
Figure 11.5: Graphical Visualization of the Cup of Information Approach Described in this Chapter; A Functionality Which Provides Executives With Information Based on Big Data
Big Data is characterized by four Vs: Volume, Velocity, Variety, and Veracity. Data Volume has been an issue for a while, and most organizations still struggle with the increasing size of their databases. Too much volume is not only a storage issue, but also a complex analytical problem. Velocity has two aspects: how fast data is being produced and how fast the data must be processed. As the first two Vs already posed problems in the early 1990s in, for example, credit-card work, Big Data has significantly expanded horizons by integrating data from a variety of sources. A three Vs model was proposed in 2001 by Douglas Laney (2001). Veracity is a very important issue as most Big Data comes from sources outside a company. Veracity represents both the credibility of the data source and suitability of the data for the target analysis. There are many cases where the four Vs can be handled using traditional tools and technologies, but Big Data actually happens when every incoming observation is evaluated against all prior observations.
Consumer-generated media buzz is well known for its significant influence on brands and an excellent example of the importance of the four Vs for modern business. It has forced companies to communicate with consumers through consumer-generated channels such as social media. As IBM writes on its social business website (www.ibm.com/social-business/us/en/): “Businesses move from liking to leading when they look beyond social media to see how social technologies drive real business value. From marketing and sales to product and service innovation, social media is changing the way people connect and the way organizations succeed.”
The predicting power of social media such as Twitter has also been demonstrated. Analyses of tweets have been used to predict box-office movie revenues, and the attention has been found to be strongly correlated with future ranking. Sentiment analysis is important in improving prediction, e.g., specifically collective mood states as detected from daily Twitter feeds were used to predict changes of DJIA (Dow Jones Industrial Average) closing values (Bollen, 2011). The tweets were analyzed using two sentiment analysis tools: OpinionFinder (http://mpqa.cs.pitt.edu/opinionfinder/) and Google-Profile of Mood States (GPOMS) evaluate the mood states against six mood dimensions (Calm, Alert, Sure, Vital, Kind, and Happy). For example, the DJIA day-to-day values and the “calm” dimension time series lagged for three days. Such results indicate a speculative possibility that the collective mood states of the general public could affect the stock market, but such claims need further careful examination and testing (http://sellthenews.tumblr.com/post/21067996377/noitdoesnot).
Instead of considering particular sectors or industries (such as the financial sector in the previous examples), T. Preis and colleagues (2012) focused on quantifying the link between online traits and economic indicators of entire countries. In their study “Quantifying the Advantage of Looking Forward,” the authors defined the “future orientation index.” For each year the index is defined with the ratio of search volume for the coming year and the search volume of the previous year (i.e., for year 2009 the search terms are “2008” and “2010”). The future orientation was computed for 45 countries and results compared with the countries' GDP. There was a strong correlation between future oriented search habits of users from a particular country to that country's GDP. The authors suggest two possible explanations for these results: “Firstly, these findings may reflect international differences in attention to the future and the past, where a focus on the future supports economic success. Secondly, these findings may reflect international differences in the type of information sought online, perhaps due to economic influences on available Internet infrastructure.”
For product development, especially in the early stages of research and development, it may even be more complicated to analyze the appropriate data sources. Accordingly, it is very important that companies rely on a holistic approach, as described earlier as the Cup of Information principle. Only by considering all segments that are relevant to a certain business does it become more feasible that novel and successful products can be developed. Therefore, sustainable product development is much more than listening to some tweet streams or by catching up the latest trends by some IT tools—it is the combination of all of these and much more data to get a precise, but still wide enough picture.
John Grinder, Ford's Big Data chief, underlines that the U.S. car producer is looking to leverage Big Data to make its products better. “The fundamental assumption of Big Data is that the amount of that data is only going to grow and there's an opportunity for us to combine that external data with our own internal data in new ways. For better forecasting or for better insights into product design, there are many, many opportunities. We recognize that the data on the Internet is potentially insightful for understanding what our customers or our potential customers are looking for and what their attitudes are, so we do some sentiment analysis around blog posts, comments, and other types of content on the Internet” (Hirner, 2012). Another source of Big Data for Ford and virtually every other car producer are the mass of sensors built in modern cars sensing temperature, pressure, humidity, local gas concentrations, not to forget all the cameras and many more devices, which now and especially in the future are going to monitor many parameters in and around a car. As Grinder further mentions, “That's a huge unexplored opportunity for us. Can you build better weather forecasts? Can you make better traffic predictions? Can you help asthmatics avoid certain areas? Can you control the airflow in the car? Never before did we have all of this data available to us nor did we have the computing power to handle it all. The killer app may be one that we haven't really anticipated yet.” It is very likely that in the future the automobile industry will offer completely new services, and former classical IT companies like Google will perhaps become well-known for their self-driving cars utilizing their immense geospatial data material.
The examples mentioned above have demonstrated that the analyses of massive data streams offer exciting possibilities for extending an organization's foresight. Is it then important for organizations to recognize the need to use Big Data analytics methods also for their internal development, management processes, and to manage its own structured, “small” data?
In addition, Jeff Jonas (2007) noticed that organizations sometimes suffer from “enterprise amnesia,” a situation “when an organization misses the obvious (e.g., when other relevant information is trapped elsewhere in their organization) and then takes incorrect action . . . or simply forgetting what was known or should have been known.”
Figure 11.6 shows that the divergence between a fast increasing observation space (data streams) and slower rise of sense-making algorithms opens the gap for the enterprise amnesia. Jonas suggests that the cure for enterprise amnesia requires perpetual analytics in the form of “a capability whereby the data actively finds the data,” what in our view corresponds to implementing technologies that use both internal and external data streams.
Figure 11.6: Enterprise Amnesia as a Divergence Effect Between Faster Increasing Observation Space (Data Streams) and Slower Rise of Sense-Making Algorithms
After Jonas (2007)
In the Wired article “Leveraging Big Data to Reach Today's Mobile Consumer,” Nelson Estrada (2013) argues that using Big Data is the marketer's response to an ever-increasing number of mobile users, expected to surpass desktop users in 2014. Specifically, “companies are taking advantage of big data analytics to help them better cater to today's consumer” because “today's consumer wants businesses to understand their needs more in depth.” Geo-specific advertising and shopping recommendations are two examples of how companies can influence mobile customers using Big Data.
Geo-specific advertising depends on the geo-location information of customers. Anonymized mobile phone datasets have already been used to explore whether human mobility is predictive. The uniqueness of human mobility traces was recently demonstrated in a study “Unique in the Crowd: The Privacy Bounds of Human Mobility,” by de Montjoye and co-workers (2013). Using a massive set of human mobility data they showed that “uniqueness of human mobility traces is high” and that “uniqueness means that little outside information is needed to re-identify the trace of a targeted individual even in a sparse, large-scale, and coarse mobility dataset.” Only four random spatio-temporal points are sufficient to uniquely determine almost all (95 percent) mobility traces.
Shopping recommendations strategy uses Internet activity data of customers to offer new or similar products or services based on their views, clicks or choices. The study “Private Traits and Attributes are Predictable from Digital Records of Human Behavior” has shown that using just Facebook Likes a wide range of personal attributes from age and gender to sexual orientation and religious and political views can be predicted. The matrix of user-like binary values was first reduced into a user-components matrix (with 100 components) from which a prediction model was constructed. The authors suggest that “the relevance of marketing and product recommendations could be improved by adding psychological dimensions to current user models” but, on the other hand, warn that “the predictability of individual attributes from digital records of behavior may have considerable negative implications, because it can easily be applied to large numbers of people without obtaining their individual consent and without them noticing” (Kosinski et al., 2013).
Instead of tracking people and their traits many innovators are interested in tracking ideas and content as they spread supported by social media. In Spreadable Media: Creating Value and Meaning in a Networked Culture, Jenkins, Ford, and Green (2013) discuss the properties and processes of diffusing or spreading content by user dynamic networking, sharing, and collaborating. In contrast to “stickiness” that reflects the content influence by the volume of its users, spreadability “recognizes the importance of the social connections among individuals, connections increasingly made visible (and amplified) by social media platforms. This approach may still include quantitative measures of how frequently and broadly content travels, but it makes important actively listening to the way media texts are taken up by audiences and circulate through audience interactions.”
In an Open Innovation environment ideas must travel, be shared, and combined or, as Jenkins, Ford, and Green (2013) put it: “Our message is simple and direct: if it doesn't spread, it is dead.” Therefore, we present and discuss a simple but powerful model in order to describe the spreading of the content driven by social network of users. These users interact (create, change, update, comment, like, tweet, cite, etc.) with the content. Content can be in any form of text, image, video, sound, or convergent combination of these media including metadata about the content itself. Users also form or belong to a social network through which they can communicate. Consider an example where the contents are ideas in an Open Innovation world. We are interested in a measure for the transfer and absorption of an idea by some others, a process that is expected to boost innovation by cross-pollination of ideas. We apply the radiation model that was recently developed to predict mobility patterns (Simini et al., 2012), where the basic underlying process is based on emission and absorption of particles. A particle (in our case an idea, a part of it, a reference to an idea . . .) is emitted from location A with a certain absorption threshold and it is absorbed at location B where the absorbency (a property of location B) is greater than the absorption threshold. It turns out that within this model the average flux of particles from A to B can be explicitly written.
This model can also be viewed as the coupling of two networks: there is a social network of users interacting with the content and with each other, and then there is a content network characterized by links between content given by the measure of spreadability. The two networks are coupled and interdependent—the changes of the contents influence the interaction of users with it, and user actions change the contents. Implementation of this model into a Cup of Information approach promises valuable predictions for organizations. Besides suggesting the possible topics or content of interest and ranking them automatically or according to users' interests, a more important feature of the model is the ability to make predictions on the future influence of the content by estimating the spreading of the content. Figure 11.7 shows the flowchart of a Cup of Information tool that uses the analysis and prediction module described above. A Cup of Information is a condensed, clear, and comprehensive short report on daily trends, topics, and social media buzz, tailored to a user's (let's say a CXO of an organization) interest and needs. The report is automatically constructed from monitoring and analyzing external Big Data streams (corresponding to the six clouds depicted in Figure 11.5), connecting them with internal data streams. The starting point for the internal data stream is the user's website and social media data with an option to add additional keywords, so the application can basically run without any intervention from the user. The internal data streams are the input for the definition module, which constructs the users' interest based on the internal data flow. The output of the definition module is correlated with external Big Data sources in the analysis and prediction module. Based on the spreadable content model for the analytical part and a prediction module, a brief Cup of Information report is prepared (see Figure 11.7), which may be extended to a detailed, in-depth report on selected topics. The selection and focus steps are used to adjust the “radar screen” depicted in Figure 11.2. The information from the selection process can be used as a feedback mechanism.
Figure 11.7: Flowchart of the Application Model for the Cup of Information Approach for Leveraging
The Cup of Information approach has already been used to help boost the Open Innovation process in organizations such as in an insurance company. The daily Cup of Information contains a lot of standard information like new products of competitors, political news regarding the insurance branch, customer feedback on various products, etc., but the CoI also highlights new and upcoming topics and, most importantly, how they are connected. For example, one major trend is the higher mobility of seniors, people who are retired and use their leisure time for traveling, investigating foreign cultures, and visiting exotic countries. This trend is driven by a steadily growing number of various offers for this target group by travel companies. As the CoI tool analyzes the internal database of the insurance company (see Figure 11.8), the software “knows” that traveling and travel insurances are relevant for that kind of company. Accordingly, it uses “travel” as one of many keywords when doing its semantic search of the external data streams. However, the new quality brought in by using a Big Data approach is the connection between this trend and other internal and massive external data.
Figure 11.8: Detailed View of the Tag Cloud Trends of the CoI for an insurance company on May 17, 2012
Taking a closer look at the tag cloud Trends of the insurance company's CoI shown in Figure 11.8 one can decide whether these trends might be of further interest (out of other relevant trends; normally between three and five themes are presented per cloud). If yes, a click on a specific trend (not shown here) opens a detailed analysis which shows the connections between “high mobility of seniors” with other tag clouds like Technology, Markets, Customers, etc. Surprisingly, the CXO of the insurance company realized that there seems to be a strong link to a technology called “paper microfluidics,” a term nobody inside an insurance company would ever search for. However, a closer look revealed that this technology is a potential key-enabling technology to develop mobile diagnostic devices. Coupling such small, but powerful devices with a smart phone, one could offer completely new diagnostic methods for seniors even at remote destinations. Accordingly, this insurance company started a project aimed at offering a completely new product for senior travelers: a package combining a classical travel insurance with a new, high-performance but low-cost diagnostic device, which can considerably reduce the risk for elderly people. So the company may kill a couple of birds with one stone: bringing down costs (seniors who use that device will have better control of critical medical parameters and thus avoid illness) but, at the same time, give customers a secure feeling. As people will soon notice that this insurance company is really caring for its clients, acquisition of new contracts will be made more likely. A byproduct of this approach is the identification of partners for the whole Open Innovation value chain.
Another application of Big Data was studied in paper industry. Like in other industries, the product developers have to continuously watch out for emerging technologies. This can be accomplished by screening patents, but unfortunately patents do not reveal if and when the respective technology will be commercialized. Another key to identifying new printing devices can be based on the announcements of the original equipment manufacturers (OEMs) and other high-tech companies. Big OEMs like Canon, HP, or Xerox may be watched quite easily via their websites, but to find out what's going on inside their R&D pipelines is almost impossible. In addition, exploring R&D activities of the big number of small but innovative companies requires significant effort, and potentially disruptive products of startups are also hard to detect early enough. Big Data methods can support here by screening the Internet, databases, and other social media, identifying new terms and correlating them with already known terms like “paper,” “printer,” “images,” and the like. Similar to the case of the insurance company, the internal input was derived automatically from the company's homepage by special software and refined by assigning priorities and adding further terms of interest. The external input in this case comprised a science database, a patent database, the World Wide Web, blogs, and tweets. As described earlier in this chapter, the applied software detects related issues based on both the speed of spread and the correlation between internal and external terms. An operator is in charge of maintaining the accuracy of the software by checking both internal and external terms for relevance and priority. This procedure is quite easy for the operator because the software provides a list of proposals derived from the company's homepage and the only task is to remove keywords which are not within the company's scope and to add new ones if necessary. The operator also decides on the size of the correlation matrix, the number of the correlated word pairs, and weighting factors like dynamics of correlation. Dynamics is defined by the number of citations of the respective word pairs per time unit, e.g., a week or month.
Figure 11.9 depicts an excerpt of the resulting matrix. The rows list the internal key words, derived from the company's own homepage and refined by the operator. The columns list the external terms correlating with the ones derived from internal sources. The grayscale of each square represents a weighted value of the strength of the correlation between the terms and the dynamics of their occurrence. If the operator moves the mouse over a square, the system automatically reads out the respective pair of terms, the strength of their correlation, and their dynamics. Thus it becomes quite easy for technology scouts to identify new issues in their fields of interest.
Figure 11.9: Detail of the Word Matrix of the Cup of Information for a Paper Producer
The results depicted in Figure 11.9 indicate that a number of pairs of internal and external terms represent both high correlation and high dynamics. Top results are indicated by black squares, followed by less optimal results in different shades of gray. As an example, the matrix revealed that the new term “LumeJet” shows good correlation with a number of internal terms like paper, print head, and digital printer. Digging deeper into the files it became obvious that a company called LumeJet had developed a new inkless digital printer for ultra-high-quality printed output.
From this point onward it was an easy task for the operator to extend this insight into an in-depth study of this technology in order to find out if this technology poses opportunities or threats to the company.
In conclusion, methods like those described in the case studies of the insurance company and the paper industry provide a hands-on opportunity to an early identification of new knowledge as the basis for product development. Operators don't need any programming knowledge and can easily adjust the outcomes of the search to their requirements by adapting their keywords and weighting factors. Their main task is to prioritize and set the principles that support and guide the software-based extraction of information and knowledge from data and to check the outcome of the procedure for emerging topics of interest.
One of the keys to success for organizations embarking into the Big Data world is certainly having a competent data science team, people working with Big Data and Big Data technologies. In his new book, Too Big to Ignore: The Business Case for Big Data, Phil Simon makes a compelling case for organizations to embrace and adopt Big Data technologies. Approaching this process with necessary commitment, valid equally for small and large companies, it is true that: “All else being equal, organizations that view and utilize information in this manner will realize greater benefits from Big Data than those that don't.” How should an organization start with Big Data? Phil Simon suggests taking small steps—make some simple short-term goals such as gathering unstructured data on current and former customers, and understand customer behavior by looking up websites and asking current employees—while the long-term focus should be on prediction capabilities of Big Data, such as predicting the products that will gain traction (Simon, 2013). The authors of Analyzing the Analyzers: An Introspective Survey of Data Scientists and Their Work, warn that finding and hiring data scientists isn't an easy task: “There are two related issues that we have seen when it comes to misunderstandings about the roles of data scientists. In one case, excessive hype leads people to expect miracles, and miracle-workers. In the other case, a lack of awareness about the variety of data scientists leads organizations to waste effort when trying to find talent” (Harris et al., 2013). What kind of background, skills and knowledge are expected from a data scientist? As Luikides (2010) argues: “Data scientists combine entrepreneurship with patience, the willingness to build data products incrementally, the ability to explore, and the ability to iterate over a solution. They are inherently interdisciplinary. They can tackle all aspects of a problem, from initial data collection and data conditioning to drawing conclusions. They can think outside the box to come up with new ways to view the problem, or to work with very broadly defined problems: here's a lot of data, what can you make from it?”
Big Data inevitably puts focus on data-driven approaches to search for new solutions in product development, understanding customers, or exploring new markets. Data-driven approaches are in sharp contrast to practices such as relying on gut feelings, routine, hunches, intuition, or corporate policy. That's exactly what managers have to keep in mind when dealing with Big Data and Open Innovation. We recommend the following steps to use Big Data techniques to enhance Open Innovation:
In a Big Data world “knowing what, not why, is good enough” (Mayer-Schoenberger and Cukier, 2013), meaning that correlating massive, fast, and versatile data streams yield fast and clear insights and help develop new products or services such as in the case of Amazon's recommendation system that looks for associations between products. However, to avoid pitfalls we should always remember that correlation does not necessarily imply causation.
Threats to privacy and security are often seen as the dark side of Big Data, but there is also a third danger, that of “falling victim to dictatorship of data, whereby we fetishize the information, the output of our analyses, and end up misusing it. Wielded unwisely, it can become an instrument of the powerful, who may turn it into a source of repression, either by simply frustrating customers and employees or, worse, by harming citizens” (Mayer-Schoenberger and Cukier, 2013).
In addition to the most debated issues and pitfalls related to Big Data we focus here on another point that can have direct consequences for businesses, and that is opinion spamming. Recent investigations about brand recognition show that many prominent brands are using fake followers or fake opinions for their promotion or to discredit their opponents (De Michelli and Stroppa, 2013). So far, it is still quite easy to detect fake followers. On the other hand, fake opinions are very hard to recognize as fake by just reading them. Today's spammers or propagandists are trying to make us change connections and values in our trust network. Each of us keeps a mental trust network that helps us decide what to believe and what not to accept as fact. Fake followers and fake “Likes” are quite easy to detect. But it is entirely different to detect fake opinion or fake review by someone who is preferably within our circle of trust—for example our favorite retail store.
Opinions are central influencers of our behavior and are mostly formed within a context of social interaction through text and talk. Whenever we have to make a decision, we would like to know others' opinions. In recent times, when an individual needed an opinion, she asked friends or family. Similarly, when an organization needed consumer opinions, it performed surveys. Nowadays, individuals and organizations increasingly crosscheck each other's opinions by using consumer-generated media. For the last 10 years we have gathered huge volumes of opinioned data recorded in digital form. Digital collection of opinions is updated on a daily basis with opinions from review portals, forum discussions, blogs, microblogs, and social networks.
The detection of fake opinions usually relies on spotting the patterns of duplication or extreme similarity of multiple reviews. An opinion spammer or a group of opinion spammers can simply mislead such sophisticated algorithms that are focused on the content of the fake opinion as there are too many similar opinions that can make the recently written fake opinion look regular. The consequences of unrecognizable fake opinions can be misleading conclusions drawn from the fake data. There is already some research on how to enhance the detection of fake opinion profiles based on content originated by such profiles using additional features from quantitative psycho-linguistic text analytics tools (Duh et al., 2013). In any case, it is of paramount importance to detect such fake activities to ensure that the web remains a trusted source of valuable information.
There is no doubt that Big Data is going to change the way organizations look at and understand data, or as Jeffrey Needham (2013) states in Disruptive Possibilities: “Big Data will bring disruptive changes to organizations and vendors, and will reach far beyond networks of friends to the social network that encompasses the planet. But with those changes come possibilities. Big data is not just this season's trendy hemline; it is a must-have piece that will last for generations.”
In this chapter, we have looked at several approaches for the analysis of massive, fast, and versatile data streams in order to understand and gain insight into financial trends, discover personal attributes from digital fingerprints we leave in the web, or uniquely determine mobile phone users by tracking their daily movement. We have presented a model that can be used to predict spreading of content and suggest applications—the Cup of Information. This or similar tools may help organizations leveraging Big Data to develop new products, processes, and services, and to identify external partners and start new projects in an Open Innovation environment. The front-end skills of managers—networking, questioning, observing, associating, and experimenting—may be strongly supported by these tools. Our case studies explain two applications of Big Data in detail and provide insight into how Big Data applications can provide inputs for Open Innovation.
Most organizations are currently in the early stages of Big Data activities and while there are many pitfalls and dangers to be avoided along the road, Big Data is certainly too big to ignore. It is cool and scary at the same time. It is “concurrently beneficial and potentially malevolent [. . .]. Responsible organizations should take the requisite privacy- and security-related steps to minimize the chance that Big Data results in big headaches” (Simon, 2013). In addition, one should take care that the Big Data hype will not cause something like apohenia, originally a medical term which is sometimes applied in a nonclinical manner in order to refer to the detection of patterns where none exist. It should also be considered that the resources needed for Big Data analysis are carefully chosen and aligned to the needs of an organization. The good news for organizations is that especially smart and lean tools like those based on the CoI need little effort and provide almost real-time results.
To be able to effectively follow the customer-centric objectives in both a Big Data and an OI world, it is important for organizations to be present and active in social media, to utilize new tools for data analytics and the identification of new opportunities, and to apply gamification approaches to develop or test new products and to interact with existing and new customers. Businesses indeed move from liking to leading when, at the same time, they engage with, and look beyond, social media.
The authors wish to thank Marko and Urška Samec of Research and Arts Zone of University of Maribor (RAZ:UM) design team for the infographics design.