Chapter 11. A Future for Search

Writing about the future of search is a challenge because the very rapid pace of technical development could make sections of this chapter look very dated by the time the book is published. My objective in doing so is to illustrate that after a long period of benign neglect it would seem that there is a renaissance in enterprise search. The consensus view is that the rate of growth of enterprise information and data is now so high that action now has to be taken to ensure that the organization can benefit from this information. As the adoption of enterprise search accelerates search vendors will feel more comfortable investing in research and development to bring new functionality to the market.

This chapter summarizes some of the areas in which evidence of this investment will be most evident. In a period of rapid change it is even more important than it has been in the past to have a search strategy that is grounded in business reality and user requirements so that these developments can be assessed in terms of the possible impact they could have on business performance.

In 2011 McKinsey Global Institute (MGI) published a report on Big Data which indicated that that enterprises around the world used more than 7 exabytes of incremental disk drive data storage capacity in 2010; nearly 80 percent of that total appeared to duplicate data that had been stored elsewhere. MGI also analyzed data generation and storage at the level of sectors and individual firms. It estimate that, by 2009, nearly all sectors in the US economy had at least an average of 200 terabytes of stored data per company (for companies with more than 1,000 employees) and that many sectors had more than 1 petabyte in mean stored data per company.

The combination of low storage costs and a lack of an information management strategy that takes a life-cycle view of information to identify what information can be either archived or deleted, together with a rapid growth in the daily increase of emails, social media, rich media and other information categories as the result of doing business in the 21st Century and the chances of finding any particular item of information are starting to get worryingly low. There are no quick fixes to this situation other than by investing in information management applications such as enterprise search, text and data mining and business intelligence.

It is too early to gauge the full impact of the acquisitions made by HP, Oracle, IBM, and Lexmark in 2011 and 2012, but it is likely to be a positive one. These major IT companies maintain very close relationships with their enterprise customers and clearly see an opportunity to offer a wider range of search applications to these customers. Shareholders will be expecting to see a return on the investment in these acquisitions, even if in the case of HP it could take some time to achieve. Companies now have a higher degree of security of supply of these search applications, and in addition Microsoft and Google will continue to provide search solutions.

These acquisitions still leave a large number of independent search vendors, most of which are privately held. Investors in these companies can now see an exit strategy. If the technology is good enough then there could be a trade sale possibility, a much easier exit route than going public in the current economic climate.

The entire search industry is going to benefit from the marketing and sales efforts of the major IT vendors and the outcome of research surveys will hopefully convince companies that search is business critical and that the closest possible match between user requirements and technology is essential to maintain business performance.

Microsoft came late to enterprise search and has struggled without success to support the two search applications in SharePoint 2010 and the FAST ESP application it acquired in 2008. Now that the company has taken the decision to withdraw main-stream support from FAST ESP in 2013 it can focus on developing the search functionality of SharePoint. For a significant number of companies SharePoint 2010 has been the first time they have been able to offer employees a good search application, especially so if the company has invested in FAST Search Server for SharePoint 2010 (FS4SP).

The next release of SharePoint is due in 2013 and in mid-July Microsoft released some initial information on FS4SP that is designed to improve both the functionality and administration of the search application. However it is important to remember that FS4SP is optimized for SharePoint 2010 and in the future SharePoint 2013, and that it is not being positioned as a replacement for FAST ESP.

Not only have SharePoint customers now have a better appreciation of the value of search but also Microsoft channel partners have had to become much more familiar with the technology and use of search. This knowledge will gradually result in the emergence of a cadre of search experts that may wish to move out of the integrator role and into a corporate role as search managers and developers.

“Big Data” has appeared from no-where to become one of the buzz-words of 2012. The Exalead definition is that a data collection is considered “Big Data” when it is so large an organization cannot effectively or affordably manage or exploit it using conventional data management tools. The size is relative rather than absolute. It is not just a ‘Big Company’ issue. Another approach at defining Big Data approaches it from the characteristics of Volume, Velocity, Variety and Variability. ‘Velocity’ takes into consideration both the rate of change of data sets and the impact that even a small data item may have on a much larger data set. ‘Variety’ is a reflection of the number of different database formats and master data management schemas that may be involved.

In the context of the future of enterprise search there are a number of issues and opportunities arising from the publicity around Big Data. It is putting enterprise search much higher up the list of ‘must have’ enterprise applications as senior managers start to focus on the ability of the company to find information for probably the first time ever,

The major IT companies see the solution of Big Data problems as a very important market opportunity, hence the acquisitions by Oracle and IBM in particular. Google has launched its Big Query web service and Amazon and Microsoft offer similar services. Autonomy has had a private cloud service for some time.

Companies are starting to discover just how much information they have in databases, and are finding that not only are the existing tools inadequate to meet the potential demand for Big Data analysis but that they have no employees with the skills needed to develop these solutions. In the USA in particular the concept of the ‘data scientist’ is gaining ground very quickly.

However it is important not to see enterprise search as the ‘answer’ to managing Big Data. Companies need to be able to find patterns in Big Data and this is where text analytics has a major role to play. With search there is no further transformation to the text when the results are presented to the user. This text must be integrated and transformed before it can be analyzed. Some of the enterprise search vendors do offer text analytics capabilities and will undoubtedly be expanding these in the future but there is also a substantial group of companies that specialize in text analytics, for example Attensity, Business Objects, Clarabridge, ClearForest, IBM, Lexalytics, SAS-Teregram and Synaptica.

Also on the edges of enterprise search are the vendors of business intelligence applications, including Business Objects, Information Builders, IBM, Microstrategy, Microsoft, Oracle and SAP. These applications provide some degree of search capability but their primary role is in proving managers with access to reports and dashboards that enable to track business performance on as near a real-time basis as possible. Again some of the search vendors, for example Exalead, also provide some dashboard interfaces but as with text analytics a significant amount of processing effort is required to integrate, clean and standardize data and information prior to analysis and presentation. Because of the volume of changes that have to be made to the databases on a regular basis (perhaps hourly) business intelligence applications use sophisticated Extract-Load-Transform (ELT) applications supported by Complex Event Processing engines.

In 2008 the Forrester Group published a report on Unified Information Access, making the following observation in the introduction to the report:

Other major consulting companies, notably Sue Feldman at International Data Corporation (IDC) take a similar position. Probably the company doing more than anyone else to get UIA on the agenda of senior management groups is Attivio. The Attivio solution is based on the Apache Lucene open-source software but with a lot of proprietary code on top. Both CEO Ali Riaz and CTO Sid Probstein were at FAST Search and Transfer prior to its acquisition by Microsoft. It is indicative of the potential for UIA solutions that Attivio gained an investment of $37M late in 2012.

As with ‘Big Data’ the term ‘Unified Information Access’ has no concise definition but it is indicative of an increasing level of integration between text-based enterprise search, business intelligence, content analytics, text and data mining and big data applications.

Over the next few years the ‘edges’ between enterprise search, text analytics and business intelligence applications will become increasingly blurred but underneath the user interface they remain quite distinct applications and it is doubtful that any vendor, even IBM or Oracle, will be able, or even wish to be able, to offer a universal application.

We are only at the very beginning of the mobile revolution. In 2010 it looked as though it was all about corporate-supplied smartphones and just two years later it is about the corporate use of personal smartphone and tablets. Mobile access is all about search, and about delivering information not just documents. For some years now search applications have extended across the entire desktop surface with facets and filters. This type of user interface has value for certain use cases, but not for mobile use. Screen space is at a minimum and the use of every pixel has to be optimized.

As a result mobile user interfaces are going to move in novel directions and in doing so will stimulate innovation in the desktop interface. For mobile use context is everything. This is not just about location-specific context but about searches that may have been carried out in the previous hours or days, and not necessarily on the mobile device itself. A sales manager may well have updated a set of customer profiles on a desktop or a tablet but now needs the latest possible information on the customer as they wait in a reception area with no more than a click or saying ‘Here’ into the smartphone.

This type of requirement is also going to increase the requirement to create and store search profiles and to be able to retrieve results sets from earlier searches, something that has not been given much attention.

Siri, the voice-command feature of the Apple iPhone and iPad, has remarkable capabilities even in its initial release, and mobile requirements will undoubtedly stimulate the development of natural human interfaces, such gestures and eye-movement, which will be transferred to desktop devices sooner rather than later. Some search vendors, notably Isys-Search, have taken a bottom-up approach to designing mobile search applications, whereas others are still trying to adapt full-screen approaches.

The interface with mobile search will be either voice, a single finger or a wave of the hand. These natural interfaces will almost certainly migrate from mobile to the desk top. The office of the future may end up looking very like the vision presented in the US TV series Crime Scene Investigation (CSI) where the forensic police team can call up any number of applications through a touch of a screen and drill down into the data the same way.

Because relevance is defined in terms of a single user it is easy to ignore the situation where the same user is carrying out multiple searches on perhaps quite different topics and would value the search application being able to integrate the different searches together. A use case might be where an engineer has been presented with the need to design a particular type of bearing. There are many approaches to this problem and the engineer may want to explore each of these individually and then integrate the best of the solutions together in a desktop environment rather than cutting and pasting from a set of printed search results.

An extension of this use case is where members of a development team have conducted searches using their own particular skill and knowledge sets and now wish to integrate them for the use of their colleagues. As more work is carried out in virtual teams in multiple locations the ability to integrate multiple searches, and then have the master query and result set updated on a periodic or ad hoc basis is going to emerge as an important business requirement.

As with so many other aspects of search the term ‘social search’ is not well defined. The role of enterprise search in the effective use of social media is going to be increasingly important. As the number of blog and wiki channels increase and as more work is carried out in collaborative workspaces, the challenge of tracking new items of information that are relevant to any of the multiple tasks that we carry out each day is going to be increasingly difficult to manage using RSS and other alerting feeds. The solution will be to use search as a means of filtering perhaps a hundred different channels to provide a ranked list of newly added relevant information. This use of search is already well developed by companies providing press alerting services. In the enterprise the search application will need to cope with the language of social media, which will inevitably make use of colloquial language and shortened forms of words and expressions, especially in the case of microblogging.

Another view of social search might better be described as social context search, where the search application is taking account of documents and blogs that the user has written, recent searches they have carried out and meetings that are in their calendars. The aim is both to improve the quality of search results and to alert the user to relevant content. Managing the potential overload of information will be a challenge, and will require organizations to invest in training employees to get the best out of these applications.

Federated search, the ability to search for information across multiple repositories and applications and then provide an integrated set of results, is a fundamental requirement of enterprise search. The usual model is for a module of the master search application to send the query to the search applications in each of the target repositories. The results from each are then integrated in some way and presented to the user. In theory it is easy, and in practice it is extremely difficult. Each of the individual search applications will have calculated a relevance ranking on the basis of the content in the repository, so normalizing the results to provide a rational overall ranking is not a reliable solution. There are almost certainly going to be performance delays, especially if the repositories are located around the world, and these performance delays will be acerbated if the repositories have different security models. Single sign-on for all applications is still rarely achieved.

Then there is the challenge of de-duplicating content from the various repositories. There are solutions for this when a single language is being used, but the situation is much more complex with multiple languages.

Another option is to create a master index of all repositories, search the master index and then download the relevant items from each repository. This option runs into some serious index performance management challenges.

A substantial amount of research and development is being undertaken into achieving good performance from federated searching as this is a core requirement of unified information access and search-based applications. Despite the best endeavours of search vendors high-performance federated search applications providing access to a list of relevant, de-duplicated information is still some way in the future.

A very important factor in shaping the future direction of enterprise search is research from the information retrieval community. Although there are many academic institutions undertaking information retrieval research there are also very active research groups in Google, HP, IBM, Microsoft and Oracle. There are many conferences on information retrieval, including those organized by the Special Interest Group for Information Retrieval (known as SIG IR) of the Association of Computing Machinery. Of particular importance in developing solutions to enterprise search problems are the annual TREC conferences.

The Text REtrieval Conference (TREC), co-sponsored by the National Institute of Standards and Technology (NIST) and U.S. Department of Defense, was started in 1992. Its purpose was to support research within the information retrieval community by providing the infrastructure necessary for large-scale evaluation of text retrieval methodologies. The TREC workshop series has the following goals:

Each TREC conference consists of a set of Tracks, each of which focuses in on a particular information retrieval problem, and the results are made publicly available. In the case of enterprise search, which from time to time has been a track at TREC, one of the fundamental problems is building a test collection that is representative of enterprises. Various surrogates have been used, such as a section of a public web site, but that is not going to emulate the complexity, homogeneity and scale of enterprise repositories or the requirements associated with federated searches across multiple applications.

Being able to gain access to enterprise repositories has been a major challenge for the information retrieval community because companies are concerned about the inadvertent release of confidential information in the published results from the research. In this respect the major IT companies are in a better position as they are able to use their own corporate repositories, but then have no way of knowing if the results of the research are being biased in any way.

Over the last ten years there have been a number of important meetings at which information retrieval research teams have tried to identify and prioritise areas for future research. The most recent of these (SWIRL2012) took place in Australia in early 2012 and in the preamble to the results of the workshop the observation was made:

The themes that emerged from this workshop were:

Inevitably there is a lag between information retrieval research outcomes and their inclusion in commercial and open-source systems. The point of this section on information retrieval is that there is a substantial amount of research taking place, and increasingly this research will focus on enterprise search opportunities and challenges. A question to ask search vendors and search developers is the extent to which they are aware of this research and are ready and able to incorporate this research into their applications.

Do ask these questions the search support team needs to be monitoring developments in information retrieval as well as enterprise search technology. A good place to start is to subscribe to the Digital Library of the Association for Computing Machinery which covers a very wide range of conferences, reports and journal articles on information retrieval and enterprise search, including the conference proceedings of the ACM Special Interest Group in Information Retrieval (SIGIR).

Although there is a significant amount of research taking place into enterprise search there is an almost total lack of academic courses on information retrieval, let along enterprise search. There are perhaps 200 universities teaching information science and informatics at undergraduate level but information retrieval is usually only one small element of the three-year course. There are many more universities teaching computer science but again the amount of time allocated to information retrieval is very limited. The issue for companies seeking to recruit search professionals is quite bleak, and is likely to stay that way for some time to come.

One bright spot is the Lucid University, set up by LucidWorks, which offers training courses in Solr and Hadoop. However these courses are intended for developers. The company indicates that system administrators are welcome to attend, but it is primarily designed for people who have experience developing web applications in Java, PHP, Ruby or similar languages.

The concept of the digital workplace is usually attributed to Jeffery Bier, who founded Instinctive Technologies in 1996. This company capitalized on the work that Bier had done at Lotus Corporation on collaborative applications, and in 2000 was re-launched as eRoom Technologies. A component of the branding was the concept of a digital workplace. Bier set out five criteria for a digital workplace which still hold good today:

Of great importance in understanding the value and challenges of digital workplaces is the rise (in 1997) and disappearance (by 2005) of Enterprise Information Portals. Merrill Lynch published a seminal report on the market in November 1998 which stated:

The vision was ahead not only of the technology but also of organizations realizing that they were not managing information effectively. This is now changing slowly and is beginning to open up an achievable vision of a digital workplace where search will be a very important enabling technology not just as a means of finding information but of integrating a wide range of applications.

There is an on-going debate about what the term ‘enterprise search’ means and whether there is a better description. In the planning stages of this book there was a discussion about whether ‘Enterprise Search’ was the best title, but none of the team behind this book could come up with a better title. I might argue that the concept is one of business intelligence but that term has already been taken, though arguably it has nothing to do with intelligence!

In the final analysis enterprise search is vision and not one or more pieces of software. All employees should have effective access to the information that the organization has created and collected so that they can make well-informed decisions that benefit the organization and their own careers. It is inconceivable that a manufacturer would invest in a precision machine tool, put it in a shed on the factory site and not tell anyone of its existence. And yet every day that is the fate of digital information assets.

At long last organizations are recognizing the strategic and operational value of information and taking action. The biggest single barrier to effective implementation is finding people with the skills needed to understand how to get the best out of the sophisticated technology of search so that the technology does not stand between a query and an index but links them intuitively.

Without these people we may end up echoing the words of T.S Elliott in the Opening Stanza from Choruses from “The Rock”:

“Where is the wisdom we have lost in knowledge?

Where is the knowledge we have lost in information?”

You'll find some additional information regarding the subject matter of this chapter in the Further Reading section in Appendix A.