Chapter 11. A Future for Search

Search and business intelligence (BI) really are two sides of the same coin. Enterprise search enables people to access unstructured content like documents, blog and wiki entries, and emails stored in repositories across their organizations. BI surfaces structured data in reports and dashboards. As both technologies mature, the boundary between them is beginning to blur. Search platforms are beginning to perform BI functions like data visualization and reporting, and BI vendors have begun to incorporate simple to use search experiences into their products. Information and knowledge management professionals should take advantage of this convergence, which will have the same effect from both sides: to give businesspeople better context and information for the decisions they make every day.

Other major consulting companies, notably Sue Feldman at International Data Corporation (IDC) take a similar position. Probably the company doing more than anyone else to get UIA on the agenda of senior management groups is Attivio. The Attivio solution is based on the Apache Lucene open-source software but with a lot of proprietary code on top. Both CEO Ali Riaz and CTO Sid Probstein were at FAST Search and Transfer prior to its acquisition by Microsoft. It is indicative of the potential for UIA solutions that Attivio gained an investment of $37M late in 2012.

As with ‘Big Data’ the term ‘Unified Information Access’ has no concise definition but it is indicative of an increasing level of integration between text-based enterprise search, business intelligence, content analytics, text and data mining and big data applications.

Over the next few years the ‘edges’ between enterprise search, text analytics and business intelligence applications will become increasingly blurred but underneath the user interface they remain quite distinct applications and it is doubtful that any vendor, even IBM or Oracle, will be able, or even wish to be able, to offer a universal application.

6. Mobile Search

We are only at the very beginning of the mobile revolution. In 2010 it looked as though it was all about corporate-supplied smartphones and just two years later it is about the corporate use of personal smartphone and tablets. Mobile access is all about search, and about delivering information not just documents. For some years now search applications have extended across the entire desktop surface with facets and filters. This type of user interface has value for certain use cases, but not for mobile use. Screen space is at a minimum and the use of every pixel has to be optimized.

As a result mobile user interfaces are going to move in novel directions and in doing so will stimulate innovation in the desktop interface. For mobile use context is everything. This is not just about location-specific context but about searches that may have been carried out in the previous hours or days, and not necessarily on the mobile device itself. A sales manager may well have updated a set of customer profiles on a desktop or a tablet but now needs the latest possible information on the customer as they wait in a reception area with no more than a click or saying ‘Here’ into the smartphone.

This type of requirement is also going to increase the requirement to create and store search profiles and to be able to retrieve results sets from earlier searches, something that has not been given much attention.

Siri, the voice-command feature of the Apple iPhone and iPad, has remarkable capabilities even in its initial release, and mobile requirements will undoubtedly stimulate the development of natural human interfaces, such gestures and eye-movement, which will be transferred to desktop devices sooner rather than later. Some search vendors, notably Isys-Search, have taken a bottom-up approach to designing mobile search applications, whereas others are still trying to adapt full-screen approaches.

The interface with mobile search will be either voice, a single finger or a wave of the hand. These natural interfaces will almost certainly migrate from mobile to the desk top. The office of the future may end up looking very like the vision presented in the US TV series Crime Scene Investigation (CSI) where the forensic police team can call up any number of applications through a touch of a screen and drill down into the data the same way.

7. Cross-Session Search

Because relevance is defined in terms of a single user it is easy to ignore the situation where the same user is carrying out multiple searches on perhaps quite different topics and would value the search application being able to integrate the different searches together. A use case might be where an engineer has been presented with the need to design a particular type of bearing. There are many approaches to this problem and the engineer may want to explore each of these individually and then integrate the best of the solutions together in a desktop environment rather than cutting and pasting from a set of printed search results.

An extension of this use case is where members of a development team have conducted searches using their own particular skill and knowledge sets and now wish to integrate them for the use of their colleagues. As more work is carried out in virtual teams in multiple locations the ability to integrate multiple searches, and then have the master query and result set updated on a periodic or ad hoc basis is going to emerge as an important business requirement.

8. Social Search

As with so many other aspects of search the term ‘social search’ is not well defined. The role of enterprise search in the effective use of social media is going to be increasingly important. As the number of blog and wiki channels increase and as more work is carried out in collaborative workspaces, the challenge of tracking new items of information that are relevant to any of the multiple tasks that we carry out each day is going to be increasingly difficult to manage using RSS and other alerting feeds. The solution will be to use search as a means of filtering perhaps a hundred different channels to provide a ranked list of newly added relevant information. This use of search is already well developed by companies providing press alerting services. In the enterprise the search application will need to cope with the language of social media, which will inevitably make use of colloquial language and shortened forms of words and expressions, especially in the case of microblogging.

Another view of social search might better be described as social context search, where the search application is taking account of documents and blogs that the user has written, recent searches they have carried out and meetings that are in their calendars. The aim is both to improve the quality of search results and to alert the user to relevant content. Managing the potential overload of information will be a challenge, and will require organizations to invest in training employees to get the best out of these applications.

9. Federated Search

Federated search, the ability to search for information across multiple repositories and applications and then provide an integrated set of results, is a fundamental requirement of enterprise search. The usual model is for a module of the master search application to send the query to the search applications in each of the target repositories. The results from each are then integrated in some way and presented to the user. In theory it is easy, and in practice it is extremely difficult. Each of the individual search applications will have calculated a relevance ranking on the basis of the content in the repository, so normalizing the results to provide a rational overall ranking is not a reliable solution. There are almost certainly going to be performance delays, especially if the repositories are located around the world, and these performance delays will be acerbated if the repositories have different security models. Single sign-on for all applications is still rarely achieved.

Then there is the challenge of de-duplicating content from the various repositories. There are solutions for this when a single language is being used, but the situation is much more complex with multiple languages.

Another option is to create a master index of all repositories, search the master index and then download the relevant items from each repository. This option runs into some serious index performance management challenges.

A substantial amount of research and development is being undertaken into achieving good performance from federated searching as this is a core requirement of unified information access and search-based applications. Despite the best endeavours of search vendors high-performance federated search applications providing access to a list of relevant, de-duplicated information is still some way in the future.

10. Developments in Information Retrieval

A very important factor in shaping the future direction of enterprise search is research from the information retrieval community. Although there are many academic institutions undertaking information retrieval research there are also very active research groups in Google, HP, IBM, Microsoft and Oracle. There are many conferences on information retrieval, including those organized by the Special Interest Group for Information Retrieval (known as SIG IR) of the Association of Computing Machinery. Of particular importance in developing solutions to enterprise search problems are the annual TREC conferences.

The Text REtrieval Conference (TREC), co-sponsored by the National Institute of Standards and Technology (NIST) and U.S. Department of Defense, was started in 1992. Its purpose was to support research within the information retrieval community by providing the infrastructure necessary for large-scale evaluation of text retrieval methodologies. The TREC workshop series has the following goals:

To encourage research in information retrieval based on large test collections
To increase communication among industry, academia, and government by creating an open forum for the exchange of research ideas
To speed the transfer of technology from research labs into commercial products by demonstrating substantial improvements in retrieval methodologies on real-world problems
To increase the availability of appropriate evaluation techniques for use by industry and academia, including development of new evaluation techniques more applicable to current systems

Each TREC conference consists of a set of Tracks, each of which focuses in on a particular information retrieval problem, and the results are made publicly available. In the case of enterprise search, which from time to time has been a track at TREC, one of the fundamental problems is building a test collection that is representative of enterprises. Various surrogates have been used, such as a section of a public web site, but that is not going to emulate the complexity, homogeneity and scale of enterprise repositories or the requirements associated with federated searches across multiple applications.

Being able to gain access to enterprise repositories has been a major challenge for the information retrieval community because companies are concerned about the inadvertent release of confidential information in the published results from the research. In this respect the major IT companies are in a better position as they are able to use their own corporate repositories, but then have no way of knowing if the results of the research are being biased in any way.

Over the last ten years there have been a number of important meetings at which information retrieval research teams have tried to identify and prioritise areas for future research. The most recent of these (SWIRL2012) took place in Australia in early 2012 and in the preamble to the results of the workshop the observation was made:

Throughout the decade covered by those reports, the field of Information Retrieval has continued to change and grow: collections have become larger, computers have become more powerful, broadband and mobile internet is widely assumed, complex interactive search can be done on home computers or mobile devices, and so on. Furthermore, as large-scale commercial search companies find new ways to exploit the user data they collect, the gap between the types of research done in industry and academics has widened, leading to tension about “repeatability” and “public data” in publications. These changes in environment and shifts in attitude mean the time is ripe for the field to re-evaluate its assumptions, its purposes, its goals, and its methodologies.

The themes that emerged from this workshop were:

Not just a ranked list. This theme incorporates topics that move beyond the classic “single ad-hoc query and ranked list” approach, considering richer modes of querying, models of interaction, and approaches to answering.
Help for users. This theme brings together topics reflecting ways that Information Retrieval technology can be extended to support users more broadly, including ways to bring IR to inexperienced, illiterate, and disabled users.
Capturing context. This theme touches topics that look at ways to incorporate what is happening with and around a user to affect querying and result presentation. In particular, this theme treats people using search systems, their context, and their information needs as critical aspects needing exploration.
Information, not documents. This theme crosses topics that seek to push Information Retrieval research beyond document retrieval and into more complex types of data and more complicated results.
Domains. This theme is part of topics that consider information that is not simply text and that has not been thoroughly explored by information retrieval research so far – data with restricted access, collections of “apps,” and richly connected workplace data.
Evaluation. A perennial issue in Information Retrieval, evaluation remains important, particularly as the field expands into new challenges. This theme includes topics that require or suggest new techniques for evaluation as well as those that need evaluation in the context of new challenges.

Inevitably there is a lag between information retrieval research outcomes and their inclusion in commercial and open-source systems. The point of this section on information retrieval is that there is a substantial amount of research taking place, and increasingly this research will focus on enterprise search opportunities and challenges. A question to ask search vendors and search developers is the extent to which they are aware of this research and are ready and able to incorporate this research into their applications.

Do ask these questions the search support team needs to be monitoring developments in information retrieval as well as enterprise search technology. A good place to start is to subscribe to the Digital Library of the Association for Computing Machinery which covers a very wide range of conferences, reports and journal articles on information retrieval and enterprise search, including the conference proceedings of the ACM Special Interest Group in Information Retrieval (SIGIR).

11. Enterprise Search Professionals

Although there is a significant amount of research taking place into enterprise search there is an almost total lack of academic courses on information retrieval, let along enterprise search. There are perhaps 200 universities teaching information science and informatics at undergraduate level but information retrieval is usually only one small element of the three-year course. There are many more universities teaching computer science but again the amount of time allocated to information retrieval is very limited. The issue for companies seeking to recruit search professionals is quite bleak, and is likely to stay that way for some time to come.

One bright spot is the Lucid University, set up by LucidWorks, which offers training courses in Solr and Hadoop. However these courses are intended for developers. The company indicates that system administrators are welcome to attend, but it is primarily designed for people who have experience developing web applications in Java, PHP, Ruby or similar languages.

12. The Digital Workplace

The concept of the digital workplace is usually attributed to Jeffery Bier, who founded Instinctive Technologies in 1996. This company capitalized on the work that Bier had done at Lotus Corporation on collaborative applications, and in 2000 was re-launched as eRoom Technologies. A component of the branding was the concept of a digital workplace. Bier set out five criteria for a digital workplace which still hold good today:

It must be comprehensible and have minimal learning curve. If people have to learn a new tool, they will not use it, especially those people outside the firewall. The digital workplace needs to be as simple and obvious as e-mail or instant messaging.
It has to be contagious. The digital workplace must have clear benefits to all parties involved, to both distributed workers and the different enterprises interacting in these new workplaces. The workplace also has to be a trusted place, thus secure, both for the individual and the companies involved. People have to want to use it.
It must be cross-enterprise. The digital workplace must span company boundaries and geographic boundaries. It also must operate outside the corporate firewall with an organization’s customers, suppliers and other partners, and require very little IT involvement, or it will not gain acceptance.
The workplace has to be complete. All communication, document-sharing, issues-tracking, and decision-making needs to be captured and stored in one place.
The digital workplace must be connected. If not, it will not gain acceptance.

Of great importance in understanding the value and challenges of digital workplaces is the rise (in 1997) and disappearance (by 2005) of Enterprise Information Portals. Merrill Lynch published a seminal report on the market in November 1998 which stated:

We believe the power of the Enterprise Portal lies in the fact that from a single gateway, users will be able to find, extract and analyze all of this information. Furthermore, we also believe that these new EIP systems will shift the focus away from the actual content of the information to the context in which the end user consumes the information, whether the end user is an employee, customer or supplier. In this way information consumers will finally be able to benefit from data and information by accessing, mining and transferring it into disparate applications where it can be used again.

The vision was ahead not only of the technology but also of organizations realizing that they were not managing information effectively. This is now changing slowly and is beginning to open up an achievable vision of a digital workplace where search will be a very important enabling technology not just as a means of finding information but of integrating a wide range of applications.

13. Does ‘enterprise search’ Have a Future?

There is an on-going debate about what the term ‘enterprise search’ means and whether there is a better description. In the planning stages of this book there was a discussion about whether ‘Enterprise Search’ was the best title, but none of the team behind this book could come up with a better title. I might argue that the concept is one of business intelligence but that term has already been taken, though arguably it has nothing to do with intelligence!

In the final analysis enterprise search is vision and not one or more pieces of software. All employees should have effective access to the information that the organization has created and collected so that they can make well-informed decisions that benefit the organization and their own careers. It is inconceivable that a manufacturer would invest in a precision machine tool, put it in a shed on the factory site and not tell anyone of its existence. And yet every day that is the fate of digital information assets.

At long last organizations are recognizing the strategic and operational value of information and taking action. The biggest single barrier to effective implementation is finding people with the skills needed to understand how to get the best out of the sophisticated technology of search so that the technology does not stand between a query and an index but links them intuitively.

Without these people we may end up echoing the words of T.S Elliott in the Opening Stanza from Choruses from “The Rock”:

“Where is the wisdom we have lost in knowledge?

Where is the knowledge we have lost in information?”