Compared to most sectors of the enterprise applications business the enterprise search business is quite small. The total annual sales of search software may only amount to $3billion at most, which in IT terms is a niche market. In total there are probably no more than 80 companies in the business at present, and these are listed in Appendix B. It is very likely that business and IT managers will not be aware of any of these companies with the possible exception of Google. Most of them have revenues of less than $50million and many may have revenues of less than $10million. However this figure excludes the revenues of the search modules in large enterprise suites from IBM, Oracle, SAP and Microsoft, as well as sales of Google search appliances.
Another way of looking at the search market is the installed base of enterprise search. Excluding the enterprise suite market probably only Ultraseek (acquired by Autonomy when it purchased Verity), Isys-Search and Google have reached the 10,000 installed base mark, but the majority of these implementations are relatively small-scale.
The good news is that if the installed base is small then the potential is significant. One of the side effects of the acquisition of Autonomy, Endeca and Isys-Search in 2011/2012 was that current and potential investors can see a trade sale of a search vendor as an exit strategy to gain a return on their investment. It does not take much to set up a search software business. It is mainly about brain power. Designers of search software need to have skills in the mathematics of probability, computational linguistics and database design. Previous experience in search development is useful but by no means essential as the mathematics of search uses well-established mathematical principles. Many specialized modules, such as document filters and language parsers, can be bought in. One of the outcomes of the search acquisition frenzy in 2011/2012 is that talented search developers may wish to pursue their career outside of a large IT company.
In this chapter an outline is given of the enterprise search business. In the period from mid-2011 to mid-2012 four significant acquisitions have changed the business landscape and the implications of these acquisitions are important to understand when considering potential suppliers of search software.
There are three categories of search vendors. The first of these consists of a small number of large IT companies who offer their customers enterprise search software but often bundled into large enterprise suites.
Dassault, the French enterprise systems company, acquired Exalead in 2010. Exalead had development both a public web search service and a powerful enterprise search application that was one of the first to be available as a cloud-based application. Exalead has also been a leader in the development of search-based applications and solutions for ‘big data’ applications.
HP acquired Autonomy in 2011 for around $8billion. Autonomy was UK-based company and was the only search vendor to be publicly listed. The company did much to raise the profile of enterprise search around the world. However the fit between an entrepreneurial company of 2000 employees and HP with over 350,000 employees was always going to be a difficult one to engineer. By mid-2012 many senior Autonomy managers had left the company and contract with Dr. Mike Lynch, the founder of Autonomy, was terminated.
IBM has a long history of research and development in information retrieval. Over the years it has acquired many small companies with search expertise, rapidly integrating them into its core product range. IBM’s Omnifind search product is based on the Apache-Lucene open-source search application. In 2012 IBM acquired Vivisimo, a search application developed at the Carniege-Mellon University.
Lexmark was founded in 1991 when IBM divested its printer business. In 2012 it diversified into the search business acquiring Brainware and then Isys-Search. Isys-Search, an Australian-based company, was a leading vendor of midrange enterprise search applications. Lexmark had previously acquired Perceptive Software in 2011, a supplier of enterprise content management and business process management software. Although Isys-Search was a very visible acquisition it is likely that Brainware may emerge as the flagship product. Brainware claims to have invested over $100 million in research and development.
Oracle has been in the search business for many years, launching Oracle Secure Search in 2006. Unlike IBM Oracle has been more than happy to sell the product to non-Oracle customers. In 2011, probably in a reaction to the HP acquisition of Autonomy, Oracle acquired Endeca, a search vendor that specialized in filtering search results for applications such as e-commerce sites. Oracle moved quickly to integrate Endeca into its product catalogue.
With all these companies there are some trade-offs between benefits and risks when choosing their search solutions. The benefits are that these companies have the financial resources, sales teams and commercial incentives to ensure that the costs of acquisition are defrayed as soon as possible. They will almost certainly have existing contracts with major companies and will quickly be in a position to offer enhanced search applications. Procurement departments will feel comfortable that the companies will remain in existence for some years to come, and IT managers know that they will be able to call on service and support teams in most countries of the world.
There are also risks to be considered. Search is such a small element of total sales the sales teams will probably not have the detailed product knowledge that is essential in ensuring that user requirements can be met by the vendor’s search solutions. In the case of IBM and Oracle, and to a lesser extent Lexmark, the companies have multiple search applications to offer. Organizations who have already implemented Autonomy, Isys-Search, Endeca, and Vivisimo will find that the people they have been dealing with for some time may not have the same degree of commitment to the product or are not able to make commitments to solving problems with the same degree of rapidity as they have done in the past.
There are probably around 70 independent search vendors whose main line of business is the development of search applications. They are mostly funded by venture capital and private equity placements. Because of their small size there are no requirements to publish detailed accounts of corporate revenues. Most of these independent search vendors have revenues of less than $20 million, and many operate largely in a specific national market to reduce the costs of customer sales and support. Indeed one of the outcomes of the search acquisition frenzy in 2011/2012 was that the mid-market search industry, with revenues of $50M to $150M, virtually disappeared.
The challenge that these companies face is that they cannot afford to do much in the way of marketing and are virtually unknown to most IT managers. There is also a procurement issue in that procurement departments are always concerned about potential suppliers that have no published accounts. All the vendors will provide financial information under a non-disclosure agreement but in many cases the profits will be minimal as they are being ploughed back into the development of the software. Even then the number of people who have a full understanding of the search software code base will be quite small.
The development of open source software dates back to the early 1980s and the launch of the GNU Project by Richard Stallman at MIT and some small-scale open source search applications were developed in the 1990s, mainly for web site search. In 1999 Doug Cutting released Lucene as a SourceForge project and donated the code to the Apache Foundation in 2001. Around the same time Yonik Seely developed the Solr application for CNET and then donated the code to the Apache Foundation in 2006.
In December 2004, Google Labs published a paper on the MapReduce algorithm, which allows very large scale computations to be run in parallel across large clusters of servers. Cutting, at that time working for Yahoo!, took the concepts in the MapReduce algorithm and created the open-source Hadoop framework that allows applications based on the MapReduce algorithm to be implemented on large server clusters.
Although there is now a high level of awareness of Apache Lucence/Solr as an open source enterprise search application there are a number of others, notably Xapian. This is based on the Muscat search application written by Dr. Martin Porter at the University of Cambridge in 1984. ElasticSearch is interesting as the company that supports the software received a $10M investment in late 2012.
Open-source applications often require the integration of modules from other sources to meet particular user requirements, and are certainly not out-of-the box applications which can be implemented without expertise not only in (usually) Java programming but also in the basic principles of information retrieval and enterprise search optimization.
One of terms often used in the open-source community is that of someone being a ‘committer’. Committers are either project founders, developers appointed by the project founders, or voted into the role by the community. They are responsible for writing the majority of code for a project as well as for the review of patches submitted by the community.
There are three business/implementation models:
An organization can download the software code and use internal developers
Use a company with expertise in the development of the selected open source application
Implement a ‘productized’ version of the open source application, for example from Lucidworks.
LucidWorks currently employs about 25% of the committers to the Apache Lucene/Solr project, making it probably the largest supplier of productized open source solutions. It has strategic partnerships with around 30 implementation partners around the world.
The open source enterprise search business is going to develop substantially over the next few years. The spate of acquisitions in 2011 and 2012 and the question marks over the future of FAST ESP have both caused organizations (other than maybe those implementing SharePoint) to look more carefully at the potential benefits of a customized open-source development option. One of the justifications that often cited for open source search software is that it enables the organization to escape being tied in to a single vendor. The Apache Lucene/Solr stack is fast becoming the de facto open-source search software, supported by a very vigorous global community. In principle there is no license-fee/proprietary code lock-in but in reality the challenges of changing from Lucene-Solr to another open-source solution should not be underestimated.
Although there is a substantial community supporting Apache Lucene and Solr in particular that does not mean that the community will provide a virtual support team for the implementation and management of open-source search applications. Implementation support will be provided by companies such as LucidWorks, Intrafind and Polyspot who have built search applications on top of Lucene and Solr. The rule is that you get what you pay for.
It is also important to note that Java is not open-source but is owned and developed by Oracle. In 2011 changes made to Java by Oracle caused some problems for the open-source search community. Lessons were learned and it is unlikely to happen again.
A search appliance is a search application and disk storage ready installed in a standard rack casing. In principle it can be installed and switched on in perhaps 30 minutes. The product concept has been made famous by Google with its Enterprise Search Appliance, but Google was not the first company to offer an appliance product. The search appliance was pioneered by the US company Thunderstone in 2003, though the company itself was founded in 1981.
The Google innovation was the pricing policy, which is based on the number of documents to be indexed and searched. The search-appliance license points begin at indexing 500,000 documents, and extend all the way up to 30 million documents or more. The Google Search Appliance is offered at two- or three-year license points, which include support, hardware replacement coverage, and software updates. When the contract period ends a new contract has to be negotiated. It is important to remember that customer support from Google is very limited indeed. Email is the standard communications channel and finding the name of someone in Google with whom a dialogue can be established is almost certainly going to be impossible.
This means that some careful calculations have to be made about the total cost of ownership over a five year period that would be the minimum typical life-span for a more conventional application. Most companies have no idea of how much information they need to index, much less the number of documents. Multiple versions of the same document quickly increase the number being indexed. Another factor to be considered is the cost of purchasing additional server licenses to provide for redundancy in the event of a server failure and also for development and test purposes.
In general search appliances offer very good processing performance because the software and hardware are fully integrated by the vendor. However it is usually difficult to tune appliances to improve relevancy, the range of connectors to other applications is limited and customer support is often restricted to a local partner.
Over the last few years the number of search appliances has increased and shortly after the acquisition of Autonomy by HP an appliance version of the Autonomy search software was announced. MaxxCAT, Perfect Search and Fabasoft Mindbreeze are three other search appliance vendors.
In mid-2012 Google withdrew its Google Mini appliance from the market. The enterprise appliances are still being sold and supported but as always with Google a decision could be made with little warning to withdraw from the market. With total revenues in 2011 of around $38 billion the enterprise search business is probably worth 0.01% of these revenues.
Microsoft came late to enterprise search. The search functionality of the 2007 release of SharePoint was poor and in an effort to catch up Microsoft acquired the Norwegian software company FAST Search and Transfer in 2008. FAST ESP was a very powerful enterprise search application running on Linux servers. Microsoft moved the application to Windows servers and used the expertise of the company’s developers to enhance the search functionality of SharePoint 2010.
The range of search applications available from Microsoft is quite complicated to understand:
Search Server Express is a free product that can be downloaded from Microsoft and installed on a single server. It can be used to index up to 300,000 items.
Search Server 2010 has the same functionality as Search Server Express but can index up to 10 million items per server, and up to 100 million items on multiple servers. There is a per-server license fee.
SharePoint Server 2010 is the entry-level search for SharePoint 2010. It is bundled in to the Standard CAL (Client Access License).
FAST Search Server 2010 for SharePoint brings much (but certainly not all) of the functionality of FAST ESP to SharePoint search. The power and complexity of the product are both substantially greater than SharePoint Server 2010.
FAST ESP is one of the most powerful search applications on the market, but has not been developed in any way since it was purchased by Microsoft in 2008. It goes out of main-stream support in July 2013. There is no upgrade path from FAST Search Server 2010.
Many organizations are confused by the FAST prefix to FAST Search Server for SharePoint 2010 and think that they have purchased the FAST ESP product. That is not the case. Certainly FAST Search Server for SharePoint 2010 (FS4SP) provides considerable search functionality but it is configured to run inside SharePoint. As a result the processing power and ability to customize the application are somewhat reduced. The features of FAST Search Server fall into four categories when compared to those in SharePoint Search Server:
Features common to both products
Features are in principle common to both products but which are enhanced in FAST Search Server
Features are unique to FAST Search Server
Features are unique to SharePoint Search Server
Outside of larger multi-national implementations the administration and tuning of SharePoint Search Server for SharePoint 2010 can probably be accomplished without a full-time search support team. That is not the case for FAST Search Server 2010 if the organization wants to gain the full return on the additional investment.
SharePoint 2010 was released in May 2010 and in July 2012 details started to emerge of changes to the search in SharePoint 2013. It seems likely that FS4SP will have enhanced functionality and will also be used in Exchange Server. Although some elements of the administration of FS4SP will be improved this will remain a complex application to manage. It is optimized for SharePoint implementations and using it as the basis for a large-scale enterprise search implementation needs to be considered very carefully indeed. It is important to appreciate that when implementing FS4SP there will almost certainly need to be a significant investment in server hardware over and above the CAL and related software licenses.
The search functionality of SharePoint 2007 (often referred to as MOSS07) was poor and many organizations used on of the many third-party solutions (such as those from BA-Insight, Coveo and Surfray) that were designed specifically for SharePoint. As a result few Microsoft Partners developed any expertise in search implementation, and had to build this capability with the arrival of SharePoint 2010. Even now the skills needed to implement FAST Search Server for SharePoint 2010 are in short supply and this should be taken into account when looking at the development path for search within a SharePoint environment.
Some of the software modules used in enterprise search applications are highly specialized. This is particularly the case with the management of languages. Two companies, Basis Technology and Teragram, are the market leaders in providing very sophisticated text analytics applications. Both companies have developed techniques for parsing and indexing Arabic and Asian languages that are widely used within the search industry. Another important sector is the development of document filters.
Another important development is the availability of cloud-based search-as-a-service applications. Hosted search services have been used for web sites for well over a decade but have failed to make any inroads into the enterprise sector because of concerns over security management, data protection (from companies in the EU) and customisation to meet specific requirements. Over the last year a number of companies have started to highlight the benefits of using cloud-based search services, mainly in terms of getting something started and then being able to accommodate growth without the need to switch vendor to do so.
During 2012 there were a number of important announcements from Amazon (CloudSearch) and Microsoft Azure. Search Technologies, a major search systems integrator, has set up a demonstration of Amazon CloudSearch using Wikipedia as the content source. Autonomy has a large private cloud service, which was probably one reason for its acquisition by HP in 2011.
This business model makes the costs of implementing search much more transparent than is the case with either commercial vendors (who never release even indicative license feeds) and open source developers.
There have been some negative comments about cloud-based search, many about the lack of security and the need to upload documents. These comments fail to take into account that cloud-based search, and indeed other cloud-based applications, are still in their infancy. There can be little doubt that the functionality of these search services will continue to grow, and it would not be surprising to see Google make some significant move into cloud-based search delivery in the near future.
Some vendors will provide a version of their search application to companies in the document management, customer relationship management and other enterprise applications.. The version supplied to the customer may well have a reduced functionality compared to the current version of the product, and indeed may not be subject to the same upgrade roadmap as the stand-alone search product. In addition it is highly unlikely that the search application can be extended to search other repositories.
Smaller search vendors will often work directly with clients, especially where the software has been designed to work out-of-the-box. There may be a need for two-three days of support, mainly around the installation of the software on the server, sorting out disaster recovery options and testing them, and setting up the crawl routines.
There are now a number of systems integration companies that specialize in search implementation projects, offering a range of services including defining the search requirements, managing the process of product selection and then supporting the implementation. Most of these companies tend to focus their business around a selection of search software applications, but will have the skills and expertise to handle almost any search implementation project.
In some cases the vendor itself may feel that the implementation process is too complex for them to support, especially in countries where they may have little or no local office support, or where there are particular technical issues to be overcome, and will then partner with a local search systems integrator. This is usually a win-win situation for all concerned, though it is wise to make sure that integration team is fully conversant with the version of the search software they are planning to implement.
Often a company has outsourced its IT services, or uses a systems integrator to provide support for the implementation of new applications. Search implementation usually only represents a very small revenue opportunity for systems integrators, and so there may not be many staff who can manage a search implementation. For this reason systems integrators work with a small number of search vendors who can provide back-up support to their consultants. It is therefore not surprising that a search integrator only works with a small number of search vendors.
e-Discovery applications are used to identify information that is required to be produced in a court case. This is a specialized area of information retrieval and with the exception of Autonomy, IBM and Recommind the vendors in this sector are focused on e-discovery processes. This is a market that has largely been driven by the compliance requirements of the U.S. Federal Rules on Civil Procedure that were first released in 2006. The industry has developed the Electronic Discovery Reference Model as a means of evaluating vendor applications.
Managing assets in order to mitigate risk and expense should e-discovery become an issue, from initial creation of electronically stored information (ESI) through its final disposition.
Locating potential sources of ESI and determining its scope, breadth & depth.
Ensuring that ESI is protected against inappropriate alteration or destruction.
Gathering ESI for further use in the e-discovery process (processing, review, etc.).
Reducing the volume of ESI and converting it, if necessary, to forms more suitable for review and analysis.
Evaluating ESI for relevance and privilege.
Evaluating ESI for content and context, including key patterns, topics, people and discussion.
Delivering ESI to others in appropriate forms & using appropriate delivery mechanisms.
Displaying ESI before audiences (at depositions, hearings, trials, etc.), especially in native and near-native forms, to elicit further information, validate existing facts or positions, or persuade an audience.
The range of options now available to an organization wishing to upgrade its search application is very wide indeed. It will probably not be until later in 2013 that the integration of Autonomy, Endeca, Vivisimo and Isys-Search will have been completed, and in mid-2013 Microsoft will be rolling out the next version of SharePoint with an up-graded FS4SP application. Cloud services will continue to develop and there will be increasing uptake of open-source search solutions. Meanwhile, as described in Chapter 11, the search business is changed from just focusing on text search to providing some form of unified information access capability and there will certainly be rapid development of mobile search applications.
You'll find some additional information regarding the subject matter of this chapter in the Further Reading section in Appendix A.