Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Apache Solr 3 Enterprise Search Server
Apache Solr 3 Enterprise Search Server Credits About the Authors Acknowledgement Acknowledgement About the Reviewers www.PacktPub.com
Discounts Free eBooks Newsletters Code Downloads, Errata and Support
PacktLib.PacktPub.com Preface
What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support
Downloading the example code Errata Piracy Questions
1. Quick Starting Solr
An introduction to Solr
Lucene, the underlying engine Solr, a Lucene-based search server Comparison to database technology
Getting started
Solr's installation directory structure Solr's home directory and Solr cores Running Solr
A quick tour of Solr
Loading sample data A simple query Some statistics The sample browse interface
Configuration files Resources outside this book Summary
2. Schema and Text Analysis
MusicBrainz.org One combined index or separate indices
One combined index
Problems with using a single combined index
Separate indices
Schema design
Step 1: Determine which searches are going to be powered by Solr Step 2: Determine the entities returned from each search Step 3: Denormalize related data
Denormalizing—'one-to-one' associated data Denormalizing—'one-to-many' associated data
Step 4: (Optional) Omit the inclusion of fields only used in search results
The schema.xml file
Defining field types Built-in field type classes
Numbers and dates Geospatial
Field options Field definitions
Dynamic field definitions
Our MusicBrainz field definitions Copying fields The unique key The default search field and query operator
Text analysis
Configuration Experimenting with text analysis Character filters Tokenization WordDelimiterFilter Stemming
Correcting and augmenting stemming
Synonyms
Index-time versus query-time, and to expand or not
Stop words Phonetic sounds-like analysis Substring indexing and wildcards
ReversedWildcardFilter N-grams N-gram costs
Sorting Text Miscellaneous token filters
Summary
3. Indexing Data
Communicating with Solr
Direct HTTP or a convenient client API Push data to Solr or have Solr pull it Data formats HTTP POSTing options to Solr Remote streaming
Solr's Update-XML format
Deleting documents
Commit, optimize, and rollback Sending CSV formatted data to Solr
Configuration options
The Data Import Handler Framework
Setup The development console Writing a DIH configuration file
Data Sources Entity processors Fields and transformers
Example DIH configurations
Importing from databases Importing XML from a file with XSLT Importing multiple rich document files (crawling)
Importing commands
Delta imports
Indexing documents with Solr Cell
Extracting text and metadata from files Configuring Solr Solr Cell parameters Extracting karaoke lyrics Indexing richer documents
Update request processors Summary
4. Searching
Your first search, a walk-through Solr's generic XML structured data representation Solr's XML response format
Parsing the URL
Request handlers Query parameters
Search criteria related parameters Result pagination related parameters Output related parameters Diagnostic related parameters
Query parsers and local-params Query syntax (the lucene query parser)
Matching all the documents Mandatory, prohibited, and optional clauses
Boolean operators
Sub-queries
Limitations of prohibited clauses in sub-queries
Field qualifier Phrase queries and term proximity Wildcard queries
Fuzzy queries
Range queries
Date math
Score boosting Existence (and non-existence) queries Escaping special characters
The Dismax query parser (part 1)
Searching multiple fields Limited query syntax Min-should-match
Basic rules Multiple rules What to choose
A default search
Filtering Sorting Geospatial search
Indexing locations Filtering by distance Sorting by distance
Summary
5. Search Relevancy
Scoring
Query-time and index-time boosting Troubleshooting queries and scoring
Dismax query parser (part 2)
Lucene's DisjunctionMaxQuery Boosting: Automatic phrase boosting
Configuring automatic phrase boosting Phrase slop configuration Partial phrase boosting
Boosting: Boost queries Boosting: Boost functions
Add or multiply boosts?
Function queries
Field references Function reference
Mathematical primitives Other math ord and rord Miscellaneous functions
Function query boosting
Formula: Logarithm Formula: Inverse reciprocal Formula: Reciprocal Formula: Linear
How to boost based on an increasing numeric field
Step by step… External field values
How to boost based on recent dates
Step by step…
Summary
6. Faceting
A quick example: Faceting release types
MusicBrainz schema changes
Field requirements Types of faceting Faceting field values
Alphabetic range bucketing
Faceting numeric and date ranges
Range facet parameters
Facet queries Building a filter query from a facet
Field value filter queries Facet range filter queries
Excluding filters (multi-select faceting) Hierarchical faceting Summary
7. Search Components
About components The Highlight component
A highlighting example Highlighting configuration
The regex fragmenter The fast vector highlighter with multi-colored highlighting
The SpellCheck component
Schema configuration Configuration in solrconfig.xml
Configuring spellcheckers (dictionaries)
IndexBasedSpellChecker options FileBasedSpellChecker options
Processing of the q parameter Processing of the spellcheck.q parameter
Building the dictionary from its source Issuing spellcheck requests Example usage for a misspelled query
Query complete / suggest
Query term completion via facet.prefix Query term completion via the Suggester Query term completion via the Terms component
The QueryElevation component
Configuration
The MoreLikeThis component
Configuration parameters
Parameters specific to the MLT search component Parameters specific to the MLT request handler Common MLT parameters
MLT results example
The Stats component
Configuring the stats component Statistics on track durations
The Clustering component Result grouping/Field collapsing
Configuring result grouping
The TermVector component Summary
8. Deployment
Deployment methodology for Solr
Questions to ask
Installing Solr into a Servlet container
Differences between Servlet containers
Defining solr.home property
Logging
HTTP server request access logs Solr application logging
Configuring logging output Logging using Log4j Jetty startup integration Managing log levels at runtime
A SearchHandler per search interface? Leveraging Solr cores
Configuring solr.xml
Property substitution Include fragments of XML with XInclude
Managing cores Why use multicore?
Monitoring Solr performance
Stats.jsp JMX
Starting Solr with JMX
Take a walk on the wild side! Use JRuby to extract JMX information
Securing Solr from prying eyes
Limiting server access
Securing public searches Controlling JMX access
Securing index data
Controlling document access Other things to look at
Summary
9. Integrating Solr
Working with included examples
Inventory of examples
Solritas, the integrated search UI
Pros and Cons of Solritas
SolrJ: Simple Java interface
Using Heritrix to download artist pages SolrJ-based client for Indexing HTML SolrJ client API
Embedding Solr Searching with SolrJ Indexing
Indexing POJOs
When should I use embedded Solr?
In-process indexing Standalone desktop applications Upgrading from legacy Lucene
Using JavaScript with Solr
Wait, what about security? Building a Solr powered artists autocomplete widget with jQuery and JSONP AJAX Solr
Using XSLT to expose Solr via OpenSearch
OpenSearch based Browse plugin
Installing the Search MBArtists plugin
Accessing Solr from PHP applications
solr-php-client Drupal options
Apache Solr Search integration module Hosted Solr by Acquia
Ruby on Rails integrations
The Ruby query response writer sunspot_rails gem
Setting up MyFaves project Populating MyFaves relational database from Solr Build Solr indexes from a relational database Complete MyFaves website
Which Rails/Ruby library should I use?
Nutch for crawling web pages Maintaining document security with ManifoldCF
Connectors Putting ManifoldCF to use
Summary
10. Scaling Solr
Tuning complex systems Testing Solr performance with SolrMeter Optimizing a single Solr server (Scale up)
Configuring JVM settings to improve memory usage
MMapDirectoryFactory to leverage additional virtual memory
Enabling downstream HTTP caching Solr caching
Tuning caches
Indexing performance
Designing the schema Sending data to Solr in bulk Don't overlap commits Disabling unique key checking Index optimization factors
Enhancing faceting performance Using term vectors Improving phrase search performance
Moving to multiple Solr servers (Scale horizontally)
Replication Starting multiple Solr servers
Configuring replication
Load balancing searches across slaves
Indexing into the master server Configuring slaves
Configuring load balancing Sharding indexes
Assigning documents to shards Searching across shards (distributed search)
Combining replication and sharding (Scale deep)
Near real time search
Where next for scaling Solr? Summary
A. Search Quick Reference
Quick reference
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion