Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Mastering ElasticSearch
Table of Contents
Mastering ElasticSearch
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe?
Free Access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Introduction to ElasticSearch
Introducing Apache Lucene
Getting familiar with Lucene
Overall architecture
Analyzing your data
Indexing and querying
Lucene query language
Understanding the basics
Querying fields
Term modifiers
Handling special characters
Introducing ElasticSearch
Basic concepts
Index
Document
Mapping
Type
Node
Cluster
Shard
Replica
Gateway
Key concepts behind ElasticSearch architecture
Working of ElasticSearch
The boostrap process
Failure detection
Communicating with ElasticSearch
Indexing data
Querying data
Index configuration
Administration and monitoring
Summary
2. Power User Query DSL
Default Apache Lucene scoring explained
When a document is matched
The TF/IDF scoring formula
The Lucene conceptual formula
The Lucene practical formula
The ElasticSearch point of view
Query rewrite explained
Prefix query as an example
Getting back to Apache Lucene
Query rewrite properties
Rescore
Understanding rescore
Example Data
Query
Structure of the rescore query
Rescore parameters
To sum up
Bulk Operations
MultiGet
MultiSearch
Sorting data
Sorting with multivalued fields
Sorting with multivalued geo fields
Sorting with nested objects
Update API
Simple field update
Conditional modifications using scripting
Creating and deleting documents using the Update API
Using filters to optimize your queries
Filters and caching
Not all filters are cached by default
Changing ElasticSearch caching behavior
Why bother naming the key for the cache?
When to change the ElasticSearch filter caching behavior
The terms lookup filter
How does it work?
Performance considerations
Loading terms from inner objects
Terms lookup filter cache settings
Filter and scopes in ElasticSearch faceting mechanism
Example data
Faceting and filtering
Filter as a part of the query
The Facet filter
Global scope
Summary
3. Low-level Index Control
Altering Apache Lucene scoring
Available similarity models
Setting per-field similarity
Similarity model configuration
Choosing the default similarity model
Configuring the chosen similarity models
Configuring TF/IDF similarity
Configuring Okapi BM25 similarity
Configuring DFR similarity
Configuring IB similarity
Using codecs
Simple use cases
Let's see how it works
Available posting formats
Configuring the codec behavior
Default codec properties
Direct codec properties
Memory codec properties
Pulsing codec properties
Bloom filter-based codec properties
NRT, flush, refresh, and transaction log
Updating index and committing changes
Changing the default refresh time
The transaction log
The transaction log configuration
Near Real Time GET
Looking deeper into data handling
Input is not always analyzed
Example usage
Changing the analyzer during indexing
Changing the analyzer during searching
The pitfall and default analysis
Segment merging under control
Choosing the right merge policy
The tiered merge policy
The log byte size merge policy
The log doc merge policy
Merge policies configuration
The tiered merge policy
The log byte size merge policy
The log doc merge policy
Scheduling
The concurrent merge scheduler
The serial merge scheduler
Setting the desired merge scheduler
Summary
4. Index Distribution Architecture
Choosing the right amount of shards and replicas
Sharding and over allocation
A positive example of over allocation
Multiple shards versus multiple indices
Replicas
Routing explained
Shards and data
Let's test routing
Indexing with routing
Indexing with routing
Querying
Aliases
Multiple routing values
Altering the default shard allocation behavior
Introducing ShardAllocator
The even_shard ShardAllocator
The balanced ShardAllocator
The custom ShardAllocator
Deciders
SameShardAllocationDecider
ShardsLimitAllocationDecider
FilterAllocationDecider
ReplicaAfterPrimaryActiveAllocationDecider
ClusterRebalanceAllocationDecider
ConcurrentRebalanceAllocationDecider
DisableAllocationDecider
AwarenessAllocationDecider
ThrottlingAllocationDecider
RebalanceOnlyWhenActiveAllocationDecider
DiskThresholdDecider
Adjusting shard allocation
Allocation awareness
Forcing allocation awareness
Filtering
But what those properties mean?
Runtime allocation updating
Index-level updates
Cluster-level updates
Defining total shards allowed per node
Inclusion
Requirements
Exclusion
Additional shard allocation properties
Query execution preference
Introducing the preference parameter
Using our knowledge
Assumptions
Data volume and queries specification
Configuration
Node-level configuration
Indices configuration
The directories layout
Gateway configuration
Recovery
Discovery
Logging slow queries
Logging garbage collector work
Memory setup
One more thing
Changes are coming
Reindexing
Routing
Multiple Indices
Summary
5. ElasticSearch Administration
Choosing the right directory implementation – the store module
Store type
The simple file system store
The new IO filesystem store
The MMap filesystem store
The memory store
Additional properties
The default store type
Discovery configuration
Zen discovery
Multicast
Unicast
Minimum master nodes
Zen discovery fault detection
Amazon EC2 discovery
EC2 plugin's installation
EC2 plugin's configuration
Optional EC2 discovery configuration options
EC2 nodes scanning configuration
Gateway and recovery configuration
Gateway recovery process
Configuration properties
Expectations on nodes
Local gateway
Backing up the local gateway
Recovery configuration
Cluster-level recovery configuration
Index-level recovery settings
Segments statistics
Introducing the segments API
The response
Visualizing segments information
Understanding ElasticSearch caching
The filter cache
Filter cache types
Index-level filter cache configuration
Node-level filter cache configuration
The field data cache
Index-level field data cache configuration
Node-level field data cache configuration
Filtering
Adding field data filtering information
Filtering by term frequency
Filtering by regex
Filtering by regex and term frequency
The filtering example
Clearing the caches
Index, indices, and all caches clearing
Clearing specific caches
Clearing fields-related caches
Summary
6. Fighting with Fire
Knowing the garbage collector
Java memory
The life cycle of Java object and garbage collections
Dealing with garbage collection problems
Turning on logging of garbage collection work
Using JStat
Creating memory dumps
More information on garbage collector work
Adjusting garbage collector work in ElasticSearch
Using standard startup script
Service wrapper
Avoiding swapping on Unix-like systems
When it is too much for I/O – throttling explained
Controlling I/O throttling
Configuration
Throttling type
Maximum throughput per second
Node throttling defaults
Configuration example
Speeding up queries using warmers
Reason for using warmers
Manipulating warmers
Using the PUT Warmer API
Adding warmers during index creation
Adding warmers to templates
Retrieving warmers
Deleting warmers
Disabling warmers
Testing the warmers
Querying without warmers present
Querying with warmer present
Very hot threads
Hot Threads API usage clarification
Hot Threads API response
Real-life scenarios
Slower and slower performance
Heterogeneous environment and load imbalance
My server is under fire
Summary
7. Improving the User Search Experience
Correcting user spelling mistakes
Test data
Getting into technical details
Suggesters
Using the _suggest REST endpoint
Understanding the REST endpoint suggester response
Including suggestions requests in a query
Suggester response
The term suggester
Configuration
Common term suggester options
Additional term suggester options
The phrase suggester
The usage example
Configuration
Basic configuration
Configuring smoothing models
Stupid backoff
Laplace
Linear interpolation
Configuring candidate generators
Direct generators
Configuring direct generators
Completion suggester
The logic behind completion suggester
Using completion suggester
Indexing data
Querying data
Custom weights
Additional parameters
Improving query relevance
The data
The quest for improving relevance
The standard query
The Multi match query
Phrases comes into play
Let's throw the garbage away
And now we boost
Making a misspelling-proof search
Drill downs with faceting
Summary
8. ElasticSearch Java APIs
Introducing the ElasticSearch Java API
The code
Connecting to your cluster
Becoming the ElasticSearch node
Using the transport connection method
Choosing the right connection method
Anatomy of the API
CRUD operations
Fetching documents
Handling errors
Indexing documents
Updating documents
Deleting documents
Querying ElasticSearch
Preparing a query
Building queries
Using the match all documents query
The match query
Using the geo shape query
Paging
Sorting
Filtering
Faceting
Highlighting
Suggestions
Counting
Scrolling
Performing multiple actions
Bulk
The delete by query
Multi GET
Multi Search
Percolator
ElasticSearch 1.0 and higher
The explain API
Building JSON queries and documents
The administration API
The cluster administration API
The cluster and indices health API
The cluster state API
The update settings API
The reroute API
The nodes information API
The node statistics API
The nodes hot threads API
The nodes shutdown API
The search shards API
The Indices administration API
The index existence API
The Type existence API
The indices stats API
Index status
Segments information API
Creating an index API
Deleting an index
Closing an index
Opening an index
The Refresh API
The Flush API
The Optimize API
The put mapping API
The delete mapping API
The gateway snapshot API
The aliases API
The get aliases API
The aliases exists API
The clear cache API
The update settings API
The analyze API
The put template API
The delete template API
The validate query API
The put warmer API
The delete warmer API
Summary
9. Developing ElasticSearch Plugins
Creating the Apache Maven project structure
Understanding the basics
Structure of the Maven Java project
The idea of POM
Running the build process
Introducing the assembly Maven plugin
Creating a custom river plugin
Implementation details
Implementing the URLChecker class
Implementing the JSONRiver class
Implementing the JSONRiverModule class
Implementing the JSONRiverPlugin class
Informing ElasticSearch about the JSONRiver plugin class
Testing our river
Building our river
Installing our river
Initializing our river
Checking if our JSON river works
Creating custom analysis plugin
Implementation details
Implementing TokenFilter
Implementing the TokenFilter factory
Implementing custom analyzer
Implementing analyzer provider
Implementing analysis binder
Implementing analyzer indices component
Implementing analyzer module
Implementing analyzer plugin
Informing ElasticSearch about our custom analyzer
Testing our custom analysis plugin
Building our custom analysis plugin
Installing the custom analysis plugin
Checking if our analysis plugin works
Summary
Index
← Prev
Back
Next →
← Prev
Back
Next →