Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Mastering ElasticSearch
Table of Contents Mastering ElasticSearch Credits About the Authors About the Reviewers www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe? Free Access for Packt account holders
Preface
What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support
Downloading the example code Errata Piracy Questions
1. Introduction to ElasticSearch
Introducing Apache Lucene
Getting familiar with Lucene Overall architecture Analyzing your data
Indexing and querying
Lucene query language
Understanding the basics Querying fields Term modifiers Handling special characters
Introducing ElasticSearch
Basic concepts
Index Document Mapping Type Node Cluster Shard Replica Gateway
Key concepts behind ElasticSearch architecture Working of ElasticSearch
The boostrap process Failure detection Communicating with ElasticSearch
Indexing data Querying data Index configuration Administration and monitoring
Summary
2. Power User Query DSL
Default Apache Lucene scoring explained
When a document is matched The TF/IDF scoring formula
The Lucene conceptual formula The Lucene practical formula
The ElasticSearch point of view
Query rewrite explained
Prefix query as an example Getting back to Apache Lucene Query rewrite properties
Rescore
Understanding rescore Example Data Query Structure of the rescore query Rescore parameters To sum up
Bulk Operations
MultiGet MultiSearch
Sorting data
Sorting with multivalued fields Sorting with multivalued geo fields Sorting with nested objects
Update API
Simple field update Conditional modifications using scripting Creating and deleting documents using the Update API
Using filters to optimize your queries
Filters and caching
Not all filters are cached by default Changing ElasticSearch caching behavior Why bother naming the key for the cache? When to change the ElasticSearch filter caching behavior
The terms lookup filter
How does it work? Performance considerations Loading terms from inner objects Terms lookup filter cache settings
Filter and scopes in ElasticSearch faceting mechanism
Example data Faceting and filtering Filter as a part of the query The Facet filter Global scope
Summary
3. Low-level Index Control
Altering Apache Lucene scoring
Available similarity models Setting per-field similarity
Similarity model configuration
Choosing the default similarity model Configuring the chosen similarity models
Configuring TF/IDF similarity Configuring Okapi BM25 similarity Configuring DFR similarity Configuring IB similarity
Using codecs
Simple use cases Let's see how it works Available posting formats Configuring the codec behavior
Default codec properties Direct codec properties Memory codec properties Pulsing codec properties Bloom filter-based codec properties
NRT, flush, refresh, and transaction log
Updating index and committing changes
Changing the default refresh time
The transaction log
The transaction log configuration
Near Real Time GET
Looking deeper into data handling
Input is not always analyzed Example usage Changing the analyzer during indexing Changing the analyzer during searching The pitfall and default analysis
Segment merging under control
Choosing the right merge policy
The tiered merge policy The log byte size merge policy The log doc merge policy
Merge policies configuration
The tiered merge policy The log byte size merge policy The log doc merge policy
Scheduling
The concurrent merge scheduler The serial merge scheduler Setting the desired merge scheduler
Summary
4. Index Distribution Architecture
Choosing the right amount of shards and replicas
Sharding and over allocation A positive example of over allocation Multiple shards versus multiple indices Replicas
Routing explained
Shards and data Let's test routing
Indexing with routing
Indexing with routing
Querying
Aliases Multiple routing values
Altering the default shard allocation behavior
Introducing ShardAllocator The even_shard ShardAllocator The balanced ShardAllocator The custom ShardAllocator Deciders
SameShardAllocationDecider ShardsLimitAllocationDecider FilterAllocationDecider ReplicaAfterPrimaryActiveAllocationDecider ClusterRebalanceAllocationDecider ConcurrentRebalanceAllocationDecider DisableAllocationDecider AwarenessAllocationDecider ThrottlingAllocationDecider RebalanceOnlyWhenActiveAllocationDecider DiskThresholdDecider
Adjusting shard allocation
Allocation awareness
Forcing allocation awareness
Filtering
But what those properties mean?
Runtime allocation updating
Index-level updates Cluster-level updates
Defining total shards allowed per node
Inclusion Requirements Exclusion
Additional shard allocation properties
Query execution preference
Introducing the preference parameter
Using our knowledge
Assumptions
Data volume and queries specification
Configuration
Node-level configuration Indices configuration The directories layout Gateway configuration Recovery Discovery Logging slow queries Logging garbage collector work Memory setup One more thing
Changes are coming
Reindexing Routing Multiple Indices
Summary
5. ElasticSearch Administration
Choosing the right directory implementation – the store module
Store type
The simple file system store The new IO filesystem store The MMap filesystem store The memory store
Additional properties
The default store type
Discovery configuration
Zen discovery
Multicast Unicast Minimum master nodes Zen discovery fault detection
Amazon EC2 discovery
EC2 plugin's installation
EC2 plugin's configuration Optional EC2 discovery configuration options EC2 nodes scanning configuration
Gateway and recovery configuration Gateway recovery process Configuration properties Expectations on nodes
Local gateway
Backing up the local gateway
Recovery configuration
Cluster-level recovery configuration Index-level recovery settings
Segments statistics
Introducing the segments API
The response
Visualizing segments information
Understanding ElasticSearch caching
The filter cache
Filter cache types Index-level filter cache configuration Node-level filter cache configuration
The field data cache
Index-level field data cache configuration Node-level field data cache configuration Filtering
Adding field data filtering information Filtering by term frequency Filtering by regex Filtering by regex and term frequency The filtering example
Clearing the caches
Index, indices, and all caches clearing Clearing specific caches Clearing fields-related caches
Summary
6. Fighting with Fire
Knowing the garbage collector
Java memory
The life cycle of Java object and garbage collections
Dealing with garbage collection problems
Turning on logging of garbage collection work Using JStat Creating memory dumps More information on garbage collector work Adjusting garbage collector work in ElasticSearch
Using standard startup script Service wrapper
Avoiding swapping on Unix-like systems
When it is too much for I/O – throttling explained
Controlling I/O throttling Configuration
Throttling type Maximum throughput per second Node throttling defaults Configuration example
Speeding up queries using warmers
Reason for using warmers Manipulating warmers
Using the PUT Warmer API Adding warmers during index creation Adding warmers to templates Retrieving warmers Deleting warmers Disabling warmers
Testing the warmers
Querying without warmers present Querying with warmer present
Very hot threads
Hot Threads API usage clarification Hot Threads API response
Real-life scenarios
Slower and slower performance Heterogeneous environment and load imbalance My server is under fire
Summary
7. Improving the User Search Experience
Correcting user spelling mistakes
Test data Getting into technical details
Suggesters Using the _suggest REST endpoint
Understanding the REST endpoint suggester response
Including suggestions requests in a query
Suggester response
The term suggester
Configuration
Common term suggester options Additional term suggester options
The phrase suggester
The usage example Configuration
Basic configuration Configuring smoothing models
Stupid backoff Laplace Linear interpolation
Configuring candidate generators
Direct generators Configuring direct generators
Completion suggester
The logic behind completion suggester Using completion suggester
Indexing data Querying data Custom weights Additional parameters
Improving query relevance
The data The quest for improving relevance
The standard query The Multi match query Phrases comes into play Let's throw the garbage away And now we boost Making a misspelling-proof search Drill downs with faceting
Summary
8. ElasticSearch Java APIs
Introducing the ElasticSearch Java API The code Connecting to your cluster
Becoming the ElasticSearch node Using the transport connection method Choosing the right connection method
Anatomy of the API CRUD operations
Fetching documents
Handling errors
Indexing documents Updating documents Deleting documents
Querying ElasticSearch
Preparing a query Building queries
Using the match all documents query The match query Using the geo shape query
Paging Sorting Filtering Faceting Highlighting Suggestions Counting Scrolling
Performing multiple actions
Bulk The delete by query Multi GET Multi Search
Percolator
ElasticSearch 1.0 and higher
The explain API Building JSON queries and documents The administration API
The cluster administration API
The cluster and indices health API The cluster state API The update settings API The reroute API The nodes information API The node statistics API The nodes hot threads API The nodes shutdown API The search shards API
The Indices administration API
The index existence API The Type existence API The indices stats API Index status Segments information API Creating an index API Deleting an index Closing an index Opening an index The Refresh API The Flush API The Optimize API The put mapping API The delete mapping API The gateway snapshot API The aliases API The get aliases API The aliases exists API The clear cache API The update settings API The analyze API The put template API The delete template API The validate query API The put warmer API The delete warmer API
Summary
9. Developing ElasticSearch Plugins
Creating the Apache Maven project structure
Understanding the basics Structure of the Maven Java project The idea of POM Running the build process Introducing the assembly Maven plugin
Creating a custom river plugin
Implementation details
Implementing the URLChecker class Implementing the JSONRiver class Implementing the JSONRiverModule class Implementing the JSONRiverPlugin class Informing ElasticSearch about the JSONRiver plugin class
Testing our river
Building our river Installing our river Initializing our river Checking if our JSON river works
Creating custom analysis plugin
Implementation details
Implementing TokenFilter Implementing the TokenFilter factory Implementing custom analyzer Implementing analyzer provider Implementing analysis binder Implementing analyzer indices component Implementing analyzer module Implementing analyzer plugin Informing ElasticSearch about our custom analyzer
Testing our custom analysis plugin
Building our custom analysis plugin Installing the custom analysis plugin Checking if our analysis plugin works
Summary
Index
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion