Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Mastering Elasticsearch Second Edition
Table of Contents Mastering Elasticsearch Second Edition Credits About the Author Acknowledgments About the Author Acknowledgments About the Reviewers www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe? Free access for Packt account holders
Preface
What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support
Downloading the example code Errata Piracy Questions
1. Introduction to Elasticsearch
Introducing Apache Lucene
Getting familiar with Lucene Overall architecture
Getting deeper into Lucene index
Norms Term vectors Posting formats Doc values
Analyzing your data
Indexing and querying
Lucene query language
Understanding the basics Querying fields Term modifiers Handling special characters
Introducing Elasticsearch
Basic concepts
Index Document Type Mapping Node Cluster Shard Replica
Key concepts behind Elasticsearch architecture Workings of Elasticsearch
The startup process Failure detection
Communicating with Elasticsearch
Indexing data Querying data
The story Summary
2. Power User Query DSL
Default Apache Lucene scoring explained
When a document is matched TF/IDF scoring formula
Lucene conceptual scoring formula Lucene practical scoring formula
Elasticsearch point of view An example
Query rewrite explained
Prefix query as an example Getting back to Apache Lucene Query rewrite properties
Query templates
Introducing query templates
Templates as strings
The Mustache template engine
Conditional expressions Loops Default values
Storing templates in files
Handling filters and why it matters
Filters and query relevance How filters work
Bool or and/or/not filters
Performance considerations Post filtering and filtered query Choosing the right filtering method
Choosing the right query for the job
Query categorization
Basic queries Compound queries Not analyzed queries Full text search queries Pattern queries Similarity supporting queries Score altering queries Position aware queries Structure aware queries
The use cases
Example data Basic queries use cases
Searching for values in range Simplified query for multiple terms
Compound queries use cases
Boosting some of the matched documents Ignoring lower scoring partial queries
Not analyzed queries use cases
Limiting results to given tags Efficient query time stopwords handling
Full text search queries use cases
Using Lucene query syntax in queries Handling user queries without errors
Pattern queries use cases
Autocomplete using prefixes Pattern matching
Similarity supporting queries use cases
Finding terms similar to a given one Finding documents with similar field values
Score altering queries use cases
Favoring newer books Decreasing importance of books with certain value
Pattern queries use cases
Matching phrases Spans, spans everywhere
Structure aware queries use cases
Returning parent documents having a certain nested document Affecting parent document score with the score of nested documents
Summary
3. Not Only Full Text Search
Query rescoring
What is query rescoring? An example query Structure of the rescore query Rescore parameters
Choosing the scoring mode
To sum up
Controlling multimatching
Multimatch types
Best fields matching Cross fields matching Most fields matching Phrase matching Phrase with prefixes matching
Significant terms aggregation
An example Choosing significant terms Multiple values analysis
Significant terms aggregation and full text search fields
Additional configuration options
Controlling the number of returned buckets Background set filtering Minimum document count Execution hint More options
There are limits
Memory consumption Shouldn't be used as top-level aggregation Counts are approximated Floating point fields are not allowed
Documents grouping
Top hits aggregation An example
Additional parameters
Relations between documents
The object type The nested documents Parent–child relationship
Parent–child relationship in the cluster
A few words about alternatives
Scripting changes between Elasticsearch versions
Scripting changes
Security issues Groovy – the new default scripting language Removal of MVEL language
Short Groovy introduction
Using Groovy as your scripting language Variable definition in scripts Conditionals Loops An example There is more
Scripting in full text context
Field-related information Shard level information Term level information
More advanced term information
Lucene expressions explained
The basics An example There is more
Summary
4. Improving the User Search Experience
Correcting user spelling mistakes
Testing data Getting into technical details Suggesters
Using the _suggest REST endpoint Understanding the REST endpoint suggester response Including suggestion requests in query The term suggester
Configuration Common term suggester options Additional term suggester options
The phrase suggester
Usage example Configuration Basic configuration Configuring smoothing models Configuring candidate generators Configuring direct generators
The completion suggester
The logic behind the completion suggester Using the completion suggester Indexing data Querying data Custom weights Additional parameters
Improving the query relevance
Data The quest for relevance improvement
The standard query The multi match query Phrases comes into play Let's throw the garbage away Now, we boost Performing a misspelling-proof search Drill downs with faceting
Summary
5. The Index Distribution Architecture
Choosing the right amount of shards and replicas
Sharding and overallocation A positive example of overallocation Multiple shards versus multiple indices Replicas
Routing explained
Shards and data Let's test routing
Indexing with routing
Routing in practice
Querying
Aliases Multiple routing values
Altering the default shard allocation behavior
Allocation awareness
Forcing allocation awareness
Filtering
What include, exclude, and require mean
Runtime allocation updating
Index level updates Cluster level updates
Defining total shards allowed per node Defining total shards allowed per physical server
Inclusion Requirement Exclusion Disk-based allocation
Query execution preference
Introducing the preference parameter
Summary
6. Low-level Index Control
Altering Apache Lucene scoring
Available similarity models Setting a per-field similarity Similarity model configuration Choosing the default similarity model
Configuring the chosen similarity model
Configuring the TF/IDF similarity Configuring the Okapi BM25 similarity Configuring the DFR similarity Configuring the IB similarity Configuring the LM Dirichlet similarity Configuring the LM Jelinek Mercer similarity
Choosing the right directory implementation – the store module
The store type
The simple filesystem store The new I/O filesystem store The MMap filesystem store The hybrid filesystem store The memory store
Additional properties
The default store type The default store type for Elasticsearch 1.3.0 and higher The default store type for Elasticsearch versions older than 1.3.0
NRT, flush, refresh, and transaction log
Updating the index and committing changes
Changing the default refresh time
The transaction log
The transaction log configuration
Near real-time GET
Segment merging under control
Choosing the right merge policy
The tiered merge policy The log byte size merge policy The log doc merge policy
Merge policies' configuration
The tiered merge policy The log byte size merge policy The log doc merge policy
Scheduling
The concurrent merge scheduler The serial merge scheduler Setting the desired merge scheduler
When it is too much for I/O – throttling explained
Controlling I/O throttling Configuration
The throttling type Maximum throughput per second Node throttling defaults Performance considerations The configuration example
Understanding Elasticsearch caching
The filter cache
Filter cache types Node-level filter cache configuration Index-level filter cache configuration
The field data cache
Field data or doc values Node-level field data cache configuration Index-level field data cache configuration The field data cache filtering
Adding field data filtering information Filtering by term frequency Filtering by regex Filtering by regex and term frequency The filtering example
Field data formats
String-based fields Numeric fields Geographical-based fields
Field data loading
The shard query cache
Setting up the shard query cache
Using circuit breakers
The field data circuit breaker The request circuit breaker The total circuit breaker
Clearing the caches Index, indices, and all caches clearing
Clearing specific caches
Summary
7. Elasticsearch Administration
Discovery and recovery modules
Discovery configuration
Zen discovery
Multicast Zen discovery configuration The unicast Zen discovery configuration
Master node
Configuring master and data nodes
Configuring data-only nodes Configuring master-only nodes Configuring the query processing-only nodes
The master election configuration
Zen discovery fault detection and configuration
The Amazon EC2 discovery
The EC2 plugin installation The EC2 plugin's generic configuration Optional EC2 discovery configuration options The EC2 nodes scanning configuration
Other discovery implementations
The gateway and recovery configuration
The gateway recovery process Configuration properties Expectations on nodes The local gateway Low-level recovery configuration
Cluster-level recovery configuration Index-level recovery settings
The indices recovery API
The human-friendly status API – using the Cat API
The basics Using the Cat API
Common arguments
The examples
Getting information about the master node Getting information about the nodes
Backing up
Saving backups in the cloud
The S3 repository The HDFS repository The Azure repository
Federated search
The test clusters Creating the tribe node
Using the unicast discovery for tribes
Reading data with the tribe node
Master-level read operations
Writing data with the tribe node
Master-level write operations
Handling indices conflicts Blocking write operations
Summary
8. Improving Performance
Using doc values to optimize your queries
The problem with field data cache The example of doc values usage
Knowing about garbage collector
Java memory
The life cycle of Java objects and garbage collections
Dealing with garbage collection problems
Turning on logging of garbage collection work Using JStat Creating memory dumps More information on the garbage collector work Adjusting the garbage collector work in Elasticsearch
Using a standard start up script Service wrapper
Avoid swapping on Unix-like systems
Benchmarking queries
Preparing your cluster configuration for benchmarking Running benchmarks Controlling currently run benchmarks
Very hot threads
Usage clarification for the Hot Threads API The Hot Threads API response
Scaling Elasticsearch
Vertical scaling Horizontal scaling
Automatically creating replicas Redundancy and high availability Cost and performance flexibility Continuous upgrades Multiple Elasticsearch instances on a single physical machine
Preventing the shard and its replicas from being on the same node
Designated nodes' roles for larger clusters
Query aggregator nodes Data nodes Master eligible nodes
Using Elasticsearch for high load scenarios
General Elasticsearch-tuning advices
Choosing the right store The index refresh rate Thread pools tuning Adjusting the merge process Data distribution
Advices for high query rate scenarios
Filter caches and shard query caches Think about the queries Using routing Parallelize your queries Field data cache and breaking the circuit Keeping size and shard_size under control
High indexing throughput scenarios and Elasticsearch
Bulk indexing Doc values versus indexing speed Keep your document fields under control The index architecture and replication Tuning write-ahead log Think about storage RAM buffer for indexing
Summary
9. Developing Elasticsearch Plugins
Creating the Apache Maven project structure Understanding the basics
The structure of the Maven Java project The idea of POM Running the build process Introducing the assembly Maven plugin
Creating custom REST action
The assumptions Implementation details
Using the REST action class
The constructor Handling requests Writing response
The plugin class Informing Elasticsearch about our REST action Time for testing Building the REST action plugin Installing the REST action plugin Checking whether the REST action plugin works
Creating the custom analysis plugin
Implementation details
Implementing TokenFilter Implementing the TokenFilter factory Implementing the class custom analyzer Implementing the analyzer provider Implementing the analysis binder Implementing the analyzer indices component Implementing the analyzer module Implementing the analyzer plugin Informing Elasticsearch about our custom analyzer
Testing our custom analysis plugin
Building our custom analysis plugin Installing the custom analysis plugin Checking whether our analysis plugin works
Summary
Index
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion