Elasticsearch Server - Third Edition by Rafal Kuc -- Read -- Imperial Library of Trantor

Log In

Or create an account ->

Imperial Library

Home
About
News
Upload
Forum

Help

Login/SignUp

Index

Elasticsearch Server Third Edition Credits About the Authors About the Reviewer www.PacktPub.com eBooks, discount offers, and more Why subscribe? Preface What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support Downloading the example code Downloading the color images of this book Errata Piracy Questions 1. Getting Started with Elasticsearch Cluster Full text searching The Lucene glossary and architecture Input data analysis Indexing and querying Scoring and query relevance The basics of Elasticsearch Key concepts of Elasticsearch Index Document Document type Mapping Key concepts of the Elasticsearch infrastructure Nodes and clusters Shards Replicas Gateway Indexing and searching Installing and configuring your cluster Installing Java Installing Elasticsearch Running Elasticsearch Shutting down Elasticsearch The directory layout Configuring Elasticsearch The system-specific installation and configuration Installing Elasticsearch on Linux Installing Elasticsearch using RPM packages Installing Elasticsearch using the DEB package Elasticsearch configuration file localization Configuring Elasticsearch as a system service on Linux Elasticsearch as a system service on Windows Manipulating data with the REST API Understanding the REST API Storing data in Elasticsearch Creating a new document Automatic identifier creation Retrieving documents Updating documents Dealing with non-existing documents Adding partial documents Deleting documents Versioning Usage example Versioning from external systems Searching with the URI request query Sample data URI search Elasticsearch query response Query analysis URI query string parameters The query The default search field Analyzer The default operator property Query explanation The fields returned Sorting the results The search timeout The results window Limiting per-shard results Ignoring unavailable indices The search type Lowercasing term expansion Wildcard and prefix analysis Lucene query syntax Summary 2. Indexing Your Data Elasticsearch indexing Shards and replicas Write consistency Creating indices Altering automatic index creation Settings for a newly created index Index deletion Mappings configuration Type determining mechanism Disabling the type determining mechanism Tuning the type determining mechanism for numeric types Tuning the type determining mechanism for dates Index structure mapping Type and types definition Fields Core types Common attributes String Number Boolean Binary Date Multi fields The IP address type Token count type Using analyzers Out-of-the-box analyzers Defining your own analyzers Default analyzers Different similarity models Setting per-field similarity Available similarity models Configuring default similarity Configuring BM25 similarity Configuring DFR similarity Configuring IB similarity Batch indexing to speed up your indexing process Preparing data for bulk indexing Indexing the data The _all field The _source field Additional internal fields Introduction to segment merging Segment merging The need for segment merging The merge policy The merge scheduler Throttling Introduction to routing Default indexing Default searching Routing The routing parameters Routing fields Summary 3. Searching Your Data Querying Elasticsearch The example data A simple query Paging and result size Returning the version value Limiting the score Choosing the fields that we want to return Source filtering Using the script fields Passing parameters to the script fields Understanding the querying process Query logic Search type Search execution preference Search shards API Basic queries The term query The terms query The match all query The type query The exists query The missing query The common terms query The match query The Boolean match query The phrase match query The match phrase prefix query The multi match query The query string query Running the query string query against multiple fields The simple query string query The identifiers query The prefix query The fuzzy query The wildcard query The range query Regular expression query The more like this query Compound queries The bool query The dis_max query The boosting query The constant_score query The indices query Using span queries A span Span term query Span first query Span near query Span or query Span not query Span within query Span containing query Span multi query Performance considerations Choosing the right query The use cases Limiting results to given tags Searching for values in a range Boosting some of the matched documents Ignoring lower scoring partial queries Using Lucene query syntax in queries Handling user queries without errors Autocomplete using prefixes Finding terms similar to a given one Matching phrases Spans, spans everywhere Summary 4. Extending Your Querying Knowledge Filtering your results The context is the key Explicit filtering with bool query Highlighting Getting started with highlighting Field configuration Under the hood Forcing highlighter type Configuring HTML tags Controlling highlighted fragments Global and local settings Require matching Custom highlighting query The Postings highlighter Validating your queries Using the Validate API Sorting data Default sorting Selecting fields used for sorting Sorting mode Specifying behavior for missing fields Dynamic criteria Calculate scoring when sorting Query rewrite Prefix query as an example Getting back to Apache Lucene Query rewrite properties Summary 5. Extending Your Index Structure Indexing tree-like structures Data structure Analysis Indexing data that is not flat Data Objects Arrays Mappings Final mappings Sending the mappings to Elasticsearch To be or not to be dynamic Disabling object indexing Using nested objects Scoring and nested queries Using the parent-child relationship Index structure and data indexing Child mappings Parent mappings The parent document Child documents Querying Querying data in the child documents Querying data in the parent documents Performance considerations Modifying your index structure with the update API The mappings Adding a new field to the existing index Modifying fields of an existing index Summary 6. Make Your Search Better Introduction to Apache Lucene scoring When a document is matched Default scoring formula Relevancy matters Scripting capabilities of Elasticsearch Objects available during script execution Script types In file scripts Inline scripts Indexed scripts Querying with scripts Scripting with parameters Script languages Using other than embedded languages Using native code The factory implementation Implementing the native script The plugin definition Installing the plugin Running the script Searching content in different languages Handling languages differently Handling multiple languages Detecting the language of the document Sample document The mappings Querying Queries with an identified language Queries with an unknown language Combining queries Influencing scores with query boosts The boost Adding the boost to queries Modifying the score Constant score query Boosting query The function score query Structure of the function query The weight factor function Field value factor function The script score function The random score function Decay functions When does index-time boosting make sense? Defining boosting in the mappings Words with the same meaning Synonym filter Synonyms in the mappings Synonyms stored on the file system Defining synonym rules Using Apache Solr synonyms Explicit synonyms Equivalent synonyms Expanding synonyms Using WordNet synonyms Query or index-time synonym expansion Understanding the explain information Understanding field analysis Explaining the query Summary 7. Aggregations for Data Analysis Aggregations General query structure Inside the aggregations engine Aggregation types Metrics aggregations Minimum, maximum, average, and sum Missing values Using scripts Field value statistics and extended statistics Value count Field cardinality Percentiles Percentile ranks Top hits aggregation Additional parameters Geo bounds aggregation Scripted metrics aggregation Buckets aggregations Filter aggregation Filters aggregation Terms aggregation Counts are approximate Minimum document count Range aggregation Keyed buckets Date range aggregation IPv4 range aggregation Missing aggregation Histogram aggregation Date histogram aggregation Time zones Geo distance aggregations Geohash grid aggregation Global aggregation Significant terms aggregation Choosing significant terms Multiple value analysis Sampler aggregation Children aggregation Nested aggregation Reverse nested aggregation Nesting aggregations and ordering buckets Buckets ordering Pipeline aggregations Available types Referencing other aggregations Gaps in the data Pipeline aggregation types Min, max, sum, and average bucket aggregations Cumulative sum aggregation Bucket selector aggregation Bucket script aggregation Serial differencing aggregation Derivative aggregation Moving avg aggregation Predicting future buckets The models Summary 8. Beyond Full-text Searching Percolator The index Percolator preparation Getting deeper Controlling the size of returned results Percolator and score calculation Combining percolators with other functionalities Getting the number of matching queries Indexed document percolation Elasticsearch spatial capabilities Mapping preparation for spatial searches Example data Additional geo_field properties Sample queries Distance-based sorting Bounding box filtering Limiting the distance Arbitrary geo shapes Point Envelope Polygon Multipolygon An example usage Storing shapes in the index Using suggesters Available suggester types Including suggestions Suggester response Term suggester Term suggester configuration options Additional term suggester options Phrase suggester Configuration Completion suggester Indexing data Querying indexed completion suggester data Custom weights Context suggester Context types Using context Using the geo location context The Scroll API Problem definition Scrolling to the rescue Summary 9. Elasticsearch Cluster in Detail Understanding node discovery Discovery types Node roles Master node Data node Client node Configuring node roles Setting the cluster's name Zen discovery Master election configuration Configuring unicast Fault detection ping settings Cluster state updates control Dealing with master unavailability Adjusting HTTP transport settings Disabling HTTP HTTP port HTTP host The gateway and recovery modules The gateway Recovery control Additional gateway recovery options Indices recovery API Delayed allocation Index recovery prioritization Templates and dynamic templates Templates An example of a template Dynamic templates The matching pattern Field definitions Elasticsearch plugins The basics Installing plugins Removing plugins Elasticsearch caches Fielddata cache Fielddata size Circuit breakers Fielddata and doc values Shard request cache Enabling and configuring the shard request cache Per request shard request cache disabling Shard request cache usage monitoring Node query cache Indexing buffers When caches should be avoided The update settings API The cluster settings API The indices settings API Summary 10. Administrating Your Cluster Elasticsearch time machine Creating a snapshot repository Creating snapshots Additional parameters Restoring a snapshot Cleaning up – deleting old snapshots Monitoring your cluster's state and health Cluster health API Controlling information details Additional parameters Indices stats API Docs Store Indexing, get, and search Additional information Nodes info API Returned information Nodes stats API Cluster state API Cluster stats API Pending tasks API Indices recovery API Indices shard stores API Indices segments API Controlling the shard and replica allocation Explicitly controlling allocation Specifying node parameters Configuration Index creation Excluding nodes from allocation Requiring node attributes Using the IP address for shard allocation Disk-based shard allocation Configuring disk based shard allocation Disabling disk based shard allocation The number of shards and replicas per node Allocation throttling Cluster-wide allocation Allocation awareness Forcing allocation awareness Filtering What do include, exclude, and require mean Manually moving shards and replicas Moving shards Canceling shard allocation Forcing shard allocation Multiple commands per HTTP request Allowing operations on primary shards Handling rolling restarts Controlling cluster rebalancing Understanding rebalance Cluster being ready The cluster rebalance settings Controlling when rebalancing will be allowed Controlling the number of shards being moved between nodes concurrently Controlling which shards may be rebalanced The Cat API The basics Using Cat API Common arguments The examples Getting information about the master node Getting information about the nodes Retrieving recovery information for an index Warming up Defining a new warming query Retrieving the defined warming queries Deleting a warming query Disabling the warming up functionality Choosing queries for warming Index aliasing and using it to simplify your everyday work An alias Creating an alias Modifying aliases Combining commands Retrieving aliases Removing aliases Filtering aliases Aliases and routing Zero downtime reindexing and aliases Summary 11. Scaling by Example Hardware Physical servers or a cloud CPU RAM memory Mass storage The network How many servers Cost cutting Preparing a single Elasticsearch node The general preparations Avoiding swapping File descriptors Virtual memory The memory Field data cache and breaking the circuit Use doc values RAM buffer for indexing Index refresh rate Thread pools Horizontal expansion Automatically creating the replicas Redundancy and high availability Cost and performance flexibility Continuous upgrades Multiple Elasticsearch instances on a single physical machine Preventing a shard and its replicas from being on the same node Designated node roles for larger clusters Query aggregator nodes Data nodes Master eligible nodes Preparing the cluster for high indexing and querying throughput Indexing related advice Index refresh rate Thread pools tuning Automatic store throttling Handling time-based data Multiple data paths Data distribution Bulk indexing RAM buffer for indexing Advice for high query rate scenarios Shard request cache Think about the queries Parallelize your queries Field data cache and breaking the circuit Keep size and shard size under control Monitoring Elasticsearch HQ Marvel SPM for Elasticsearch Summary Index

← Prev
Back
Next →

← Prev
Back
Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion