Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Foreword Preface
Who Should Read This Book Why We Wrote This Book Elasticsearch Version How to Read This Book Navigating This Book Online Resources Conventions Used in This Book Using Code Examples Safari® Books Online How to Contact Us Acknowledgments
I. Getting Started
1. You Know, for Search…
Installing Elasticsearch
Installing Marvel
Running Elasticsearch
Viewing Marvel and Sense
Talking to Elasticsearch
Java API RESTful API with JSON over HTTP
Document Oriented
JSON
Finding Your Feet
Let’s Build an Employee Directory
Indexing Employee Documents Retrieving a Document Search Lite Search with Query DSL More-Complicated Searches Full-Text Search Phrase Search Highlighting Our Searches Analytics Tutorial Conclusion Distributed Nature Next Steps
2. Life Inside a Cluster
An Empty Cluster Cluster Health Add an Index Add Failover Scale Horizontally
Then Scale Some More
Coping with Failure
3. Data In, Data Out
What Is a Document? Document Metadata
_index _type _id Other Metadata
Indexing a Document
Using Our Own ID Autogenerating IDs
Retrieving a Document
Retrieving Part of a Document
Checking Whether a Document Exists Updating a Whole Document Creating a New Document Deleting a Document Dealing with Conflicts Optimistic Concurrency Control
Using Versions from an External System
Partial Updates to Documents
Using Scripts to Make Partial Updates Updating a Document That May Not Yet Exist Updates and Conflicts
Retrieving Multiple Documents Cheaper in Bulk
Don’t Repeat Yourself How Big Is Too Big?
4. Distributed Document Store
Routing a Document to a Shard How Primary and Replica Shards Interact Creating, Indexing, and Deleting a Document Retrieving a Document Partial Updates to a Document Multidocument Patterns
Why the Funny Format?
5. Searching—The Basic Tools
The Empty Search
hits took shards timeout
Multi-index, Multitype Pagination Search Lite
The _all Field More Complicated Queries
6. Mapping and Analysis
Exact Values Versus Full Text Inverted Index Analysis and Analyzers
Built-in Analyzers When Analyzers Are Used Testing Analyzers Specifying Analyzers
Mapping
Core Simple Field Types Viewing the Mapping Customizing Field Mappings
index analyzer
Updating a Mapping Testing the Mapping
Complex Core Field Types
Multivalue Fields Empty Fields Multilevel Objects Mapping for Inner Objects How Inner Objects are Indexed Arrays of Inner Objects
7. Full-Body Search
Empty Search Query DSL
Structure of a Query Clause Combining Multiple Clauses
Queries and Filters
Performance Differences When to Use Which
Most Important Queries and Filters
term Filter terms Filter range Filter exists and missing Filters bool Filter match_all Query match Query multi_match Query bool Query
Combining Queries with Filters
Filtering a Query Just a Filter A Query as a Filter
Validating Queries
Understanding Errors Understanding Queries
8. Sorting and Relevance
Sorting
Sorting by Field Values Multilevel Sorting Sorting on Multivalue Fields
String Sorting and Multifields What Is Relevance?
Understanding the Score Understanding Why a Document Matched
Fielddata
9. Distributed Search Execution
Query Phase Fetch Phase Search Options
preference timeout routing search_type
scan and scroll
10. Index Management
Creating an Index Deleting an Index Index Settings Configuring Analyzers Custom Analyzers
Creating a Custom Analyzer
Types and Mappings
How Lucene Sees Documents How Types Are Implemented Avoiding Type Gotchas
The Root Object
Properties Metadata: _source Field Metadata: _all Field Metadata: Document Identity
Dynamic Mapping Customizing Dynamic Mapping
date_detection dynamic_templates
Default Mapping Reindexing Your Data Index Aliases and Zero Downtime
11. Inside a Shard
Making Text Searchable
Immutability
Dynamically Updatable Indices
Deletes and Updates
Near Real-Time Search
refresh API
Making Changes Persistent
flush API
Segment Merging
optimize API
II. Search in Depth
12. Structured Search
Finding Exact Values
term Filter with Numbers term Filter with Text Internal Filter Operation
Combining Filters
Bool Filter Nesting Boolean Filters
Finding Multiple Exact Values
Contains, but Does Not Equal Equals Exactly
Ranges
Ranges on Dates Ranges on Strings
Dealing with Null Values
exists Filter missing Filter exists/missing on Objects
All About Caching
Independent Filter Caching Controlling Caching
Filter Order
13. Full-Text Search
Term-Based Versus Full-Text The match Query
Index Some Data A Single-Word Query
Multiword Queries
Improving Precision Controlling Precision
Combining Queries
Score Calculation Controlling Precision
How match Uses bool Boosting Query Clauses Controlling Analysis
Default Analyzers Configuring Analyzers in Practice
Use index settings, not config files Keep it simple
Relevance Is Broken!
14. Multifield Search
Multiple Query Strings
Prioritizing Clauses
Single Query String
Know Your Data
Best Fields
dis_max Query
Tuning Best Fields Queries
tie_breaker
multi_match Query
Using Wildcards in Field Names Boosting Individual Fields
Most Fields
Multifield Mapping
Cross-fields Entity Search
A Naive Approach Problems with the most_fields Approach
Field-Centric Queries
Problem 1: Matching the Same Word in Multiple Fields Problem 2: Trimming the Long Tail Problem 3: Term Frequencies Solution
Custom _all Fields cross-fields Queries
Per-Field Boosting
Exact-Value Fields
15. Proximity Matching
Phrase Matching
Term Positions What Is a Phrase
Mixing It Up Multivalue Fields Closer Is Better Proximity for Relevance Improving Performance
Rescoring Results
Finding Associated Words
Producing Shingles Multifields Searching for Shingles Performance
16. Partial Matching
Postcodes and Structured Data prefix Query wildcard and regexp Queries Query-Time Search-as-You-Type Index-Time Optimizations Ngrams for Partial Matching Index-Time Search-as-You-Type
Preparing the Index Querying the Field Edge n-grams and Postcodes
Ngrams for Compound Words
17. Controlling Relevance
Theory Behind Relevance Scoring
Boolean Model Term Frequency/Inverse Document Frequency (TF/IDF)
Term frequency Inverse document frequency Field-length norm Putting it together
Vector Space Model
Lucene’s Practical Scoring Function
Query Normalization Factor Query Coordination Index-Time Field-Level Boosting
Query-Time Boosting
Boosting an Index t.getBoost()
Manipulating Relevance with Query Structure Not Quite Not
boosting Query
Ignoring TF/IDF
constant_score Query
function_score Query Boosting by Popularity
modifier factor boost_mode max_boost
Boosting Filtered Subsets
filter Versus query functions score_mode
Random Scoring The Closer, The Better Understanding the price Clause Scoring with Scripts Pluggable Similarity Algorithms
Okapi BM25
Term-frequency saturation Field-length normalization Tuning BM25
Changing Similarities
Configuring BM25
Relevance Tuning Is the Last 10%
III. Dealing with Human Language
18. Getting Started with Languages
Using Language Analyzers Configuring Language Analyzers Pitfalls of Mixing Languages
At Index Time
Incorrect stemming Incorrect inverse document frequencies
At Query Time Identifying Language
One Language per Document
Foreign Words
One Language per Field Mixed-Language Fields
Split into Separate Fields Analyze Multiple Times Use n-grams
19. Identifying Words
standard Analyzer standard Tokenizer Installing the ICU Plug-in icu_tokenizer Tidying Up Input Text
Tokenizing HTML Tidying Up Punctuation
20. Normalizing Tokens
In That Case You Have an Accent
Retaining Meaning
Living in a Unicode World Unicode Case Folding Unicode Character Folding Sorting and Collations
Case-Insensitive Sorting Differences Between Languages Unicode Collation Algorithm Unicode Sorting Specifying a Language
Multiple sort orders
Customizing Collations
21. Reducing Words to Their Root Form
Algorithmic Stemmers
Using an Algorithmic Stemmer
Dictionary Stemmers Hunspell Stemmer
Installing a Dictionary Per-Language Settings Creating a Hunspell Token Filter Hunspell Dictionary Format
Choosing a Stemmer
Stemmer Performance Stemmer Quality Stemmer Degree Making a Choice
Controlling Stemming
Preventing Stemming Customizing Stemming
Stemming in situ
Is Stemming in situ a Good Idea
22. Stopwords: Performance Versus Precision
Pros and Cons of Stopwords Using Stopwords
Stopwords and the Standard Analyzer Maintaining Positions Specifying Stopwords Using the stop Token Filter Updating Stopwords
Stopwords and Performance
and Operator minimum_should_match
Divide and Conquer
Controlling Precision Only High-Frequency Terms More Control with Common Terms
Stopwords and Phrase Queries
Positions Data Index Options Stopwords
common_grams Token Filter
At Index Time Unigram Queries Bigram Phrase Queries Two-Word Phrases
Stopwords and Relevance
23. Synonyms
Using Synonyms Formatting Synonyms Expand or contract
Simple Expansion Simple Contraction Genre Expansion
Synonyms and The Analysis Chain
Case-Sensitive Synonyms
Multiword Synonyms and Phrase Queries
Use Simple Contraction for Phrase Queries Synonyms and the query_string Query
Symbol Synonyms
24. Typoes and Mispelings
Fuzziness Fuzzy Query
Improving Performance
Fuzzy match Query Scoring Fuzziness Phonetic Matching
IV. Aggregations
25. High-Level Concepts
Buckets Metrics Combining the Two
26. Aggregation Test-Drive
Adding a Metric to the Mix Buckets Inside Buckets One Final Modification
27. Building Bar Charts 28. Looking at Time
Returning Empty Buckets Extended Example The Sky’s the Limit
29. Scoping Aggregations
Global Bucket
30. Filtering Queries and Aggregations
Filtered Query Filter Bucket Post Filter Recap
31. Sorting Multivalue Buckets
Intrinsic Sorts Sorting by a Metric Sorting Based on “Deep” Metrics
32. Approximate Aggregations
Finding Distinct Counts
Understanding the Trade-offs Optimizing for Speed
Calculating Percentiles
Percentile Metric Percentile Ranks Understanding the Trade-offs
33. Significant Terms
significant_terms Demo
Recommending Based on Popularity Recommending Based on Statistics
34. Controlling Memory Use and Latency
Fielddata Aggregations and Analysis
High-Cardinality Memory Implications
Limiting Memory Usage
Fielddata Size Monitoring fielddata Circuit Breaker
Fielddata Filtering Doc Values
Enabling Doc Values
Preloading Fielddata
Eagerly Loading Fielddata Global Ordinals
Building global ordinals Eager global ordinals
Index Warmers
Preventing Combinatorial Explosions
Depth-First Versus Breadth-First
35. Closing Thoughts
V. Geolocation
36. Geo-Points
Lat/Lon Formats Filtering by Geo-Point geo_bounding_box Filter
Optimizing Bounding Boxes
geo_distance Filter
Faster Geo-Distance Calculations geo_distance_range Filter
Caching geo-filters Reducing Memory Usage Sorting by Distance
Scoring by Distance
37. Geohashes
Mapping Geohashes geohash_cell Filter
38. Geo-aggregations
geo_distance Aggregation geohash_grid Aggregation geo_bounds Aggregation
39. Geo-shapes
Mapping geo-shapes
precision distance_error_pct
Indexing geo-shapes Querying geo-shapes Querying with Indexed Shapes Geo-shape Filters and Caching
VI. Modeling Your Data
40. Handling Relationships
Application-side Joins Denormalizing Your Data Field Collapsing Denormalization and Concurrency
Renaming Files and Directories
Solving Concurrency Issues
Global Locking Document Locking Tree Locking
41. Nested Objects
Nested Object Mapping Querying a Nested Object Sorting by Nested Fields Nested Aggregations
reverse_nested Aggregation When to Use Nested Objects
42. Parent-Child Relationship
Parent-Child Mapping Indexing Parents and Children Finding Parents by Their Children
min_children and max_children
Finding Children by Their Parents Children Aggregation Grandparents and Grandchildren Practical Considerations
Memory Use Global Ordinals and Latency Multigenerations and Concluding Thoughts
43. Designing for Scale
The Unit of Scale Shard Overallocation Kagillion Shards Capacity Planning Replica Shards
Balancing Load with Replicas
Multiple Indices Time-Based Data
Index per Time Frame
Index Templates Retiring Data
Migrate Old Indices Optimize Indices Closing Old Indices Archiving Old Indices
User-Based Data Shared Index Faking Index per User with Aliases One Big User Scale Is Not Infinite
VII. Administration, Monitoring, and Deployment
44. Monitoring
Marvel for Monitoring Cluster Health
Drilling Deeper: Finding Problematic Indices Blocking for Status Changes
Monitoring Individual Nodes
indices Section OS and Process Sections JVM Section Threadpool Section FS and Network Sections Circuit Breaker
Cluster Stats Index Stats Pending Tasks cat API
45. Production Deployment
Hardware
Memory CPUs Disks Network General Considerations
Java Virtual Machine Transport Client Versus Node Client Configuration Management Important Configuration Changes
Assign Names Paths Minimum Master Nodes Recovery Settings Prefer Unicast over Multicast
Don’t Touch These Settings!
Garbage Collector Threadpools
Heap: Sizing and Swapping
Give Half Your Memory to Lucene Don’t Cross 32 GB! Swapping Is the Death of Performance
File Descriptors and MMap Revisit This List Before Production
46. Post-Deployment
Changing Settings Dynamically Logging
Slowlog
Indexing Performance Tips
Test Performance Scientifically Using and Sizing Bulk Requests Storage Segments and Merging Other
Rolling Restarts Backing Up Your Cluster
Creating the Repository Snapshotting All Open Indices Snapshotting Particular Indices Listing Information About Snapshots Deleting Snapshots Monitoring Snapshot Progress Canceling a Snapshot
Restoring from a Snapshot
Monitoring Restore Operations Canceling a Restore
Clusters Are Living, Breathing Creatures
Index
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion