In one of my previous posts on elasticsearch, i shared my understanding of elasticsearch configurations and best practices. That was mostly from an indexing perspective. There are several tweaks one can use to optimise query performance as well. Improving querying time can be even more challenging than trying to improve indexing times. Lets see why querying is more of a challenge:
Queries can go on while index is getting updatedDifferent queries would need different strategies for optimisationsThere are far more configurations that impact query performance:Query syntax/clauses usedIndex schemaElasticsearch configurationsRAM, CPU, Network, IO
And there are times when you need to fire 2 or more queries in succession to get certain results back from ES. I have had one such scenario recently where i needed to fire 3 queries to ES and make sure that the response times where always less then a second. The 3 queries in question were related to each other in a sense that query 2 uses output of query 1 and query 3 uses output from query 2. For my use case, one of the queries was simple, while others two were more complex as they had aggregations, stats, filters etc.
As outlined above, there are several things that can prevent an optimal response time. Also, to safely say that a desired response time has been achieved, one needs to test and test right. A poor testing method would lead to misleading performance statistics. Below are details of my testing methodology and tweaks that led to sub second response times for 3 queries.
ElasticSearch Cluster and Indexes5 Machines in the cluster5 Shards per index250 GB EBS volume on each machine to hold indexesIndexes are stored as compressedNo indexing takes place while testing (my use case asks for indexing in batch once a day)3 indexesIndex A: with 24+ million records (used in 1st query)All integer fields.4 fields.Index B: with 90+ million records (used in 2nd query)All integers3 fieldsIndex C: with 340K records (used in 3rd query)String, Integer and Date fieldsonly few fields used in querying.Different machine types:to hold ES indexes: m3.large to c3.4xlargeRAMDifferent sizes for tests, starting from 4GB to 15GB given to ES instance.
Via
Alex Kantone