When to use a summary index

When the question you want to answer requires looking at all or most events for a given source type, the number of events can become huge very quickly. This is what is generally referred to as a dense search.

For example, if you want to know how many page views happened on your website, the query to answer this question must inspect every event. Since each query uses a processor, we are essentially timing how fast our disk can retrieve the raw data and how fast a single processor can decompress that data. Doing a little math, we get the following:

1,000,000 hits per day /

10,000 events processed per second =

100 seconds

If we use multiple indexers, or possibly buy much faster disks, we can cut this time, but only linearly. For instance, if the data is evenly split across four indexers, without changing disks, this query will take roughly 25 seconds.

If we use summary indexing, we should be able to improve our time dramatically.

Let's assume we have calculated the hit counts per five minutes. Now, doing the math:

24 hours * 60 minutes per hour / 5-minute slices =

288 summary events

If we then use those summary events in a query, the math looks like the following:

288 summary events /

10,000 events processed per second =

.0288 seconds

This is a significant increase in performance. In reality, we would probably store more than 288 events. For instance, let's say we want to count the events by their HTTP response code. Assuming that there are 10 different status codes we see on a regular basis, we have:

24 hours * 60 minutes per hour / 5-minute slices * 10 codes = 2880 events

The math then looks as follows:

2,880 summary events /

10,000 events processed per second =

.288 seconds

That's still a significant improvement over 100 seconds.