Sizing indexers

There are a number of factors that affect how many Splunk indexers you will need, but starting with a model system with typical usage levels, the short answer is 100 gigabytes of raw logs per day per indexer. In the vast majority of cases, the disk is the performance bottleneck, except in the case of very slow processors.

The measurements mentioned next assume that you will spread the events across your indexers evenly, using the autoLB feature of the Splunk forwarder. We will talk more about this in indexer load balancing.

The model system looks like this:

To test your concurrency on an existing installation, try this query:

index=_audit search_id action=search 
| transaction maxpause=1h search_id 
| concurrency duration=duration 
| timechart span="1h" avg(concurrency) 
max(concurrency) 

A formula for a rough estimate (assuming eight fast processors and 8 gigabytes of RAM per indexer) might look like this:

indexers needed = 
[your IOPs] / 800 * 
[gigs of raw logs produced per day] / 100 * 
[average concurrent queries] / 4 

The behavior of your systems, network, and users make it impossible to reliably predict performance without testing. These numbers are a rough estimate at best.

Let's say you work for a mid-sized company producing about 80 gigabytes of logs per day. You have some very active users, so you might expect four concurrent queries on an average. You have good disks, which bonnie++ has shown to pull a sustained 950 IOPS. You are also running some fairly heavy summary indexing queries against your web logs, and you expect at least one to be running pretty much all the time. This gives us the following output:

950/800 IOPS * 
80/100 gigs * 
(1 concurrent summary query + 4 concurrent user queries) / 4
= 1.1875 indexers

You cannot really deploy 1.1875 indexers, so your choices are either to start with one indexer and see how it performs or to go ahead and start with two indexers.

My advice would be to start with two indexers, if possible. This gives you some fault tolerance, and installations tend to grow quickly as more data sources are discovered throughout the company. Ideally, when crossing the 100-gigabyte mark, it may make sense to start with three indexers and spread the disks across them. The extra capacity gives you the ability to take one indexer down and still have enough capacity to cover the normal load. See the discussion in the Planning redundancy section.

If we increase the number of average concurrent queries, increase the amount of data indexed per day, or decrease our IOPS, the number of indexers needed should scale more or less linearly.

If we scale up a bit more, say 120 gigabytes a day, 5 concurrent queries, and two summary queries running on an average, we grow as follows:

950/800 IOPS * 
120/100 gigs * 
(2 concurrent summary query + 5 concurrent user queries) / 4
= 2.5 indexers

Three indexers would cover this load, but if one indexer is down, we will struggle to keep up with the data from the forwarders. Ideally, in this case, we should have four or more indexers.