Put simply, indexes.conf determines where data is stored on the disk, how much is kept, and for how long. An index is simply a named directory with a specific structure. Inside this directory structure, there are a few metadata files and subdirectories; the subdirectories are called buckets and actually contain the indexed data.
A simple stanza looks like this:
[implSplunk] homePath = $SPLUNK_DB/implSplunk/db coldPath = $SPLUNK_DB/implSplunk/colddb thawedPath = $SPLUNK_DB/implSplunk/thaweddb
Let's walk through these attributes:
- homePath: This is the location for recent data.
- coldPath: This is the location for older data.
- thawedPath: This is a directory where buckets can be restored. It is an unmanaged location. This attribute must be defined, but I, for one, have never actually used it.
An aside about the terminology of buckets is probably in order. It is as follows:
- hot: This is a bucket that is currently open for writing. It lives in homePath.
- warm: This is a bucket that was created recently but is no longer open for writing. It also lives in homePath.
- cold: This is an older bucket that has been moved to coldPath. It is moved when maxWarmDBCount has been exceeded.
- frozen: For most installations, this simply means deleted. For customers who want to archive buckets, coldToFrozenScript or coldToFrozenDir can be specified to save buckets.
- thawed: A thawed bucket is a frozen bucket that has been brought back. It is special in that it is not managed, and it is not included in all time queries. When using coldToFrozenDir, only the raw data is typically kept, so Splunk rebuild will need to be used to make the bucket searchable again.
How long data stays in an index is controlled by these attributes:
- frozenTimePeriodInSecs: This setting dictates the oldest data to keep in an index. A bucket will be removed when its newest event is older than this value. The default value is approximately 6 years.
- maxTotalDataSizeMB: This setting dictates how large an index can be. The total space used across all hot, warm, and cold buckets will not exceed this value. The oldest bucket is always frozen first. The default value is 500 gigabytes. It is generally a good idea to set both of these attributes. frozenTimePeriodInSecs should match what users expect. maxTotalDataSizeMB should protect your system from running out of disk space.
Less commonly used attributes include:
- coldToFrozenDir: If specified, buckets will be moved to this directory instead of being deleted. This directory is not managed by Splunk, so it is up to the administrator to make sure that the disk does not fill up.
- maxHotBuckets: A bucket represents a slice of time and will ideally span as small a slice of time as is practical. I would never set this value to less than 3, but ideally, it should be set to 10.
- maxDataSize: This is the maximum size for an individual bucket. The default value is set by the processor type and is generally acceptable. The larger a bucket, the fewer the buckets to be opened to complete a search, but the more the disk space needed before a bucket can be frozen. The default is auto, which will never top 750 MB. The setting auto_high_volume, which equals 1 GB on 32-bit systems and 10 GB on 64-bit systems, should be used for indexes that receive more than 10 GB a day.
We will discuss sizing multiple indexes in Chapter 12, Advanced Deployments.