Buckets

You may have noticed that there is a certain pattern in this configuration file, in which folders are broken into three locations: coldPath, homePath, and thawedPath. This is a very important concept in Splunk. An index contains compressed raw data and associated index files which are spread out into age-designated directories. Each age-designated directory is called a bucket.

A bucket moves through several stages as it ages. In general, as your data gets older (think colder) in the system, it is pushed to the next bucket. And, as you can see in the following list, the thawed bucket contains data that has been restored from an archive. Here is a breakdown of the buckets in relation to each other:

hot: This is newly indexed data and open for writing (hotPath)
warm: This is data rolled from the hot bucket with no active writing (warmPath)
cold: This is data rolled from the warm bucket (coldPath)
frozen: This is data rolled from the cold bucket and archived (frozenPath)
thawed: This is data restored from the archive (thawedPath)

Tip from the Fez: By default, Splunk will delete data as opposed to archiving it in a frozen bucket. Ensure that you are aware of the data retention requirements for your application and configure a path for frozen buckets to land if required.

Now going back to the indexes.conf file, you should realize that homePath will contain the hot and warm buckets, coldPath will contain the cold bucket, and thawedPath will contain any restored data from the archive. This means you can put buckets in different locations to efficiently manage storage resources.

For our purposes, we have chosen to let Splunk handle when data rolls from one bucket to the next using default settings. In high-volume environments, you may need to more specifically control when data rolls through bucket process.