Another handy feature introduced with BlueStore is that it enables compression of data at the sub-object level, blobs inside BlueStore. This means that any data written into Ceph, no matter the client access model, can benefit from this feature. Compression is enabled on a per-pool basis but is disabled by default.
As well as the ability to enable compression per-pool, there are also a number of extra options to control the behavior of the compression, as shown in the following list:
- compression_algorithm: This controls which compression library is used to compress data. The default is snappy, a compression library written by Google. Although its compression ratio isn't the best, it has very high performance, and unless you have specific capacity requirements, you should probably stick with snappy. Other options are zlib and zstd.
- compression_mode: This controls the operating status of compression on a per-pool basis. It can be set to either none, passive, aggressive, or force. The passive setting enables the use of compression, but will only compress objects that are marked to be compressed from higher levels. The aggressive setting will try and compress all objects unless explicitly told not to. The force setting will always try and compress data.
- compress_required_ratio: By default, this is set at 87.5%. An object that has been compressed must have been compressed to at least below this value to be considered worth compressing; otherwise, the object will be stored in an uncompressed format.
Although compression does require additional CPU, snappy is very efficient, and the distributed nature of Ceph lends itself well to this task as the compression duties are spread over a large number of CPUs across the cluster. In comparison, a legacy storage array would have to use more of its precious, finite dual controller CPU's resource.
An additional advantage of using compression over the reduction in space consumed is also I/O performance when reading or writing large blocks of data. Because of the data being compressed, the disks or flash devices will have less data to read or write, meaning faster response times. Additionally, flash devices will possibly see less write wear because of the reduced amount of total data written.