Summary

In this chapter, we went through the core concepts of what a cluster is and defined it as a set of computers called nodes working together in the same type of workload. A compute cluster's primary function is to perform tasks that run CPU-intensive workloads, which are designed to reduce processing time. A storage cluster's function is to aggregate available storage resources into a single storage space that simplifies management and allows you to efficiently reach the petascale or go beyond the 1-PB available space. Then, we explored how SDS is changing the way that data is stored and how GlusterFS is one of the projects that is leading this change. SDS allows for the simplified management of storage resources, while at the same time adding features that were impossible with traditional monolithic storage arrays.

To further understand how applications interact with storage, we defined the core differences between block, file, and object storage. Primarily, block storage deals with logical blocks of data in a storage device, file storage works by reading or writing actual files from a storage space, and object storage provides metadata to each object for further interaction. With these concepts of different interactions with storage in mind, we went on to point out the characteristics of GlusterFS that make it attractive for enterprise customers and how these features tie into what SDS stands for.

Finally, we delved into the main reasons why high availability and high performance are a must for every storage design and how performing parallel, or serial, I/O can affect application performance.

In the next chapter, we will dive into the actual process of architecting a GlusterFS storage cluster.