Summary

The process of architecting a storage solution requires many variables to be known. In this chapter, we defined that deciding how much space is needed depends on the GlusterFS volume type, the application requirements, and the estimated growth in data utilization.

Depending on the volume type, the available space is affected, a distributed volume aggregates all of the available space making it the most space efficient, while a replicated volume uses half of the available raw space for mirroring.

The application and user base dictate how much space is required. This is because, depending on the type of data being served, the storage requirements change. Thinking ahead and planning for storage growth avoids the potential to run out of resources, and allows for at least a 10% buffer when sizing should fit most situations.

With the performance requirements, we defined the concepts of throughput, latency, IOPS, and I/O size and how these interact with each other. We defined what variables come into play when configuring GlusterFS for optimal performance, how each volume has its performance characteristics, and how the brick layout plays an important role when trying to optimize a GlusterFS volume.

We also defined how high availability requirements affect sizing and how each volume provides different levels of HA. When disaster recovery is needed, GlusterFS geo-replication adds the required level of availability by replicating data to another physical region, which allows the smooth recovery of services in case of a disaster.

Finally, we went through how the workload defines how the solution is designed and how using tools to verify how the application interacts with the storage allows for the correct configuration of the storage cluster. We also found out how file types and sizes define performance behavior and space utilization, and how asking the right questions allows for a better understanding of the workload, which results in a more efficient and optimized solution.

The main takeaway is to always ask how the application and workload interact with its resources. This allows for the most efficient design possible.

In the next chapter, we'll go through the actual configuration needed for GlusterFS.