CPU

Ceph's official recommendation is for 1 GHz of CPU power per OSD. Unfortunately, in real life, it's not quite as simple as this. What the official recommendations don't point out is that a certain amount of CPU power is required per I/O; it's not just a static figure. Thinking about it, this makes sense: the CPU is only used when there is something to be done. If there's no I/O, then no CPU is needed. This, however, scales the other way: the more I/O, the more CPU is required. The official recommendation is a good safe bet for spinning-disk based OSDs. An OSD node equipped with fast SSDs can often find itself consuming several times this recommendation. To complicate things further, the CPU requirements vary depending on I/O size as well, with larger I/Os requiring more CPU.

If the OSD node starts to struggle for CPU resources, it can cause OSDs to start timing out and getting marked out from the cluster, often to rejoin several seconds later. This continuous loss and recovery tends to place more strain on the already limited CPU resource, causing cascading failures.

A good figure to aim for would be around 1-10 MHz per I/O, corresponding to 4 kb-4 MB I/Os respectively. As always, testing should be carried out before going live to confirm that CPU requirements are met both in normal and stressed I/O loads. Additionally, utilizing compression and checksums in BlueStore will use additional CPU per I/O, and should be factored into any calculations when upgrading from a Ceph cluster that had been previously running with filestore. Erasure-coded pools will also consume additional CPU over replicated pools. CPU usage will vary with the erasure coding type and profile, and so testing must be done to gain a better understanding of the requirements.

Another aspect of CPU selection that is key to determining performance in Ceph is the clock speed of the cores. A large proportion of the I/O path in Ceph is single threaded, and so a faster-clocked core will run through this code path faster, leading to lower latency. Because of the limited thermal design of most CPUs, there is often a trade-off of clock speed as the number of cores increases. High core count CPUs with high clock speeds also tend to be placed at the top of the pricing structure, and so it is beneficial to understand your I/O and latency requirements when choosing the best CPU.

A small experiment was done to find the effect of CPU clock speed on write latency. A Linux workstation running Ceph had its CPU clock manually adjusted using the user space governor. The following results clearly show the benefit of high-clocked CPUs:

CPU MHz 4 KB write I/O  Avg latency (us)
1600  797  1250
2000  815  1222
2400  1161  857
2800  1227  812
3300  1320  755
4300  1548  644

 

If low latency, and especially low write latency, is important, then go for the highest-clocked CPUs you can get, ideally at least above 3 GHz. This may require a compromise in SSD-only nodes on how many cores are available, and thus how many SSDs each node can support. For nodes with 12 spinning disks and SSD journals, single-socket quad core processors make an excellent choice, as they are often available with very high clock speeds, and are very aggressively priced.

Where extreme low latency is not as important—for example, in object workloads—look at entry-level processors with well-balanced core counts and clock speeds.

Another consideration concerning CPU and motherboard choice should be the number of sockets. In dual-socket designs, the memory, disk controllers, and NICs are shared between the sockets. When data required by one CPU is required from a resource located on another CPU, a socket is required that must cross the interlink bus between the two CPUs. Modern CPUs have high speed interconnections, but they do introduce a performance penalty, and thought should be given to whether a single socket design is achievable. There are some options given in the section on tuning as to how to work around some of these possible performance penalties.