Horizontal scaling (scaling out)

If you do not want to scale vertically, you can always scale your system horizontally. Often, it is also referred to as scaling out. Google has really made this approach quite popular. The Google search engine is running out of inexpensive hardware boxes. So, despite being a distributed system, scaling out helped Google in its early days expand its search process in a short amount of time while being inexpensive. Most of the time, common tasks are assigned to worker machines and their output is collected by several machines doing the same task. This kind of arrangement also survives through failures. To scale out, load balancing techniques are useful. In this arrangement, a load balancer is usually added in front of all the clusters of the nodes. So, from a consumer perspective, it does not matter which machine/box you are hitting. This makes it easy to add capacity by adding more servers. Adding servers to clusters improves scalability linearly.

Scaling out is a successful strategy when the application code does not depend on the server it is running on. If the request needs to be executed on a specific server, that is, if the application code has server affinity, it will be difficult to scale out. However, in the case of stateless code, it is easier to get that code executed on any server. Hence, scalability is improved when a stateless code is run on horizontally scaled machines or clusters.

Due to the nature of horizontal scaling, it is a commonly used approach across the industry. You can see many examples of large scalable systems managed this way, for example, Google, Amazon, and Microsoft. We recommend that you scale microservices in a horizontal fashion as well.