With the advent of virtualization and the move to cloud-based infrastructure, applications can exist on elastic infrastructure designed to grow and shrink based on anticipated or measured traffic patterns. If your application experiences peak periods, you shouldn't have to provision full capacity during non-peak periods, wasting compute resources and money. From virtualization to containers and container schedulers, it's more and more common to have dynamic infrastructure that changes to accommodate the needs of your system.
Microservices are a natural fit for auto-scaling. Because we can scale separate parts of a system separately, it's easier to measure the scaling needs of a specific service and its dependencies.
There are many ways to create auto-scaling clusters. In the next chapter, we'll talk about container orchestration tools, but without skipping ahead, auto-scaling clusters can also be created in any cloud provider. In this recipe, we'll cover creating auto-scaling compute clusters using Amazon Web Services, particularly Amazon EC2 Auto Scaling. We'll create a cluster with multiple EC2 instances running our message-service behind an Application Load Balancer (ALB). We'll configure out cluster to automatically add instances based on CPU utilization.