Autoscaling applications using a Horizontal Pod Autoscaler

In this recipe, you will learn how to create a Horizontal Pod Autoscaler (HPA) to automate the process of scaling the application we created in the previous recipe. We will also test the HPA with a load generator to simulate a scenario of increased traffic hitting our services. Follow these steps:

First, make sure you have the sample To-Do application deployed from the Manually scaling an application recipe. When you run the following command, you should get both MongoDB and Node pods listed:

$ kubectl get pods | grep my-ch7-app
my-ch7-app-mongodb-5499c954b8-lcw27 1/1 Running 0 4h41m
my-ch7-app-node-d8b94964f-94dsb     1/1 Running 0 4h16m
my-ch7-app-node-d8b94964f-h9w4l     1/1 Running 3 4h41m

Create an HPA declaratively using the following command. This will automate the process of scaling the application between 1 to 5 replicas when the targetCPUUtilizationPercentage threshold is reached. In our example, the mean of the pods' CPU utilization target is set to 50 percent usage. When the utilization goes over this threshold, your replicas will be increased:

cat <<EOF | kubectl apply -f -
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: my-ch7-app-autoscaler
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-ch7-app-node
  minReplicas: 1
  maxReplicas: 5
  targetCPUUtilizationPercentage: 50
EOF

Although the results may be the same most of the time, a declarative configuration requires an understanding of the Kubernetes object configuration specs and file format. As an alternative, kubectl can be used for the imperative management of Kubernetes objects.

Note that you must have a request set in your deployment to use autoscaling. If you do not have a request for CPU in your deployment, the HPA will deploy but will not work correctly.
You can also create the same HorizontalPodAutoscaler imperatively by running the $ kubectl autoscale deployment my-ch7-app-node --cpu-percent=50 --min=1 --max=5 command.

Confirm the number of current replicas and the status of the HPA. When you run the following command, the number of replicas should be 1:

$ kubectl get hpa
NAME                  REFERENCE                  TARGETS       MINPODS MAXPODS REPLICAS AGE
my-ch7-app-autoscaler Deployment/my-ch7-app-node 0%/50%        1       5       1        40s

Get the service IP of my-ch7-app-node so that you can use it in the next step:

$ export SERVICE_IP=$(kubectl get svc --namespace default my-ch7-app-node --template "{{ range (index .status.loadBalancer.ingress 0) }}{{.}}{{ end }}")
$ echo http://$SERVICE_IP/
http://mytodoapp.us-east-1.elb.amazonaws.com/

Start a new Terminal window and create a load generator to test the HPA. Make sure that you replace YOUR_SERVICE_IP in the following code with the actual service IP from the output of Step 4. This command will generate traffic to your To-Do application:

$ kubectl run -i --tty load-generator --image=busybox /bin/sh

while true; do wget -q -O- YOUR_SERVICE_IP; done

Wait a few minutes for the Autoscaler to respond to increasing traffic. While the load generator is running on one Terminal, run the following command on a separate Terminal window to monitor the increased CPU utilization. In our example, this is set to 210%:

$ kubectl get hpa
NAME                  REFERENCE                  TARGETS       MINPODS MAXPODS REPLICAS AGE
my-ch7-app-autoscaler Deployment/my-ch7-app-node 210%/50%      1       5       1        23m

Now, check the deployment size and confirm that the deployment has been resized to 5 replicas as a result of the increased workload:

$ kubectl get deployment my-ch7-app-node
NAME            READY UP-TO-DATE AVAILABLE AGE
my-ch7-app-node 5/5   5          5         5h23m

On the Terminal screen where you run the load generator, press Ctrl + C to terminate the load generator. This will stop the traffic coming to your application.
Wait a few minutes for the Autoscaler to adjust and then verify the HPA status by running the following command. The current CPU utilization should be lower. In our example, it shows that it went down to 0%:

$ kubectl get hpa
NAME                  REFERENCE                  TARGETS MINPODS MAXPODS REPLICAS AGE
my-ch7-app-autoscaler Deployment/my-ch7-app-node 0%/50%  1       5       1        34m

Check the deployment size and confirm that the deployment has been scaled down to 1 replica as the result of stopping the traffic generator:

$ kubectl get deployment my-ch7-app-node
NAME            READY UP-TO-DATE AVAILABLE AGE
my-ch7-app-node 1/1   1          1         5h35m

In this recipe, you learned how to automate how an application is scaled dynamically based on changing metrics. When applications are scaled up, they are dynamically scheduled on existing worker nodes.