Load balancing with Nginx

In practical cases, multiple servers are deployed instead of one for handling huge sets of incoming requests for APIs. But, who should forward an incoming client request to a server instance? A load balancer does that job. Load balancing is a process where the central server distributes the load to various servers based on certain criteria. Refer to the following diagram:

A load balancer employs few strategies such as Round Robin or Least Connection for routing requests to instances. Let's take a look at what each does in a simple table:

Load-balancing method	Description
`Round Robin`	The incoming requests are uniformly distributed across servers based on the criteria of server weights.
`Least Connection`	Requests are sent to the server that is currently serving the least number of clients.
`IP Hash`	This is used to send the requests from a given client's IP to the given server. Only when that server is not available is it given to another server.
`Least Time`	A request from the client is sent to the machine with the lowest average latency (the time-to-serve client) and the least number of active connections.

We can set which strategy to apply for load balancing in the Nginx configuration.

Let's explore how load balancing is practically achieved in Nginx for our Go API servers. The first step in this process is to create an upstream cluster in the http section of the Nginx configuration file:

http {
    upstream cluster {
        server site1.mysite.com weight=5;
        server site2.mysite.com weight=2;
        server backup.mysite.com backup;
    }
}

Here, servers are the IP addresses or domain names of the servers running the same code. We are defining an upstream called cluster here. It is a server group that we can refer to in our location directive. Weights should be given in proportion to the resources available. In the preceding code, site1 is given a higher weight because it may be a bigger instance (memory and CPU). Now, in the location directive, we can specify the server group with the proxy_pass command:

server {
    location / {
        proxy_pass http://cluster;
    }
}

Now, the proxy server that is running will pass requests to the machines in the cluster for all API endpoints hitting the / endpoint. The default request routing algorithm will be Round Robin, which means that all of the server's turns will be repeated one after the other. If we need to change it, we can mention that in the upstream definition. Take a look at the following code snippet:

http {
    upstream cluster {
        least_conn;
        server site1.mysite.com weight=5;
        server site2.mysite.com;
        server backup.mysite.com backup;
    }
}

server {
    location / {
        proxy_pass http://cluster;
    }
}

The preceding configuration says to create a cluster of three machines and add load balancing method as least connections. least_conn is the string we used to mention the load balancing method. The other values could be ip_hash or least_time. You can try this by having a set of machines in the Local Area Network (LAN). Otherwise, we can have Docker installed with multiple virtual containers as different machines to test out load balancing.

We need to add that http block in the /etc/nginx/nginx.conf file, whereas the server block is in /etc/nginx/sites-enabled/default. It is better to separate these two settings.

Here's a small exercise: try to run three bookServer instances on different ports and enable load balancing on Nginx. In the next section, we'll examine how to rate limit an API in Nginx for certain clients.