There's more...

The soft delay is a nice way to handle rate limiting in a way which is the least disruptive to most users. However, if you're running an API-based service and want to ensure the requesting application is notified of any request which hits the limit, you can add the nodelay parameter to the limit_req directive. Consider this example:

limit_req zone=basiclimit burst=5 nodelay;

Instead of seeing the connections queued, they're immediately returned with a 503 (Service Unavailable) HTTP error. If we rerun the same initial Apache Benchmark call (even with a single connection), we now see this:

Concurrency Level:      1 
Time taken for tests:   4.306 seconds 
Complete requests:      200 
Failed requests:        152 
   (Connect: 0, Receive: 0, Length: 152, Exceptions: 0) 
Non-2xx responses:      152 
Total transferred:      1387016 bytes 
HTML transferred:       1343736 bytes 
Requests per second:    46.45 [#/sec] (mean) 
Time per request:       21.529 [ms] (mean) 
Time per request:       21.529 [ms] (mean, across all concurrent requests) 
Transfer rate:          314.58 [Kbytes/sec] received

Not all our requests returned with a status of 200, and instead any requests over the limit immediately received a 503. This is why our benchmark only shows 46 successful requests per second, as 152 of these were 503 errors.