The server block contains an if statement, which generally should be avoided, but it's necessary here. Performance wise, evaluating it for each call is necessary for interrogating the user agent anyway, so there's no degradation. In the following configuration, we'll drop out the bots we don't want to be able to access our site:
server { listen 80; server_name badbots.nginxcookbook.com; if ($http_user_agent ~
(Baiduspider|Yandex|DirBuster|libwww|"")) { return 403; } location / { root /usr/share/nginx/html; index index.html index.htm; } }
This is of course, just a very small amount of user agents to block, but some of the more common. Ideally, you should keep the list small; the larger the list the larger the chance of blocking the wrong thing.
If we view the logs, we can see that the access has been blocked:
106.74.67.24 - - [04/Sep/2016:22:24:03 +1000] "GET / HTTP/1.1" 403 571 "-" "libwww" "-" 106.74.67.24 - - [04/Sep/2016:22:24:03 +1000] "GET / HTTP/1.1" 403 571 "-" "libwww" "-" 106.74.67.24 - - [04/Sep/2016:22:24:03 +1000] "GET / HTTP/1.1" 403 571 "-" "libwww" "-" 106.74.67.24 - - [04/Sep/2016:22:24:04 +1000] "GET / HTTP/1.1" 403 571 "-" "libwww" "-" 106.74.67.24 - - [04/Sep/2016:22:24:04 +1000] "GET / HTTP/1.1" 403 571 "-" "libwww" "-"