400–499 range

When you encounter status codes in this range, you should be concerned. The 400 range indicates that there was something wrong with your request. There are many different issues that can trigger these responses, such as poor formatting, authentication issues, or unusual requests. Servers send these codes back to their clients to tell them that they will not fulfill the request because something looks sketchy.

One status code you may already be familiar with is 404 Not Found. This occurs when your request a resource that the server cannot seem to find. This could be due to a misspelling of the resource or because the page does not exist at all. Sometimes, websites update files on their servers and possibly forget to update the links in the web pages with their new locations. This can cause broken links, and it is especially common when a page links to an external website.

Other common status codes in this range that you may encounter are 401 Unauthorized and 403 Forbidden. In both cases, this means that you are trying to access pages that require proper authentication credentials. There are many different forms of authentication for the web, and this book will cover only the basics in the future chapters.

The last status code that I would like to highlight in this range is 429 Too Many Requests. Some web servers are configured with rate limits, meaning that you can only maintain a certain number of requests over a certain period of time. If you are surpassing this rate, then you are not only putting unreasonable stress on the web server, but you are also exposing your web scraper, which puts it at risk for being blacklisted. Following the proper web scraping etiquette is beneficial for both you and your target website.