HTTP request format

If you open your web browser and navigate to http://www.example.com/page1.htm, your browser will need to send an HTTP request to the web server at www.example.com. That HTTP request may look like this:

GET /page1.htm HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
Accept-Language: en-US
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip, deflate
Host: example.com
Connection: Keep-Alive

As you can see, the browser sends a GET request by default. This GET request is asking the server for the document /page1.htm. A GET request consists of HTTP headers only. There is no HTTP body because the client isn't sending data to the server. The client is only requesting data from the server. In contrast, a POST request would contain an HTTP body.

The first line of an HTTP request is called the request line. The request line consists of three parts – the request type, the document path, and the protocol version. Each part is separated by a space. In the preceding example, the request line is GET /page1.htm HTTP/1.1. We can see that the request type is GET, the document path is /page1.htm, and the protocol version is HTTP/1.1.

When dealing with text-based network protocols, it is always important to be explicit about line endings. This is because different operating systems have standardized on different line-ending conventions. Each line of an HTTP message ends with a carriage return, followed by a newline character. In C, this looks like \r\n. In practice, some web servers may tolerate other line endings. You should ensure that your clients always send a proper \r\n line ending for maximum compatibility.

After the request line, there are various HTTP header fields. Each header field consists of its name followed by a colon, and then its value. Consider the User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 line. This User-Agent line is telling the web server what software is contacting it. Some web servers will serve different documents to different user agents. For example, it is common for some websites to serve full documents to search engine spiders while serving paywalls to actual visitors. The server generally uses the user-agent HTTP header field to determine which is which. At the same time, there is a long history of web clients lying in the user-agent field. I suggest you take the high road in your applications and clearly identify your application with a unique user-agent value.

The only header field that is actually required is Host. The Host field tells the web server which web host the client is requesting the resource from. This is important because one web server may be hosting many different websites. The request line tells the web server that the /page1.htm document is wanted, but it doesn't specify which server that page is on. The Host field fills this role.

The Connection: Keep-Alive line tells the web server that the HTTP client would like to issue additional requests after the current request finishes. If the client had sent Connection: Close instead, that would indicate that the client intended to close the TCP connection after the HTTP response was received.

The web client must send a blank line after the HTTP request header. This blank line is how the web server knows that the HTTP request is finished. Without this blank line, the web server wouldn't know whether any additional header fields were still going to being sent. In C, the blank line looks like this: \r\n\r\n.

Let's now consider what the web server would send in reply to an HTTP request.