We relied on the C standard library functions fgets() and scanf() for HTTP parsing in Chapter 5, Writing a Server the Old-Fashioned Way, and in Chapter 6, Doing I/O Right, with Event Loops, we still relied on scanf. However, these began to feel increasingly clunky when driven by libuv’s asynchronous, buffer-based I/O design. Although the code we wrote performed reasonably well, it did so at the expense of handling certain real-world edge cases.
We’ll solve these problems by introducing another external C library: the node.js http-parser library,[56] which is a low-level C library that is tightly integrated with libuv in the node.js core.
http-parser and libuv really do go together like peanut butter and jelly; the parser API has just a handful of methods we’ll need to call, and two structs. Unlike with our libuv and libcurl bindings, we won’t be able to model these as opaque pointers. Instead, we’ll need to correctly model all of the external-facing fields of the parser object so we can pull out its state when needed.
The parser struct itself looks like this:
| type Parser = CStruct8[ |
| Long, // private data |
| Long, // private data |
| UShort, // major version |
| UShort, // minor version |
| UShort, // status (request only) |
| CChar, // method |
| CChar, // Error (last bit upgrade) |
| Ptr[Byte] // user data |
| ] |
This is a little unwieldy, but we’ll only need to access fields number six (http method) and eight (the custom user data field). The actual library functions we’ll use are as follows:
| def http_parser_init(p:Ptr[Parser],parser_type:Int):Unit = extern |
| def http_parser_settings_init(s:Ptr[ParserSettings]):Unit = extern |
| def http_parser_execute(p:Ptr[Parser],s:Ptr[ParserSettings], |
| data:Ptr[Byte],len:Long):Long = extern |
| def http_method_str(method:CChar):CString = extern |
This sets up and executes parsing when we have data available; the trick is all in the ParserSettings struct, which holds two kinds of callbacks:
| type HttpCB = CFuncPtr1[Ptr[Parser],Int] |
| type HttpDataCB = CFuncPtr3[Ptr[Parser],CString,Long,Int] |
| |
| type ParserSettings = CStruct8[ |
| HttpCB, // on_message_begin |
| HttpDataCB, // on_url |
| HttpDataCB, // on_status |
| HttpDataCB, // on_header_field |
| HttpDataCB, // on_header_value |
| HttpCB, // on_headers_complete |
| HttpDataCB, // on_body |
| HttpCB // on_message_complete |
| ] |
In short, we’ll supply a HttpCB for a notification, where something has occurred but no new data is present, and HttpDataCB for data callbacks, where there’s a new buffer of data for us. We also won’t necessarily need to handle all of these. If we leave any null, they simply won’t get called. Otherwise, all of these are called by http_parser_execute when we pass it a buffer of data, and all of the callbacks will receive pointers and offsets into the same data, without intermediate copies.
In other words, this parser maintains state but doesn’t accumulate data, leaving us free to do so ourselves.
We can design a simple mutable RequestState case class that we can use to accumulate values as they become available:
| case class RequestState( |
| url:String, |
| method:String, |
| var lastHeader:String = "None", |
| headerMap:mutable.Map[String,String] = mutable.Map[String,String](), |
| var body:String = "") |
We’ll also need to design some kind of struct that we can store in the eighth field of Parser to identify requests and responses. For this design to work, we’ll need to closely coordinate the functionality of libuv’s TCP I/O callbacks and handles with a Parser instance per connection, as well as a RequestState per request, although we’ll only have one active RequestState at a time per connection.
If we design a simple, three-field struct containing the connection ID, the TCPHandle, and the Parser struct, we can share it between the libuv and parser aspects of our codebase. In fact, we can structure all the parsing components into a trait that we can mix into our server implementation, allowing for a cleaner design. We just need to require that the server allows us to look up RequestStates by id and provide a handleRequest function for us to call (from the onComplete parser callback) when a request is fully parsed.
The basic interface looks like this:
| trait Parsing { |
| import LibUV._,HttpParser._ |
| val requests:mutable.Map[Long,RequestState] |
| |
| def handleRequest(id:Long,handle:TCPHandle,request:RequestState):Unit |
| |
| type ConnectionState = CStruct3[Long,TCPHandle,Parser] |
| |
| val HTTP_REQUEST = 0 |
| val HTTP_RESPONSE = 1 |
| val HTTP_BOTH = 2 |
Now, we just need to implement the actual callbacks. The first callback to be executed for any request is the onURL callback. We’ll use that to initialize the RequestState with its method and URL (since these will be called from a libuv on_read callback, we can rely on the ConnectionState already being populated).
| def onURL(p:Ptr[Parser],data:CString,len:Long):Int = { |
| val state = (p._8).asInstanceOf[Ptr[ConnectionState]] |
| val message_id = state._1 |
| val url = bytesToString(data,len) |
| println(s"got url: $url") |
| val m = p._6 |
| val method = fromCString(http_method_str(m)) |
| println(s"method: $method ($m)") |
| requests(message_id) = RequestState(url,method) |
| 0 |
| } |
The tricky part is header parsing. We can receive any number of headers, but we’ll always alternate between keys and values. But http-parser can call onValue twice in a row in some circumstances, such as when a buffer boundary falls in the middle of a header line. As a result, we’ll need to keep track of whether we last saw a header key or value, and what that value was, so we can know how to update our Map of headers as we receive each component, like so:
| def onHeaderKey(p:Ptr[Parser],data:CString,len:Long):Int = { |
| val state = (p._8).asInstanceOf[Ptr[ConnectionState]] |
| val message_id = state._1 |
| val request = requests(message_id) |
| |
| val k = bytesToString(data,len) |
| request.lastHeader = k |
| requests(message_id) = request |
| 0 |
| } |
| |
| def onHeaderValue(p:Ptr[Parser],data:CString,len:Long):Int = { |
| val state = (p._8).asInstanceOf[Ptr[ConnectionState]] |
| val message_id = state._1 |
| val request = requests(message_id) |
| |
| val v = bytesToString(data,len) |
| request.headerMap(request.lastHeader) = v |
| requests(message_id) = request |
| 0 |
| } |
And likewise, if we have a POST, PUT, or other HTTP request with a body, we can append that content to our state like so:
| def onBody(p:Ptr[Parser],data:CString,len:Long):Int = { |
| val state = (p._8).asInstanceOf[Ptr[ConnectionState]] |
| val message_id = state._1 |
| val request = requests(message_id) |
| |
| val b = bytesToString(data,len) |
| request.body += b |
| requests(message_id) = request |
| 0 |
| } |
Finally, once the method is complete, we can finalize it and pass it on to the actual server implementation via the handleRequest interface (which we’ll implement shortly):
| def onMessageComplete(p:Ptr[Parser]):Int = { |
| val state = (p._8).asInstanceOf[Ptr[ConnectionState]] |
| val message_id = state._1 |
| val tcpHandle = state._2 |
| val request = requests(message_id) |
| handleRequest(message_id,tcpHandle,request) |
| 0 |
| } |
| } |
| |
| @link("http_parser") |
| @extern |
| object HttpParser { |
| type Parser = CStruct8[ |
| Long, // private data |
| Long, // private data |
| UShort, // major version |
| UShort, // minor version |
| UShort, // status (request only) |
| CChar, // method |
| CChar, // Error (last bit upgrade) |
| Ptr[Byte] // user data |
| ] |
| |
| |
| type HttpCB = CFuncPtr1[Ptr[Parser],Int] |
| type HttpDataCB = CFuncPtr3[Ptr[Parser],CString,Long,Int] |
| |
| type ParserSettings = CStruct8[ |
| HttpCB, // on_message_begin |
| HttpDataCB, // on_url |
| HttpDataCB, // on_status |
| HttpDataCB, // on_header_field |
| HttpDataCB, // on_header_value |
| HttpCB, // on_headers_complete |
| HttpDataCB, // on_body |
| HttpCB // on_message_complete |
| ] |
| |
| def http_parser_init(p:Ptr[Parser],parser_type:Int):Unit = extern |
| def http_parser_settings_init(s:Ptr[ParserSettings]):Unit = extern |
| def http_parser_execute(p:Ptr[Parser],s:Ptr[ParserSettings], |
| data:Ptr[Byte],len:Long):Long = extern |
| def http_method_str(method:CChar):CString = extern |
| } |
That’s it! With the skills we’ve developed over the course of this book, we can now integrate an external HTTP parser in less code that it took us to build one ourselves.