Parsing, Revisited

We relied on the C standard library functions fgets() and scanf() for HTTP parsing in Chapter 5, Writing a Server the Old-Fashioned Way, and in Chapter 6, Doing I/O Right, with Event Loops, we still relied on scanf. However, these began to feel increasingly clunky when driven by libuv’s asynchronous, buffer-based I/O design. Although the code we wrote performed reasonably well, it did so at the expense of handling certain real-world edge cases.

We’ll solve these problems by introducing another external C library: the node.js http-parser library,[56] which is a low-level C library that is tightly integrated with libuv in the node.js core.

The http-parser API

http-parser and libuv really do go together like peanut butter and jelly; the parser API has just a handful of methods we’ll need to call, and two structs. Unlike with our libuv and libcurl bindings, we won’t be able to model these as opaque pointers. Instead, we’ll need to correctly model all of the external-facing fields of the parser object so we can pull out its state when needed.

The parser struct itself looks like this:

LibUVService/parsing.scala
 type​ ​Parser​ = CStruct8[
 Long​, ​//​ ​private​ ​data
 Long​, ​//​ ​private​ ​data
 UShort​, ​//​ ​major​ ​version
 UShort​, ​//​ ​minor​ ​version
 UShort​, ​//​ ​status​ (​request​ ​only​)
 CChar​, ​//​ ​method
 CChar​, ​//​ ​Error​ (​last​ ​bit​ ​upgrade​)
 Ptr​[​Byte​] ​//​ ​user​ ​data
 ]

This is a little unwieldy, but we’ll only need to access fields number six (http method) and eight (the custom user data field). The actual library functions we’ll use are as follows:

LibUVService/parsing.scala
 def​ http_parser_init(p​:​​Ptr​[​Parser​],parser_type​:​​Int​)​:​​Unit​ = extern
 def​ http_parser_settings_init(s​:​​Ptr​[​ParserSettings​])​:​​Unit​ = extern
 def​ http_parser_execute(p​:​​Ptr​[​Parser​],s​:​​Ptr​[​ParserSettings​],
  data​:​​Ptr​[​Byte​],len​:​​Long​)​:​​Long​ = extern
 def​ http_method_str(method​:​​CChar​)​:​​CString​ = extern

This sets up and executes parsing when we have data available; the trick is all in the ParserSettings struct, which holds two kinds of callbacks:

LibUVService/parsing.scala
 type​ ​HttpCB​ = CFuncPtr1[​Ptr​[​Parser​],​Int​]
 type​ ​HttpDataCB​ = CFuncPtr3[​Ptr​[​Parser​],​CString​,​Long​,​Int​]
 
 type​ ​ParserSettings​ = CStruct8[
 HttpCB​, ​//​ ​on_message_begin
 HttpDataCB​, ​//​ ​on_url
 HttpDataCB​, ​//​ ​on_status
 HttpDataCB​, ​//​ ​on_header_field
 HttpDataCB​, ​//​ ​on_header_value
 HttpCB​, ​//​ ​on_headers_complete
 HttpDataCB​, ​//​ ​on_body
 HttpCB​ ​//​ ​on_message_complete
 ]

In short, we’ll supply a HttpCB for a notification, where something has occurred but no new data is present, and HttpDataCB for data callbacks, where there’s a new buffer of data for us. We also won’t necessarily need to handle all of these. If we leave any null, they simply won’t get called. Otherwise, all of these are called by http_parser_execute when we pass it a buffer of data, and all of the callbacks will receive pointers and offsets into the same data, without intermediate copies.

In other words, this parser maintains state but doesn’t accumulate data, leaving us free to do so ourselves.

Parsing Requests

We can design a simple mutable RequestState case class that we can use to accumulate values as they become available:

LibUVService/parsing.scala
 case​ ​class​ RequestState(
  url​:​​String​,
  method​:​​String​,
 var​ lastHeader​:​​String​ = ​"None"​,
  headerMap​:​​mutable.Map​[​String​,​String​] ​=​ mutable.Map[​String​,​String​](),
 var​ body​:​​String​ = ​""​)

We’ll also need to design some kind of struct that we can store in the eighth field of Parser to identify requests and responses. For this design to work, we’ll need to closely coordinate the functionality of libuv’s TCP I/O callbacks and handles with a Parser instance per connection, as well as a RequestState per request, although we’ll only have one active RequestState at a time per connection.

If we design a simple, three-field struct containing the connection ID, the TCPHandle, and the Parser struct, we can share it between the libuv and parser aspects of our codebase. In fact, we can structure all the parsing components into a trait that we can mix into our server implementation, allowing for a cleaner design. We just need to require that the server allows us to look up RequestStates by id and provide a handleRequest function for us to call (from the onComplete parser callback) when a request is fully parsed.

The basic interface looks like this:

LibUVService/parsing.scala
 trait​ Parsing {
 import​ ​LibUV._​,HttpParser.​_
 val​ requests​:​​mutable.Map​[​Long​,​RequestState​]
 
 def​ handleRequest(id​:​​Long​,handle​:​​TCPHandle​,request​:​​RequestState​)​:​​Unit
 
 type​ ​ConnectionState​ = CStruct3[​Long​,​TCPHandle​,​Parser​]
 
 val​ HTTP_REQUEST ​=​ 0
 val​ HTTP_RESPONSE ​=​ 1
 val​ HTTP_BOTH ​=​ 2

Now, we just need to implement the actual callbacks. The first callback to be executed for any request is the onURL callback. We’ll use that to initialize the RequestState with its method and URL (since these will be called from a libuv on_read callback, we can rely on the ConnectionState already being populated).

LibUVService/parsing.scala
 def​ onURL(p​:​​Ptr​[​Parser​],data​:​​CString​,len​:​​Long​)​:​​Int​ = {
 val​ state ​=​ (p._8).asInstanceOf[​Ptr​[​ConnectionState​]]
 val​ message_id ​=​ state._1
 val​ url ​=​ bytesToString(data,len)
  println(s​"got url: $url"​)
 val​ m ​=​ p._6
 val​ method ​=​ fromCString(http_method_str(m))
  println(s​"method: $method ($m)"​)
  requests(message_id) ​=​ RequestState(url,method)
  0
 }

The tricky part is header parsing. We can receive any number of headers, but we’ll always alternate between keys and values. But http-parser can call onValue twice in a row in some circumstances, such as when a buffer boundary falls in the middle of a header line. As a result, we’ll need to keep track of whether we last saw a header key or value, and what that value was, so we can know how to update our Map of headers as we receive each component, like so:

LibUVService/parsing.scala
 def​ onHeaderKey(p​:​​Ptr​[​Parser​],data​:​​CString​,len​:​​Long​)​:​​Int​ = {
 val​ state ​=​ (p._8).asInstanceOf[​Ptr​[​ConnectionState​]]
 val​ message_id ​=​ state._1
 val​ request ​=​ requests(message_id)
 
 val​ k ​=​ bytesToString(data,len)
  request.lastHeader ​=​ k
  requests(message_id) ​=​ request
  0
 }
 
 def​ onHeaderValue(p​:​​Ptr​[​Parser​],data​:​​CString​,len​:​​Long​)​:​​Int​ = {
 val​ state ​=​ (p._8).asInstanceOf[​Ptr​[​ConnectionState​]]
 val​ message_id ​=​ state._1
 val​ request ​=​ requests(message_id)
 
 val​ v ​=​ bytesToString(data,len)
  request.headerMap(request.lastHeader) ​=​ v
  requests(message_id) ​=​ request
  0
 }

And likewise, if we have a POST, PUT, or other HTTP request with a body, we can append that content to our state like so:

LibUVService/parsing.scala
 def​ onBody(p​:​​Ptr​[​Parser​],data​:​​CString​,len​:​​Long​)​:​​Int​ = {
 val​ state ​=​ (p._8).asInstanceOf[​Ptr​[​ConnectionState​]]
 val​ message_id ​=​ state._1
 val​ request ​=​ requests(message_id)
 
 val​ b ​=​ bytesToString(data,len)
  request.body += b
  requests(message_id) ​=​ request
  0
 }

Finally, once the method is complete, we can finalize it and pass it on to the actual server implementation via the handleRequest interface (which we’ll implement shortly):

LibUVService/parsing.scala
 def​ onMessageComplete(p​:​​Ptr​[​Parser​])​:​​Int​ = {
 val​ state ​=​ (p._8).asInstanceOf[​Ptr​[​ConnectionState​]]
 val​ message_id ​=​ state._1
 val​ tcpHandle ​=​ state._2
 val​ request ​=​ requests(message_id)
  handleRequest(message_id,tcpHandle,request)
  0
  }
 }
 
 @link(​"http_parser"​)
 @extern
 object​ HttpParser {
 type​ ​Parser​ = CStruct8[
 Long​, ​//​ ​private​ ​data
 Long​, ​//​ ​private​ ​data
 UShort​, ​//​ ​major​ ​version
 UShort​, ​//​ ​minor​ ​version
 UShort​, ​//​ ​status​ (​request​ ​only​)
 CChar​, ​//​ ​method
 CChar​, ​//​ ​Error​ (​last​ ​bit​ ​upgrade​)
 Ptr​[​Byte​] ​//​ ​user​ ​data
  ]
 
 
 type​ ​HttpCB​ = CFuncPtr1[​Ptr​[​Parser​],​Int​]
 type​ ​HttpDataCB​ = CFuncPtr3[​Ptr​[​Parser​],​CString​,​Long​,​Int​]
 
 type​ ​ParserSettings​ = CStruct8[
 HttpCB​, ​//​ ​on_message_begin
 HttpDataCB​, ​//​ ​on_url
 HttpDataCB​, ​//​ ​on_status
 HttpDataCB​, ​//​ ​on_header_field
 HttpDataCB​, ​//​ ​on_header_value
 HttpCB​, ​//​ ​on_headers_complete
 HttpDataCB​, ​//​ ​on_body
 HttpCB​ ​//​ ​on_message_complete
  ]
 
 def​ http_parser_init(p​:​​Ptr​[​Parser​],parser_type​:​​Int​)​:​​Unit​ = extern
 def​ http_parser_settings_init(s​:​​Ptr​[​ParserSettings​])​:​​Unit​ = extern
 def​ http_parser_execute(p​:​​Ptr​[​Parser​],s​:​​Ptr​[​ParserSettings​],
  data​:​​Ptr​[​Byte​],len​:​​Long​)​:​​Long​ = extern
 def​ http_method_str(method​:​​CChar​)​:​​CString​ = extern
 }

That’s it! With the skills we’ve developed over the course of this book, we can now integrate an external HTTP parser in less code that it took us to build one ourselves.