Introducing libcurl

libcurl is a widely used and full-featured C library for file transfer. It’s been in active development for over twenty years, and it supports not just plain HTTP, but also HTTPS, FTP, SCP, IMAP, SMTP, Gopher, LDAP, and many other protocols. Experienced UNIX console hackers probably know the main command-line utility, curl, but you may have just as likely used it via a binding to its underlying C library from one of dozens of other languages.

Much like libuv, libcurl is designed for convenient use from other programming languages, and as you’ll see shortly, it even has hooks for integration with external event loops like libuv. In other words, it’s perfect for our use case. Even more than libuv, libcurl relies on a small number of functions to do all its work; one consequence, however, is many of those functions have highly generic signatures and hundreds of possible options. Thankfully, libcurl is exceedingly well documented, and I’ll provide definitions for the constants as we go.

The libcurl Easy API

libcurl’s API has an elegant, layered design. Functions with easy in their names, like libcurl_easy_init, provide everything you need to make individual requests in a synchronous fashion, whereas the multi API calls build on top of that to coordinate multiple requests. This structure will allow us to focus on the basics before moving on to the more challenging concurrent use cases.

First, we have to initialize and configure an easy request handle, using the following functions and constants:

LibUVFutures/curl_sync/curl.scala
 type​ ​Curl​ = Ptr[​Byte​]
 type​ ​CurlOption​ = Int
 type​ ​CurlInfo​ = CInt
 
 @name(​"curl_global_init"​)
 def​ global_init(flags​:​​Long​)​:​​Unit​ = extern
 
 @name(​"curl_easy_init"​)
 def​ easy_init()​:​​Curl​ = extern
 
 @name(​"curl_easy_setopt"​)
 def​ curl_easy_setopt(handle​:​ ​Curl​, option​:​ ​CInt​,
  parameter​:​ ​Ptr​[​Byte​])​:​ ​CInt​ = extern
 
 @name(​"curl_easy_getinfo"​)
 def​ easy_getinfo(handle​:​ ​Curl​, info​:​ ​CInt​,
  parameter​:​ ​Ptr​[​Byte​])​:​ ​CInt​ = extern
 
 @name(​"curl_easy_perform"​)
 def​ easy_perform(easy_handle​:​ ​Curl​)​:​ ​CInt​ = extern

libcurl itself needs to be initialized before making any requests, and its global_init() function can take several options; however, we’ll only be using LIBCURL_ALL, which enables SSL and a few other useful options. Once libcurl is initialized, easy_init() creates and returns what libcurl calls an easy handle, which we’ll model as an opaque Ptr[Byte] aliased as EasyCurl. Unlike our libuv handles, we aren’t permitted to stash custom data inside of the EasyCurl handle; instead, we’ll use easy_setopt to stash custom data as well as all kinds of important request metadata.

easy_setopt takes a lot of different options. Here are the ones we’ll be using:

LibUVFutures/curl_sync/curl.scala
 val​ URL​:​​CurlOption​ = 10002
 val​ PORT​:​​CurlOption​ = 10003
 val​ USERPASSWORD​:​​CurlOption​ = 10005
 
 val​ READDATA​:​​CurlOption​ = 10009
 val​ HEADERDATA​:​​CurlOption​ = 10029
 val​ WRITEDATA​:​​CurlOption​ = 10001
 
 val​ READCALLBACK​:​​CurlOption​ = 20012
 val​ HEADERCALLBACK​:​​CurlOption​ = 20079
 val​ WRITECALLBACK​:​​CurlOption​ = 20011
 
 val​ TIMEOUT​:​​CurlOption​ = 13
 val​ GET​:​​CurlOption​ = 80
 val​ POST​:​​CurlOption​ = 47
 val​ PUT​:​​CurlOption​ = 54
 val​ CONTENTLENGTHDOWNLOADT​:​​CurlInfo​ = 0x300000 + 15
 val​ HTTPHEADER​:​​CurlOption​ = 10023
 
 val​ PRIVATEDATA​:​​CurlOption​ = 10103
 val​ GET_PRIVATEDATA​:​​CurlInfo​ = 0x100000 + 21

This can be a little overwhelming, but we can organize it if we work backward from the API we want to provide. First we need to set options for all the key fields on our HTTP Request object:

URL is the most straightforward—we just convert our URL to a CString and pass it in to easy_setopt. The HTTP method is relatively easy as well—GET is the default, and anything else we can set with CUSTOMREQUEST. Likewise, we can set our request body with POSTFIELDS and a CString. We’ll have to pay a little more attention to headers, though. Because libcurl has a special linked list struct for setting headers, we’ll need to convert our Scala Seq using two utility functions provided by libcurl:

LibUVFutures/curl_sync/curl.scala
 type​ ​CurlSList​ = CStruct2[​Ptr​[​Byte​],​CString​]
 
 @name(​"curl_slist_append"​)
 def​ slist_append(slist​:​​Ptr​[​CurlSList​], string​:​​CString​)​:​​Ptr​[​CurlSList​] ​=​ extern
 
 @name(​"curl_slist_free_all"​)
 def​ slist_free_all(slist​:​​Ptr​[​CurlSList​])​:​​Unit​ = extern

Fortunately, we don’t have to concern ourselves with the layout of the list, but we will have to work out a way to keep a pointer to the linked list around for us to free after the request completes.

We’re almost ready to tie together all of the request configuration, but if we want to actually use the data, we need to figure out the callbacks. Even if we’re running libcurl in its easy, blocking mode, the only way we can get access to the results of our request is if we provide callbacks, as well as design and provide data structures for storage. If we were to model this as a C-level struct, we would need all of the following:

For now, however, we can instead define a simpler Scala-style case class like so:

LibUVFutures/curl_sync/curl.scala
 case​ ​class​ ResponseState(
 var​ code​:​​Int​ = 200,
 var​ headers​:​​mutable.Map​[​String​,​String​] ​=​ mutable.Map(),
 var​ body​:​​String​ = ​""
 )

We can retrieve it by request serial number as needed. With that, we’re ready to implement two callbacks—one for when we receive a line of header data and one for when we receive some chunk of body data:

LibUVFutures/curl_sync/main.scala
 val​ statusLine ​=​ raw​".+? (\d+) (.+)\n"​.r
 val​ headerLine ​=​ raw​"([^:]+): (.*)\n"​.r
 val​ headerCB ​=​ ​new​ CurlDataCallback {
 def​ apply(ptr​:​ ​Ptr​[​Byte​], size​:​ ​CSize​, nmemb​:​ ​CSize​,
  data​:​ ​Ptr​[​Byte​])​:​ ​CSize​ = {
 val​ serial ​=​ !(data.asInstanceOf[​Ptr​[​Long​]])
 val​ len ​=​ stackalloc[​Double​]
  !len ​=​ 0
 val​ byteSize ​=​ size * nmemb
 val​ headerString ​=​ bufferToString(ptr,size,nmemb)
  headerString ​match​ {
 case​ statusLine(code, description) ​=>
  println(s​"status code: $code $description"​)
 case​ headerLine(k, v) ​=>
 val​ resp ​=​ responses(serial)
  resp.headers(k) ​=​ v
  responses(serial) ​=​ resp
 case​ l ​=>
  }
  fwrite(ptr, size, nmemb, stdout)
 return​ byteSize
  }
 }
LibUVFutures/curl_sync/main.scala
 val​ writeCB ​=​ ​new​ CurlDataCallback {
 def​ apply(ptr​:​ ​Ptr​[​Byte​], size​:​ ​CSize​, nmemb​:​ ​CSize​,
  data​:​ ​Ptr​[​Byte​])​:​ ​CSize​ = {
 val​ serial ​=​ !(data.asInstanceOf[​Ptr​[​Long​]])
 val​ len ​=​ stackalloc[​Double​]
  !len ​=​ 0
 val​ strData ​=​ bufferToString(ptr,size,nmemb)
 
 val​ resp ​=​ responses(serial)
  resp.body ​=​ resp.body + strData
  responses(serial) ​=​ resp
 
 return​ size * nmemb
  }
 }

A Synchronous curl API

We’re now ready to start putting the pieces together. We can write a getSync function that will initialize a request and set up all of its options, headers, and callbacks. We’ll want to take extra care to create a unique serial number for our request and register it with libcurl so that our callbacks receive it:

LibUVFutures/curl_sync/main.scala
 var​ request_serial ​=​ 0L
 val​ responses ​=​ HashMap[​Long​,​ResponseState​]()
 
 def​ getSync(url​:​​String​, headers​:​​Seq​[​String​] ​=​ Seq.empty)​:​​ResponseState​ = {
 val​ req_id_ptr ​=​ malloc(sizeof[​Long​]).asInstanceOf[​Ptr​[​Long​]]
  !req_id_ptr ​=​ 1 + request_serial
  request_serial += 1
  responses(request_serial) ​=​ ResponseState()
 val​ curl ​=​ easy_init()
 
  Zone { ​implicit​ z ​=>
 val​ url_str ​=​ toCString(url)
  println(curl_easy_setopt(curl, URL, url_str))
  }
  curl_easy_setopt(curl, WRITECALLBACK, Curl.func_to_ptr(writeCB))
  curl_easy_setopt(curl, WRITEDATA, req_id_ptr.asInstanceOf[​Ptr​[​Byte​]])
  curl_easy_setopt(curl, HEADERCALLBACK, Curl.func_to_ptr(headerCB))
  curl_easy_setopt(curl, HEADERDATA, req_id_ptr.asInstanceOf[​Ptr​[​Byte​]])
 val​ res ​=​ easy_perform(curl)
  easy_cleanup(curl)
 return​ responses(request_serial)
 }

This functionality is, by itself, enough to write many useful programs. For example, we can write a utility that takes a URL from the command line and fetches it, like so:

LibUVFutures/curl_sync/main.scala
 def​ main(args​:​​Array​[​String​])​:​​Unit​ = {
  println(​"initializing"​)
  global_init(1)
 val​ resp ​=​ getSync(args(0))
  println(s​"done. got response: $resp"​)
  println(​"global cleanup..."​)
  global_cleanup()
  println(​"done"​)
 }

And we can run it, like so:

 $ ./target/scala-2.11/curl_sync-out
 initializing
 0
 status code: 200 OK
 HTTP/1.1 200 OK
 Accept-Ranges: bytes
 ...
 <html>
 <head>
  <title>Example Domain</title>
 
  <meta charset="utf-8" />
  <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1" />
  <style type="text/css">
  body {
  background-color: #f0f0f2;
 ...
 </body>
 </html>

That sure looks like a web page! Now that we’ve looked at the fundamentals of libcurl, we’re ready to adapt it to our libuv-backed ExecutionContext and see how much faster it can go.