libcurl is a widely used and full-featured C library for file transfer. It’s been in active development for over twenty years, and it supports not just plain HTTP, but also HTTPS, FTP, SCP, IMAP, SMTP, Gopher, LDAP, and many other protocols. Experienced UNIX console hackers probably know the main command-line utility, curl, but you may have just as likely used it via a binding to its underlying C library from one of dozens of other languages.
Much like libuv, libcurl is designed for convenient use from other programming languages, and as you’ll see shortly, it even has hooks for integration with external event loops like libuv. In other words, it’s perfect for our use case. Even more than libuv, libcurl relies on a small number of functions to do all its work; one consequence, however, is many of those functions have highly generic signatures and hundreds of possible options. Thankfully, libcurl is exceedingly well documented, and I’ll provide definitions for the constants as we go.
libcurl’s API has an elegant, layered design. Functions with easy in their names, like libcurl_easy_init, provide everything you need to make individual requests in a synchronous fashion, whereas the multi API calls build on top of that to coordinate multiple requests. This structure will allow us to focus on the basics before moving on to the more challenging concurrent use cases.
First, we have to initialize and configure an easy request handle, using the following functions and constants:
| type Curl = Ptr[Byte] |
| type CurlOption = Int |
| type CurlInfo = CInt |
| |
| @name("curl_global_init") |
| def global_init(flags:Long):Unit = extern |
| |
| @name("curl_easy_init") |
| def easy_init():Curl = extern |
| |
| @name("curl_easy_setopt") |
| def curl_easy_setopt(handle: Curl, option: CInt, |
| parameter: Ptr[Byte]): CInt = extern |
| |
| @name("curl_easy_getinfo") |
| def easy_getinfo(handle: Curl, info: CInt, |
| parameter: Ptr[Byte]): CInt = extern |
| |
| @name("curl_easy_perform") |
| def easy_perform(easy_handle: Curl): CInt = extern |
libcurl itself needs to be initialized before making any requests, and its global_init() function can take several options; however, we’ll only be using LIBCURL_ALL, which enables SSL and a few other useful options. Once libcurl is initialized, easy_init() creates and returns what libcurl calls an easy handle, which we’ll model as an opaque Ptr[Byte] aliased as EasyCurl. Unlike our libuv handles, we aren’t permitted to stash custom data inside of the EasyCurl handle; instead, we’ll use easy_setopt to stash custom data as well as all kinds of important request metadata.
easy_setopt takes a lot of different options. Here are the ones we’ll be using:
| val URL:CurlOption = 10002 |
| val PORT:CurlOption = 10003 |
| val USERPASSWORD:CurlOption = 10005 |
| |
| val READDATA:CurlOption = 10009 |
| val HEADERDATA:CurlOption = 10029 |
| val WRITEDATA:CurlOption = 10001 |
| |
| val READCALLBACK:CurlOption = 20012 |
| val HEADERCALLBACK:CurlOption = 20079 |
| val WRITECALLBACK:CurlOption = 20011 |
| |
| val TIMEOUT:CurlOption = 13 |
| val GET:CurlOption = 80 |
| val POST:CurlOption = 47 |
| val PUT:CurlOption = 54 |
| val CONTENTLENGTHDOWNLOADT:CurlInfo = 0x300000 + 15 |
| val HTTPHEADER:CurlOption = 10023 |
| |
| val PRIVATEDATA:CurlOption = 10103 |
| val GET_PRIVATEDATA:CurlInfo = 0x100000 + 21 |
This can be a little overwhelming, but we can organize it if we work backward from the API we want to provide. First we need to set options for all the key fields on our HTTP Request object:
URL is the most straightforward—we just convert our URL to a CString and pass it in to easy_setopt. The HTTP method is relatively easy as well—GET is the default, and anything else we can set with CUSTOMREQUEST. Likewise, we can set our request body with POSTFIELDS and a CString. We’ll have to pay a little more attention to headers, though. Because libcurl has a special linked list struct for setting headers, we’ll need to convert our Scala Seq using two utility functions provided by libcurl:
| type CurlSList = CStruct2[Ptr[Byte],CString] |
| |
| @name("curl_slist_append") |
| def slist_append(slist:Ptr[CurlSList], string:CString):Ptr[CurlSList] = extern |
| |
| @name("curl_slist_free_all") |
| def slist_free_all(slist:Ptr[CurlSList]):Unit = extern |
Fortunately, we don’t have to concern ourselves with the layout of the list, but we will have to work out a way to keep a pointer to the linked list around for us to free after the request completes.
We’re almost ready to tie together all of the request configuration, but if we want to actually use the data, we need to figure out the callbacks. Even if we’re running libcurl in its easy, blocking mode, the only way we can get access to the results of our request is if we provide callbacks, as well as design and provide data structures for storage. If we were to model this as a C-level struct, we would need all of the following:
For now, however, we can instead define a simpler Scala-style case class like so:
| case class ResponseState( |
| var code:Int = 200, |
| var headers:mutable.Map[String,String] = mutable.Map(), |
| var body:String = "" |
| ) |
We can retrieve it by request serial number as needed. With that, we’re ready to implement two callbacks—one for when we receive a line of header data and one for when we receive some chunk of body data:
| val statusLine = raw".+? (\d+) (.+)\n".r |
| val headerLine = raw"([^:]+): (.*)\n".r |
| val headerCB = new CurlDataCallback { |
| def apply(ptr: Ptr[Byte], size: CSize, nmemb: CSize, |
| data: Ptr[Byte]): CSize = { |
| val serial = !(data.asInstanceOf[Ptr[Long]]) |
| val len = stackalloc[Double] |
| !len = 0 |
| val byteSize = size * nmemb |
| val headerString = bufferToString(ptr,size,nmemb) |
| headerString match { |
| case statusLine(code, description) => |
| println(s"status code: $code $description") |
| case headerLine(k, v) => |
| val resp = responses(serial) |
| resp.headers(k) = v |
| responses(serial) = resp |
| case l => |
| } |
| fwrite(ptr, size, nmemb, stdout) |
| return byteSize |
| } |
| } |
| val writeCB = new CurlDataCallback { |
| def apply(ptr: Ptr[Byte], size: CSize, nmemb: CSize, |
| data: Ptr[Byte]): CSize = { |
| val serial = !(data.asInstanceOf[Ptr[Long]]) |
| val len = stackalloc[Double] |
| !len = 0 |
| val strData = bufferToString(ptr,size,nmemb) |
| |
| val resp = responses(serial) |
| resp.body = resp.body + strData |
| responses(serial) = resp |
| |
| return size * nmemb |
| } |
| } |
We’re now ready to start putting the pieces together. We can write a getSync function that will initialize a request and set up all of its options, headers, and callbacks. We’ll want to take extra care to create a unique serial number for our request and register it with libcurl so that our callbacks receive it:
| var request_serial = 0L |
| val responses = HashMap[Long,ResponseState]() |
| |
| def getSync(url:String, headers:Seq[String] = Seq.empty):ResponseState = { |
| val req_id_ptr = malloc(sizeof[Long]).asInstanceOf[Ptr[Long]] |
| !req_id_ptr = 1 + request_serial |
| request_serial += 1 |
| responses(request_serial) = ResponseState() |
| val curl = easy_init() |
| |
| Zone { implicit z => |
| val url_str = toCString(url) |
| println(curl_easy_setopt(curl, URL, url_str)) |
| } |
| curl_easy_setopt(curl, WRITECALLBACK, Curl.func_to_ptr(writeCB)) |
| curl_easy_setopt(curl, WRITEDATA, req_id_ptr.asInstanceOf[Ptr[Byte]]) |
| curl_easy_setopt(curl, HEADERCALLBACK, Curl.func_to_ptr(headerCB)) |
| curl_easy_setopt(curl, HEADERDATA, req_id_ptr.asInstanceOf[Ptr[Byte]]) |
| val res = easy_perform(curl) |
| easy_cleanup(curl) |
| return responses(request_serial) |
| } |
This functionality is, by itself, enough to write many useful programs. For example, we can write a utility that takes a URL from the command line and fetches it, like so:
| def main(args:Array[String]):Unit = { |
| println("initializing") |
| global_init(1) |
| val resp = getSync(args(0)) |
| println(s"done. got response: $resp") |
| println("global cleanup...") |
| global_cleanup() |
| println("done") |
| } |
And we can run it, like so:
| $ ./target/scala-2.11/curl_sync-out |
| initializing |
| 0 |
| status code: 200 OK |
| HTTP/1.1 200 OK |
| Accept-Ranges: bytes |
| ... |
| <html> |
| <head> |
| <title>Example Domain</title> |
| |
| <meta charset="utf-8" /> |
| <meta http-equiv="Content-type" content="text/html; charset=utf-8" /> |
| <meta name="viewport" content="width=device-width, initial-scale=1" /> |
| <style type="text/css"> |
| body { |
| background-color: #f0f0f2; |
| ... |
| </body> |
| </html> |
That sure looks like a web page! Now that we’ve looked at the fundamentals of libcurl, we’re ready to adapt it to our libuv-backed ExecutionContext and see how much faster it can go.