Chapter 9. Web Workers

When your web application requires heavy lifting or background processing on the JavaScript side, the Web Workers API is your answer.

The Web Workers interface spawns real OS-level threads, allowing for data to be passed back and forth between any given threads (or worker). Furthermore, because communication points between threads are carefully controlled, concurrency problems are rare. You cannot access components unsafe to threads or the DOM, and you have to pass specific data in and out of a thread through serialized objects. So you have to work extremely hard to cause problems in your code. Regardless of how you plan to use Web Workers in your application, the main idea behind processing any behind-the-scenes data lies in the idea of creating multiple workers (or threads) in the browser.

As of this writing, Safari, Safari for iOS5, Chrome, Opera, and Mozilla Firefox support the Web Workers API, but Internet Explorer does not. (Internet Explorer 10 did add support for Web Workers in Platform Preview 2.) Web Workers in Android versions 2.0 and 2.1 support Web Workers, as well, but later versions of Android do not. The only shim currently available for Web Workers makes use of Google Gears. If the core Web Workers API is not supported on a device or browser, you can detect if Google Gears is installed. For more details, see http://html5-shims.googlecode.com/svn/trunk/demo/workers.html.

With Web Workers and its multithreaded approach, you do not have access to the DOM (which is not thread safe), the window, document, or parent objects. You do, however, have access to the quite a few other features and objects, starting with the navigator object:

appCodeName     //the code name of the browser
appName    //the name of the browser
appVersion    //the version information of the browser
cookieEnabled    //Determines whether cookies are enabled in the browser
platform    //Returns for which platform the browser is compiled
userAgent    //the user-agent header sent by the browser to the server

Although you can access the location object, it is read only:

hash    //the anchor portion of a URL
host    //the hostname and port of a URL
hostname    //the hostname of a URL
href    //the entire URL
pathname    //the path name of a URL
port    //the port number the server uses for a URL
protocol    //the protocol of a URL
search    //the query portion of a URL

You can use XMLHttpRequest to make AJAX calls within a worker, as well as import external scripts using the importScripts() method, as long as they’re in the same domain. To cut down wait times, you can set and clear timeouts and intervals with setTimeout(), clearTimeout(), setInterval(), and clearInterval(), respectively. Finally, you can access the Application cache and spawn other workers. Creating a worker is quite easy; you need only a JavaScript file’s URL. The Worker() constructor is invoked with the URL to that file as its only argument:

var worker = new Worker('worker.js');

Note

Worker scripts must be external files with the same scheme as their calling page. Thus, you cannot load a script from a data URL and an HTTPS page cannot start worker scripts that begin with HTTP URLs.

The worker is not actually started until you call postMessage(), such as by sending some object data to the worker:

worker.postMessage({'haz':'foo'}); // Start the worker.

Next, add an EventListener to listen for data the worker returns:

worker.addEventListener('message', function(e) {
  console.log('returned data from worker', e.data);
}, false);

In the actual worker.js file, you could have something simple like:

self.addEventListener('message', function(e) {
  var data = e.data;
  //Manipulate data and send back to parent
  self.postMessage(data.haz); //posts 'foo' to parent DOM
}, false);

The previous example simply relays serialized JSON from the parent DOM to the spawned worker instance, and back again.

In newer browsers (like Chrome), you can take your data types a step further and pass binary data between workers. With transferable objects, data is transferred from one context to another. It is zero-copy, which vastly improves the performance of sending data to a worker.

When you transfer an ArrayBuffer from your main app to a worker, the original ArrayBuffer is cleared and is made no longer usable by the browser. Its contents are transferred to the worker context.

Chrome version 8 and above also includes a new version of postMessage() that supports transferable objects:

var uInt8Array = new Uint8Array(new ArrayBuffer(10));
for (var i = 0; i < uInt8Array.length; ++i) {
  uInt8Array[i] = i * 2; // [0, 2, 4, 6, 8,...]
}

worker.webkitPostMessage(uInt8View.buffer, [uInt8View.buffer]);

Figure 9-1 shows how much faster data can travel between threads using transferable objects. For example, 32MB of data makes a round trip from the worker back to the parent in 2ms. Using previous methods, such as structured cloning, took upward of 300ms to copy the data between threads. To try this test for yourself, visit http://html5-demos.appspot.com/static/workers/transferables/index.html.

Figure 9-1. Using Web Workers with transferable objects

A Practical Use Case: Pooling and Parallelizing Jobs

The following example, originally inspired by Jos Dirksen’s thread pool example, gives you a way to specify the number of concurrent workers (or threads). With this method, browsers like Chrome can use multiple CPU cores when processing data concurrently, and you can significantly increase your rendering time by up to 300%. You can view the full demo here at http://html5e.org/example/workers, but the basic worker1.js file contains:

self.onmessage = function(event) {

    var myobj = event.data;

    search: while (myobj.foo < 200) {
        myobj.foo += 1;
        for (var i = 2; i <= Math.sqrt(myobj.foo); i += 1)
            if (myobj.foo % i == 0)
                continue search;
        // found a prime!
        self.postMessage(myobj);
    }

    // close this worker
    self.close();
};

The above code simply spits out prime numbers and ends at 200. You could set the while loop to while(true) for endless output of prime numbers, but this is a simple example to demonstrate how you can process data in chunks and parallelize the code to reach a common goal with multiple worker threads.

From your main index.html (the place you want all the data to be displayed), initialize your thread pool and give the workers a callback:

slidfast({
     workers: {script:'worker1.js', threads:9, mycallback:workerCallback}
});

Note

To view a live demo of this technique, visit https://github.com/html5e/slidfast/blob/master/example/workers/index.html.

When the workers parameter initializes, the following code creates the thread pool and begins each task concurrently:

function Pool(size) {
  var _this = this;

  // set some defaults
  this.taskQueue = [];
  this.workerQueue = [];
  this.poolSize = size;

  this.addWorkerTask = function (workerTask) {
      if (_this.workerQueue.length > 0) {
          // get the worker from the front of the queue
          var workerThread = _this.workerQueue.shift();
          //get an index for tracking
          slidfast.worker.obj().index = _this.workerQueue.length;
          workerThread.run(workerTask);
      } else {
          // no free workers,
          _this.taskQueue.push(workerTask);
      }
  };

  this.init = function () {
      // create 'size' number of worker threads
      for (var i = 0; i < size; i++) {
          _this.workerQueue.push(new WorkerThread(_this));
      }
  };

  this.freeWorkerThread = function (workerThread) {
      if (_this.taskQueue.length > 0) {
          // don't put back in queue, but execute next task
          var workerTask = _this.taskQueue.shift();
          workerThread.run(workerTask);
      } else {
          _this.taskQueue.push(workerThread);
      }
  };
}

// runner work tasks in the pool
function WorkerThread(parentPool) {

  var _this = this;

  this.parentPool = parentPool;
  this.workerTask = {};

  this.run = function (workerTask) {
      this.workerTask = workerTask;
      // create a new web worker
      if (this.workerTask.script !== null) {
          var worker = new Worker(workerTask.script);
          worker.addEventListener('message', function (event) {
              mycallback(event);
              _this.parentPool.freeWorkerThread(_this);
          }, false);
          worker.postMessage(slidfast.worker.obj());
      }
  };

}

function WorkerTask(script, callback, msg) {
  this.script = script;
  this.callback = callback;
  console.log(msg);
  this.obj = msg;
}

var pool = new Pool(workers.threads);
pool.init();
var workerTask = new WorkerTask(workers.script,
                                  mycallback,
                                  slidfast.worker.obj());

After initializing the worker threads, add the actual workerTasks to process the data:

 pool.addWorkerTask(workerTask);
 slidfast.worker.obj().foo = 10;
 pool.addWorkerTask(workerTask);
 slidfast.worker.obj().foo = 20;
 pool.addWorkerTask(workerTask);
 slidfast.worker.obj().foo = 30;
 pool.addWorkerTask(workerTask);

As you can see in Figure 9-2, each thread brings data back to the main page and renders it with the supplied callback. The thread order varies on each refresh and there is no guarantee on how the browser will process the data. To see a demo, visit http://html5e.org/example/workers. Use the latest version of Chrome or another browser that supports actual CPU core usage per web worker.

Figure 9-2. Data being returned by multiple Web Worker threads in parallel

Other Uses

Crunching prime numbers may not be the best real-world example of using thread pooling, but you can use the same technique for processing image data. For more information, see http://www.smartjava.org/examples/webworkers2 and Figure 9-3.

Figure 9-3. Example of Web Worker threads processing image data

Web Workers could be put into action within your app for additional scenarios as well. For example, you could parse wiki text as the user types, and then generate the HTML. You can find an example of this at http://www.cach.me/blog/2011/01/javascript-web-workers-tutorial-parse-wiki-text-in-real-time. Or, you could use it for visualizations and business graphs. For a visualization framework, see https://github.com/samizdatco/arbor.