Generators

The ES6 specification introduces another mechanism that, besides other things, can be used to simplify the asynchronous control flow of our Node.js applications. We are talking about generators, also known as semi-coroutines. They are a generalization of subroutines, where there can be different entry points. In a normal function, in fact, we can have only one entry point, which corresponds to the invocation of the function itself. A generator is similar to a function, but in addition, it can be suspended (using the yield statement) and then resumed at a later time. Generators are particularly useful when implementing iterators, and this should ring a bell, as we have already seen how iterators can be used to implement important asynchronous control flow patterns such as sequential and limited parallel execution.

Before we explore the use of generators for asynchronous control flow, it's important we learn some basic concepts. Let's start from the syntax; a generator function can be declared by appending the * (asterisk) operator after the function keyword:

Inside the makeGenerator() function, we can pause the execution using the keyword yield and return to the caller the value passed to it:

In the preceding code, the generator yields a string, Hello World, by putting the execution of the function on pause. When the generator is resumed, the execution will start from console.log('Re-entered').

The makeGenerator() function is essentially a factory that, when invoked, returns a new generator object:

The most important method of the generator object is next(), which is used to start/resume the execution of the generator and returns an object in the following form:

This object contains the value yielded by the generator (value) and a flag to indicate if the generator has completed its execution (done).

To conclude our exploration of the basic functionality of generators, we will now learn how to pass values back to a generator. This is actually very simple; what we need to do is just providing an argument to the next() method, and that value will be provided as the return value of the yield statement inside the generator.

To show this, let's create a new simple module:

When executed, the preceding code will print Hello world. This means that the following has happened:

In a similar way, we can force a generator to throw an exception. This is made possible by using the throw method of the generator, as shown in the following example:

Using this last code snippet, the twoWayGenerator() function will throw an exception the moment the yield function returns. This works exactly as if an exception was thrown from inside the generator, and this means that it can be caught and handled like any other exception using a try-catch block.

You must be wondering how generators can help us with handling asynchronous operations. We immediately demonstrate that by creating a function that allows us to use asynchronous functions inside a generator and then resuming the execution of the generator when the asynchronous operation completes. We will call this function asyncFlow():

The preceding function takes a generator as an input, instantiates it, and then immediately starts its execution:

The generatorFunction() receives as input a special callback function that invokes generator.throw() if an error is received; otherwise, it resumes the execution of the generator by passing back the results received in the callback function:

To demonstrate the power of this simple function, let's create a new module called clone.js, which (stupidly) creates a clone of itself. Paste the asyncFlow() function we just created, followed by the core of the program:

Remarkably, with the help of the asyncFlow() function, we were able to write asynchronous code using a linear approach, as we were using blocking functions! The magic behind this result should be clear by now. The callback passed to each asynchronous function will in turn resume the generator as soon as the asynchronous operation is complete. Nothing complicated, but the outcome is surely impressive.

There are two other variations of this technique, one involving the use of promises and the other using thunks.

Both thunks and promises allow us to create generators that do not need a callback to be passed as an argument; for example, a version of asyncFlow() using thunks might be the following:

The trick is to read the return value of generator.next(), which contains the thunk. The next step is to invoke the thunk itself, by injecting our special callback. Simple! This allows us to write the following code:

Similarly, we could implement a version of asyncFlow() that accepts a promise as yieldable. We leave this as an exercise as its implementation requires only a minimal change in the asyncFlowWithThunks() function. We may also implement an asyncFlow() function that accepts both promises and thunks as yieldables, using the same principles.

Let's start our practical exploration of generators and co by modifying version 2 of the web spider application. The very first thing we want to do is to load our dependencies and generate a thunkified version of the functions we are going to use. These will go at the top of the spider.js module:

Looking at the preceding code, we can surely notice some similarities with the code we used earlier in the chapter to promisify some APIs. In this regard, it is interesting to point out that if we decided to use the promisified version of our functions instead of their thunkified alternative, the code that will now follow would remain exactly the same, thanks to the fact that co supports both thunks and promises as yieldable objects. In fact, if we want, we could even use both thunks and promises in the same application, even in the same generator. This is a tremendous advantage in terms of flexibility, as it allows us to use generator-based control flow with whatever solution we already have at our disposal.

Okay, now let's start transforming the download() function into a generator:

By using generators and co, our download() function suddenly becomes trivial. All we had to do is just convert it into a generator function and use yield wherever we had an asynchronous function (as thunk) to invoke.

Next, it's the turn of the spider() function:

The interesting detail to notice from this last fragment of code is how we were able to use a try-catch block to handle exceptions. Also, we can now use throw to propagate errors! Another remarkable line is where we yield the download() function, which is not a thunk nor a promisified function, but just another generator. This is possible, thanks to co, which also supports other generators as yieldables.

At last, we can also convert spiderLinks(), where we implemented an iteration to download the links of a web page in sequence. With generators, this becomes trivial as well:

There is really little to explain from the previous code, there is no pattern to show for the sequential iteration; generators and co are doing all the dirty work for us, so we were able to write the asynchronous iteration as if we were using blocking, direct style APIs.

Now comes the most important part, the entry point of our program:

This is the only place where we have to invoke co(...) to wrap a generator. In fact, once we do that, co will automatically wrap any generator we pass to a yield statement, and this will happen recursively, so the rest of the program is totally agnostic to the fact we are using co, even though it's under the hood.

Now it should be possible to run our generator-based web spider application. Just remember to use the --harmony or --harmony-generators flag in the command line:

The bad news about generators is that they are great for writing sequential algorithms, but they can't be used to parallelize the execution of a set of tasks, at least not using just yield and generators. In fact, the pattern to use for these circumstances is to simply rely on a callback-based or promise-based function, which in turn can easily be yielded and used with generators.

Fortunately, for the specific case of the unlimited parallel execution, co already allows us to obtain it natively by simply yielding an array of promises, thunks, generators, or generator functions.

With this in mind, version 3 of our web spider application can be implemented simply by rewriting the spiderLinks() function as follows:

What we did was just collect all the download tasks, which are essentially generators, and then yield on the resulting array. All these tasks will be executed by co in parallel and then the execution of our generator (spiderLinks) will be resumed when all the tasks finish running.

If you think we cheated by exploiting the feature of co that allows us to yield on an array, we can demonstrate how the same parallel flow can be achieved using a callback-based solution similar to what we have already used earlier in the chapter. Let's use this technique to rewrite the spiderLinks() once again:

To run the spider() function in parallel, which is a generator, we had to convert it into a thunk and then execute it. This was possible by wrapping it with the co(...) function, which essentially creates a thunk out of a generator. This way, we were able to invoke it in parallel and set the done() function as callback. Usually, all the libraries for generator-based control flow have a similar feature, so you can always transform a generator into a callback-based function if needed.

To start multiple download tasks in parallel, we just reused the callback-based pattern for parallel execution, which we defined earlier in the chapter. We should also notice that we transformed the spiderLinks() function to a thunk (it's not even a generator anymore.) This enabled us to have a callback function to invoke when all the parallel tasks are completed.

Now that we know how to move in case of nonsequential execution flows, it should be easy to plan the implementation of version 4 of our web spider application, the one imposing a limit on the number of concurrent download tasks. We have several options we can use to do that, some of them are as follows:

For educational purposes, we are going to choose the last option, so we can dive into a pattern that is often associated with coroutines (but also threads and processes).

The goal is to leverage a queue to feed a fixed number of workers, as many as the concurrency level we want to set. To implement this algorithm, we are going to take as starting point the TaskQueue class we defined earlier in the chapter. Let's start gradually; the first thing we want to do is define the constructor:

Notice the invocation of this.spawnWorkers() as this is the method in charge of starting the workers. The next step is, of course, to define our workers; let's see how they look:

Our workers are very simple; they are just generators wrapped around co() and executed immediately, so that each one can run in parallel. Internally, each worker is running an infinite loop that blocks (yield) waiting for a new task to be available in the queue (yield self.nextTask()), and when this happens, it yields the task (which is any valid yieldable) waiting for its completion. You may be wondering, how can we actually wait for the next task to be queued? The answer is in the nextTask() method that we are now going to define:

Let's see what happens in this method, which is the core of the pattern:

Now, to know how the idle workers in the consumerQueue function are resumed, we need to define the pushTask() method:

Trivially, the method invokes the first callback in the consumerQueue function if available, which in turn will unblock a worker. If no callback is available, it means that all the workers are busy, so we simply add a new item in the taskQueue function.

In the TaskQueue class we just defined, the workers have the role of consumers, while whoever uses pushTask() can be considered a producer. This pattern shows us how a generator can look very similar to a thread (or a process). In fact, the producer-consumer interaction is probably the most common problem presented when studying inter-process communication techniques, but as we already mentioned, it is also a common use case for coroutines.