Friday, June 8, 2012

Simple Intro to Synchronization / Serialization in NodeJS

One of the most common topics that comes up with people who are new to NodeJS is how perform synchronous tasks in an asynchronous programming environment. There are two general responses to this:
  1. Your Doing It Wrong. Just learn how to deal with the asynchronous programming style.
  2. Use a module. Step.js, asynch.js, or any number of other ones.
A lot of people have tackled this problem. You might even hear someone recommend to go write your own flow control library first, then go choose the module that you like best. To be honest, I've always wanted to understand the internals of control flow libraries like Step.JS, and have even pored over over Isaac Schlueter's Slide.JS slideshow.

Recently I wanted to create a little node script that would populate some tables in a database. I had a bunch of methods to do this, but they were all asynchronous functions and I needed to call them serially. For example, the first one would create a table. The second would populate some data in the table. And so on. Since my use case was fairly simple, and I didn't really want to download and start blindly using a flow control library to solve my problem, I decided to take it as an opportunity to see if I could create my own solution for serializing the function calls.

The resulting code is below. It's not especially pretty, or flexible. But after working it out, I felt that I did understand the general approach to serialization / synchronization in NodeJS a lot better. Hopefully looking at this simple example will shed some light on this for you as well.

To start with, I imported my init module. The functions I wanted to call synchronously are all exported by this module. I put these functions in an Array like this (note that I'm not actually invoking the functions here, just putting references to them in the array):

The next thing I did was create an array called "callbacks":

The idea is to create a bunch of special callbacks that will be invoked after each of the "steps" above. A callback function normally has parameters like (err, result), and a body that handles these. In addition to handling the err and result, however, when my callbacks were invoked I needed my callbacks to call the next step in the list. You can picture it something like this:

+--------+          +------+
| func_1 |--------->| cb_1 |
+--------+          +------+
+--------+             |
| func_2 |<------------+
    |               +------+
    +-------------->| cb_2 |

Ok, bad ascii art, but hopefully you get the idea. Function 1 gets called, finishes, and callback 1 is invoked. Callback 1 does the usual err / result handling and then invokes Function 2. Function 2 finishes and callback 2 is invoked. It does the usual err / result handling and then invokes Function 3, and so on.

For each of my "steps", I need to create one of these smart callbacks that knows the next function to call. I do that in a for loop, creating a custom callback for each function in the list of "steps".

Once the customized callbacks are all created, I start off the chain of events by calling the first function and passing in the first custom callback, like tipping the first domino in a row.

The part where we create the custom callbacks is the most interesting. Its like a little factory that creates callback functions. The createCallback function creates custom callbacks and returns them. The idea of a function returning another function is a little odd, if you're new to functional programming. But in JavaScript, a function can be passed around like any other variable. And here we are creating custom callback functions that know the next function in the list of steps.  They also know that - if there is a next function to call - they should invoke it by passing in the next callback.

Obviously this is a super simplified example, and doesn't even come close to the functionality provided by the flow control libraries that are readily available. But hopefully its simple enough to give you a basic idea of how some of these modules go about solving the problem of taking asynchronous functions, and executing them in a synchronous, or serialized way.

For reference, here's the whole file... good luck and have fun!