Wednesday, November 12, 2014

Three Simple Rules for Escaping Callback Hell

A lot of newcomers to Node.JS complain about "callback hell" and the "pyramid of doom" when they're getting started with the callback-driven continuation passing style.  It's confusing, and a lot of people reach for an async / flow-control module right away.  Many people have settled on using Promises, a solution that brings some unfortunate problems along with it (performance, error-hiding anti-patterns, and illusory behavior, for example).

I prefer using some simple best practices for working with callbacks to keep my code clean and organized. These techniques don't require adding any extra modules to your code base, won't slow your program down, don't introduce error-hiding anti-patterns, and don't convey a false impression of synchronous execution. Best of all, they result in code that is actually more readable and concise, and once you see how simple they are, you might want to use them, too.

Here they are:
  1. use named functions for callbacks
  2. nest functions when you need to capture (enclose) variable scope
  3. use return when invoking the callback

The Pyramid of Doom

Here's a contrived example that uses typical node.js callbacks with (err, result) arguments. It's a mess of nested functions: the so-called Pyramid of Doom. It keeps indenting, layer upon smothering layer, until it unwinds in a great cascading spasm of parenthesis, braces and semi-colons.


Named Callbacks

The Pyramid of Doom is often shown as a reason to use Promises, but most async libraries -- including and especially Promises -- don't really solve this nesting problem.  We don't end up with deeply nested code like this because something is wrong with JavaScript. We get it because people write bad, messy code.  Named callbacks solve this problem, very simply. Andrew Kelley wrote about this on his blog a while ago ("JavaScript Callbacks are Pretty Okay"). It's a great post with some simple ways of taming "callback hell" that get skipped over by a lot of node newcomers.

Here's the above example re-written using named callback functions. Instead of a Russian doll of anonymous functions, every function that takes a callback is passed the name of the callback function to use. The callback function is defined immediately afterwards, greatly improving readability.


Nest Only for Scope

We can do even better. Notice that two functions, sendGreeting and showResult, are still nested inside of the getGreeting function. Nested "inner" functions create a closure that encloses the callback function's own local variable scope, plus the variable scope of the function its nested inside of. These nested callbacks can access variables from higher up the call stack. In our example, both sendGreeting and showResult use variables that were created earlier in the getGreeting function. They can access these variable from getGreeting, because they're nested inside getGreeting and thus, enclose its variable scope.

A lot of times this is totally unnecessary. You only need to nest functions if you need to refer to variables in the scope of the caller from within the callback function. Otherwise, simply put named functions on the same level as the caller. In our example, variables can be shared by moving them to the top-level scope of the greet function. Then, we can put all our named functions on the same level. No more nesting and indentation!


Return when invoking a Callback

The last point to improve readability is more a stylistic preference, but if you make a habit of always returning from an error-handling clause, you can further minimize your code. In direct-style programming where function calls are meant to return a value, common wisdom says that returning from an if clause like this is bad practice that can lead to errors.  With continuation-passing style, however, explicitly returning when you invoke the callback ensures that you don't accidentally execute additional code in the calling function after the callback has been invoked. For that reason, many node developers consider it best practice. In trivial functions, it can improve readability by eliminating the else clause, and it is used by a number of popular JavaScript modules.  I find a pragmatic approach is to return from error handling clauses or other conditional if/else clauses, but sometimes leave off the explicit return on the last line in the function, in the interest of less code and better readability. Here's the updated example:

Compare this example with the Pyramid of Doom at the beginning of the post. I think you'll agree that these simple rules result in cleaner, more readable code and provide a great escape from the Callback Hell we started out with.

Good luck and have fun!

Monday, October 13, 2014

How Wolves Change Rivers

A beautifully filmed short video from Yellowstone National Park that reminds us of the importance of wildlife for the health of the whole planet:


How Wolves Change Rivers from Sustainable Man on Vimeo.

Wednesday, September 17, 2014

doxli - a help utility for node modules on command line

Quite often I fire up the node REPL and pull in some modules I've written to use on the command line. Unfortunately I often forget the exact way to call the various functions in those modules (there are a lot) and end up doing something like foo.dosomething.toString() to see the source code and recall the function signature.

In the interest of making code as "self-documenting" as possible,  I wrote a small utility that uses dox to provide help for modules on the command line. It adds a help() function to a module's exported methods so you can get the dox / jsdoc comments for the function on the command line.

So now foo.dosomething.help() will return the description, parameters, examples and so on for the method based on the documentation in the comments.

It's still a bit of a work in progress, but it works nicely - provided you actually document your modules with jsdoc-style comments.

All the info is here: https://www.npmjs.org/package/doxli

Sunday, September 7, 2014

REST API Best Practices 4: Collections, Resources and Identifiers

Other articles in this series:
  1. REST API Best Practices: A REST Cheat Sheet
  2. REST API Best Practices: HTTP and CRUD
  3. REST API Best Practices: Partial Updates - PATCH vs. PUT
RESTful APIs center around resources that are grouped into collections. A classic example is browsing through the directory listings and files on a website like http://vault.centos.org/. When you browse the directory listing, you can click through a series of folders to download files.  The folders are collections of CentOS resource files.



In Rest, collections and resources are accessed via HTTP URI's in a similar way:

members/ -- a collection of members
members/1 -- a resource representing member #1
members/2 -- a resource representing member #2

It may help to think of a REST collection as a directory folder containing files, although its highly unlikely that the member data is stored as literal JSON files on the server. The member data should be coming from a database, but from the perspective of a REST API, it looks similar to a directory called "members" that contains a bunch of files for download.

Naming collections


In case it's not obvious already, collection names should be nouns. Use the plural form for naming collections. There's been some debate over whether collection names should be plural (members/1) or singular (member/1). The plural form seems to be most widely used.

Getting a collection


Getting a collection, like "members" may return
  1. the entire list of resources as a list of links, 
  2. partial representations of each resource, or 
  3. full representations of all the resources in the collection. 
Our classic example of browsing online directories and files uses approach #1, returning a list of links to the files. The list is formatted in HTML, so you can click on the hyperlink to access a particular file.

Approach #2, returning a partial representation (ie. first name, last name) of all resources in a collection is a more pragmatic way of returning enough information about the resources in a collection for the end user to make a selection to request further details, especially if the collection can contain a lot of resources. Actually, the directory listings on a website like http://vault.centos.org/ display more than just the hyperlink. They include additional meta-data like the last-modified timestamp and file size, as well.  This is helpful for the end-user who's looking for an up-to-date file and wants to know how long it will take to download. It's a good example of returning just enough information about the resources for the end-user to be able to make a selection.

With approach #3, if a collection is small, you may want to return the full representation of all the resources in the collection as a big array.  For large collections, it isn't practical, however. Github is the only RESTful API example I've seen that actually returns a full representation of all resources when you fetch the collection. I wouldn't consider  #3 to be a "best practice", or recommend it for most use cases, but if you know the collection and resources will be small, it might be more effective to fetch the whole collection all at once like this.

The best practice for fetching a collection of resources, in my opinion, is #2: return a partial representation of the resources in a collection with just enough information to facilitate the selection process, and be sure to include the URL (href) of each resource where it can be downloaded from.

Only when a collection is guaranteed to be small and you need to reduce the performance impact of making multiple queries, consider bending the rules with approach #3 to return all the resources in one fell swoop.

Here's a practical example of fetching the collection of members using approach #2.

Request

GET /members
Host: localhost:8080

Response

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8

[
  {
    "id": 1,
    "href": "/members/1",
    "firstname": "john",
    "lastname": "doe"
  },
  {
    "id": 2,
    "href": "/members/2",
    "firstname": "jane",
    "lastname": "doe"
  }
]

In this example, some minimal information is returned about each of the members: first and last name, id, and the "href" URL where the full representation of the member resource can be downloaded.


Getting a resource


Getting a specific resource should returns the full representation of that resource from the URL that contains the collection name and the ID of the specific resource you want.

Resource IDs


RESTful resources have one or more identifiers: a numerical ID, a title, and so on. Common practice is for every resource to have a numeric ID that is used to reference the resource, although there are some notable exceptions to the rule.

Resources themselves should contain their numerical ID; the current best practice is for this to exist within the resource simply as an attribute labelled "id". Every resource should contain an "id"; avoid using more complicated names for resource identifiers like "memberID" or "accountNumber" and just stick with "id". If you need additional identifiers on a resource, go ahead and add them, but always have an "id" that acts as the primary way to retrieve the resource. So, if a member has "id" : 1, it should be fairly obvious that you can fetch his details at the URL "members/1".

An example of fetching a member resource would be:

Request

GET /members/1
Host: localhost:8080

Response

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8

{
  "id": 1,
  "href": "/members/1",
  "firstname": "john",
  "lastname": "doe",
  "active": true,
  "lastLoggedIn": "Tue Sep 16 2014 08:37:42 GMT-400 (EDT)",
  "foo": "bar",
  "fizz": "buzz",
  "qux": "doo"
}

Beyond simple collections


Most of the examples you see online are fairly simple, but practical data models are often much more complex.  Resources frequently contain sub-collections and relationships with other resources. API design in this area seems to be done in a mostly ad-hoc manner,but there are some practical considerations and trade-offs when designing APIs for more complex data models, which should be covered in the next post.

Thursday, August 21, 2014

Defensive Shift - Turning the Tables on Surveillance

Like many people lately, I've been pondering the implications of pervasive surveillance, "big data" analysis, state-sponsored security exploits, and the role of technology in government. For one thing, my work involves a lot of the same technology: deep packet inspection, data analysis, machine learning and even writing experimental malware. However, instead of building tools that enable pervasive government surveillance, I've built a product that tells mobile smartphone users if their device, or a laptop connected to it, has been infected with malware, been commandeered into a botnet, or come under attack from a malicious website, and so on.  I'm happy to be working on applying some of this technology in a way that actually benefits regular people. It feels much more on the "good side" of technology than on the bad side we've been hearing so much about lately.

Surveillance of course has been in the news a lot lately, so we're all familiar with the massive betrayal of democratic principles by governments, under the guise of hunting the bogeyman. It's good that people are having conversations about reforming it, but don't expect the Titanic to turn around suddenly. There's far too much money and too many careers on the line to just shut down the leviathan of pervasive surveillance overnight. It will take time, and a new generation of more secure networking technologies.

Big data has also been in the news in some interesting ways: big data analysis has been changing the way baseball is played! CBC's David Common presents the story [1]:

http://www.cbc.ca/news/world/how-the-defensive-shift-and-big-data-are-changing-baseball-1.2739619

Not everyone is happy with the "defensive shift" - the process of repositioning outfield players based on batting stats that tell coaches how likely a batter is to hit left or right, short or long.  Longtime fans feel it takes away from the human element of the game and turns it into more of a science experiment.

I tend to agree.  And to be honest, until now deep traffic inspection, big data analysis, surveillance, and definitely state-sponsored hacking, have quite justifiably earned a reputation as, well, repugnant to any freedom-loving, democracy-living, brain-having person. Nevertheless, as powerful as big data analytics, machine learning, and network traffic analysis are, and as much as they have been woefully abused by our own governments, I don't think we've yet begun to see the potential for good that these technologies could have, particularly if they are applied in reverse to the way they're being used now.

Right now we're in a position where a few privileged, state-sponsored bad actors are abusing their position of trust and authority to turn the lens of surveillance and data analysis upon ordinary people, foreign business competitors[2], jilted lovers [3], etc.  The sea change that will, I think, eventually come is when the lens of technology slowly turns with relentless inevitability onto the government itself, and we have the people observing and monitoring and analyzing the effectiveness of our elected officials and public servants and their organizations.

How do we begin to turn the tables on surveillance?

Secure Protocols

As I see it, this "defensive shift" will happen due to several factors. First, because the best and brightest engineers - the ones who design the inner workings of the Internet and write the open-source software used for secure computing - are on the whole smart enough to know that pervasive surveillance is an attack and a design flaw [4], are calling for it to be fixed in future versions of Internet protocols [5], and are already working on fixing some of the known exploits [6].

One of the simplest remedial actions available right now for pervasive surveillance attacks is HTTPS, with initiatives like HTTPS Now[9] showing which web sites follow good security practices, and tools like HTTPS Everywhere[10], a plugin for your web browser that helps you connect to websites securely. There is still work to be done in this area, as man-in-the-middle attacks and compromised cryptographic keys are widespread at this point - a problem for which perfect forward secrecy[11] needs to become ubiquitous. We should expect future generations of networking protocols to be based on these security best practices.

Some people say that creating a system that is totally secure against all kinds of surveillance, including lawful intercept, will only give bad people more opportunity to plan and carry out their dirty deeds.  But this turns out not to be true when you look at the actual data of how much information has been collected, how much it all costs, and how effective it's actually been.  It yields practically nothing useful and is almost always a "close the barn door, the horse is out!" scenario. This, coming from an engineer who actually works in the area of network-based threat analysis, by the way.

Open Data

Second, the open data movement. Its not just you and I who are producing data-trails as we mobe and surf and twit around the Interwebs.  There's a lot of data locked up in government systems, too.  If you live in a democracy, who owns that data? We do. It's ours. More and more of it is being made available online, in formats that can be used for computerized data analysis.  Sites like the Center for Responsive Politics' Open Secrets Database [8], for example, shed a light on money in politics, showing who's lobbying for what, how much money they're giving, and who's accepting the bribes, er, donations.

One nascent experiment in the area of government open data analysis is AnalyzeThe.US, a site that let's you play with a variety of public data sources to see correlations. Warning - it's possible for anyone to "prove" just about anything with enough graphs and hand-waving. For real meaningful analysis, having some background in mathematics and statistics is a definite plus, but the tool is still super fun and provides a glimpse of where things could be going in the future with open government.

Automation

Third, automation. There's still a long way to go in this area, but even the slowness and inefficiency of government will eventually give way to the relentless march of technology as more and more systems that have traditionally been mired in bureaucratic red tape become networked and automated, all producing data for analytics. Filling in paper forms for hours on end will eventually be as absurd for the government to require as it would be for buying a book from Amazon.

With further automation and data access, the ability to monitor, analyze and even take remedial action on bureaucratic inefficiencies should be in the hands of ordinary people, turning the current model of Big Brother surveillance on its head. Algorithms will be able to measure the effectiveness of our public services and national infrastructures, do statistical analysis, provide deep insight and make recommendations. The business of running a government, which today seems to be a mix of guesswork, political ideology and public relations management, will start to become less of a religion and more of a science, backed up with real data. It won't be a technocracy - but it will be leveraging technology to effectively crowd-source government.  Which is what democracy is all about, after all.


[1] http://www.cbc.ca/news/world/how-the-defensive-shift-and-big-data-are-changing-baseball-1.2739619
[2] http://www.cbc.ca/news/politics/why-would-canada-spy-on-brazil-mining-and-energy-officials-1.1931465
[3] http://www.cnn.com/2013/09/27/politics/nsa-snooping/
[4] http://tools.ietf.org/html/rfc7258
[5] http://techcrunch.com/2013/10/11/icann-w3c-call-for-end-of-us-internet-ascendancy-following-nsa-revelations/
[6] https://www.fsf.org/blogs/community/gnu-hackers-discover-hacienda-government-surveillance-and-give-us-a-way-to-fight-back
[7] AnalyzeThe.US
[8] https://www.opensecrets.org/
[9] https://www.httpsnow.org/
[10] https://www.eff.org/https-everywhere
[11] http://en.wikipedia.org/wiki/Forward_secrecy#Perfect_forward_secrecy

Thursday, August 14, 2014

Repackaging node modules for local install with npm


If you need to install an npm package for nodejs from local files, because you can't or prefer not to download everything from the  npmjs.org repo, or you don't even have a network connection, then you can't just get an npm package tarball and do `npm install <tarball>`, because it will immediately try to download all it's dependencies from the repo.

There are some existing tools and resources you can try:

  • npmbox - https://github.com/arei/npmbox
  • https://github.com/mikefrey/node-pac
  • bundle.js gist -  https://gist.github.com/jackgill/7687308
  • relevant npm issue - https://github.com/npm/npm/issues/4210

I found all of these a bit over-wrought for my taste. So if you prefer a simple DIY approach, you can simply edit the module's package.json file, and copy all of its dependencies over to the "bundledDependencies" array, and then run npm pack to build a new tarball that includes all the dependencies bundled inside.

Using `forever` as an example:
  1. make a directory and run `npm init; npm install forever` inside of it
  2. cd into the node_modules/forever directory
  3. edit the package.json file
  4. look for the dependencies property
  5. add a bundledDependencies property that's an array
  6. copy the names of all the dependency modules into the bundledDependencies array
  7. save the package.json file
  8. now run `npm pack`. It will produce a forever-<version>.tgz file that has all it's dependencies bundled in.
Update: another proposal from the github thread (I haven't verified this yet):
  1. In online environment, npm install --no-bin-link. You will have a entire flattened node_modules
  2. Then, bundle this flawless node_modules with tar / zip / rar / 7z etc
  3. In offline environment, extract the bundle, that's it


Thursday, May 29, 2014

JavaScript's Final Frontier - MIDI

JavaScript has had an amazing last few years. Node.JS has taken server-side development by storm. First person shooter games are being built using HTML and JavaScript in the browser. Natural language processing and machine learning are being implemented in minimalist JavaScript libraries. It would seem like there's no area in which JavaScript isn't set blow away preconceptions about what it can't do and become a major player.

There is, however, one area in which JavaScript - or more accurately the web stack and the engines that implement it - has only made a few tentative forays.  For me this represents a final frontier; the one area where JavaScript has yet to show that it can compete with native applications. That frontier is MIDI.

I know what you're probably thinking. Cheesy video game soundtracks on your SoundBlaster sound card. Web pages with blink tags and bad music tracks on autoplay. They represent one use case where MIDI was applied outside of its original intent. MIDI was made for connecting electronic musical instruments, and it is still very much alive and well. From lighting control systems to professional recording studios to GarageBand, MIDI is a key component of arts performance and production. MIDI connects sequencers, hardware, software synthesizers and drum machines to create the music many people listen to everyday. The specification, though aging, shows no signs of going away anytime soon. It's simple and effective and well crafted.

It had to be. Of all applications, music could be the most demanding. That's because in most applications, even realtime ones, the exact timing of event processing is flexible within certain limits. Interactive web applications can tolerate latency on their network connections. 3D video games can scale down their frames per second and still provide a decent user experience. At 30 frames per second, the illusion of continuous motion is approximated. The human ear, on the other hand, is capable of detecting delays as small as 6 milliseconds. For a musican, latency of 20ms between striking a key and hearing a sound, would be a show-stopper. Accurate timing is essential for music performance and production.

There's been a lot of interest and some amazing demos of Web Audio API functionality.  The Web MIDI API, on the other hand, hasn't gotten much support.  Support for Web MIDI has landed in Chrome Canary, but that's it for now.  A few people have begun to look at the possibility of adding support for it in Firefox.  Until the Web MIDI API is widely supported, interested people will have to make due with the JazzSoft midi plugin and Chris Wilson's Web MIDI API shim.

I remain hopeful that support for this API will grow, because it will open up doors for some truly great new creative and artistic initiatives.

Wednesday, May 7, 2014

REST API Best Practices 3: Partial Updates - PATCH vs PUT

This post is a continuation of REST API Best Practices 2: HTTP and CRUD, and deals with the question of partial updates.

REST purists insist that PATCH is the only "correct" way to perform partial updates [1], but it hasn't reached "best-practice" status just yet, for a number of reasons.

Pragmatists, on the other hand, are concerned with building mobile back-ends and APIs that simply work and are easy to use, even if that means using PUT to perform partial updates [2].

The problems with using PATCH for partial updates are manifold:
  1. Support for PATCH in browsers, servers and web application frameworks is not universal. IE8, PHP, Tomcat, django, and lots of other software has missing or flaky support for it. So depending on your technology stack and users, it might not even be a valid option for you.
  2. Using the PATCH method correctly requires clients to submit a document describing the differences between the new and original documents, like a diff file, rather than a straightforward list of modified properties. This means the client has to do a lot of extra work - keep a copy of the original resource, compare it to the modified resource, create a "diff" between the two, compose some type of document showing the differences, and send it to the server. The server also has more work to apply the diff file. 
  3. There's no specification that says how the changes in the diff file should be formatted or what it should contain, exactly. The RFC simply says:
    "With PATCH, however, the enclosed entity contains a set of instructions describing how a resource currently residing on the origin server should be modified to produce a new version."
    One early recommendation for using PATCH is the JSON Patch RFC [3]. Unfortunately, the spec overly complicates updating. I describe a much simpler alternative below, which works with either PATCH or PUT.

Pragmatic partial updates with PUT

Using PUT for partial updates is pretty simple, even if it doesn't conform strictly to the concept of Representational State Transfer.  So a fair number of programmers happily use it to implement partial updates on back-end mobile API servers. It's fair to say that when developing an API, a pragmatic approach that focuses on the needs of mobile client applications is completely reasonable.

Current "best practices" when using PUT for partial updates, as I see it, is this: When you PUT the update:
  1. Include the properties to be updated, with their new values
  2. Don't include properties that are not to be updated
  3. Set properties to be 'deleted' to null
The reality is that most data is going to be stored in a database that has an implicit or explicit schema that describes what sort of data your application is expecting. If you're using a relational database, this will end up being columns in your database tables, some of whose values may be null. In this scenario it makes perfect sense to "delete" properties by setting them null, since the database columns are not going to disappear in any case. And for those who use a NoSQL database, its not a stretch to delete nullified properties.

Update: This pragmatic approach to updates is used by a number of exemplary SaaS companies, including Github. It can also be used with the HTTP PATCH method, and it has now been formalized in RFC 7386 JSON Merge Patch [4].

Further reading

1. http://williamdurand.fr/2014/02/14/please-do-not-patch-like-an-idiot/
2. http://techblog.appnexus.com/2012/on-restful-api-standards-just-be-cool-11-rules-for-practical-api-development-part-1-of-2/
3. http://tools.ietf.org/html/draft-ietf-appsawg-json-patch-07
4. https://tools.ietf.org/html/rfc7386 

Monday, April 7, 2014

REST API Best Practices 2: HTTP and CRUD

This post expands a bit further on the REST API Cheat Sheet regarding HTTP operations for Create / Read / Update / Delete functionality in REST APIs.

APIs for data access and management are typically concerned with four actions (the so-called CRUD operations):
  • Create - the ability to create a resource
  • Read - the ability to retrieve a resource
  • Update - the ability to modify a resource
  • Delete - the ability to remove a resource

CRUD operations don't have a perfect, 1-to-1 mapping to HTTP methods, which has led to different opinions and implementations, but the following list represents best practice as I see it in the industry today, and follows the HTTP specification:

CRUD Operation    HTTP Method
CreatePOST
ReadGET
UpdatePUT and/or PATCH
DeleteDELETE

To reiterate, HTTP methods can be used to implement CRUD oprations as follows:
  • POST - create a resource
  • GET - retrieve a resource
  • PUT - update a resource (by replacing it with a new version)*
  • PATCH - update part of a resource (if available and appropriate)*
  • DELETE - remove a resource

Although PATCH is considered the officially correct and "RESTful" way to do partial updates, it has yet to gain wide adoption. Many popular web application frameworks don't support the PATCH method yet, so in practice, it is not uncommon to use PUT for partial updates even though its not strictly "RESTful". The decision to use PUT vs. PATCH for partial updates is driven by the capabilities of your framework of choice (Rails only recently introduced PATCH, for example) and by the practical requirements of building web/mobile back-end services that actually work and are easy to use, even if they don't satisfy REST purists. More on this in the next post.

 

Safe and Idempotent Methods

 

The HTTP 1.1 specification defines "safe" and "idempotent" methods [1].  Safe methods don't modify data on the server no matter how many times you call them. Idempotent methods can modify data on the server the first time you call them, but repeating the same call over and over again won't make any difference. Here's a partial list:

Method    Safe    Idempotent
GET
HEAD
PUT×
PATCH×
DELETE×
POST××

The safe and/or idempotent nature of these HTTP methods provides some further insight into how they ought to be used. Notice that POST is neither safe, nor idempotent. A successful POST should create new data on the server, and repeating the same call should create even more copies on the server. GET, on the other hand, is safe and idempotent, so no matter how many times you call it, the data on the server shouldn't be affected.

GET - use it to fetch resources, but don't "tunnel" request parameters through to the server as a way to alter the state of data on the server - as a "safe" method, calling GET shouldn't have side effects.

PUT - use it to update an existing resource by replacing it with a new representation. The data you PUT to the server should be a complete replacement for the specified resource. Although PUT can in theory be used to insert new resources, in practice it's not advisable. Note that after the first PUT request, repeatedly calling the same PUT method with the same data won't change the data on the server more than it already has been (a condition of idempotent methods).

PATCH - if this method is available and well supported in both your client and server side technology stack (ie. Rails 4), consider using it to update part of an existing resource by changing some of it's properties, following the recommendations of the framework for how to submit the change descriptions. The PATCH method isn't supported everywhere and not common enough to be considered a current best practice, but the industry seems to be moving this way and technically it's the correct way to provide partial updates according to the HTTP spec [2].

If your server, framework or client user base (IE8, etc) doesn't support PATCH, rest assured that many developers take the pragmatic approach and simply bend the rules to use PUT for partial updates [3]. I'll cover this in the next post in more detail. Note that, no matter how you do your partial update, it should be atomic, that is once the update has started, it should not be possible to retrieve a copy of the resource until the update has been fully applied.

POST - use it to create new resources. The server should create a unique identifier for each newly created resource. Return a 201 Created response if the request was successful. The unique ID should be returned in the response; it has been suggested to use the Location header of the response for this, but for most client applications it will be more practical to return the ID in the body of the response. For this reason, best practice currently appears to be to populate both the Location header with the URL of the newly created resource, and also return a representation of the resource in the response body that includes it's id and/or URL. POST is also frequently used to trigger actions on the server which technically aren't part of RESTful API, but provide useful functionality for web applications.

DELETE - use it to delete resources; it's pretty self-explanatory.

More posts in this series

REST API Best Practices 1: A REST Cheat Sheet
REST API Best Practices 3: Partial Updates - PATCH vs. PUT
REST API Best Practices 4: Collections, Resources and Identifiers


[1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
[2] http://stackoverflow.com/questions/19732423/why-isnt-http-put-allowed-to-do-partial-updates-in-a-rest-api
[3] http://techblog.appnexus.com/2012/on-restful-api-standards-just-be-cool-11-rules-for-practical-api-development-part-1-of-2/  

Friday, March 21, 2014

REST API Best Practices: a REST Cheat Sheet

I'm interested in REST API design and identifying the best practices for it. Surprisingly, a lot of APIs that claim to be RESTful, aren't. And the others all do things differently. This is a popular area, though, and some best practices are starting to emerge.  If you're interested in REST, I'd like to hear your thoughts about best practices.

REST is not simply JSON over HTTP,  but most RESTful APIs are based on HTTP. Request methods like POST, GET, PUT and DELETE are used to implement Create, Read, Update and Delete (CRUD) operations. The first question is how to map HTTP methods to CRUD operations.

To start, here's a "REST API Design Cheat Sheet" that I typed up and pinned to my wall. Its based on the book "REST API Design Rulebook", and the HTTP RFC. I think it reflects standard practice. There are newer and better books on the subject now, but this list covers the basics of HTTP requests and response codes used in REST APIs.

Request Methods

  • GET and POST should not be used in place of other request methods
  • GET is used to retrieve a representation of a resource
  • HEAD is used to retrieve response headers
  • PUT is used to insert or update a stored resource
  • POST is used to create a new resource in a collection
  • DELETE is used to remove a resource

Response Status Codes

  • 200 "OK" indicates general success
  • 200 "OK" shouldn't be used to return error messages
  • 201 "Created" indicates a resource was successfully created
  • 202 "Accepted" indicates that an asynch operation was started
  • 204 "No Content" indicates success but with an intentionally empty response body
  • 301 "Moved Permanently" is used for relocated resources
  • 303 "See Other" tells the client to query a different URI
  • 304 "Not Modified" is used to save bandwidth
  • 307 "Temporary Redirect" means resubmit the query to a different URI
  • 400 "Bad Request" indicates a general failure
  • 401 "Unauthorized" indicates bad credentials
  • 403 "Forbidden" denies access regardless of authentication
  • 404 "Not Found" means the URI doesn't map to a resource
  • 405 "Method Not Allowed" means the HTTP method isn't supported
  • 406 "Not Acceptable" indicates the requested format isn't available
  • 409 "Conflict" indicates a problem with the state of the resource
  • 412 "Precondition Failed" is used for conditional operations
  • 415 "Unsupported Media Type" means the type of payload can't be processed
  • 500 "Internal Server Error" indicates an API malfunction
A note about the PATCH method. There are good reasons to consider using the HTTP PATCH method for partial updates of resources, but because it's not supported everywhere, and because there are workarounds, I haven't added it to my cheat sheet yet.

Other Posts in this series

REST API Best Practices 2: HTTP and CRUD
REST API Best Practices 3: Partial Updates - PATCH vs. PUT
REST API Best Practices 4: Collections, Resources and Identifiers

Wednesday, March 19, 2014

Tuesday, March 11, 2014

When Agile Went Off the Rails


Whenever I hear a company say "We follow an agile development process", I can't help but wince a little. The core ideas of agile development are excellent, but somewhere along the way it accumulated quite a lot of codified process, and became its own formal methodology - almost the same thing the Agile Manifesto was trying to counteract. It's not too surprising, since the agile manifesto didn't prescribe any particular project management methodology for implementing its guidelines. So naturally it wasn't long before management professionals began to formalize agile philosophy into a methodology of their own.

Now one of the original authors of the Agile Manifesto has come out with a piece, originally titled "Time to Kill Agile", in which he makes this point that a formal methodology runs counter to the original goals of the agile development concept. Dave Thomas has been hugely influential in the software development field. Aside from being one of the authors of the agile manifesto, he's written a lot of other stuff, and he's the guy who coined the phrases "code kata" and "DRY" (Don't Repeat Yourself - the maxim developers follow to effectively organize their code).  He later renamed the piece "Agile Is Dead (Long Live Agility)!", which is a better reflection of his current thinking on Agile processes vs agile development's underlying goals.

Being a critic of Agile is risky; in many cases it seems to have improved the effectiveness of teams a lot. It's working for a lot of people, and they like it.

But having one of the original authors of the Agile Manifesto come out with this kind of criticism of agile methodology makes a certain amount of healthy skepticism seem appropriate.

Agile software development, according to Dave Thomas, can't be implemented as a set of methodologies, and the managers, consultants and companies that have sprung up around Agile have shown a certain level of disregard for what the authors of the Agile Manifesto intended in the first place.

Dave Thomas has some good advice for teams that want to develop software with agility. He advocates an iterative approach to development, and choosing options that enable future change. He recommends thinking of "agile" in the form of an adverb (agilely, or "with agility"). Programming with agility. Teams that execute with agility.

I've found that when it comes to managing a project, simple is usually better. What's worked best in my experience is, in a nutshell, to simply encourage communication. Make sure everyone understands the overall objective, how they can contribute to it, what progress has been made and what challenges remain, and importantly, give everyone the opportunity to have their work fully recognized and appreciated on a regular basis. Given the opportunity to work on a challenging project and the chance to have their contributions seen and appreciated by colleagues, most developers will bend over backwards to do their best.

One technique I found effective was a brief (timed with a hard stop) Monday morning meeting where we laid out the objectives for the week ahead, a quick information-gathering hike around the office at the end of the week, and an email re-cap on Friday afternoon highlighting the team's progress. Showing the percentage-towards-completion of major tasks was also a big motivator, as developers began to take pride in seeing their areas of responsibility make steady, visible progress towards completion.  We didn't formalize or get locked into one way of doing it, so when our company got acquired and our management structure changed, we adapted pretty easily.

Perhaps this isn't too far away from the way agile methodology is practiced "by the book". Regardless of the methodology, it's worth noting that the Agile Manifesto wasn't really a call to implement any particular process. It had broader goals in mind:
  • People over processes.
  • Working software over documentation.
  • Collaboration over contracts.
  • Adaptability over planning.

Saturday, February 22, 2014

Itsukushima Jinja

UNESCO World Heritage Site, Itsukushima Shrine, Hatsukaichi, Hiroshima Prefecture, Japan.

Itsukushima Jinja
Itsukushima Jinja
© 2014 Darren DeRidder. Originally uploaded by 73rhodes

Productivity and Note-taking

I told a friend of mine that I wasn't really happy with the amount of time that gets taken up by Slack and "communication and sched...