permalink

23

Faster JavaScript Memoization For Improved Application Performance

Whilst not new by any means, memoization is a useful optimization technique for caching the results of function calls such that lengthy lookups or expensive recursive computations can be minimized where possible.

The basic idea is that if you can detect an operation has already been previously completed for a specific set of input values, the stored result can instead be returned instead of repeating the operation again.

Some of the problems memoization can help optimize include: recursive mathematics, algorithmic caching of canvas animation and more generally, any problems which can be expressed as a set of calls to the same function with a combination of argument values that repeat.

 

A Brief History

 

If you're a history buff, you may be interested to know that the term 'memoization' was first coined in 1968 by a scientist involved in researching artificial intelligence named Donald Michie – it's based on the latin for the word memorable (memorandum). Michie proposed using memoized functions to improve the performance of machine learning and it's become more popular in recent years with the rise of functional languages such as Haskell and Erlang.

There's a plethora of excellent material that's been previously written about memoization and for further reading I recommend these pieces by John Hann and Nicholas Zakas. Douglas Crockford also briefly touched upon memoization in JavaScript: The Good Parts as does Stoyan Stefanov in JavaScript patterns.

 

Implementing Memoization – The Basics

 

Memoization is a concept that pre-dates the JavaScript programming language, but we can simply think of it as remembering the results for inputs of a specific value to a function.

When we say 'remember', we mean utilizing a form of cache to store a record of the computed output of such specific inputs so that if a particular function is subsequently re-queried with the same input, the remembered result is returned rather than a recalculation being necessary. A lot of the time, a simple array is employed for the cache table, but a hash or map could be just as easily used.

In most of the JavaScript implementations for memoization that I've reviewed, the memoization of a function (with a set of arguments) is conducted within an inner returned function whilst a signature or hash is employed for indexing the stored result.

One of the more easily readable memoization implementations can be seen below. This one is by zepto.js author, Thomas Fuchs. Whilst I prefer for memoization to be kept as a utility outside of the function prototype, this implementation does do a great job of clearly demonstrating the core concepts.

Function.prototype.memoize = function(){
    var self = this, cache = {};
    return function( arg ){
      if(arg in cache) {
        console.log('Cache hit for '+arg);
        return cache[arg];
      } else {
        console.log('Cache miss for '+arg);
        return cache[arg] = self( arg );
      }
    }
  }
// a hypothetical usage example

// imagine having a function, fooBar, as follows
function fooBar(a){ ... }
// once a memoizer is available, usage is as simple as 
// applying it to our function as follows:
var memoFooBar = fooBar.memoize();
// note in other implems, this would probably be 
// memoize(fooBar);
// we now use memFooBar for all calls to the original 
// function, which will now go through the memoizer instead.
memoFooBar(1); 
// the above call isn't cached, so any computations necessary
// are conducted. This result is then cached for next time.
memoFooBar(1); // will now return the cached result 
memoFooBar(2); // whilst this will need to be calculated

The approach taken to actually caching results varies more greatly. The majority of developers prefer to maintain the cache outside of the returned function whilst others (Stoyan does this too) actually add custom properties directly to the function to be memoized.

With respect to the latter, it's considered best practice by developers such as Hann to avoid interfering with a user's code where possible. If a property cache is assigned to the function instance under a particular name, one limits the code calling your memoizer from using a similarly named property on the same instance. Whilst there may be a slim chance of this overlap actually occurring, this is just something to be aware of.

That said, I spent some time this week comparing a number of different memoization implementations and you may be interested to find out how a version that does hang it's cache on a user's function performs compared to those that keep it separate.

 

A Faster Implementation?

 

Just for fun, I put together a derivative implementation of memoization based on some prior work by @philogb, which should work fine for most trivial problems.

Some of the specific improvements I made include adding object argument testing and switching out the typeof check suggested in his post comments for determining whether or not to serialize arguments passed as objects into JSON text to use the more reliable Object.prototype.toString.call(). This update also consequently covers JSON stringifying arguments that are objeccts. For more information on this, feel free to read my post from a few weeks ago covering it.

Update: Mathias Bynens and Dmitry Baranovsky have provided some further optimizations to the implementation, which I've updated most tests to now also use.

Beyond this, much of the code is self-explanatory. args is a version of the function arguments that have been coerced into a JavaScript array, hash is handling serialization as covered above, fn.memoize is our cache and the final return is performing the cache lookup, storing the result of a particular set of input values if not already present.

/*
* memoize.js
* by @philogb and @addyosmani
* with further optimizations by @mathias
* and @DmitryBaranovsk
* perf tests: http://bit.ly/q3zpG3
* Released under an MIT license.
*/
function memoize( fn ) {
    return function () {
        var args = Array.prototype.slice.call(arguments),
            hash = "",
            i = args.length;
        currentArg = null;
        while (i--) {
            currentArg = args[i];
            hash += (currentArg === Object(currentArg)) ? 
            JSON.stringify(currentArg) : currentArg;
            fn.memoize || (fn.memoize = {});
        }
        return (hash in fn.memoize) ? fn.memoize[hash] : 
        fn.memoize[hash] = fn.apply(this, args);
    };
}

Now you'll notice that this implementation is making use of custom function properties, which I've previously said should be avoided if possible. So, why have I opted to show you this implementation instead of re-writing it to follow best practices?

Well, I collected together what I considered were some of the more robust memoization implementations I could find (including two from Fuchs and Hann) and when I benchmarked them on jsPerf using a common memoization problem, the above code (interestingly) outperformed the others by a respectable factor.

I was very surprised by this myself and double-checked the tests, had others review them and tested the output, but it would at the moment appear that this is one of the better memoization implementations available.

Now, I say *appear* as although the jsPerf test results favor this solution, my own offline tests conclude that at least 2 of the other implementations almost match it in performance.

Let us now review how memoization can be applied to optimizing a problem.

 

Benchmarking

 

The Fibonacci Series

 

Conventionally, many of the implementations of memoization you'll find have used the well-known Fibonacci series to demonstrate how it can help with recursive solutions to problems.

In case you're a little rusty on your high-school math, the Fibonacci series looks a little like this: 1,1,2,3,5,8,13,21,34.. etc.

Here, each number after the second, is the sum of the two numbers that appeared before it. Eg. 3 (let's call this 'x') = x-1 (ie. 2) + x-2 (ie. 1). A classic fibonacci problem might therefore be to discover what the 20th number in the series is.

Below you can see an implementation of a base function for calculating Fibonacci numbers that relies on recursion in order to generate the correct result.
 

function fib( x ) {
    if(x < 2) return 1; else return fib(x-1) + fib(x-2);
}

/*console profiling test*/
var fibTest = memoize(fib);
//first call
console.time("non-memoized call");
console.log(fibTest(20)); //varies (eg.10946)
console.timeEnd("non-memoized call");

//let's see how much better memoization fares..
console.time("memoized call");
console.log(fibTest(20)); 
//0ms in most cases (if already cached)
console.timeEnd("memoized call");

 

jsPerf Testing

 

In the jsPerf test, a similar setup was used for comparing the performance of the implementation presented earlier with five others (including the memoization implementation found in Underscore.js). As you can see by the BrowserScope results listed, the performance difference (for this particular test) was favorable towards our implementation, particularly in Chrome and Firefox.

Note 1: If you're interested in comparing the performance of the actual cached value being accessed and used rather than just returned, tests for this can be found here (and also favor the newer implementation in many cases).

Note 2: Whilst not compared in this test, I previously benchmarked the effectiveness of all implementations vs. just calling fib(N) directly and the average improvement in lookup time was approximately 70%. Later in the post we'll take a look at this more closely.

If we continue working under the assumption that this is (for now) considered to be one of the more optimal approach to memoization in the tested set of implementations, the next obvious questions are why is this and is there anything better?.

 

Why is this faster?

 

I was genuinely curious as to why this particular implementation was being reported as performing better than the alternatives, so let's try exploring why.

 

Custom props on function instance vs. separate variables

 

My initial theory was that maintaining the cache as a custom property on the function instance (as opposed to within a separate variable) may be resulting in more optimal property lookup times. If so, this would be interesting as I would have guessed that given both the instance and the external variable are objects, the difference in lookup times would be negligible. We can of course benchmark this theory just to see whether there is indeed a noticeable difference (jsPerf test here).

Curiously, in most browsers it would appear that the converse is in fact true. Using a separate variable performs more favorably in a simple property lookup test in both Chrome, Firefox and Opera.

 

Let's review the code once more

 

Reverting to @philogb's variation without the additional type checks, we can see that the bulk of the memoization logic lies in the final ternary expression being evaluated near the end of the returned function:

 function memoize( f ) {
  return function () {
      var args = Array.prototype.slice.call(arguments);

      //we've confirmed this isn't really influencing
      //speed positively
      f.memoize = f.memoize || {};

      //this is the section we're interested in
      return (args in f.memoize)? f.memo[args] :
                     f.memoize[args] = f.apply(this, args);
  };
}

Whilst I won't be reproducing all of the implementations this is benchmarked against in this post, you're free to review them in the JavaScript prep section of this jsPerf test. One thing you'll notice is that the logic used for memoization itself is significantly more complex in many of the alternative solutions, with quite a few opting for additional conditional checks, in some cases unnecessarily.

To the best of my knowledge, this is the reason why this version performs better. It's simple and get's the job done.

 

 

Is there something that performs more optimally?

 

In my opinion, maintaining type-checking to determine the need for argument stringification is quite important. If however, you feel this is over-kill, a simpler solution that:

  1. Does away with type-checking
  2. Declares the function to be cached within the memoization implementation itself (as opposed to passing it in)
  3. Only accepts one argument as opposed to many (which may work fine for some situations and should for our Fibonacci test)

is this version from Stoyan's 2009 JSConf talk:

  function memoize( param ){
      if (!memoize.cache) {
          memoize.cache = {};
      }
      if (!memoize.cache[param]) {
          var result = fib(param); //custom function
          memoize.cache[param] = result;
      }
      return memoize.cache[param];
  }

The jsPerf test comparing my version to the above can be found here. As you can see from the below results, Stoyan's implementation performs better in all browsers being tested as a result of these differences:

You may also be interested in taking a look at the tests comparing directly calling fib(N) in succession (blue) as opposed to using either our (Memoize2 – red) or Stoyan's (Memoize7 – orange) solutions:

If a simple, compact implementation is what you're after, you may be interested in two variations of memoize.js that I've included below.

The first includes the minimal code required for memoization (cache + memoizer expression) using custom function properties. The second makes use of an external cache that is passed through to the memoizer, in case inline property definition isn't quite your thing.

In both cases, as you can see in the jsPerf comparison chart, a roughly equivalent level of performance is possible regardless of implementation. This leaves you free to select the implementation style you feel most comfortable with, but remember that these compact versions are only recommended for simple single-argument scenarios. For anything more complex, I recommend the complete implementation presented earlier.

// compact memoizer
function memoize1(param) {
   this.memoize = this.memoize || {};
   return (param in this.memoize) ? 
   this.memoize[param] : this.memoize[param] = fib(param);
}

// usage: memoize1(25);


// using an external cache
var memCache = {};
function memoize2(param, cache) {
   cache = cache || {};
   return (param in cache) ? cache[param] : 
   cache[param] = fib(param);
}

// usage: memoize2(25, memCache)


// stoyan's memoizer
function memoize3(param) {
   if (!memoize3.cache) {
       memoize3.cache = {};
   }
   if (!memoize3.cache[param]) {
       var result = fib(param); //custom function
       memoize3.cache[param] = result;
   }
   return memoize3.cache[param];
}

// usage: memoize3(25)

The jsPerf test for the below can be found here.

 

Practical Applications: Canvas

 

You'll often read about memoization being used to optimize complex algorithms, but they can also be used for visual applications with simpler routines such as those you might find in a lot of the Chrome canvas experiments.

Here's an example: Andrew Hoyer created an interactive canvas-based cloth simulation a while back that calculates reactionary point-data related to particular points in the cloth a user has clicked.

Every line in the cloth simulation is called a constraint and every point is a point mass (an object with no dimension, just location and mass). All the constraints do is control the distance between each point mass.

Rather than having to recalculate point-data if a user clicks on the same location twice, I've modified his demo to make use of memoization to cache calculations that have been previously made.

Clicking on a single point once caches the calculations for it, whilst clicking on it a second time (try double-clicking) will load from the memoized cache instead. A simple logger will let you know whether the cache has been hit or not lower down in the demo.

 

Memoization Pros

 

The benefits memoization has to offer are reduced (overall) execution time and potential improvements to overall performance, depending on what you're trying to achieve. It's particularly useful where you've detected levels of recursion in your application (eg. you're repeating the same computations as part of a larger operation or you're making some of the same repeated XHR calls regularly).

In general it works best when the result of passing a known set of parameters and values to a function will always result in the same value (eg. if the result returned by f(3,4) == f(3,4)). It can't however be applied to cases where f(3,4) != f(3,4). I've also personally found it quite helpful when working with recursive functions for drawing complex shapes (such as trees and fractals) to canvas, so it's utility does go beyond applications which may not be that visual or universally exciting.

 

Cons

 

With this in mind, memoization isn't appropriate if the method's output changes over time, is unlikely to be called again with the exact same set of parameters or is simplistic enough that there isn't much to be gained from using memoization as it comes with an overhead. A trivial, although silly, example of where memoization isn't appropriate is fetching stock market data as it regularly gets updated. Fetching a group of friends basic profile data however would make more sense with this technique as it's unlikely to change during a user's session.

 

Conclusions

 

Use of memoization generally needs to factor in what you're trying to do, the memory currently being consumed by recursive/iterative functions etc. You'll find that the more likely a task is to be repeated in your application, the greater potential memoization has for being of benefit. In closing, remember that memoization is both powerful, but also needs to be used carefully.

It can help cache the results of heavy queries quite well but this comes at the cost of of possibly storing large datasets (ie. you're trading speed for memory).

23 Comments

  1. Addy, there’s one serious flaw with your “Comparison of memoization” test.
    Look closely at this code:

    function fib(x) {
    if(x < 2) return 1; else return fib(x-1) + fib(x-2);
    }
    var fibTest1 = memoize1(fib);
    console.log(fibTest1(20),'test1-norm');

    This code will run fib function through memoizer only once.. all the following recursive calls call fib directly.

    I was dragged to this article because just few days ago I worked on memoize function on my side. However I wanted to have generic one that works for any type of arguments and for various functions that can take different number of arguments. Result is: https://github.com/medikoo/es5-ext/blob/master/li

    I’ve fixed your test and added mine version to it: http://jsperf.com/comparison-of-memoization-imple… on Chrome and FF it shows as 3/4th in performance. Considering fact that unlike others it’s completely generic makes me quite happy :)

    • Hey @medikoo. Thanks for reporting this!. I updated the post to reflect these changes yesterday and I've recalibrated all tests to ensure that everything being presented *should* now be accurate.

      Thankfully in most cases the original hypothesis of this newer implementation performing a little better still hold, but I liked your take on a more generic memoizer! Great work :)

      I'm currently working on expanding the tests to cover a few more access scenarios as those have been requested on twitter.

  2. hi addy, great overview
    memoisation is also an excellent way to avoid repeated requests to the dom. GetElementById has a performance costs, and walking through the dom like sizzle does in IE6/7/8 is extremely expensive performance wise. (yes IE is crappy, but it is the worst browsers who require more performance efficient js)
    Of course, query results can be cached "manually", but for large web apps, with many devs, one might not know if an element/element set has already been requested in another part of the app. A memoiser is then a great generic way to build a cache.
    The danger however, is to have this cache growing with every cycle of actions in the app. Then the memoiser ends up being a huge memory leak. Therefore, specific attention must be given to emptying the cache from unnecessary data

    • Thanks for your thoughts on this, @olivvv. I think the last point you make is particularly important to remember. Unmemoization (removing unnecessary data from the cache) is equally as important and it's something I hope to factor in as an extension to this post at some point.

  3. Pingback: Elegant D » Faster JavaScript Memoization For Improved Application Performance

  4. Great article dude! Going to go right through the app I'm working on at the dayjob and see where we can make performance improvements with this.

    Btw – your tests indicated that underscore's memoize implementation is slow by comparison to the implementation you tested – do you still recommend using `underscore.memoize` anyway, given that the app already uses underscore quite heavily?

    Cheers!

    • My two cents are that if you're already using underscore, it may not make sense to drop in yet another piece of code, however if it's not already a part of your stack and you're solely looking for a memoization implementation, you may want to opt for either my implementation or Thomas Fuchs'. Hope that helps!

    • Nice! I actually wanted to factor in time for accessing the result into the tests (which could have probably just been done via a basic variable assignment), but I went for console.log instead. Thanks for sharing this. If I have time, I'll update the post to use this version of the perf test instead :)

  5. I little bit more optimisations can be made by getting rid of toString and friends with Object constructor. I mean replacing this:

    callEm == toString.call(currentArg)

    with this

    currentArg == Object(currentArg)

    Great read anyways. Thank you.

  6. Pingback: Link-urile săptămânii 12-25 Septembrie | Staicu Ionuţ-Bogdan

  7. Hi,

    Nice article, but i think you should put code "fn.memoize || (fn.memoize = {});" at line 19 outside the loop.

    There are two reason for doing that.

    First, with more than one argument it's going to do this check for every argument.
    Second, if first call going to be without arguments, this function will break, since fn.memoize will be undefined.

    Cheers,
    Maks

  8. A very quick update: For anyone wishing to use the memoizer at a production level, we now have a whole set of new improvements and optimizations which have been added to it. We also now have tests included and documentation (outside of this post) is on the way. To follow our progress, checkout the project repo: https://github.com/addyosmani/memoize.js

  9. It seems to me that all the tests you have on jsperf share a simple flaw. You are calling `memoize` in the benchmarking loop.

    While it might be interesting to know how quickly `memoize` runs, presumably the important thing is how well the memoized function runs. Your tests conflate these two notions. Because running fib(20) is recursive, this might not seem like a major issue, but I think it is. There should be twenty calls to `fib` and one to `memoize` in each run of your tests. Since a call to `fib` involves one comparison, a return, and sometimes or two lookups (now that it's properly memoized) and an addition, there is not that much overhead in the actual call. But some of the `memoize` functions might have more overhead to them.

    When I try to rearrange this so that only the memoized function is being called in the loop, I get entirely different performance characteristics:
    http://jsperf.com/another-memoization-comparison

    Am I missing something important here?

    • Hey Scott,

      Reviewing your notes, I maintain that these tests aren't flawed. We're purposefully benchmarking the combined memoization and lookup times as a part of this experiment. You have highlighted however a very interesting (and important) point – from the perspective of the latter, it's just as critical that one analyze the lookup times independently to gauge just how well this part of the process performs. Your jsPerf test indicates the Fuchs implementation to be the fastest here and we have no problem admitting that when it comes to the look-up phase, it is. I guess it boils down to where someone is hitting pain-points in their application, the actual memoization phase or (arguably, depending on what you're doing – the more important lookup phase). I'd advise others to checkout your test just to be aware of all the gotchas.

      Cheers!

      Addy

  10. Hi,

    In the “Faster Implementation” currentArg is implicitely declared, there is a “;” at the end of “i = args.length;” which make currentArg global

    I guess it should be in var block, correct ?

    JB

  11. I think another downside to using JSON.stringify as your caching function is circular references. You won’t be able to cache a ‘window’ object, for example.

Leave a Reply

Required fields are marked *.