Memoization - is that even a word?


Photo of my chubby face smiling to you

Piotr Staniów • Posted on 28 Apr 20208 mins

#JavaScript

This article is a part of Learning by implementing series.

I've learnt it the hard way that, as it happens with many tools, memoization can be both a blessing and a curse. Using adequate memoization techniques can vastly boost the performance of your applications but when used improperly, it may also become a source of slowdowns.

We are often caught unaware by common pitfalls of memoization and some popular libraries will actually lure you into them. With that in mind, I've decided to write a few words — quite a few actually — about what memoization is and how to implement it. We'll review a full spectrum of options ranging from the most basic ones, frequently implemented in libraries, to more advanced ones, which all of course have their trade-offs.

What is function memoization?

Memoization is a performance optimisation technique, reducing time required to get a result of (computationally) expensive function. The technique allows to evade performing a computation, when a function is called with the same set of arguments as previously.

There are two important takeaways here. Firstly, the memoized function should be rather costly to run (either on CPU or on memory), otherwise the memoizing function could actually worsen it. Secondly, to actually benefit from it, the same arguments should be likely to be provided to the function more than once, yet not necessarily in a row.

The basic idea here is that we will be striving to implement memoize function that takes another function as an argument and returns a memoized variant of it.

That's a good moment to take a pause, and ask yourself — have you ever heard of thunks? They have recently gained some popularity, especially amongst React developers and that idea will be also useful here, so let's first answer…

What is a thunk?

Simply speaking, a thunk is a function returned by another function. Thunks are usually created to delay call to a function, optionally with inserting some code before or after the function. It may be used as well when not all arguments are available immediately.

Let's get our hands dirty with writing a few thunks:

function thunkFactory() {
    return function thunk() {
        console.log('I\'ve called a thunk and I liked it!');
    };
}

thunkFactory()(); // One way of calling a thunk
const aThunk = thunkFactory();
aThunk(); // And another way of calling a thunk

Notice that we have written a thunk and a thunk factory, which are both functions. To invoke the thunk, we need to first create it hence the doubled parentheses.

This pattern may also be leveraged when not all arguments are available at hand. Instead of invoking the function with all of them at once, we can gradually pass them as they arrive. Let's imagine, that we have an API that responds with a number and we want to increase the number by some factor that is only known in a completely different part of the project:

function sum(a) {
    return function thunk(b) {
        return a + b;
    };
}
const sumWith5 = sum(5);

// Let's print whatever API returns increased by 5!
fetch('/api')
    .then(sumWith5)
    .then(console.log);

How to implement simple function memoization?

In this part, we will combine our knowledge about memoization and thunks. Our goal here is to write a memoize function that fits into this pattern:

function memoize(functionToMemoize) {
    return function memoizedFunction() { // A thunk
        functionToMemoize();
    }
}

The goal here, however, is to ensure that functionToMemoize will be called as rarely as it is possible because it's a computationally expensive function.

A somewhat naive implementation, that we will start with, may assume that the function takes only a single argument (we name such functions unary). For the sake of simplicity, we will also only store the previous argument it was called with, and the previous result of a call.

function memoize(functionToMemoize) {
    let previousArg;
    let cachedValue;

    return function memoizedFunction(arg0) {
        // If functionToMemoize was called previously
        if (previousArg === arg0) {
            // Return previously returned value without the actual call
            return cachedValue;
        }
        // Otherwise do the expensive call
        const nextResult = functionToMemoize(arg0);
        // And store both argument and the result for further usage
        previousArg = arg0;
        cachedValue = nextResult;
        // Finally, return the retrieved value
        return nextResult;
    };
}

There are three main features of the memoize function we have written above:

  • The function to memoize is unary (receiving only a single argument).
  • Only a single call is memoized — when the argument changes, the previous result is lost (what a waste!).
  • Shallow comparison is performed, so calling the function with two objects of identical shape but created separately (of different references) results in calling the expensive function twice.

As we progress through this article, the memoize function will be tweaked and tuned, so that we actually get to a far more potent variant. Our goal is to get a version of it that:

  • Memoizes functions of any arity (those taking any number of arguments)
  • Remembers any number of calls or a fixed number of calls
  • Performs deep equality check so that we recognize correctly two objects are actually equivalent

With these goals in mind lets reflect on…

Building blocks of memoization

As we iterate over all those implementations, we will preserve a few following concepts behind that are going to prevail in all of them:

  • Cache — a place to store previously used arguments and corresponding values, that enables storing more than one call
  • Serializer — a function that takes fingerprints off objects, so that we correctly recognize two equivalent objects,
  • Strategy — that is the actual implementation of memoize, that combines a cache and a serializer in a certain way

Comparing two objects

In the previous implementation we haven't used any serializer. Therefore, we're only able to cache a single previous response, when exactly the same object is passed — with the same reference as previously. Nonetheless, this is not always the case and in fact, that's a rather rare scenario. To address that issue, we may modify our memoize function by adding a serializer, like the well-known JSON.stringify that serves this purpose just great:

const serializer = JSON.stringify;

function memoize(functionToMemoize) {
    let previousArg;
    let cachedValue;

    return function memoizedFunction(arg0) {
        // We will cache and compare a serialized argument
        const serializedArg0 = serializer(arg0);
        if (previousArg === serializedArg0) {
            return cachedValue;
        }
        const nextResult = functionToMemoize(arg0);
        previousArg = serializedArg0;
        cachedValue = nextResult;
        return nextResult;
    };
}

This way allows us to peek into the internal structure of an object. It is also relatively fast method, since it's implemented in the browser's native code. The trade-off here though is that JSON.stringify brings significant overhead for primitive values that don't require serialization.

"

A key thing to note here is that JSON.stringify is not sensitive to keys order. It means that { a: 1, b: 2 } and { b: 2, a: 1 } are considered two completely different objects, because their respective serialized strings are different. This can be worked around by sorting keys in object, or using an alternative library — for instance the JSON stable stringify.

When to use memoization?

Memoization comes at a cost. We're building here some wrappers certainly adding some complexity to the function. We're using a cache which, provided that we cache all calls, may grow uncontrollably. It all depends on the use case — even a trivial function may be memoized with benefits, if that's a Redux selector in React and it triggers a costly rerender of a complex component.

Nonetheless, a rule of thumb here is to memoize functions that are expensive to run, and carry out heavy computations. This also applies to a function implementing an algorithm that is running few operations but is heavy on memory.

Another requirement is that the function has to be a pure function. It means, that it should be deterministic with respect to its input, returning always the same values for the same arguments. It also means, that it has no side effects, like setting a variable outside of it — you can think of pure functions like if they were mathematical functions.

Memoization may be also useful in case of recursive computations, when the same input is frequently reused, for instance when you compute values of Fibonacci sequence.

How to memoize multiple calls of a function?

The idea here is to add a cache, which is responsible for keeping information about what was the response for given argument passed to a function. We can actually use a simple JavaScript object for that purpose, storing the serialized argument as a key to it, and the previously returned value as a value in this object.

Please bear in mind, that the cache has to be recreated with every memoize call, otherwise we would have a shared cache for all memoized functions regardless of what they do, which is probably not we aim for here. 😉

const serializer = JSON.stringify;

function memoize(functionToMemoize) {
    const cache = {}; // Re-created for every function memoized
    return function memoizedFunction(arg0) {
        const serializedArg0 = serializer(arg0);
        if (cache.hasOwnProperty(serializedArg0)) {
            return cache[serializedArg0];
        }
        const nextResult = functionToMemoize(arg0);
        cache[serializedArg0] = nextResult;
        return nextResult;
    };
}

How to memoize multiple arguments of a function?

The way of memoizing multiple arguments of a function may depend on requirements that we have for the memoize function. If we expect a somewhat slow arguments comparison, but performing in-depth lookup into objects, we can actually use our serializer on the array of arguments passed to the function:

const serializer = JSON.stringify;

function memoize(functionToMemoize) {
    const cache = {};
    return function memoizedFunction(...args) {
        const serializedArgs = serializer(args);
        if (cache.hasOwnProperty(serializedArgs)) {
            return cache[serializedArgs];
        }
        const nextResult = functionToMemoize(...args);
        cache[serializedArgs] = nextResult;
        return nextResult;
    };
}

This may however turn out to be a bit sluggish, in particular if multiple large objects are passed to the function, or if it always gets called only with primitive values. A more effective way would be to perform a shallow comparison for all arguments, to check if they match previously used arguments. Unfortunately, our cache (a plain object) is not adapted to storing any other key than a string, so we need to adapt the cache too.

One idea would be to store a list of tuples containing both a list of arguments and corresponding value. It is a naive approach, that takes long time to search but it certainly works:

const serializer = JSON.stringify;
const areArraysEqual = (arr1, arr2) => arr1.every((el, idx) => el === arr2[idx]);
// `notFound` is just some fixed value
const notFound = Symbol.for('@memoize/notFound');

function memoize(functionToMemoize) {
    const cache = {
        _calls: [],
        set(args, value) {
            this._calls.push([args, value]);
        },
        get(args) {
            const entry = this._calls.find(
                ([prevArgs]) => areArraysEqual(prevArgs, args)
            );
            return entry ? entry[1] : notFound;
        },
    };
    return function memoizedFunction(...args) {
        const previousValue = cache.get(args);
        if (previousValue !== notFound) {
            return previousValue;
        }
        const nextResult = functionToMemoize(...args);
        cache.set(args, nextResult);
        return nextResult;
    };
}

One thing that I probably need to explain myself from is use of Symbol.for — this function simply returns an unique value like no other, that is unlikely to be returned by the memoized function. We could instead use undefined or -1, provided that the function to memoize never returns undefined — which would be indistinguishable from value being not found.

A more performant solution

The cache presented above has one very serious problem — it requires searching through all previous calls to find the previous result of a call. However, the main reason we write memoize is to improve our performance, it's a performance optimisation technique after all, so we can't afford this kind of inefficiencies.

Another idea for the cache would be to create a recursively nested object, where key corresponds to one of the arguments it was called with. The top-level object would keep first arguments of all calls as keys. The value would either be the return value of the function or another cache object which, in turn, would keep second arguments of calls as keys, etc.

For a function call heavy('arg1', 'arg2', 'arg3') that returns 17, the cache structure after a first call would look like this one:

{
    "arg1": {
        "arg2": {
            "arg3": 17
        }
    }
}

Determining whether the cache structure has already encountered given arguments would be straightforward and relatively fast:

const cache = {
    _calls: {},
    get(args) {
        let lookup = this._calls;
        for (const arg of args) {
            if (!lookup.hasOwnProperty(arg)) {
                return notFound;
            }
            lookup = lookup[arg];
        }
        return lookup;
    }
}

I'm not using reduce here on purpose because it's slower 🚀

Unfortunately, this approach won't work for arguments that are objects, since the cache is an object too, and objects can't be used as keys. If you called a function twice with two different objects, they would be both converted to [Object object]. To work around this problem, we may use Map that allows storing objects and primitive values as keys.

The full implementation of the cache may look like this one:


function createCache() {
    return {
        _calls: new Map(),
        set(args, value) {
            let lookup = this._calls;
            for (let i = 0; i < args.length; i++) {
                const arg = args[i];
                const previousValue = lookup.get(arg);
                if (previousValue) { // Argument already cached
                    lookup = previousValue;
                } else if (i === args.length - 1) { // Last argument
                    lookup.set(arg, value);
                } else { // First encounter of argument at that position
                    const newLevel = new Map();
                    lookup.set(arg, newLevel);
                    lookup = newLevel;
                }
            }
        },
        get(args) {
            let lookup = this._calls;
            for (const arg of args) {
                if (!lookup.has(arg)) {
                    return notFound;
                }
                lookup = lookup.get(arg);
            }
            return lookup;
        },
    };
}

The memoize function remains the same apart from the fact that you now create a cache using createCache helper function from the above snippet.

Memory-wise optimisation

Until now, we're only focusing on avoiding as many calls of the original function as it's possible, by memoizing with respect to multiple arguments, objects and many calls. The latest cache implementation is still not ideal — it will grow out of control in size as we are calling the function more and more with new arguments. It will also prevent JavaScript's garbage collector for freeing unused resources, because cache is forever keeping a reference to objects that once were arguments to memoized function.

There are two possible improvements in that area: keeping track of how often or how recently given arguments were passed to the function, and removing those infrequent or stale. A hint here would be to experiment with LRU cache and the underlying algorithm.

Another improvement would be to use WeakMap that doesn't store a reference to an object, so it would not stop garbage collector from releasing unused objects.

However, WeakMap is kind of the opposite of an object — its keys cannot be primitive values. So using that as a data structure powering our cache would result in not being able to call the function with anything that is not an object. We could however, use a trick and have a combined nested cache structure that would use objects for primitive keys and WeakMaps for object keys. Just to give you a hint of how it may look like:

_calls = [
    {
        10: [
            { 15: null }, // fn(10, 15) -> null
            WeakMap {}
        ]
    },
    WeakMap {
        { a: 1 } => [
            { 15: 16 }, // fn({ a: 1 }, 15) -> 16
            WeakMap {}
        ],
    }
]

We will not implement those in this article however, for the sake of brevity. Likely though, with the number of checks needed and complexity of the data structure, you need a solid justification to go this far.

Other implementations of memoization

Memoization is a difficult subject — there are many implementations available in various libraries that have various limitations.

The popular lodash.memoize function is rather slow and memoizes only with respect to the first argument (however other arguments are passed to the original function). You can check the implementation here.

The createSelector function from reselect library performs only a shallow arguments comparison (which is faster than serializing) but it only keeps track of a single call. Implementation is available here.

Another variant is memoizeWith from ramda that is considered one of the slowest implementations (probably due to its functional paradigm style) and is only capable of caching by string — it requires passing a serializer to it. See implementation

There's also fast-memoize library, that claims to be the world's fastest JavaScript memoization library.

Wrap up

There are some key takeaways from this article I would like you to remember. Firstly, there is no "one suits all" solution for memoization and you should properly identify what is the memoize that you need.

It is also worth noting that memoization comes at a cost and depending on its implementation the cost may be huge. The memoization technique however, is extremely useful and no wonder it's that popular.

The ideal solution here remains, as always, to be curious, not accept things as they are, and dig deeper: you can always find the sweet spot for your functions by benchmarking them. And you can use versatile libraries that are dedicated to the sole purpose of memoization, and there are plenty of them. 😎

Stay connected

Stay up to date with recent articles — subscribe to the newsletter.
Unsubscribe at any time with a link in the email.

© Piotr Staniów's Blog 2021

Opinions are my own and do not represent the views of my current or past employers.