Varnishing JavaScript


Optimising delivery to the client is an important step in improving user experience on a website. Unfortunately, it generally falls under the domain of technical people tasked with fixing either their own problems or server problems, not normally user problems in any direct way. As such, despite tools such as YSlow it’s still all too common for people to be delivering web sites in a sub-optimal fashion in terms of delivery to users.

I’m going to be exploring some cacheability optimisations for JavaScript files using a reverse proxy called Varnish, though you could extend these to other resources as well. As a primer for what I’m talking about here, it may be a good idea to bone up on web caches and what they do.

Anyway… to the point at hand. I’m using Varnish to handle versioned URL’s for JavaScript files such that I only ever need a single copy of those files on the server but am able to get a far-future expires version out on the web purely by changing the path I use to refer to the resource. For example, let’s say I have a JavaScript resource that normally lives at http://sand.boundvariable.com.au/js/core.js. Using this Varnish setup I can get long lived versions of this file onto the web by referencing it at a URL like http://sand.boundvariable.com.au/js/v.1.core.js. Note that I change the file path instead of adding a query parameter as query parameters tend to prevent caches from keeping a copy.

The point here is to maximise the time for which particular resources are kept in client-side caches. Instead of trying to get URL’s to expire as often as I expect them to change I would instead start referencing the same resource on a new URL.

I’ve got an example setup on a Gentoo box running Varnish in front of [Cherokee](http://www.cherokee-project.com/) lighttpd. Apart from the Varnish config subs listed below, both Varnish and Cherokee lighttpd are in default setup with Varnish on port 80 proxying to Cherokee lighttpd on port 8080.

So, the Varnish config…

Here’s the receive:

sub vcl_recv {
    set req.backend = default;
    if (req.url ~ "\.v\.[0-9]*") {
        set req.url = regsub(req.url, ".v.[0-9]*", "");
        set req.http.magicmarker = "1";
        remove req.http.cookie;
    }
}

This tests for the presence of a version in a file path, sets a marker for the backend and removes the version.

Here’s the fetch:

sub vcl_fetch {
    if (req.http.magicmarker == "1") {
        unset obj.http.magicmarker;
        set obj.http.expires = "Mon, 1 Jan 2018 20:00:00 GMT";
        unset obj.http.cache-control;
        unset obj.http.etag;
    }
}

This checks for the marker and on finding it removes it and sets some header optimisations with a far future Expires.

You can see an example at http://sand.boundvariable.com/js/core.v.106022009.js. Open it up and check out the headers. For comparison, have a look at the headers on http://sand.boundvariable.com/js/core.js which is doing a straight pass to Cherokee.

You can also check out the cacheability of the file via a report on ircache.

The danger in this approach served as-is, is that somebody could fill the cache and downstream proxies with stale copies of files. Also, this setup in the current state is not appropriate for making sure that every user getting a resource at a particular URL is getting the same result.

Determining if those two points are problems for particular cases and the solutions for them is left as an exercise for the reader.



CoffeeScript in Action


CoffeeScript in Action book cover

I'm the author. Get it from Manning.