Compression of JavaScript programs

In an earlier post I promised that I would come back to the topic of compressing JavaScript programs efficiently, so here we go…

In this post, I’ll go through some techniques for creating self extracting, compressed JavaScript programs.

Since this is quite a lengthy post, here’s an outline as an overview and for quick navigation:

Why compress JavaScript?

There may be several legitimate reasons for compressing your JavaScript files. The most obvious is probably to reduce network bandwidth, which can be achieved with HTTP compression (i.e. by using gzip compression between the server and the client at the protocol level).

There are, however, some less common cases where you can not, or are not allowed to, rely on HTTP compression. The use case that originally triggered my interest is the somewhat artificial JavaScript file size limitation of certain demo contests. For instance, the WebGL 64k contest has the following rules:

  • Submission file size MUST be ≤ 64k (65,536 bytes)
  • No external requests, everything must be inlined in the JavaScript

Pay close attention to the second point: all assets, such as images and music, must be inlined in the JavaScript source code, and according to the first point, the source code must not exceed 65536 bytes. Those are quite challenging limits!

In order to make an interesting program, you will certainly want make good use of those bytes, and that calls for compression.

The Google Closure Compiler

First things first: There is a great tool from Google called the Google Closure Compiler. The closure compiler will basically trim down your JavaScript code to a minimum by:

  • Removing all comments and unnecessary white-spaces.
  • Renaming variables and properties to shorter names.
  • Refactoring expressions and constructs into a more compact form.

Needless to say, any attempts at compressing JavaScript source code should always start by using the closure compiler.

With that said, this article will focus mainly on taking compression even further by applying binary compression methods to the already compact code produced by the closure compiler.

Selecting a compression method

One of the main problems that we face when we want to compress a JavaScript program is that it must be able to decompress itself, which means that the decompression code must not take too much space, or the gain from compressing the original program would be lost!

Another problem is that the binary compressed data stream must be stored in the JavaScript source somehow (more on that later).

Existing libraries

Since the decompression routine must be small, existing (efficient but large) decompression libraries such as JXGraph and js-deflate are disqualified (the decompression routine from the latter, which seems quite minimal, weighs in at 4924 bytes when run through the Google Closure Compiler, which is a bit on the heavy side).

Using liblzg

My first solution to the problem was to use the liblzg compression library, which was specifically designed to enable a light weight decompression routine. As it turns out, the decompression routine fits into about 450 bytes of JavaScript code (with a potential of becoming even smaller with some tweaking), which is clearly acceptable for our needs.

Handling binary data

This brings us to another problem: how to store binary data in a JavaScript source file.

Base64

The tried and true method for storing binary data in text files is of course base64. However, since that scheme is only able to store six data bits per text character, we get a data growth of 33% (assuming an 8-bit character encoding)! That data growth will severely damage the compression ratio, so we ought to find a better method.

Latin 1

The first method that I tried was to use plain Latin 1 encoding – i.e. just put the compressed byte data into a JavaScript string in a Latin 1 encoded JavaScript source file. Obviously, that will not work without some tweaks, since there are several byte values that are forbidden in such a string (0-31, 39, 92, and 127-159, to be more precise).

By substituting invalid byte codes with two valid characters, and some “clever” shifting of codes to minimize the occurrences of invalid codes (based on the statistical content of liblzg compressed data), the data growth factor could be reduced to about 5-10%.

Surely, we can do better than that?

UTF-16

Yes we can! Please note that while in Latin 1 encoding there are 26% invalid code points (in the 0-255 range), UTF-16 only has about 3% invalid code points (in the entire 0-65535 range, only 2151 code points must be avoided).

So, let us use UTF-16 coding for our packed JavaScript program! A 2-byte BOM in the beginning of the file will ensure that the browser understands that we use UTF-16 (it’s actually even safer than Latin 1 encoding, which can be misinterpreted as UTF-8 by some clients or text-editors).

Now we can just pack two bytes from the compressed data stream into a single UTF-16 character, and again have a fall back for invalid code points (use two valid UTF-16 characters to encode two bytes that would otherwise make up an invalid code), giving at most a data growth of less than 4% (in the case of liblzg compressed data, the figure is usually less than 1%).

The downside to UTF-16 encoding

There is one severe downside to using UTF-16 encoding: the decompression routine must be encoded in UTF-16 too, which means that it will take twice the space (i.e. over 900 bytes)! For small files, that may even make the final file larger than with Latin 1 encoding.

But fear not, the solution is near: we can pack the decompression routine!

How? Well, let’s use the same trick as for the binary data: pack two characters into one UTF-16 character. The decompression routine has a nice property: it’s pure ASCII (only using codes in the range 32-126). This means that combining two consecutive bytes into a single Unicode character will always result in a valid code point (all codes in the range 2020-7e7e are valid UTF-16 codes).

Since packing the decompression routine in a UTF-16 string will never produce invalid codes, we get 0% space loss, and the routine for unpacking the packed string is very simple. Yes, we need an additional unpacking routine, and it has to be in plain (unpacked) UTF-16 form, but it does not add much to the code since it’s very basic. It looks something like this (before mangling it to a more compact form):

var i, c, p = '...packed-string...', s = '';
for (i = 0; i < p.length; i++)
{
  c = p[i];
  s += String.fromCharCode(c >> 8, c & 255);
}
eval(s);

Conclusions for the liblzg approach

So, at this point, we’ve taken the liblzg compression pretty much as far as we can for making a self extracting JavaScript program. In summary:

  • A self-extracting JavaScript module (about 600 bytes).
  • Fairly well-packed binary data (only about 3% larger than the original binary size).
  • Pure ECMAScript implementation (very high cross-browser compatibility).
  • Decent compression ratio (though not quite as good as DEFLATE, Bzip2, LZMA, etc).

While we have achieved great results, we can do even better!

Doing even better – PNG

Wouldn’t it be great if we could access some browser API that can decompress DEFLATE encoded data from JavaScript? Most browsers use zlib internally for many different things, but still there is no direct support for zlib from JavaScript…

However, if we are willing to sacrifice compatibility with legacy browsers (e.g. IE 6, 7 and 8), there is a cool trick that we can use (that relies heavily on the <canvas> element): store our JavaScript program in a PNG image file (let each character in the source code be one pixel in the image, ranging from 0-255 in gray scale intensity).

So what good would that do – and how can we use it?

The PNG idea

OK, so a little background on the PNG image file format:

  • PNG uses the DEFLATE algorithm for compression (which is the same that is used in ZIP and gzip, for instance).
  • The file format is lossless (no pixel values are distorted).
  • All modern browsers can decode PNG images when loaded to an Image object.
  • We can draw the PNG image to a canvas element and read back pixels using the getImageData() method!

In other words, with the aid of the canvas element, we can decompress a PNG image and extract its contents into a JavaScript String object, that we can then execute (e.g. with eval).

Again, we can use the UTF-16 “trick” for storing the raw PNG image data in a JavaScript string without experiencing too much data growth.

The decompression logic

As with the liblzg solution, we need some decompression code. Only this time it’s not the actual decompression algorithm, but a small program that does roughly the following:

  1. Decode the UTF-16 string to a binary “string” (containing the raw PNG data).
  2. Encode the binary string as Base64, using the btoa() method.
  3. Generate a data URI by prepending a small header: “data:image/png;base64,”
  4. Load the data URI into an Image object.
  5. In the onload handler of the image:
    1. Draw the image to a canvas element.
    2. Read back the pixel values and put them into a new string.
    3. Execute the content of the string.

The size of this decompression program is roughly the same as the liblzg decoder, but on the other hand the DEFLATE compression method gives us better compression ratios than liblzg (the main reason is that while liblzg only uses LZ77 compression, DEFLATE uses a combination of LZ77 and Huffman compression).

Conclusions for the PNG approach

At this point, and with the given goal (making a fairly small JavaScript source file even smaller), the PNG method described above seems to be as far as you can get with current browser technology when it comes to compression ratio (when including the decompression routine as part of the compressed deliverable).

The key benefits with this method are:

  • Reuse the DEFLATE decompression routine in browser:
    • Commonly available (every browser that supports PNG).
    • Quite good compression ratios (JavaScript code compresses well).
  • Encode the raw PNG data in a valid UTF-16 string:
    • Very low overhead for encoding binary data in a string (only 3-4%, compared to 33% for Base64).
    • Very little chance for the data to be misinterpreted by the browser.

Wrapping it up – CrunchMe

I hope that you found the compression methods described here useful and/or interesting.

If you want to compress your own JavaScript programs, there is an open source command line tool available called CrunchMe.

Note: At the time of writing this, the tool is still under development (beta), but it should be stable enough for normal use (you will have to compile it yourself, though).

25 thoughts on “Compression of JavaScript programs

  1. Jan

    What’s wrong with serving JS which is gzipped by the web server?

    1. m

      Doing server-side gzipping is all good en well for limiting bandwidth (increasing page load speed etc), but it does not solve the slightly different problem when your JavaScript code must be smaller than a certain size limit (silly? quite likely…)

  2. “What’s wrong with serving JS which is gzipped by the web server?”

    Pffft, gzip. It’s time you go and learn about the demoscene 😉

  3. Jan: on mobile devices cache size is very limited, and they normally cache uncompressed content, so keeping JS files small helps.

    1. Is there any reason that the mobile device keep cache in raw format? Even the cheapest compression saves IO time and power.

      1. m

        Please keep in mind that it may not always be true that compression saves power or even “transfer” time, though, since executing the unpacker takes both time and battery power (regardless if it’s a JS unpacker or a “native” gzip unpacker).

  4. egon

    For smaller competitions (like js1k) the png and lz unpackers are still to large.

    There’s jscrush and packify that have a lot smaller unpacker.

    Maybe you can include these (or a similar) packers into the tool as well and let the program choose the best packing method.

    1. m

      Yes, the current approach is not optimal for really small programs (the unpacker overhead is over 0.5 KB). The target is more like 4 KB – 64 KB.

      Actually, for really small programs, the LZG version could be made much smaller. The code is quite generic, and a fair bit of code deals with match offsets > 2 KB and match lengths >32 bytes, for instance.

  5. Calroen Hsu

    Why to not hide the JS inside another JS? This way the first JS is not visible to compression soft. Can hide JS inside JS and each time compression soft works again.

    1. m

      I’m confused. Could you please explain a bit more?

      1. They mean that you can avoid the size increase of UTF-16 on the unpacker by keeping the unpacker in a UTF-8 JS which loads the data to be packed from a second JS using a different encoding.

        This is, of course, a bad idea, because you lose the space savings on HTTP headers, invoke new network traffic, lag, etc.

        1. m

          Oh, I see now. Maybe I wasn’t clear in the post, but one of the constraints of this whole exercise is to keep both the unpacker and the packed data within the same file. If you are allowed to split the two, an even better approach is to load the packed JS from a separate PNG image (and never bother with UTF-16 encoding).

  6. Google Closure compiler is indeed always a good start, but please make sure that beast gets a chance on your string constants.

    Simply replacing often recurring string constants (like “function”, “object”) with a simple var strFunction=”function”; allows Closure to save a few bytes.

    Usually, when applying gzip compression, that doesn’t save a byte but in some cases (e.g. bookmarklets), gzip compression is not feasible.

  7. Jared Jacobs

    Great post, clever tricks. One thing I looked for in this post and comments, but didn’t find, was a comparison of the running times of these different methods. I’d imagine these methods are not recommended for any real website unless most of its visitors had fast computers and slow connections (AT&T mobile customers, perhaps?). A significant disadvantage of these methods is that the browser has to do several layers of unpacking/decoding of JS code on every page navigation, even when the script is just coming from the browser cache.

    1. m

      Good comment. Actually, the decompression is quite fast in modern browsers, so for most applications it should not be noticeable, but you are right that if you (re)load pages alot it could give you a performance hit.

      For comparison, you can try another demo of the LZG decompression here (I usually get over 10 MB/s in that test).

  8. It would be cool if somebody added a service like that to google apps. I use google’s optimizer that is hosted there. Pretty convenient.

  9. witek

    Precompress using gzip -9, put as static file, and configure apache properly. Instant win and big compatibility. (removing comments and redudant whitespaces, semicolons will also help). As of using clousre compiler and other “pseudo-compression” methods I will be very cearfull (base64 and other can really mess output). But automatic renaming of variable, arguments and function names, as well inlineing functions used once should be safe (if you are not writing library, is shiping and not debuging code, and there are no “eval”).

    PNG trick is not a trick. It is ugly hack.

    1. m

      Again, this is not about lowering network bandwidth nor faster page loads. It’s about smaller JavaScript files, even if they are loaded locally (not from a server).

  10. Alex Scott

    Will you continue to develop on CrunchMe?

  11. Does your website have a contact page? I’m having trouble locating it but, I’d
    like to send you an email. I’ve got some suggestions for your blog you might be interested in hearing. Either way, great site and I look forward to seeing it grow over time.

  12. Jennis Klabot

    Thanks for this article, was very helpful, especially the binary date part was something I didn’t really know much about before. I have a question though. Isn’t it much easier to just use one of the many online Javascript compressors available today for this (like http://www.giftofspeed.com/javascript-compressor/ or http://javascriptcompressor.com/ ) and just compress the individual javascript files instead of going through all the hassle to set it all up yourself? Thanks

    1. m

      The JavaScript compressors that you mention have slightly different goals than I had when developing these compression techniques. Today, a better performing alternative is JsExe.

  13. McSodbrenner

    Browsers and OS have evolved and are now using color management that changes the values you get back from getImageData. And because there are unlimited color profiles that are used we unfortunately cannot rely on the read values. Got three different results for Chrome Desktop (no color management), Chrome on Nexus 5X and Chrome on Lenovo Yoga tablet. Only the values read on Desktop Chrome were the original ones.

Leave a Reply

Your email address will not be published.