Thursday, June 20, 2013

What asm.js is and what asm.js isn't

This is going to be a bit long, so tl;dr asm.js is a formalization of a pattern of JavaScript that has been developing in a natural way over years. It isn't a new VM or a new JIT or anything like that.

asm.js is a subset of JavaScript, defined with the goal of being easily optimizable and used primarily as a compiler target from languages like C and C++. I've seen some recent online discussions where people appear to misunderstand what those things mean, which motivated me to write this post, where I'll give my perspective on asm.js together with some context and history.

The first thing worth mentioning here is that compiling into JavaScript is nothing new. It's been done since at least 2006 with Google Web Toolkit (GWT) which can compile Java into JavaScript (GWT also does a lot more, it's a complete toolkit for writing complex clientside apps, but I'll focus on the compiler part of it). Many other compilers from various languages to JavaScript have shown up since then, for both existing languages like C++ and C#, to new languages like CoffeeScript, TypeScript and Dart.

Compiled code (that is, JavaScript that is the output of a compiler) can look odd. It's often not in a form that we would be likely to write by hand. And each compiler has a particular pattern or style of code that it emits: For example, a compiler can translate classes in one language into JavaScript classes using prototypal inheritance (which might look a little more like "typical" JS), or it can implement those classes with function calls without inheritance (passing "this" manually; this might look a little less "typical"), etc. Each compiler has a way in which it works, and that gives it a particular pattern of output.

Different patterns of JavaScript can run at different speeds in different engines. This is obvious of course. On one specific benchmark a JS engine can be faster than another, but in general no JS engine is "the fastest", because different benchmarks can test different things - use of classes, garbage collection, integer math, etc. When comparing JS engines, one can be better on one of those aspects and slower on another, because optimizing for them can be quite separate. Look at the individual parts of benchmarks on AWFY for example (click "breakdown"), and you'll see that.

The same is true for compiled code: different JS engines can be faster or slower on particular patterns of compiled code. For example, it was recently noticed that Chrome is sometimes much faster than Firefox on GWT-generated code; as you can see in the bug there, recent work has narrowed the gap. There is nothing odd about that: one JS engine was faster than another on a particular pattern, and work was done to catch up. This is how things work.

Aside from Java, which GWT compiles, another important language is C++. There have been at least two compilers from C++ to JavaScript in existence over the last few years, Emscripten and Mandreel. They are separate projects and quite different in many ways - for example, Emscripten is open source and targets just JavaScript, while Mandreel is closed source and can target other things as well like Flash - but over a few years they converged on basically the same pattern of JavaScript for compiled C++ code. That pattern involves using a singleton typed array to represent memory, and bitwise operations (|0, etc.) to get values to behave like C++ integers (the JavaScript language only has doubles).

We might say that Emscripten and Mandreel "discovered" a useful pattern of JavaScript, where "discovered" means the same as when Crockford discovered JSON. Of course typed arrays, bitwise ops and JavaScript Object Notation syntax all existed earlier, but noticing particular ways in which they can be especially useful is significant. Today it feels very natural to use JSON as a data interchange format - it almost feels like it was originally designed to be used in that manner, even though of course it was not - and likewise, if you are writing a compiler from C++ to JavaScript, it is natural to use |0 and a singleton typed array to do so.

Luckily for the Emscripten/Mandreel pattern, that type of code benefits from many years of work that went into JavaScript engines. For example, they are all very good at optimizing code whose types do not change, and the Emscripten/Mandreel pattern generates code that is implicitly statically typed, since it originated as statically typed C++, so in fact types should not change. Likewise, the bitwise operators that are important in the Emscripten/Mandreel pattern appeared in crypto tests in major benchmarks like SunSpider, so they are already well-optimized for.

In other words, the Emscripten/Mandreel pattern of code could be quite fast due to it focusing on things that JS engines already did well. This isn't a coincidence of course, as both Emscripten and Mandreel tested in browsers and decided to generate code that ran as quickly as possible. Then, as the output of these compilers was increasingly used on the web, browsers started to optimize for them in more specific ways. Google added a benchmark of Mandreel code to the Octane benchmark (the successor to the popular V8 benchmark), and both Chrome and Firefox have optimized for both Mandreel and Emscripten for some time now (as a quick search in their bug trackers can show). So the Emscripten/Mandreel pattern became yet another pattern of JavaScript that JavaScript engines optimize for, alongside all the others which are represented in the familiar benchmarks: SunSpider, Octane, Kraken, etc. This was all a very natural process.

Here is where we get to asm.js. While Emscripten/Mandreel code can run quickly, there was still some gap between it and native code. The gap was not huge - in many cases it ran just 3x slower than native code, which is pretty close to, say, Java - but still problematic in some cases (for example, high-performances games). So Mozilla began a research project to investigate ways to narrow that gap. Developed in the open, asm.js started to formally define the Emscripten/Mandreel pattern as a type system. Formalizing it made us think about all the corner cases, and we found various places in Emscripten/Mandreel code where types could change at runtime, which is contrary to the goal of the pattern, and can make things slower. asm.js's goal was to get rid of all those pitfalls.

The first set of benchmark results was promising - execution close to 2x slower than native, even on larger benchmarks. Achieving that speedup in Firefox took just 1 engineer only 3 months, since it doesn't require a new VM or a new JIT, it only required adding some additional optimizations to Firefox's existing JS engine. Soon after, Google showed very large speedups on asm.js code as well, as noted in the IO keynote and as you can see on AWFY. (In fact, over the last few days there have been several speedups noticeable there on both browsers - these improvements are happening as we speak!) Again, these large speedups in two browsers were achieved quickly because no new VM or JIT was written, just additional optimizations to existing JS VMs. So it would be incorrect to say that asm.js is a "new VM".

I've also seen people say that asm.js is a new "web technology". Loosely speaking, I suppose we could use that term, but only if we understand that if we say "asm.js is a web technology" then that is totally different from when we say "WebGL is a web technology". Something like WebGL is in fact what we normally call a "web technology" - it was standardized, and browsers need to decide to support it and then do work to implement it. asm.js, on the other hand, is a certain pattern of compiled code in JavaScript, which is already standardized and supported. asm.js, just like GWT's output pattern, does not need to be standardized, nor do browsers need to "support" it, nor to standardize whatever optimizations they perform for it. Browsers already support JavaScript and compete on JavaScript speed, so asm.js, GWT output, CoffeeScript output, etc., all work in them and are optimized to varying degrees.

Regarding the term "support": As before, I suppose we can loosely talk about browsers "supporting" asm.js, but we need to be sure what we mean. On the one hand, when we talk about WebGL then it is clear what we mean when we say something like "IE10 does not support WebGL". But if we say "a browser supports asm.js", we intend something very different - I guess people using that phrase mean something like "a browser that spent some time to optimize for asm.js". But I don't see people saying "a browser supports GWT output", so I suspect that using the term "support" comes from the mistaken notion that asm.js is something more than a pattern of JavaScript, which is not the case.

Why, then, do some people apparently still think asm.js is anything more than a pattern of JavaScript? I can't be sure, but here are some possibilities and clarifications to them:

1. asm.js code basically defines a low-level VM, in a sense: There is a singleton array for "memory" and all the operations are on that. While true, it is also true for Emscripten and Mandreel output, and other compilers as well. So in some sense this is valid to say, but just in the general sense that we can implement VMs in JavaScript (or any Turing-complete language), which of course we can regardless of asm.js.

2. Some of the initial speedups from asm.js were surprisingly large, and it felt like a big jump from the current speed of JavaScript, not an incremental step like things normally progress. And big jumps often require something "new", something large and standalone. But as I mentioned above, this was actually a very gradual process. Relevant optimizations for Emscripten/Mandreel code have taken years, writing SSA-optimizing JITs like CrankShaft and IonMonkey have likewise taken years (there is overlap between these two statements, of course), and the speedups on asm.js code are due to those long-term projects being able to really shine on a code pattern that is easy for them to optimize. In fact, many of the recent optimizations to speed up asm.js code do not actually add new optimizations directly, instead they change whether the browser decides to fully optimize it - that is, they get the code to actually reach the optimizing JIT (CrankShaft, IonMonkey, etc.). The power of those optimizing JITs has been there for a while now, but sometimes heuristics prevented it from being used.

3. A related thing is that I sometimes see people say that asm.js code will be "unusably slow" on browsers that do not optimize for it in a special way (just today I saw such a comment on hacker news, for example). First of all this is just false: look at these benchmarks, it is clear that on many of them Firefox and Chrome have essentially the same performance, and where there is a difference, it is noticeably decreasing. But aside from the fact that it is false, it is also disrespectful to the JavaScript VM engineers at Google, Microsoft and Apple: To say that asm.js code will be "unusably slow" on browsers other than Firefox implies that those other VM devs can't match a level of performance that was shown to be possible by their peers. That is a ridiculous thing to say given how talented those devs are, which has been proven countless times.

If someone says it will be "unusably slow" not because they can't reach the same speed but because they won't, then that looks obviously false. Why would any browser decide to not optimize for a type of JavaScript that is being used - including in high-profile things like Epic Citadel - and has relevant benchmarks? All browsers want to be fast on everything, this is a very competitive field, and we have already seen Google optimize for asm.js code as mentioned earlier in the IO keynote; there are also some positive signs from Microsoft as well.

4. Another possible reason is that the asm.js optimization module in Firefox is called OdinMonkey (mentioned for example in this blogpost). SpiderMonkey, Firefox's JavaScript engine, has called its JITs with Monkey-related names: TraceMonkey, JaegerMonkey and IonMonkey (although, baseline is a new JIT that received no Monkey name, so there is definitely no consistency here). So perhaps OdinMonkey sounds like it could be a new JIT, which seems to imply that asm.js optimizations require a new JIT. But as mentioned in that blogpost, OdinMonkey is not a new JIT, instead it is a module that sits alongside the rest of the parts of the JavaScript engine. What OdinMonkey does is detect the presence of asm.js code, take the parse tree that the normal parser emitted, type check the parse tree, and then send that information into the existing IonMonkey optimizing compiler. All the code optimization and code generation of IonMonkey are still being used, no new JIT for asm.js was written. That's why, as mentioned before, writing OdinMonkey only took one engineer 3 month's work (while also working on the spec). Writing a new JIT would take much more time and effort!

5. Also possibly related is the fact that OdinMonkey uses the "use asm" hint to decide to type check the code. This is indeed a bit odd, and feels wrong to some people. I certainly understand that feeling, in fact when working on the design of asm.js I argued against it. The alternative would be to do some heuristic check: Does this block of code not contain anything impossible in asm.js (no throw statements, for example), and does it contain a singleton typed array, etc.? If so, then start to type check in OdinMonkey. This could achieve practically the same result in my opinion. It does, however, have some downsides - heuristics are sometimes wrong and often add overhead, and really this optimization won't "luckily work" on random code, instead it will be expected and intended by a person using a C++ to JS compiler, so why not have them note that intention - which are strong arguments for using an explicit hint.

The important thing is that "use asm" does not affect JS semantics (if it does in some case - and new optimizations often do cause bugs in odd corner cases - then that must be fixed just like any other correctness bug). That means that JavaScript engines can ignore it, and if it is ultimately determined to be useless by JS engines then we can just stop emitting it.

6. asm.js has a spec, and defines a type system. That seems much more "formal" than, say, the output pattern of GWT, and furthermore things that do have specs are generally things like IndexedDB and WebGL, that is, web technologies that need to be standardized and so forth. I think this implies to some people that asm.js is more like WebGL than just a pattern of JavaScript. But not everything with a spec has it for standardization purposes. As mentioned before, one reason for writing a spec and type system for asm.js was to really force us to think about every detail, in order to get rid of all the places where types can change, and other stuff that prevents optimizations. There are 4 other reasons that IMO justified the effort to write a spec:

* Having a spec makes it easy to communicate things to other people. As I mentioned before, GWT output often used to run (and maybe still does) faster in Chrome than Firefox, and I am not really sure why (probably multiple reasons). GWT and Chrome are two projects from the same company, so it would not be surprising if their devs talk to each other privately, and there would be nothing wrong with it if they did. Emscripten and Firefox are, like GWT and Chromium, two open source projects primarily developed by a single browser vendor, so there is sort of a parallel situation here. To avoid the downsides of private discussions, we felt that writing a spec for asm.js would give us the benefit of being able to say "this (with all the technical details) is why this code runs fast in Firefox." That means that if other browser vendors want to run that code quickly, then they have all the docs they need in order to do so, and on the other side of things, if other compilers like Mandreel want to benefit from those speedups, then once more they have all the information they need as well. Writing a spec (and doing so in the open) makes things public and transparent.

* Having a type system opens up the possibility to do ahead of time (AOT) compilation in a reasonable way. Note that AOT should have already been possible in the output patterns for Emscripten and Mandreel (remember, they are generated from C++, that is AOTed), but actually wasn't because of the few places where types could in fact change. Assuming we did things properly, asm.js should have no such possible type changes anymore, so AOT is possible. And a type system makes it not just possible but quite reasonable to actually do so in practice.

AOT can be very useful in reducing the overhead and risk of optimization heuristics. As mentioned before, in some cases code never even reaches the optimizing JIT due to heuristics not making the optimal decision, and furthermore, collecting the data for those heuristics (how long a function or loop executes, type info for variables, etc.) adds overhead (during a warmup phase before full optimization, and possibly later during deoptimization and recompilation). If code can be compiled in an AOT manner, we simply avoid those two problems: we optimize everything, and we do so immediately. (There is a potential downside, however, in that fully optimizing large amounts of code can take significant time; in Firefox this is partially mitigated by things like compilation using multiple cores, another possibility is to consider falling back to the baseline JIT for particularly slow-to-compile functions.)

Because of these benefits of AOT compilation, it can avoid the "stutter" problem - where a game pauses or slows down briefly when new code is executed (like when a new level in the game is begun that uses some new mechanics or effects), due to some new code being detected as hot and then optimized, at the precise time when it is needed to be fast. Stutter can be quite problematic, as it tends to happen at the worst possible times - when something interesting happens - and I've often heard game devs be concerned about it. AOT compilation fully optimizes all the code ahead of time, so the stutter problem is avoided.

A further benefit of AOT is that it gives predictable performance: we know that all the code will be fully optimized, so we can be reasonably confident of performance in a new benchmark we have never tested on, since we do not rely on heuristics to decide what to optimize and when. This has been shown repeatedly when AOT compilation in Firefox achieved good performance the very first time we ran it on a new codebase or benchmark, for example on Epic Citadel, a compiled Lua VM, and others. In my JavaScript benchmarking experiences in the past, that has been rare.

AOT therefore brings several benefits. But with all that said, it is an implementation detail, one possible approach among others. As JS engines continue to investigate ways to run this type of code even better, we will likely see more experimentation in this area.

* Last reason for writing the spec and type system: asm.js began as a research project, and we might want to publish a paper about it at some point. (This one is much less important than the other three, but since I wrote out the others, might as well be complete.)

That sums up point 6, why asm.js has a spec and type system. Finally, two last possible reasons why people might mistakenly think asm.js is a new VM or something like that:

7. Math.imul. Math.imul is something that came up during the design of asm.js, that did actually turn into a proposal for standardization (for ES6), and has been implemented in at least Firefox and Chrome so far. Some people seem to think that asm.js relies on Math.imul, which would imply that asm.js relies on new language features in JavaScript. While Math.imul is helpful, it is entirely optional. It makes only a tiny impact on benchmarks, and is super-easy to polyfill (which emscripten does). Perhaps it would have been simpler to not propose it and just use the polyfill code, to avoid any possible confusion. But Math.imul is so simple to define (multiply two 32-bit numbers properly, return the lower 32 bits), and is basically the only integer math operation that cannot be implemented in a natural way in JS (the remarkable thing is that all the others do have a natural way to be expressed, even though JavaScript does not have integer types!), so it just felt like a shame not to.

It's important to stress that if Math.imul were opposed by the JavaScript community, it would of course not have been used by asm.js - asm.js is a subset of JavaScript, it can't use anything nonstandard. It goes without saying though that new things are being discussed for standardization in JavaScript all the time, and if there are things that could be useful for compiled C++ code, then such things are worth discussing in the JavaScript community and standards bodies.

8. Finally, asm.js and Google's portable native client (PNaCl) share the goal of enabling C++ code to run safely on the web at near-native speeds. So it is perhaps not surprising that some people have written blogposts comparing them. And aside from that shared goal, there are some other similarities, like both utilizing LLVM in some manner. But to compare them as if they were two competing products - that is, as if they directly compete with each other, and one's success may be at the other's detriment - is, I think, irrelevant. PNaCl (and PPAPI, the pepper plugin API on which it depends) is a proposed web technology, something that needs to be standardized with support from multiple vendors if it is to be part of the web, and asm.js on the other hand is just a pattern of JavaScript, something that is already supported in all modern browsers. asm.js code is being optimized for right now, just like many patterns of JavaScript are, driven by the browser speed race. So asm.js will continue to run faster over time, regardless of what happens with other things like PNaCl: I don't see a realistic scenario where PNaCl somehow makes browser vendors decide to stop competing on JavaScript speed (other plugin technologies like Flash, Java, Silverlight, Unity, etc. did not).

All of this has nothing to do with how good PNaCl is. (Worth mentioning here that I consider it to be an impressive feat of engineering, and I have a huge amount of respect for the engineers working on it.) It is simply operating in a different area. The JavaScript speed race is very important right now in the browser market: When your browser runs something slower than another browser, that's something that you are naturally motivated to improve. This happens with benchmarks (we already saw big speedups on two browsers on asm.js code), and it happens with things like the Epic Citadel demo (it initially did not run at all in Chrome, but Google quickly fixed things). This kind of stuff will continue to happen on the web and drive asm.js performance, and as already mentioned, this is regardless of what happens with PNaCl.

Are things really that simple?

I've been saying that asm.js is just a pattern of JavaScript and nothing more, and the speedups on it are part of the normal JavaScript speed competition that leads browser vendors to optimize various patterns. But actually things are more complicated than that. I would not argue that any pattern should be optimized for, nor that it would be ok for a vendor to specifically optimize based on hints for an arbitrary subset of JavaScript.

To make my point, consider the following extreme hypothetical example: A browser vendor decides to optimize a little subset of JS that has a JS array for memory, and stores either integers or strings into it. For example,

function WTF(print) {
  'use WTF';
  var mem = [];
  function run(arg) {
    mem[20] = arg;
    var a = 100;
    mem[5] = a;
    a = 'hello';
    mem[11] = a;
    print(mem[5]);
    mem[5] = mem[11];
    a = arg;
    return a;
  }
  return run;
}


Call this WTF.js. So when 'use WTF' is present, the browser checks that there is a singleton array called mem and so forth, and can then optimize the WTF.js code very well: mem is in a closure and does not escape, so we know its identity statically; we also know it should be implemented as a hash table because it likely has holes in its indexes; we know it can contain only ints or strings so some custom super-specialized data structure might be useful here; etc. etc. Again, this is an example meant to be extreme and ridiculous, but you can imagine that such a pattern might be useful as a compilation target for some odd language.

What is wrong with WTF.js is that while it is JavaScript, it is a terrible kind of JavaScript, in the following ways. In particular it violates basic principles of how current JavaScript engines optimize: for starters, variables are given more than one type (both locals and elements in mem). This is exactly what JS engine devs have been telling us all not to do for years now. Also, this subset comes out of nowhere - I can't think of anything like it, and when I tried to give it a hypothetical justification in the previous paragraph, I had to really reach. And there are obvious ways to make this more reasonable, for example to use consecutive indexes from 0 in the "mem" array, so there are no holes - this is something else that JS engine devs have been telling us to do for a very long time - and so forth. So WTF.js is in fact WTF-worthy.

None of that is the case with asm.js. As I detailed before, asm.js is the latest development in a natural process that has been going on in the JavaScript world for several years: Multiple compilers from C++ to JS appeared spontaneously, later they converged on the subset of JS that they target, and later JS engines optimized for that subset as it became more common on the web. While |0 might look odd to you, it wasn't invented for asm.js. It was "discovered" independently multiple times way before asm.js, found to be useful, and JS engines optimized for it. This process was not directed by anyone; it happened organically.

These principles guided us while designing early versions of asm.js: There were code patterns that were even easier to optimize, but they were novel, sometimes bizarre, and most importantly, they ran poorly without special optimizations for them, that did not exist yet. When I felt that a proposed pattern was of that nature, I strongly opposed it. Emscripten and Mandreel code already ran quite well; to design a new pattern that could be much faster with special optimizations, but right now is significantly slower without them, is a bad idea - it would feel unfair and WTF-like.

Therefore as we designed asm.js we tested on JS engines without any special optimizations for it, primarily on Chrome and on Firefox with the WIP optimizations turned off. asm.js is one output mode in Emscripten, so we had a good basis for comparison: when we flip the switch and go from the old Emscripten output pattern to asm.js, do things get faster or slower? (Note btw that we lack something similar in the WTF.js example from before, which is another problem with WTF.js.) What we saw was that (when we avoided the more novel ideas I mentioned before, that were rejected), flipping the switch from the old pattern to asm.js generally had little effect, sometimes helping and sometimes hurting, but overall things stayed at much the same (quite good) level of performance JS engines already had. That made sense, because asm.js is very close to the old pattern. (My guess is that the cases where it helped were ones where asm.js's extra-careful avoidance of types changing at runtime made a positive difference, and cases where it hurt were ones where it happened to hit an unlucky case in the heuristics being used by the JS engine.)

A few months ago Emscripten's default code generation mode switched to asm.js. Like most significant changes it was discussed on the mailing list and irc, and I was happy to see that we did not get reports of slowdowns in other browsers as a result of the change, which is further evidence that we got that part right. In fact (aside from the usual miscellaneous bugs following any significant change in a large project), the main regression caused by that change was that we got reports of startup slowdowns in Firefox actually! (These were caused by AOT sometimes being sluggish, it was however very easy to get large speedups on it, following those bug reports.)

Developing and shipping optimizations for something bizarre like WTF.js would be selfish, since it benefits one vendor while arbitrarily, unexpectedly and needlessly harming perf in other vendor's browsers. But none of that is the case with asm.js, which builds on a naturally-occurring pattern of JavaScript (the Emscripten/Mandreel pattern, already optimized for by JS engines), was designed to not harm performance on other browsers compared to that already-existing pattern, and as benchmarks have shown on two browsers already, creates opportunities for significant speedups on the web.