Tuesday, February 21, 2012

box2d.js: Box2D on the Web is Getting Faster

Box2D is a popular open source 2D physics library, used for example in Angry Birds. It's been ported to various platforms, including JavaScript through a previous port to ActionScript. box2d.js is a new port, straight from C++ to JavaScript using Emscripten. Here is a demo.

Last December, Joel Webber benchmarked various versions of Box2D. Of the JavaScript versions, the best (Mandreel's build) was 12x slower than C. Emscripten did worse, which was not surprising since back then Emscripten could not yet support all LLVM optimizations. Recently however that support has landed, so I ran the numbers and on the trunk version of SpiderMonkey (Firefox's JavaScript engine), Emscripten's version is now around 6x slower than C. That's twice as fast as the previous best result from December (three times as fast as Emscripten's result at that time).

That should get even faster as JavaScript engines and the compilers to JavaScript continue to improve. The rate of improvement is quite fast in fact, you will likely see a big difference between stable and development versions of browsers when running processing-intensive code like Box2D.

Aside from speed, it's important that the compiled code be easily usable. box2d.js uses the Emscripten bindings generator to wrap compiled C++ classes in friendly JavaScript classes, see the demo code for an example. Basically, you can write natural JavaScript like new Box2D.b2Vec2(0.0, -10.0) and it will call the compiled code for you.

(And of course, box2d.js is zlib licensed, like Box2D - usable for free in any way.)

Monday, January 23, 2012

Emscripten Standard Library Support, Now With More C++

I just landed much more comprehensive support for the C++ standard library in Emscripten, which now allows you to compile pretty much any C++ code using the standard C++ library to JavaScript. So I figured it was a good time to write up an overview of how Emscripten handles standard libraries.

As background, one of the initial design decisions in Emscripten was to focus on generating good code, even when that has some potential downsides elsewhere. Good code means both fast code and small code, both of which are particularly important on the web: While fast code is important everywhere, JavaScript is not yet as fast as native code, so to counter that we need to really focus on generating efficient code, and regarding code size, you might not care much about linking to a 5MB shared library on your desktop, but downloading and parsing a 5MB script is something significant.

For that reason, it didn't seem like a good idea to build C and C++ standard libraries, ship them with your code, and link them at runtime: The standard libraries are quite large. Furthermore, Emscripten doesn't have a single ABI, it has several code generation modes, as one example there are two typed array modes and one mode without typed arrays, and code compiled with one is not interface-compatible with another. Not having a stable ABI lets us generate more specialized and efficient code, but it is another reason for not shipping separate linkable standard libraries.

Instead, when compiling code with Emscripten we build the standard library along with your project's code. Everything is then shipped as a single file. This gives the advantages mentioned before: Smaller code size since we know which parts of the standard library you actually need, and faster code since we can specialize both the standard library and your own code, not just your own.

However, there are two disadvantages of this approach. The first is that combining the standard libraries with your own code means they form a single "unit". When normally you build your code and then link to the LGPL-licensed GNU standard libraries, it's clear that your code does not need to comply with the LGPL (you just need to comply with the LGPL regarding the LGPL'd library itself). But if you build your project together with the standard library, intertwining them in an optimized way, it is less clear how the LGPL applies here. I actually don't think there is a problem - it seems equivalent to the former "normal" case to me, despite the differences between them - however, I am not a lawyer, and also it is better to avoid any possible confusion and concern. In addition, even if the LGPL still applies just to the library, you would be shipping the library yourself, meaning you need to comply with the LGPL for it (which I don't think is a problem myself, but it is a concern for other people). For those reasons, Emscripten doesn't include any LGPL code. That ruled out using the GNU standard C and C++ libraries, which would otherwise be the first choice because of their familiarity and compatibility with existing code.

Given that decision, I looked at the other options and decided to use the Newlib C library. There are then two options: Use just the Newlib headers, or use the existing Newlib implementation code, porting it to the new platform. I decided to use just the headers, because (1) Newlib is already not 100% compatible with the GNU C library, so there would anyhow be inconsistencies and missing parts we would need to work around, and easier to do so in our own new code, (2) By implementing the C standard library in JavaScript, we can optimize it using the existing capabilities of the web platform (for example, we use JavaScript's sort inside qsort), and (3) Porting Newlib to the web would mean writing new C code inside Newlib, and interfacing that with JavaScript code that hooks into the web platform APIs themselves, which is a little less convenient than writing just JavaScript, and finally (4) Porting a C standard library means working with the internals of that library, as opposed to implementing the familiar C standard library interface which is higher-level. So, Emscripten has an implementation of the C standard library written in JavaScript, primarily using the Newlib headers. (The only exception is malloc and free, which we compile from dlmalloc, because writing an effective malloc implementation is not trivial. However, we should implement malloc and free in JavaScript eventually since we could optimize it quite a bit.)

For C++, again we couldn't use the GNU C++ standard library. Instead we started out with the libc++ headers. Just using the headers was enough to get a lot of code to run, because a lot of the functionality is in the headers themselves. We did need to introduce some ugly hacks in the headers though, as well as implement some bits in JavaScript. This was enough for a lot of projects to work, but was still missing a lot of stuff, almost everything that wasn't implemented in a header.

That problem is what I worked on fixing last week: As of Saturday, we will build the libc++ sources, if they are needed by your project, and include them. To get this to be efficient, I also enabled LLVM's global dead code elimination, so that while we link in the entire C++ standard library, we immediately eliminate all the parts you don't actually need before proceeding to compile to JavaScript. This helps quite a lot with the size of the generated code (and also is nice for compilation times). Aside from that, the main challenge here was getting libc++ to build using the Newlib C standard library headers. The end result is that all of our hacks in the libc++ headers are now removed, we now build stock libc++ and include it if necessary, which means pretty much any C++ program that uses the C++ standard library should work. (With the obvious caveats of no multithreading with shared state and so forth, which are general limitations of compiling to JavaScript.)

I mentioned before that there were two downsides to building the standard libraries with your project, the first of which was licensing, which led us to avoid LGPL code. The other disadvantage is build time: You compile the standard library into JavaScript when you build your project, as opposed to just building your own project and linking it with prebuilt standard libraries. While I believe this is definitely worth the advantages of the approach (faster and smaller generated code), it is a concern. We get around a lot of the problem by (1) having the C standard library implemented in JavaScript, so there is no compilation time for it, (2) using LLVM's dead code elimination to quickly get rid of parts of the C++ standard library (and dlmalloc, as mentioned before that we also build, if it is used) early in the compilation process, (3) only linking in the C++ standard library (and dlmalloc) if they are actually used, and (4) caching the bitcode result of compiling the C++ standard library (and dlmalloc), so that we only compile it once to bitcode. With these in place, while emcc is still slower than gcc, it isn't very significant, except for the very first time you compile libc++ from source into bitcode (which as mentioned before, is done once and then cached).

Tuesday, December 20, 2011

New Emscripten tutorial: C/C++ to JavaScript now easier than ever with "emcc"

A new compiler frontend for Emscripten, emcc, has landed recently. emcc can be used basically as a drop-in replacement for gcc, making it much easier to compile C and C++ into JavaScript. For example,

   emcc src.cpp

will generate a.out.js, and

  emcc src.cpp -o src.html

will generate a complete HTML file with the compiled code as embedded JavaScript, including SDL support so the code can render to a Canvas element. Optimizing code is now easy as well,

  emcc -O2 src.cpp

will generate optimized code (optimizing in LLVM, the Emscripten compiler itself, the Closure Compiler, and the Emscripten JavaScript optimizer). (Note that there is an even faster setting, -O3, see the docs for more.)

emcc is presented in more detail in the new Emscripten Tutorial. Check it out! Feedback is welcome :)

Saturday, December 10, 2011

Typed Arrays by Default in Emscripten

Emscripten has several ways of compiling code into JavaScript, for example, it can use typed arrays or not (for more, see Code Generation Modes). I merged the 'ta2 by default' branch into master in Emscripten just now, which makes one of the typed array modes the default. I'll explain here the reason for that, and the results of it.

Originally Emscripten did not use typed arrays. When I began to write it, typed arrays were supported only in Firefox and Chrome, and even there they were of limited benefit due to lack of optimization and incomplete implementation. Perhaps more importantly, it was not clear whether they would ever be universally supported in all browsers. So to generate code that truly runs everywhere, Emscripten did not use typed arrays, it generated "plain vanilla" JavaScript.

However, that has changed. Firefox and Chrome now have mature and well-performing implementations of typed arrays, and Opera and Safari are very close to the same. Importantly, Microsoft has said that IE10 will support typed arrays. So typed arrays are becoming ubiquitous, and have a bright future.

The main benefits of using typed arrays are speed and code compatibility. Speed is simply a cause of JS engines being able to optimize typed arrays better than normal ones, both in how they are laid out in memory and how they are accessed. Compatibility stems from the fact that by using typed arrays with a shared buffer, you can get the same memory behavior as C has, for example, you can read an 8-bit byte from the middle of a 32-bit int and get the same result C would get. It's possible to do that without typed arrays, but it would be much, much slower. (There is however a downside to such C-like memory access: Your code, if it was not 100% portable in the first place, may depend on the CPU endianness.)

Because of those benefits, I worked towards using typed arrays by default. To get there, I had to fix various problems with accessing 64-bit values, which are only a problem when doing C-like memory access, because unaligned 64-bit reads and writes do not work (due to how the typed arrays API is structured). The settings I64_MODE and DOUBLE_MODE control reading those 64-bit values: If set to 1, reads and writes will be in two 32-bit parts, in a safe way.

Another complication is that typed arrays cannot be resized. So when sbrk() is called to a value that is larger than the max size, we can't easily enlarge the typed arrays we are using. The current implementation will create new typed arrays and copy the old values into them, which will work but is potentially slow.

Typed arrays have already worked in Emscripten for a long time (in two modes, even, shared and non-shared buffers), but the issues mentioned in the previous two paragraphs limited their use in some areas. So the recent work has been to smooth over all the missing pieces, to make typed arrays ready as the default mode.

The current default in Emscripten, after the merge, is to use typed arrays (in mode 2, with a shared buffer, that is, C-like memory access), and all the other settings are set to safe values (I64_MODE and DOUBLE_MODE are both 1), etc. This means that all the code that worked out of the box before will continue to work, and additional code will now work out of the box as well. Note that this is just the defaults: If your makefile sets all the Emscripten settings itself (like defining whether to use typed arrays or not, etc.), then nothing will change.

The only thing to keep in mind with this change is that by default, you will need typed arrays to run the generated code. If you want your code, right now, to run in the most places, you should set USE_TYPED_ARRAYS to 0 to disable typed arrays. Another possible issue is that not all JS console environments support typed arrays: Recent versions of SpiderMonkey and Node.js do, but the V8 shell has some issues (note that this is just a problem in the commandline shell, not in Chrome), so if you test your generated code using d8 then it will not work. Instead, you can test it in a browser, or by using Node.js or the SpiderMonkey shell for now.

Monday, December 5, 2011

Emscripten in node.js and on the web

Until now, to use Emscripten to compile LLVM to JavaScript you had to install a JavaScript engine shell (like SpiderMonkey's or V8's), both to run Emscripten itself and to run the generated code. This meant you had to get the latest source code of one of those shells and build it, which isn't hard but isn't super convenient either. So over the weekend I landed support for running Emscripten itself in node.js and in web browsers, as well as support for running the generated code in node.js (it always ran in browsers).

What this means is that if you have node.js, Python and Clang, you have everything you need to use Emscripten. For more, see the updated Getting Started page. (Regarding running Emscripten itself in a web browser, see src/compiler.html. This isn't really intended as a serious way to use it, but there are some interesting use cases for it, or will be.)

It is still strongly recommended to install the JavaScript engine shells themselves, though. One reason is the trunk engine shells are the very latest code, so to see the maximum speed code can run you should use them. Also, some tests require the SpiderMonkey shell because the others do not yet fully support the latest typed arrays spec. But, if you already have node.js installed anyhow, it is now easier to use Emscripten because you can just use that.

Tuesday, November 15, 2011

Code Size When Compiling to JavaScript

When compiling code to JavaScript from some other language, one of the questions is how big the code will be. This is interesting because code must be downloaded on the web, and large downloads are obviously bad. So I wanted to investigate this, to see where we stand and what we need to do (either in current compilers, or in future versions of the JavaScript language - being a better compiler target is one of the goals there).

The following is some preliminary data from two real-world codebases, the Bullet physics library (compiled to JavaScript in the ammo.js project) and Android's H264 decoder (compiled to JavaScript in the Broadway project):

Bullet

.js        19.2  MB
.js.cc      3.0  MB
.js.cc.gz   0.48 MB

.o          1.9  MB
.o.gz       0.56 MB

Android H264

.js       2,493 KB
.js.cc      265 KB
.js.cc.gz    61 KB

.o          110 KB
.o.gz        53 KB

Terms used: 

.js         Raw JS file compiled by Emscripten from LLVM bitcode
.js.cc      JS file with Closure Compiler simple opts
.js.cc.gz   JS file with Closure, gzipped

.o          Native code object file
.o.gz       Native code object file, gzipped

Notes on methodology:
  • Native code was generated with -O2. This leads to smaller code than without optimizations in both cases.
  • Closure Compiler advanced optimizations generate smaller JS code in these two cases, but not by much. While it optimizes better for size, it also does inlining which increases code size. In any case it is potentially misleading since its dead code elimination rationale is different from the one used for LLVM and native code, so I used simple opts instead.
  • gzip makes sense here because you can compress your scripts on the web using it (and probably should). You can even do gzip compression in JS itself (by compiling the decompressor).
  • Debug info was not left in any of the files compared here.
  • This calculation overstates the size of the JS files, because they have the relevant parts of Emscripten's libc implementation statically linked in. But, it isn't that much.
  • LLVM and clang 3.0-pre are used (rev 141881), Emscripten and Closure Compiler are latest trunk as of today.
Analysis

At least in these two cases it looks like compiled, optimized and gzipped JavaScript is very close to (also gzipped) native object files. In other words, the effective size of the compiled code is pretty much the same as you would get when compiling natively. This was a little surprising, I was expecting to see the size be bigger, and to then proceed to investigate what could be improved.

Now, the raw compiled JS is in fact very large. But that is mostly because the original variable names appear there, which is basically fixed by running Closure. After Closure, the main reason the code is large is because it's in string format, not an efficient binary format, so there are things like JavaScript keywords ('while', for example) that take a lot of space. That is basically fixed by running gzip since the same keywords repeat a lot. At that point, the size is comparable to a native binary.

Another comparison we can make is to LLVM bitcode. This isn't an apples-to-apples comparison of course, since LLVM bitcode is a compiler IR: It isn't designed as a way to actually store code in a compact way, instead it's a form that is useful for code analysis. But, it is another representation of the same code, so here are those numbers:

Bullet

.bc         3.9  MB
.bc.gz      2.2  MB

Android H264

.bc         365 KB
.bc.gz      258 KB

LLVM bitcode is fairly large, even with gzip: gzipped bitcode is over 4x larger than either gzipped native code or JS. I am not sure, but I believe the main reason why LLVM bitcode is so large here is because it is strongly and explicitly typed. Because of that, each instruction has explicit types for the expressions it operates on, and elements of different types must be explicitly converted. For example, in both native code and compiled JS, taking a pointer of one type and converting it to another is a simple assignment (which can even be eliminated depending on where it is later used), but in LLVM bitcode the pointer must be explicitly cast to the new type which takes an instruction.

So, JS and native code are similar in their lack of explicit types, and in their gzipped sizes. This is a little ironic since JS is a high level language and native code is the exact opposite. But both JS and native code are pretty space-efficient it turns out, while something that seems to be in between them - LLVM bitcode, which is higher than native code but lower than JS - ends up being much larger. But again, this actually makes sense since native code and JS are designed to simply execute, while LLVM bitcode is designed for analysis, so it really isn't in between those two.

(Note that this is in no way a criticism of LLVM bitcode! LLVM bitcode is an awesome compiler IR, which is why Emscripten and many other projects use it. It is not optimized for size, because that isn't what it is meant for, as mentioned above, it's a form that is useful for analysis, not compression. The reason I included those numbers here is that I think it's interesting seeing the size of another representation of the same compiled code.)

In summary, it looks like JavaScript is a good compilation target in terms of size, at least in these two projects. But as mentioned before, this is just a preliminary analysis (for example, it would be interesting to investigate specific compression techniques for each type of code, and not just generic gzip). If anyone has additional information about this topic, it would be much appreciated :)

Friday, October 7, 2011

JSConf.eu, Slides, SQLite on the Web

I got back from JSConf.eu a few days ago. I had never been to JSConf before, and it was very interesting! Lots of talks about important and cool stuff. The location, Berlin, was also very interesting (the mixture of new and old architecture in particular). Overall it was a very intensive two days, and the organizers deserve a ton of credit for running everything smoothly and successfully.

I was invited to give a talk about Emscripten, the LLVM to JavaScript compiler I've been working on as a side project over the last year. Here are my slides from the talk, links to demos are in them. There was also a fourth unplanned demo which isn't in the slides, here is a link to it.

If you've seen the previous Emscripten demos, then some of what I showed had new elements, like the Bullet/ammo.js demo which shows the new bindings generator which lets you use the compiled C++ objects directly from JS in a natural way. One demo was entirely new though, SQLite ported to JS. I haven't had time to do any rigorous testing of the port or to optimize it for speed. However it appears to work properly in all the basic tests I tried: creating tables, doing selects, joins, etc. With WebSQL not moving forward as a web standard, compiling SQLite to JS directly might be the best way to get SQL on the web. The demo is just a proof of concept, but I think it shows the approach is feasible.