Thursday, November 14, 2013

C++ to JavaScript: Emscripten, Mandreel, and now Duetto

There are currently at least 3 compilers from C/C++ to JavaScript: Emscripten, Mandreel and the just-launched Duetto. Now that Duetto is out (congrats!), it's possible to do a comparison between the 3, which ends up being interesting as there are some big differences but also big similarities between them.

Disclaimer: I founded the Emscripten project and work on it, so unsurprisingly I have more knowledge about that one than the other two. It is likely that I got some stuff wrong here, please correct me if so! :)

A quick overview is in the following table:

EmscriptenMandreelDuetto




Licenseopen source (permissive)proprietarymixed (see below)




Based onLLVMLLVMLLVM




ArchitectureLLVM IR backend (external)LLVM tblgen backendLLVM IR backend




Memory modelsingleton typed arraysingleton typed arrayJS objects




C/C++ compatibilityfullfullpartial




LLVM IR optsfullfullpartial




Backend optslow-level JSLLVM?




Call modelJS nativeC stack in typed arrayJS native




Loop recreationEmscripten relooperCustom (relooper-inspired)Emscripten relooper




APIsstandard C (SDL, etc.)customHTML5-based




Othernon-JS targets tooclient-server

More detail on each of those factors:
  • License: Licenses range from Emscripten which is fully open source with a permissive license (MIT/LLVM), to Mandreel which is fully proprietary, to Duetto which is somewhere in the middle - the Duetto core compiler has a permissive open source license (LLVM), the Duetto libraries+headers are dual licensed GPL/proprietary, and Duetto premium features are proprietary.

    Duetto includes core system libraries that must be linked together with your code, like libc and libcxx. Those have been relicensed to the GPL (from original licenses that were permissive, like MIT for libcxx), and this implies that the Duetto output for a project will be a mix of that project's code plus GPL'd library code, so it looks like the entire thing must be GPL'd, or you must buy a proprietary license.
     
  • Based on: All of these compilers use LLVM. It has become in many ways the default choice in this space.
     
  • Architecture: While they all use LLVM, these compilers use it in quite different ways. Mandreel has an LLVM backend using the shared backend codegen infrastructure, using tblgen, the selection DAG, etc. - this is the way most LLVM backends are written, for example the x86 and ARM backends, so Mandreel's architecture is the closest to a "typical" LLVM-based compiler. On the other hand, Duetto implements a backend inside LLVM that does not use that shared backend code, and instead basically calls out into Duetto code that processes the llvm::Module itself into JavaScript, which means it processes LLVM IR (which is entirely separate from the backend selection DAG: LLVM IR is lowered into the selection DAG). Emscripten also processes LLVM IR, like Duetto, but does so outside of LLVM itself - it parses LLVM IR that is exported from LLVM.
     
  • Memory model: Mandreel and Emscripten share a similar memory model, using a singleton typed array with aliasing views to emulate something extremely similar to how LLVM IR (and C programs) view memory. That includes aliasing (you can load a 32-bit value and split it into two 16-bit values, or you can read two 16-bit values and get the same result), as well as the fact that pointers are all interchangeable (aside from casting rules, they are just numeric values, that refer to a place in a single continguous memory space) and unsafe accesses (if you have a pointer to an int, even if there is no valid data after it, you can still read x[1] or x[17]; hopefully the program made sure there is in fact valid data there!). The odd one out is Duetto which uses JS objects, so each C++ class you instantiate has its own object. This generates more "normal"-looking JS, at least at first glance - there are objects, there are properties accessed on objects - but it is complex to make it match up to the LLVM memory model (for example, pointers to objects in Duetto need to be able to represent both the JS object holding the data, but also a numeric offset so that pointer math works, etc.), which adds overhead (see more discussion in the Performance Analysis section below).

    Another result of this memory model is that Duetto allocates and frees JS objects all the time. This has both advantages and disadvantages: On the plus side it means that objects no longer referred to can be reclaimed by the JS engine's garbage collector (GC), so at any point in time the total memory used can be just what is currently alive - I say "can be" and not "will be" because that assumes that the JS engine has a moving GC (many do not), and even so this will only be true right after a GC, and not all the time. And on the minus side for Duetto it means that the overhead of object allocation and collection is present, whereas for Emscripten and Mandreel it is not - in those compilers, objects are just ranges in the singleton typed array, the JS GC is not even aware of them.

    Furthermore, each object will take more memory in the Duetto model: JS objects do not just store the raw data in them, but also have various overhead related to the VM (for example, most VMs have a pointer to "shape" info for the object, to optimize accesses to it, and other stuff), whereas in Emscripten and Mandreel literally the same amount of memory is used as C would use. On the other hand, Emscripten and Mandreel have a typed array that may not be fully used at all times, which can be wasteful. That limitation can be partially removed, for example Emscripten has an option to grow the heap only when required (through the POSIX sbrk command), and similarly it could be shrunk as well, which would leave only fragmentation as a concern - but fragmentation is definitely a real concern for long-running code. So which is better will really depend on the application and use case. For short-running code it seems highly likely that Emscripten and Mandreel will use less memory, whereas for long-running code it might go either way.

    An advantage of the Duetto model is security: if you read beyond the limit of one array, you will not read data from another. Basically, Duetto is compiling C/C++ into something more like Java or C#, where you do bounds checks on each array and so forth. This can prevent lots of bugs, obviously. However, it is foreign to how C/C++ normally work - this is an advantage not over Emscripten and Mandreel, but over all C++ compilers, it is an inherent difference compared to the normal way C++ compilers are implemented. It does come with some security benefits, however it also has downsides in terms of performance (those bounds checks just mentioned) and compatibility (see next section). Personally, I would argue that if you want that type of security, you would be better off writing code in C# or Java in the first place (which have compilers to JS, for example JSIL and GWT), but I do think that what Duetto is doing is interesting - it's almost like making a new language.
      
  • Compatibility: Emscripten and Mandreel should be able to run practically any C/C++ codebase, because as mentioned in the previous point, their memory model is essentially identical to LLVM's, assuming the code is reasonably portable. (As a rule of thumb, if C code is portable enough to run on both x86 and ARM, it will work in Emscripten and Mandreel.) Duetto however has a model which is different than LLVM's, which can require that codebases be ported to it. How much of a problem this is for Duetto will depend the codebases people try to port. There are very portable codebases like Bullet, but much real-world code does depend on memory aliasing and so forth, for example Cube 2. Emscripten experimented with various memory models in the past, and we encountered significant challenges with all of them except the C-like model that Emscripten and Mandreel currently use, which is most compatible with normal C/C++. As a quick test out of curiosity, I tried to build just the script module in Cube 2 using Duetto. I received multiple errors (for example invalid field types in unions, etc.) that due to the complexity of Cube 2 did not look trivial to fix.

    Note that for a new codebase this wouldn't matter - people writing new Duetto apps will just limit themselves to what Duetto can do.
     
  • LLVM IR optimizations: The previous point spoke about compatibility with C++ code. A related matter is compatibility with LLVM IR: While all these compilers rely on a frontend to transform source code into LLVM IR, so that only the subset of LLVM IR that the frontend generates is relevant, the LLVM optimizer can perform sophisticated transformations from LLVM IR to other LLVM IR. Therefore to be able to utilize the LLVM IR optimizer to its full extent, it is best to support as wide a range of LLVM IR as possible, otherwise you need to tell the optimizer to limit what it does.

    Compatibility with LLVM IR depends significantly on the memory model, just like compatibility with C++: the optimizer assumes memory aliases just like C assumes it does, and so forth. So Emscripten and Mandreel, which have a memory model that is practically identical to LLVM's, can benefit from the full range of LLVM optimizations to be run, including the ones that assume aliasing and other non-safe aspects of the C memory model. Duetto on the other hand has a different memory model, one not compatible with all C code, and similarly Duetto is not compatible with all LLVM IR. That is, Duetto runs into problems either when source code would need something in LLVM IR that it cannot handle, or when the LLVM optimizer generates IR it can't handle. For source code, you can port it so it is in the proper subset; for optimizations, you need to either disable or limit them.

    It's possible that Duetto does not lose much here - most LLVM IR optimizations do not generate odd things like 512-bit integers nor do they rely heavily on C's memory model - so limiting LLVM's optimizer to what Duetto can handle might work out fine. However, a concern here is that Duetto needs to make sure that all existing and future IR optimizations do not depend on things it can't handle. Only if you have the exact same assumptions as LLVM IR has about memory and so forth, do you have a guarantee that all optimizations are valid for you.
     
  • Backend optimizations: Only Mandreel benefits from LLVM's backend optimizations (optimizations done after the LLVM IR level), because only it is implemented as an LLVM backend using the shared backend infrastructure, as discussed earlier. That gives Mandreel access to things like register allocation and various other optimizations. These can be quite significant in native (x86, ARM, etc.) builds, but it is less clear how crucial they are on the web, since JS engines will do their own backend-type optimizations (register allocation, licm, etc.) anyhow. In place of those, both Emscripten and Duetto perform their own "backend" optimizations. Emscripten's focus on low-level code (which makes sense given it's memory model, as mentioned before), and includes JS-specific passes to remove unneeded variables, shrink overly-large functions, etc. (see my talk from last week at the LLVM developer's conference, also this). I am not familiar with the details of Duetto's backend optimizations, but from their blogposts it sounds like they focus on optimizing things for their memory model (avoiding unnecessary object creation, etc.). If any Duetto people are reading this, I would love to learn more.
     
  • Call model: Both Emscripten and Duetto use native-looking function calls:   func(param1, param2)   etc. This uses the JS stack and tends to be well-optimized in JS engines. Mandreel on the other hand uses the C stack, that is, it writes to global locations before the call and reads them back in the call. Mandreel's approach is similar to how CPUs work, but I believe no JS engine is able to optimize parameters in that call model into registers, so it will mean memory accesses which are slower. Inlining (by LLVM or by the JS engine) will reduce that overhead for Mandreel, but it will still be noticeable in some workloads, and also is bad for code size.
     
  • Loop recreation: LLVM IR contains basic blocks and branch instructions, which are easy to model with gotos, but JS lacks goto statements. So all these compilers must either use large switch statements in loops (the closest to a goto as we can get), or recreate loops and ifs. They all do the latter. Emscripten has what it calls its Relooper for that purpose, and I see that Emscripten's code is reused in Duetto which is nice. Mandreel also recreates loops, in a way inspired by the Relooper algorithm but not using the same code.
     
  • APIs: Emscripten implements typical C APIs like SDL, glut, etc., so often entire apps can just recompile and run. Mandreel on the other hand has its own API which apps need to be written against. It is custom and designed to be portable not just across native and JS builds, but also various other platforms that Mandreel can target, like Flash. Finally Duetto supports accessing HTML APIs from C++, so they are standard APIs in the sense of the web, but not ones that existing code might already be using. For new apps, Duetto's approach is interesting, but for existing ones I think Emscripten's is preferable. Of course there is no reason to not do both in a single compiler. (In fact it could be possible for Emscripten and Duetto to collaborate on that, but looks like Duetto's WebGL headers are GPL licensed which would be a problem for Emscripten, which only includes permissively licensed open source code like MIT.)

    Another factor to consider regarding APIs is native builds. It is often very useful to compile the same app to both native and HTML5 builds, for performance comparisons, benefiting from native debugging and profiling tools, etc. If I understand Duetto's approach properly, it can't compile apps natively because Duetto apps use things like WebGL which are not present as native libraries. Emscripten and Mandreel on the other hand do allow such builds.
     
  • Other: A few other points of interest are that while Emscripten is purely an LLVM to JS compiler, Mandreel and Duetto have other aspects as well. As already mentioned, Mandreel has a custom API that apps can be developed against, and it is portable across a very large set of target platforms, not just JS. And Duetto is meant not just for client JS but also to be able to run C++ on both sides of a client-server app, sort of like how node.js lets you run the same code on both sides as well, but using C++.

Interim Summary

As mentioned in the introduction, there are big differences but also big similarities here. Often two out of the three compilers are very similar or identical on a particular aspect, but which two changes from aspect to aspect. Overall I think it's interesting how big the differences are, probably larger than the differences between modern JS engines or modern C compilers! I suspect that is because compiling C++ to JS is a newer problem and the industry is still figuring out the best way to do it.

Performance Analysis

What you don't see in this blogpost are benchmarks. That's intentional, because it's very hard to fairly benchmark code across compilers as different as this. It's practically a given that benchmarks can be found that show each compiler to be best, so I won't bother to do that. Instead, this section contains an analysis of how the different compilers' architectures will affect performance, to the best of my understanding.

Overall the biggest factor responsible for performance is likely the memory model. Over the last 3 years Emscripten tried various approaches including JS objects, arrays, typed arrays in aliased and non-aliased modes, arrays with "compressed indexes" (aka QUANTUM_SIZE=1), and finally ended up on aliased typed arrays - the model described above, and shared with Mandreel - because they were greatly superior in performance. They are better both because they model memory essentially the same way LLVM does, so the full set of LLVM optimizations can be run, and also because being low-level they are very easy for JS engines to optimize - reads and writes in LLVM IR become reads and writes of typed arrays in JS, which can become simple reads and writes from memory in modern JS engines, and so forth. When emitting the asm.js subset of JS, this is even more precise (as it avoids cases where reads can be out of bounds and return undefined, for example, which would have meant it isn't a simple memory read any more), and for those reasons this approach of compiling LLVM to JS can get pretty close to native speed.

I would be surprised if Duetto gets to that level of performance. It uses property accesses on JS objects, which is far more complex than typed array reads and writes - instead of a simple read or write from memory, JS object property accesses are optimized using hidden class optimizations, PICs, and so forth. These can be fast in many cases, but as heuristic-based optimizations they are unpredictable. They also require overhead to be performed, both in terms of time and memory. Furthermore, as we are not just reading and writing numbers (as in the Emscripten/Mandreel model) but also references to JS objects, things like write barriers may be executed in the VM (or if not, then more time would be spent GCing). And again, all of this is competing with reads and writes from typed arrays, which are the among the simplest thing to optimize in JS since their type is known and they are just flat. It is true that in the best case a JS VM can turn a property access on a JS object into a direct read from memory, but even a statically typed VM like the JVM generally doesn't manage to run at C-like speeds, and here we are talking about dynamically-typed VMs which have a much more complex optimization problem before them.

Perhaps the crux of the matter is that Duetto basically transforms a C++ codebase so it runs in a VM using a relatively safe memory model and GC. There have been various such attempts - Managed C++, Emscripten's and other JS compilers' attempts in this space as mentioned before, etc. - and overall we know this is possible, if you limit yourself to a subset of C++. We also know it has benefits, such as somewhat higher security as mentioned earlier on. But that increased security stems from the compiled C++ being more like compiled Java and C#, because just like them it runs in a VM, is JITed, uses a GC, benefits from PICs, does bounds checks for safety, etc. In other words, I believe that the Duetto model will in the best case approach the performance of Java and C#, while the Emscripten/Mandreel model will in the best case approach the performance of C and C++. Furthermore, their ability to reach the best-case performance is not identical: Approaching C and C++ speed is far simpler because the model is simpler, and we have in fact already mentioned that the Emscripten/Mandreel approach can indeed approach that limit in many cases (see link above). The Duetto model may be able to reach the speed of Java and C#, but that is both more complex and yet unproven.

Perhaps I am missing something, and there is a reason Duetto-compiled C++ running on a JavaScript VM can outperform the JVM and .NET? That seems unlikely to me, because if it were possible to do so, then surely the JVM and .NET would have done whatever allows such speedups already - assuming C++ in Duetto's VM-like model is generally analogous to the JVM/.NET model, which is my best guess for the reasons mentioned before. Certainly it is possible that I overlooked something here and there is in fact a fundamental difference, please correct me if so.

I do believe that Duetto can outperform typical handwritten JS. It has two advantages over that: First, it can run the LLVM optimizer on the code, and second, it generates JS that is implicitly statically typed. For example, if the code contains a.x then x should always be of the same type, since it originated from C++ code that had that property (C unions are a complication for this but that is solvable as well, with some overhead). In hand-written JS, on the other hand, it is easy to do things that break that assumption. Again, the model Duetto uses is closer to C# or Java where there are objects and a garbage collector and so forth, like JS, but at least types are static, unlike JS. And modern JS engines are quite good at looking for hidden static types, so Duetto should benefit from those optimizations.

One last note on performance: In small benchmarks the difference between C/C++, Java/C#,  and JS can be quite small. JS VMs as well as the JVM and CLR have JITs that can often optimize small loops and such into extremely efficient code, competitive with C/C++ (and LuaJIT can even outperform it in some cases). Things change, however, with large codebases. The larger the codebase, the harder dynamic, heuristic-based optimizations are to perfect; this is true for Java and C# and far more true for JS. For example, a small inner loop containing a temporary object allocation in JS might be analyzed to not escape, and turn into just some constant space on the stack. However, in a large codebase with many function calls, including functions too large and/or too complex to inline, etc., such analyses are often less productive, whereas in C/C++, they will still be on the stack in this example. As another example, JS engines need to analyze types dynamically, and in small functions often do so very well, but the larger the relevant code, the higher the risk for any imprecision in the analysis to "leak" everywhere else (for example, one careless function returning a double instead of an int will send a double into all the callers), whereas in C/C++ there is no such risk. (asm.js tries to get as close as possible to C/C++ by ensuring by a type system that there are type coercions in all necessary places, which would avoid the problem just described - that double would be immediately turned into an int in all callers.)

For all these reasons, small benchmarks often do not show differences, but large ones can. So how fast a compiler is depends a lot on what kind of code is compiled with it. When Duetto is used on a small benchmark, I would not be surprised if it performs very well. My concerns are mainly about large codebases, which I don't think I've seen Duetto report numbers on yet. (By "large" I mean something like Cube 2 or Epic Citadel, hundreds of thousands of lines of code, consisting of many different components, etc.)

General thoughts on Duetto

If Duetto can in fact get close to the speed of Java and C#, that's good enough for many applications, even if it is slower than C and C++ (and Emscripten and Mandreel). And in some cases Java and C# are fairly comparable to C and C++ anyhow. Finally, of course performance is not the only thing that matters when developing applications.

Putting aside performance, Duetto's approach has some advantages. One nice thing about Duetto is that it uses Web APIs in C++, so they might avoid a little overhead that Emscripten and Mandreel have. For example, Emscripten and Mandreel translate OpenGL calls into WebGL calls, while Duetto will have what amounts to translating WebGL into WebGL. That's definitely simpler, and it might be faster in some cases.

Another interesting aspect of Duetto is that since C++ objects become JS objects, this might make it easier to interoperate with normal JS outside of the compiled codebase. It seems like this could work in some cases, however if the codebase is minified then I don't see how this is possible, without either duplicating properties or using wrapper code, which basically gets to the same position Emscripten is in this regard.

Finally, the client-server thing that Duetto does looks quite unique, and it will be interesting to see if people flock to the idea of doing C++ for webservers, and to integrate that code with compiled C++ on the client in JS. That's an original idea AFAIK, and it's cool.

In summary, congratulations to Duetto on launching! Nice to see new and interesting ideas being tried out in the C++/JS space.





Tuesday, November 12, 2013

Two Recent Emscripten Demos

Whenever I do a presentation I try to have at least one new demo, here are two that I made recently:
  • 3 screens demo, for the Mozilla 2013 Summit. The demo is of 3 screens inside the BananaBread 3D first person shooter, a port of Sauerbraten. One screen shows a list of recent tweets, another shows a video, and the third shows a playable Doom-based game (see details here). So you can play Doom while you play Sauerbraten ;) Controls are a little tricky, but see the demo page itself for how to play each game.
     
  • Clang in the browser, for the LLVM Developer's Conference. The name says it all, this is basically the entire clang C++ compiler running in JS. The port isn't very polished (I didn't try to optimize it, you need to reload the page to recompile, etc.), I basically just got it running.