New EliminatorThe eliminator optimization phase was originally written by Max Shawabkeh. It basically removes unneeded variables, so
var x = f(a);
var y = x + g(b);
can be optimized into
var y = f(a) + g(b);
This can greatly reduce the size of the code as well as improve performance, and was fundamental for our approach of relying on a combination of the eliminator + the closure compiler to go from LLVM's SSA representation into a register-like format: The eliminator removes large amounts of unneeded variables, and the closure compiler then reduces the number of variables further by reusing the ones that remain.
The eliminator could be slow on large functions, however, because it calculated the transitive closure of dependencies between all the variables, an expensive calculation. It also missed out on some opportunities to optimize because of some simplifying assumptions it made in its design. A final downside was it integrated poorly with the rest of our optimizations (in part due to being written in a different language, CoffeeScript).
I rewrote the eliminator entirely from scratch, in order to do a more precise analysis of which variables can be removed. I also simplified the problem slightly by only eliminating variables that have a single use - this makes it far faster, and I don't see any downside in the quality of the generated code (in fact it avoids some possible bad cases, although it took a long time to figure out what was going on in them). The new version is faster in general and far faster on the rare bad cases (100x even), and generates better-performing code to boot.
Overall, pretty much all the speed-intensive parts of code generation and optimization in Emscripten are now parallel, and will utilize all your CPU cores automatically. That means that if you experience slow compilation speeds, you can just throw more hardware at it. I hear they are selling 16-core CPUs now ;)
New RelooperThe relooper is an optimization performed (unlike most optimizations) during initial code generation. It takes basic blocks of code and branching information between them and generates high-level JS control flow structures like loops and ifs, which makes the code run far faster. The original relooper algorithm was developed together with the implementation I wrote in the compiler. Eventually some aspects of how it works were found to be suboptimal, so specific optimizations were added to the JS optimizer ('hoistMultiples', 'loopOptimizer'), overall giving us pretty good generated code.
Meanwhile I wrote a new version of the relooper in C++. There were 2 reasons for that choice of language: First, because other projects needed something like it, and C++ was a better language for them, and second, because we had plans to evaluate writing an LLVM backend for emscripten that would also need to reloop in C++ (note: we decided against the LLVM backend in the end). The new version avoids the limitations of the first, and generates better code. In particular it has no need for additional optimizations done after the fact. It also implements some additional tweaks that are missing in the first one, like node splitting in some cases and more precise removal of loop labels when they are not needed, etc. It's also a much cleaner codebase.
I brought that new version of the relooper into Emscripten by compiling it to JS and using it in the JS compiler. This makes compilation faster both because the new relooper is faster than the previous one (not surprising as often compiled code is faster than handwritten code), and because the additional later optimizations are no longer needed, for overall about a 20% speedup on compiling BananaBread. It also generates better code, for example it can avoid a lot of unneeded nesting that the previous relooper had (which caused problems for projects like jsmess).
Note that this update makes Emscripten a 'self-hosting compiler' in a sense: one of the major optimization passes must be compiled to JS from C++, using Emscripten itself. Since this is an optimization pass, there is no chicken-and-egg problem: We bootstrap the relooper by first compiling it without optimizations, which works because we don't need to reloop there. We then use that unoptimized build of the relooper (which reloops properly, but slowly since it itself is unoptimized) in Emscripten to compile the relooper once more, generating the final fully-optimized version of the relooper, or "relooped relooper" if you will.