Tuesday, May 29, 2012

Reloop All The Blocks

One of the main results from the development of Emscripten was the Relooper algorithm. The Relooper takes basic blocks of code - chunks of simple code, at the end of which are branches to other blocks of code - and generates high-level structure from that using loops and ifs. This is important because LLVM gives you basic blocks, and JavaScript requires loops and ifs to be fast, so when compiling C++ to JavaScript you need to bridge the two. So if you have any sort of compiler from a representation with basic blocks into JavaScript - or for that manner any high-level language that does not have gotos, but does have labelled loops - then the Relooper might be useful for you. The Relooper is known to be used in the two main C++ to JS compilers, Emscripten and Mandreel.

Emscripten's Relooper implementation is in JavaScript, which was very useful for experimenting with different approaches and developing the algorithm. However, there are two downsides to that implementation: First, that it was built for experimentation, not speed, and second, that being in JavaScript it is not easily reusable by non-JavaScript projects. So I have been working on a C++ Relooper, which is intended to implement a more optimized version of the Relooper algorithm, in a fast way, and to make embedding in other projects as easy as possible.

That implementation is not fully optimized yet, but it has gotten to the point where it is usable by other projects. It got to that point after last week I wrote a fuzzer for it, which generates random basic blocks, then implements that in JavaScript in the trivial switch-in-a-loop manner, and then uses the Relooper to compile it into fast JavaScript. The fuzzer then runs both programs and checks for identical output. This found a few bugs, and after fixing them the fuzzer can be run for a very very long time without finding anything, so hopefully there are no remaining bugs or at least very very few.

The C++ Relooper code linked to before comes with some testcases, which are good examples for how to use it. As you can see there, using the Relooper is very simple: There are both C++ and C APIs, and what you do in them is basically
  • Define the output buffer
  • Create the basic blocks, specifying the text they contain and which other blocks they branch to
  • Create a relooper instance and add blocks to it
  • Tell the relooper to perform its calculation on those blocks, and finally to render to the output buffer
There is also a debugging mode, in which a lot of debug info will be printed out, including (from the C API) a C program using the C API, which is useful for generating testcases.

3 comments:

  1. Is there an example of taking a 'soup' of labeled LLVM Blocks and using the Relooper algorithm?

    It is unclear to me the relationship between the Relooper Block and the LLVM BasicBlocks and LLVM BranchInst.

    Thanks,
    Richard Catlin

    ReplyDelete
  2. The emscripten paper has an example. Otherwise you can run emscripten on data and follow code through the stages of LLVM and so forth.

    ReplyDelete