Thursday, September 9, 2010

Tools For Debugging Wakeups

Removing unnecessary CPU wakeups in Fennec is very important, in order to reduce the amount of power used. The main tool used for that is powertop, which basically tells you how many wakeups each process has. But how can you tell what specifically in your app is causing those wakeups?

There doesn't seem to be a simple solution for that. One idea is to use the debugger, manually. Perhaps that can be automated using debugger scripts, but I'm not sure, and in any case it would be very specific to the debugger and OS.

So I've tried a different route - automatic rewriting of the source code. I've got it sort of working so far. The idea is this:
  • First you apply this patch. It modifies some critical files, in particular in the condition variable handling code and in the timer thread code. It adds a parameter to both, so that when you call them, you must identify (by a string) who you are.
  • Then, you run python timeout_rewriter.py (that script is included in the patch). This goes through the entire codebase, rewriting all the calls to condition variables (PR_WaitCondVar etc.) and timer creation (->initWithCallback etc.) to add that parameter. It modifies both C++ and JS code. Over 100 files are modified. How it does the rewriting is sort of hackish: It doesn't use anything fancy, just some heuristics derived from trial and error ;)
  • Then you build Firefox normally, and run it. All timeouts will then be printed with explanations, for example:
-------------------------------------------------

WaitCondVar: ./js/src/xpconnect/src/xpcjsruntime.cpp:820 (0.818000 seconds since previous; timeout: 1000)
WaitCondVar: ./js/src/xpconnect/src/xpcjsruntime.cpp:820 (1.000000 seconds since previous; timeout: 1000)
TimerThread: ./dom/base/nsJSEnvironment.cpp:3733
WaitCondVar: ./xpcom/threads/TimerThread.cpp:347 (0.000000 seconds since previous; timeout: 997)
WaitCondVar: ./js/src/jstask.cpp:95 (0.058000 seconds since previous; timeout: -1)
WaitCondVar: ./js/src/jstask.cpp:95 (0.045000 seconds since previous; timeout: -1)
WaitCondVar: ./js/src/xpconnect/src/xpcjsruntime.cpp:820 (0.825000 seconds since previous; timeout: 1000)
TimerThread: ./content/events/src/nsEventStateManager.cpp:793
WaitCondVar: ./xpcom/threads/TimerThread.cpp:347 (0.000000 seconds since previous; timeout: 329)
TimerThread: ./mobile/components/SessionStore.js:345
WaitCondVar: ./xpcom/threads/TimerThread.cpp:347 (0.329000 seconds since previous; timeout: 532)
-------------------------------------------------


Explanations:
  • The 1st line shows that the JS watchdog thread woke up - you can just look at the source line mentioned to see what it is. Additional info is that the previous CPU wakeup was ~0.8 seconds before it, and that this particular call to WaitCondVar was with a timeout of 1 second (so, it would like to be woken up again in 1 second).
  • The 2nd line shows the same place waking up, after 1 second.
  • The 3rd line shows the DOM GC timer being fired. The next line is the condition variable that the timer thread uses.
  • The 5th line is the JS background thread.
  • etc.
I might write some tools to parse the output (like profilers do), but so far it already seems useful. Some conclusions: When the browser is idle, we have two 1/5 second timers, the IdleService (should be fixed soon), and nsEventStateManager (need to look into that).

The reasons for rewriting the source code automatically are that, as mentioned, it's over 100 files - so a lot to do by hand. But also, we don't want those changes to go into our actual tree (even with ifdefs, it would be horrible), and maintaining such a big patch as the actual tree changes would be very problematic. So automatic source code rewriting seemed the best approach. Just remember to throw away all those changes, and not commit them by mistake ;)

2 comments:

  1. I was going to suggest you look a spatch, but then I remembered it only handles C, not C++... still it's looks awsome, but I haven't had a chance to learn it yet

    ReplyDelete