azakai's blog: power

Showing posts with label power. Show all posts

Thursday, September 9, 2010

Tools For Debugging Wakeups

Removing unnecessary CPU wakeups in Fennec is very important, in order to reduce the amount of power used. The main tool used for that is powertop, which basically tells you how many wakeups each process has. But how can you tell what specifically in your app is causing those wakeups?

There doesn't seem to be a simple solution for that. One idea is to use the debugger, manually. Perhaps that can be automated using debugger scripts, but I'm not sure, and in any case it would be very specific to the debugger and OS.

So I've tried a different route - automatic rewriting of the source code. I've got it sort of working so far. The idea is this:

First you apply this patch. It modifies some critical files, in particular in the condition variable handling code and in the timer thread code. It adds a parameter to both, so that when you call them, you must identify (by a string) who you are.
Then, you run python timeout_rewriter.py (that script is included in the patch). This goes through the entire codebase, rewriting all the calls to condition variables (PR_WaitCondVar etc.) and timer creation (->initWithCallback etc.) to add that parameter. It modifies both C++ and JS code. Over 100 files are modified. How it does the rewriting is sort of hackish: It doesn't use anything fancy, just some heuristics derived from trial and error ;)
Then you build Firefox normally, and run it. All timeouts will then be printed with explanations, for example:

-------------------------------------------------

WaitCondVar: ./js/src/xpconnect/src/xpcjsruntime.cpp:820 (0.818000 seconds since previous; timeout: 1000)
WaitCondVar: ./js/src/xpconnect/src/xpcjsruntime.cpp:820 (1.000000 seconds since previous; timeout: 1000)
TimerThread: ./dom/base/nsJSEnvironment.cpp:3733
WaitCondVar: ./xpcom/threads/TimerThread.cpp:347 (0.000000 seconds since previous; timeout: 997)
WaitCondVar: ./js/src/jstask.cpp:95 (0.058000 seconds since previous; timeout: -1)
WaitCondVar: ./js/src/jstask.cpp:95 (0.045000 seconds since previous; timeout: -1)
WaitCondVar: ./js/src/xpconnect/src/xpcjsruntime.cpp:820 (0.825000 seconds since previous; timeout: 1000)
TimerThread: ./content/events/src/nsEventStateManager.cpp:793
WaitCondVar: ./xpcom/threads/TimerThread.cpp:347 (0.000000 seconds since previous; timeout: 329)
TimerThread: ./mobile/components/SessionStore.js:345
WaitCondVar: ./xpcom/threads/TimerThread.cpp:347 (0.329000 seconds since previous; timeout: 532)
-------------------------------------------------

Explanations:

The 1st line shows that the JS watchdog thread woke up - you can just look at the source line mentioned to see what it is. Additional info is that the previous CPU wakeup was ~0.8 seconds before it, and that this particular call to WaitCondVar was with a timeout of 1 second (so, it would like to be woken up again in 1 second).
The 2nd line shows the same place waking up, after 1 second.
The 3rd line shows the DOM GC timer being fired. The next line is the condition variable that the timer thread uses.
The 5th line is the JS background thread.
etc.

I might write some tools to parse the output (like profilers do), but so far it already seems useful. Some conclusions: When the browser is idle, we have two 1/5 second timers, the IdleService (should be fixed soon), and nsEventStateManager (need to look into that).

The reasons for rewriting the source code automatically are that, as mentioned, it's over 100 files - so a lot to do by hand. But also, we don't want those changes to go into our actual tree (even with ifdefs, it would be horrible), and maintaining such a big patch as the actual tree changes would be very problematic. So automatic source code rewriting seemed the best approach. Just remember to throw away all those changes, and not commit them by mistake ;)

Thursday, June 17, 2010

Power Measuring & Saving

With help from some friendly people in the Maemo IRC (thanks SpeedEvil and DocScrutinizer51!), I was able to get realtime power usage measurements from Fennec on the N900. The goal was to see how much clumping together CPU wakeups would help save power, the idea being that even if the same amount of CPU calculation is done, doing it together in 'clumps' is better than spread out, as each wakeup means the CPU will be in a non-sleep mode for longer. Here are the figures:

(click for fullsize version)

The settings are:

Baseline: Fennec is not run at all. This is for comparison, to see how much power is drawn by the N900 when idle
Fennec 200 clumped wakeups: 200 wakeups/second are scheduled, and all of them are scheduled at the same time. In other words, 200 tiny operations are done, then the CPU can sleep for almost 1 second.
Fennec 200 separate wakeups: As before 200 wakeups/second are scheduled, but at equal intervals over the 1-second period. So the longest the CPU has to sleep is about 1/200th of a second.
All settings were tested both with the screen off (left chart) or screen on (right chart).

Why were 200 wakeups used here? No special reason. As an absolute # of wakeups it's a large number - few websites would have that many animated images/setIntervals()/etc. - but these wakeups were very very short (just incrementing a counter). So this could be similar to having fewer but longer wakeups. But in any case, the results should be treated as just a general indication. The goal is to see whether reducing timeouts is worth doing or not, and not to calculate exactly how much it will improve battery life (that would depend on the specific device, websites visited, usage style, how much clumping we do, etc. etc.). Note also that clumping has a price - it adds latency, as timers don't fire at their normal times. When it is worthwhile, and how much, is a hard question.

For now, though, the results show that clumping wakeups can be noticeable: 38% savings when the screen is off, and 8% when on. As expected much less is saved when the screen is on, because the screen itself drains a lot of power (so our savings become less significant). Which of the 38% or 8% figures is more relevant is a good question - it depends on how often the user takes a 'break' from browsing (so the screen goes off) but leaves the browser on. But anyhow, again, these figures can't be taken literally.

FWIW, to put this in battery life terms, if you save 38% of your power drain then your battery life would be 60% longer, and if you save 8% of your power drain then your battery life would be 8% longer. Again, this is just a general indication, but I think it does show that implementing timeout clumping (properly) in Fennec can be useful.

Other Recent Work

I've also submitted patches for some other bugs related to saving power:

567339 / 359608 - A longstanding issue where image animation timers continue to fire even after browsing to another page. The patch uses the refresh driver to figure out when to suspend timers (but it doesn't use the refresh driver for the actual animations, which might cause more wakeups as it uses a fixed 50Hz timer).
568730 - Stop the JavaScript engine's GC (& other stuff) timer when no JS is running, by suspending it if enough time has passed since the last JS call context was exited.

EDIT: Improved charts

Friday, May 28, 2010

Powersaving Bughunting

I spent most of yesterday working on Bug 567339, which is about removing unnecessary CPU wakeups in order to save power (which is very important on mobile devices - the fewer time the CPU is woken up from sleep, the longer your battery will last, since each wakeup disturbs sleep mode for a short while after the actual activity).

So, long story short, it seem the bug is that timers for image animations are not canceled. In other words, if you view an image with animations, then a timer is created for each image, and it does the rendering at the proper rate, but the timer keeps running after the image is no longer needed. So you keep getting CPU wakeups at the appropriate rate for the animations, wasting power.

(This is actually an issue on normal Firefox as well, not just mobile - some CPU cycles are being spent on these timers. It is probably negligible, though, but still wasteful.)

The longer story of hunting the cause of the bug is as follows:

A helpful person reports the bug, and mentions a suspicion of https being the cause (since enabling/disabling https affects the symptoms). At issue here are 4 wakeups/second which should not be happening.
Trying to reproduce the bug, it turns out that it happens in nightly builds, but not when building from source in trunk. Hmm.
Trying to figure out the difference, one noticeable change between the nightly builds and building from source is that the start page shows some suggestions of addons in the nightly, but building from source does not. Turns out this is because /trunk reports itself as 2.0, and no addons exist for that yet. Changing the reported version to 1.1 leads to addons being suggested, and the bug in fact occurs.
Since fetching addon suggestions is done from AMO through https, and the original bug report strongly suspected https as the culprit, it would seem we are almost done here. Or are we?
Actually inspecting all the wakeups (this was where most of the time I spend on this bug was; I put up a wiki page with some details of the methodology I used) eventually reveals that the vast majority of wakeups are in fact of imgContainer::notify()... and not https or anything network related.
To confirm this, removing the animated image that was shown when loading addon suggestions removes the wakeups... and indeed, the image had 4 frames/second, exactly the number of wakeups we were hunting!

Still working on figuring out why this happens and how it can be resolved.