96.42% (22,771,482B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->39.75% (9,388,032B) 0x8058AC1: arena_malloc_large (jemalloc.c:3831)
| ->39.75% (9,388,032B) 0x8058DB9: arena_malloc (jemalloc.c:3856)
| ->34.74% (8,204,288B) 0x8058E98: imalloc (jemalloc.c:3866)
| | ->34.27% (8,093,696B) 0x805D833: malloc (jemalloc.c:5882)
| | | ->12.23% (2,887,680B) 0x6BBB63F: sqlite3MemMalloc (sqlite3.c:14221)
  ...->39.75% (9,388,032B) 0x8058AC1: arena_malloc_large (jemalloc.c:3831)
| ->39.75% (9,388,032B) 0x8058DB9: arena_malloc (jemalloc.c:3856)
| ->34.74% (8,204,288B) 0x8058E98: imalloc (jemalloc.c:3866)
| | ->34.27% (8,093,696B) 0x805D833: malloc (jemalloc.c:5882)
| | | ->12.23% (2,887,680B) 0x6BBB63F: sqlite3MemMalloc (sqlite3.c:14221)
This shows a snapshot at a particular point in time, during which the whole program allocated ~22MB of memory, of which almost 3MB was due to SQLite.
One limitation though is you can't easily see what changes from one snapshot to another, which is important in order to see if there are any gradual memory increases (these may or may not be actual leaks). So I wrote a small python script, massdiff, which diffs two Massif snapshots. Here is an example of the output:
Diffing snapshots 30 50 
- (heap allocation functions) malloc/new/new[], --alloc-fns, etc. - 22,286,738
[ diff: +502,088 ]
+ (heap allocation functions) malloc/new/new[], --alloc-fns, etc. - 22,788,826
- 0x8058927: arena_malloc_small (jemalloc.c:3794) - 7,494,594
[ diff: +499,552 ]
+ 0x8058927: arena_malloc_small (jemalloc.c:3794) - 7,994,146
- 0x8058D9D: arena_malloc (jemalloc.c:3854) - 7,494,594
[ diff: +499,552 ]
+ 0x8058D9D: arena_malloc (jemalloc.c:3854) - 7,994,146
- 0x8058E98: imalloc (jemalloc.c:3866) - 6,055,702
[ diff: +499,232 ]
+ 0x8058E98: imalloc (jemalloc.c:3866) - 6,554,934
- 0x805D833: malloc (jemalloc.c:5882) - 5,552,542
[ diff: +499,232 ]
+ 0x805D833: malloc (jemalloc.c:5882) - 6,051,774
- 0x52D2FA3: js_malloc (jsutil.h:213) - 1,635,252
[ diff: +432,064 ]
+ 0x52D2FA3: js_malloc (jsutil.h:213) - 2,067,316
- 0x52D9D27: JSRuntime::malloc(unsigned int, JSContext*) (jscntxt.h:1358) - 1,635,252
[ diff: +432,064 ]
+ 0x52D9D27: JSRuntime::malloc(unsigned int, JSContext*) (jscntxt.h:1358) - 2,067,316
- 0x52D9DEC: JSContext::malloc(unsigned int) (jscntxt.h:2027) - 1,600,532
[ diff: +431,904 ]
+ 0x52D9DEC: JSContext::malloc(unsigned int) (jscntxt.h:2027) - 2,032,436
- 0x5D5B41D: JSObject::allocSlots(JSContext*, unsigned int) (jsobj.cpp:4032) - 161,456
[ diff: +344,832 ]
+ 0x5D5B41D: JSObject::allocSlots(JSContext*, unsigned int) (jsobj.cpp:4032) - 506,288
- 0x5D5B594: JSObject::growSlots(JSContext*, unsigned int) (jsobj.cpp:4078) - 161,456
[ diff: +344,832 ]
+ 0x5D5B594: JSObject::growSlots(JSContext*, unsigned int) (jsobj.cpp:4078) - 506,288
- (heap allocation functions) malloc/new/new[], --alloc-fns, etc. - 22,286,738
[ diff: +502,088 ]
+ (heap allocation functions) malloc/new/new[], --alloc-fns, etc. - 22,788,826
- 0x8058927: arena_malloc_small (jemalloc.c:3794) - 7,494,594
[ diff: +499,552 ]
+ 0x8058927: arena_malloc_small (jemalloc.c:3794) - 7,994,146
- 0x8058D9D: arena_malloc (jemalloc.c:3854) - 7,494,594
[ diff: +499,552 ]
+ 0x8058D9D: arena_malloc (jemalloc.c:3854) - 7,994,146
- 0x8058E98: imalloc (jemalloc.c:3866) - 6,055,702
[ diff: +499,232 ]
+ 0x8058E98: imalloc (jemalloc.c:3866) - 6,554,934
- 0x805D833: malloc (jemalloc.c:5882) - 5,552,542
[ diff: +499,232 ]
+ 0x805D833: malloc (jemalloc.c:5882) - 6,051,774
- 0x52D2FA3: js_malloc (jsutil.h:213) - 1,635,252
[ diff: +432,064 ]
+ 0x52D2FA3: js_malloc (jsutil.h:213) - 2,067,316
- 0x52D9D27: JSRuntime::malloc(unsigned int, JSContext*) (jscntxt.h:1358) - 1,635,252
[ diff: +432,064 ]
+ 0x52D9D27: JSRuntime::malloc(unsigned int, JSContext*) (jscntxt.h:1358) - 2,067,316
- 0x52D9DEC: JSContext::malloc(unsigned int) (jscntxt.h:2027) - 1,600,532
[ diff: +431,904 ]
+ 0x52D9DEC: JSContext::malloc(unsigned int) (jscntxt.h:2027) - 2,032,436
- 0x5D5B41D: JSObject::allocSlots(JSContext*, unsigned int) (jsobj.cpp:4032) - 161,456
[ diff: +344,832 ]
+ 0x5D5B41D: JSObject::allocSlots(JSContext*, unsigned int) (jsobj.cpp:4032) - 506,288
- 0x5D5B594: JSObject::growSlots(JSContext*, unsigned int) (jsobj.cpp:4078) - 161,456
[ diff: +344,832 ]
+ 0x5D5B594: JSObject::growSlots(JSContext*, unsigned int) (jsobj.cpp:4078) - 506,288
The diff only shows what changed between the two revisions, and shows that in in tree format, just like Massif's ms_print does. Each group of three lines shows a line from the first snapshot on top, the corresponding line from the later snapshot on bottom, and between them the difference.
The output here is from loading about:blank 500 times in Fennec. There is overall ~500K of additional allocation (so, ~1K per page load), of which JSObject::growSlots is responsible for ~340K (this is later all deallocated at once, presumably due to GC being run).
So far this has been useful in helping discover one specific case, bug 641663, and I'm still investigating some additional issues. Hopefully it can be a useful tool for other people too. To use it,
The script doesn't work when you take 100 or more snapshot as it matches one or more spaces at the beginning of the snapshot title - instead it should match zero or more spaces.
ReplyDeletei.e.
# Snapshot title
- m = re.match(' +([..regex elided...]')
+ m = re.match(' *([..regex elided...]')
if m:
snapshot = Snapshot()
for i in ['n', 'time', 'total', 'useful_heap', 'extra_heap', 'stacks']:
Your script is interesting, but doesn't work when valgrind collects more than 100 snapshots. The problem is on line 42 - when you attempt to match the snapshot title it should have zero or more spaces at the beginning, not one or more.
ReplyDeleteThanks Oliver!
ReplyDeleteVery useful script.
ReplyDeleteMy use case is slightly different: my program loops 100 times over a memory consuming api and I want the details. For that, I use the macro VALGRIND_MONITOR_COMMAND("detailed_snapshot") at the beginning of the api, and it generates 100 snapshot files.
In the same idea, I want to compare 2 snapshot files. Inspired from your script, here is mine. Thanks for the idea.
'''
ReplyDeleteParse 2 massif snapshot files
============================
Usage: this_script.py SNAPSHOT_1 SNAPSHOT 2
where
SNAPSHOT_1 and 2 are massif snapshots (not a ms_print dump)
typically generated by the macro in the code VALGRIND_MONITOR_COMMAND("detailed_snapshot")
Here is a sample of such a file:
>>>>>>>>>>>>>>
desc: --time-unit=ms --threshold=0.0
cmd: /my_prog.sh
time_unit: ms
#-----------
snapshot=0
#-----------
time=4078176
mem_heap_B=765203759
mem_heap_extra_B=58898945
mem_stacks_B=0
heap_tree=detailed
n29988: 765203759 (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
n187: 230733840 0x1A303B27: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.390.77)
n3: 121525532 0x1B7CCA75: ??? (in /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.390.77)
n2: 123451053 0x1B7B7B6B: ??? (in /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.390.77)
n1: 123451050 0x1B89A702: ??? (in /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.390.77)
n1: 123451050 0x1B89C079: ??? (in /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.390.77)
n2: 123451050 0x1B4FFE67: ??? (in /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.390.77)
n2: 116592620 0x1B503A2B: ??? (in /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.390.77)
n2: 109734190 0x1B4D387A: ??? (in /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.390.77)
>>>>>>>>>>>>>>
As you can see, there is only one snapshot for the whole file, some header
lines, followed by n-lines describing the allocations in a tree-like structure
where indentation is relevant to the call stacks. The first n-line is the
allocated memory for the whole snapshot.
'''
import os, sys, re
ReplyDelete# define custom parameters
# depth of the tree, 0 means only global memory
max_depth = 1
print 'maximum depth =', max_depth
# tolerance in octets for comparison, 1000=1ko
diff_threshold = 1000000
print 'tolerance =', diff_threshold, 'bytes'
# Read files
class Snapshot: pass
class SnapshotLine:
def __str__(self):
return ('%s%s - %d - %10s' % (' '*self.indent, self.addr, self.mem, self.text))[:130]
snapshots = []
def read_file(file):
print 'reading file', file, '...'
snapshot = Snapshot()
snapshot.file = file
snapshot.lines = []
snapshots.append(snapshot)
started = False
title_read = False
above_max_depth_ws='^' + ' '*(max_depth+1)
for line in open(file, 'r').readlines():
#print line
line = line.replace('\n', '')
if 'snapshot=0' in line:
started = True
continue
if not started: continue
if '#----' in line: continue
# Snapshot title lines
if not title_read:
found = False
for i in ['time', 'mem_heap_B', 'mem_heap_extra_B', 'mem_stacks_B']:
expr = '^' + i + '=(?P[\d]+)'
m = re.match(expr, line)
if m:
setattr(snapshot, i, int(m.group('value')))
found = True
break
if found: continue;
title_read = True
# Snapshot detail line
m = re.match(above_max_depth_ws, line)
if m: continue;
m = re.match('(?P[ ]*)n(?P[\d]+): (?P[\d]+) (?P0x[0-9A-F]*): (?P.*)', line)
if not m:
m = re.match('(?P[ ]*)n(?P[\d]+): (?P[\d]+) (?P.*)', line)
if m:
snapshot_line = SnapshotLine()
snapshot_line.indent = len(m.group('indent'))
snapshot_line.n = int(m.group('n'))
snapshot_line.mem = int(m.group('mem'))
try:
snapshot_line.addr = m.group('addr')
except:
snapshot_line.addr = 0
snapshot_line.text = m.group('text')
#print snapshot_line.__dict__
if snapshot_line.indent <= max_depth:
snapshot.lines.append(snapshot_line)
read_file(sys.argv[1])
read_file(sys.argv[2])
# Generate tree structure
print 'generating tree structure...'
for snapshot in snapshots:
snapshot.roots = {}
for i in range(snapshot.lines.__len__()):
line = snapshot.lines[i]
line.children = {}
indent = line.indent
if indent == 0: snapshot.root = line
elif indent == 1: snapshot.roots[line.addr] = line
else:
# Find parent
j = i-1
while snapshot.lines[j].indent != indent-1: j -= 1
snapshot.lines[j].children[line.text] = line
#print snapshot.file, snapshot.roots
# Dump tree
def dump_lines(lines):
def dump_line(line):
print line
for child in line.children.values():
dump_line(child)
for line in lines:
dump_line(line)
def dump_tree():
print 'Tree:'
for snapshot in snapshots:
print snapshot.file, snapshot.root, snapshot.roots
dump_lines(snapshot.roots.values())
#dump_tree()
# Diff two snapshots
ReplyDeleteprint 'diffing...'
def commify(x):
sign = '+' if x >= 0 else '-'
ret = list(str(abs(x))[::-1])
for i in range(len(ret)):
if i % 3 == 2 and i != len(ret)-1:
ret[i] += ','
return sign+(''.join(ret))[::-1]
def diff_dicts(d1, d2):
keys = list(set(d1.keys() + d2.keys()))
data = [[key, 0] for key in keys]
for datum in data:
key = datum[0]
if key not in d2:
datum[1] = -d1[key].mem
elif key not in d1:
datum[1] = d2[key].mem
else:
datum[1] = d2[key].mem - d1[key].mem
data.sort(lambda x, y: y[1]-x[1])
for datum in data:
key = datum[0]
diff = datum[1]
if key not in d2:
print "-", d1[key]
elif key not in d1:
print "+", d2[key]
else:
if abs(diff) > diff_threshold:
print "-/+", d1[key]
print '[diff: %s]' % commify(diff)
diff_dicts(d1[key].children, d2[key].children)
print '-', snapshots[0].file, snapshots[0].root
print '[diff: %s]' % commify(snapshots[1].root.mem - snapshots[0].root.mem)
print '+', snapshots[1].file, snapshots[1].root
print '-------------'
diff_dicts(snapshots[0].roots, snapshots[1].roots)
Sorry I had to make several posts because the full script is too large.
ReplyDelete