Dtrace Flame graphs

For code related discussions and questions
Post Reply
User avatar
Tzeentch
Trained
Trained
Posts: 242
Joined: 14 Oct 2012, 14:24

Dtrace Flame graphs

Post by Tzeentch » 06 Feb 2015, 22:39

To begin constructing flame graphs for a visual representation of stack traces, as this may prove a useful investment. View - http://www.brendangregg.com/flamegraphs.html

After getting from github, first test was done in tutorial doing very basic tasks. Followed by the 3 step process to generate a graph:

First Dtrace to collect user-level stacks with frequency count while on-CPU at 99 Hertz, arg1 is the user-land program counter while arg0 is kernel.

Note many other tools can also generate these graphs such as Perf, SystemTap etc

Code: Select all

# dtrace -x ustackframes=100 -n 'profile-99 /execname == "warzone2100" && arg1/ {@[ustack()] = count(); } tick-60s { exit(0); }' -o warzone.stacks

# ./stackcollapse.pl warzone.stacks > warzone.fold

# ./flamegraph.pl warzone.fold > warzone2100-tutorial.svg
Can't directly attach to be browsed but I can take these and send to people if they'd like, I have been able to browse these with firefox interactively and most browsers should natively support .svg files.
More or less grey

User avatar
Tzeentch
Trained
Trained
Posts: 242
Joined: 14 Oct 2012, 14:24

Re: Dtrace Flame graphs

Post by Tzeentch » 11 Feb 2015, 10:34

Did one of tutorial, renamed as follows and taken in same way as prior to collect stats

Code: Select all

# dtrace -x ustackframes=100 -n 'profile-99 /execname == "warzone2100" && arg1/ {@[ustack()] = count(); } tick-60s { exit(0); }' -o warzone.stacks
# ./stackcollapse.pl warzone.stacks > warzone.folded
/usr/ports/devel/git/FlameGraph # ./flamegraph.pl --title Warzone2100-tutorial warzone.folded > warzone2100-tutorial.svg
More or less grey

User avatar
Tzeentch
Trained
Trained
Posts: 242
Joined: 14 Oct 2012, 14:24

Re: Dtrace Flame graphs

Post by Tzeentch » 11 Feb 2015, 17:32

Similar way to gather memset operations. Same sample rate of 60 seconds. Done during tutorial (after level loaded) with barely any actions being taken.

Code: Select all

# dtrace -n '::memset:entry /execname == "warzone2100"/ { @[ustack()] = sum(arg2); } tick-60s { exit (0) ; }' -o warzone.memset
# ./stackcollapse.pl warzone.memset > warzone.memset-fold
./flamegraph.pl --colors mem --title Warzone2100-memset warzone.memset-fold > warzone2100-memset.svg
More or less grey

User avatar
Tzeentch
Trained
Trained
Posts: 242
Joined: 14 Oct 2012, 14:24

Re: Dtrace Flame graphs

Post by Tzeentch » 19 Mar 2015, 04:14

Made a flame graph for malloc operations in a 4 player ffa, me and 3 nexus AI's.
Command used after game launch. Get all malloc operations and stack trace for each along with the sum of bytes requested, then fold into single line and generate flame graph with bytes per operation.

Code: Select all

dtrace -x ustackframes=100 -n 'pid$target::malloc:entry {@[ustack()] = sum(arg0); } tick-60s { exit(0); }' -p 77498 -o warzone.malloc
./stackcollapse.pl warzone.malloc > warzone.malloc.folded
./flamegraph.pl warzone.malloc.folded --title="malloc() bytes" \--countname="bytes" --colors=mem > warzone.malloc.svg
I have found some interesting points on this. (At least to me)

function : warzone2100`updateScripts takes 33.14% of all malloc operations causing all stack traces above such as
function : libQtScript.so.4.8.6`QScriptValue::call 32.96% (likely to change as we go to QT5)

function : displayConsoleMessages takes 24.27% (depends on amount of messages) kind of already known this is a big problem.

function : warzone2100`audio_Update takes 10.02 so not so bad.

By comparison calloc requests are quite different. Same method.

Code: Select all

dtrace -x ustackframes=100 -n 'pid$target::calloc:entry {@[ustack()] = sum(arg0); } tick-60s { exit(0); }' -p 77498 -o warzone.calloc
dtrace: description 'pid$target::calloc:entry ' matched 2 probes
./stackcollapse.pl warzone.calloc > warzone.calloc.folded
./flamegraph.pl warzone.calloc.folded --title="calloc() bytes" \--countname="bytes" --colors=mem > warzone.calloc.svg
Not sure what this one is but
function : libGL.so taking 64.64%
function : warzone2100`audio_Update taking 33.65%

same on memsets as above and apart from main game loop :...:

droidUpdate 32.34%
updatePlayerPower 16.14%
structureUpdate 5.68%
More or less grey

User avatar
NoQ
Special
Special
Posts: 6194
Joined: 24 Dec 2009, 11:35
Location: /var/zone

Re: Dtrace Flame graphs

Post by NoQ » 19 Mar 2015, 05:33

function : libQtScript.so.4.8.6`QScriptValue::call 32.96% (likely to change as we go to QT5)
Emm, that's quite weird, because Nexus AI doesn't use QtScript.
All of this is just from rules.js, which is supposed to be very lightweight.

I'd really like to see how this stuff scales when you switch to NullBot AI.
Also, no, it probably won't change when we switch to Qt5 (also, we already have in master), but it probably will change if we switch to QJSEngine.

Still, i'm not sure if mallocs have an actually large absolute impact.

With probe-based analysis we cannot compute it (comparable to overhead). With samplers - not sure, probably even depends; but if we can, then it's not a problem. We can also not rely on the allocated size, because mallocs, unlike memsets, aren't straightforwardly O(n).

User avatar
Tzeentch
Trained
Trained
Posts: 242
Joined: 14 Oct 2012, 14:24

Re: Dtrace Flame graphs

Post by Tzeentch » 19 Mar 2015, 16:48

Made an on-CPU flame graph and it is lightweight although it takes many mallocs it's on my systems CPU less that 1% (0.62%).

all my CPU is hogged by display world and draw3Dscene so if we upgrade the game engine to QJSEngine I'm assuming this will allow for some new cool things?
Getting curious on display world now.

from what I see we don't have any major issues probably only minor ones. My CPU is not used that much by warzone unless I change Vsync option or have many messages displayed. (by selecting many things in single context or in tutorial mode.) Which has already been covered.

and yeah checking activity inside the kernel (ignoring my graphics driver and filtering out idle threads) kernel activity is mostly spent on displaymessages and the display world even if I set all graphic options to the lowest.

Code: Select all

dtrace -x stackframes=100 -n 'profile-199 /arg0/ {@[stack()] = count(); } tick-60s { exit(0); }' -o out.stacks
dtrace: description 'profile-199 ' matched 2 probes
./stackcollapse.pl out.stacks | grep -v cpu_idle > out.fold
./flamegraph.pl out.fold > kernel.svg
More or less grey

stiv
Warzone 2100 Team Member
Warzone 2100 Team Member
Posts: 876
Joined: 18 Jul 2008, 04:41
Location: 45N 86W

Re: Dtrace Flame graphs

Post by stiv » 20 Mar 2015, 04:56

function : warzone2100`updateScripts takes 33.14% of all malloc operations
Just my Scientific Wild-Ass Guess, but that is likely coming from the javascript interpreter. No idea about javascript internals, but I do know the python interpreter mallocs & frees like a house on fire, and would expect the same here.

Post Reply