Future plans for GUI

For code related discussions and questions
wuz21m
Trained
Trained
Posts: 36
Joined: 18 Dec 2014, 20:57

Future plans for GUI

Post by wuz21m »

I have read on the forums that plans were underway to port WZ2100 to QT 5. I know we are already using QT 5 to build Warzone2100.

I know that we are using QT 5 extensively and a map editor is already using QT 5.

But what about the game frontend and in-game widgets? My searches didn't turn up any conclusive results. The current solution performs many re-draws and is probably a major performance bottleneck. Are there any plans in place to do something about this?
User avatar
NoQ
Special
Special
Posts: 6226
Joined: 24 Dec 2009, 11:35
Location: /var/zone

Re: Future plans for GUI

Post by NoQ »

Here's the latest flood on the subject, as far as i remember: [1]

The task is large, and so far nobody had guts to take it up. I don't think the current GUI code causes much performance issues (worked well enough even in 90's), but it's clear thay many good features are blocked by this, and also scalable GUI for higher resolutions is very much needed.
wuz21m
Trained
Trained
Posts: 36
Joined: 18 Dec 2014, 20:57

Re: Future plans for GUI

Post by wuz21m »

From what I understand, everything is redrawn from scratch for every frame. For example iV_DrawTextRotated function (lib/ivis_opengl/textdraw.cpp) is called for everything containing text every frame and it does a whole array of matrix operations.

I was just profiling Warzone 2100 and I chose the normal tutorial (snowy level which gives you two trucks and asks you to make an oil derrick). I stayed there and only panned the camera a bit. Of course due to profiling (via valgrind) it is agonizingly slow (3~5 FPS!) I waited for ~1800 frames total.
Here are the findings:
displayConsoleMessages -> (~%10, ~1800) of total processing time! The time is split between iV_GetTextWidth (4%) and iV_DrawText (5.3).
pie_DrawShadows (8.5%, called 20,000 times?!) -> 4% is taken by two std::sort operations inside pie_DrawShadow.
pie_Draw3DShape2 (8%, called 195,000 times) -> DrawElements (6.8%, called 195,000 times?!)
locateMouse-> 4% (even has a comment calling it slow)

Other things to note:
mapTile (3.8% called 39,000,000 times!!) -> 6 million directly from drawTiles, 7.6 mil from drawTerrain(), 6.2 from setTileColor() and 16.5 from mapUpdate()
atmosDrawParticles and atmosDrawParticles both contain a check if(astmosParts.. == APS_Active). That alone is taking %1.5 of total CPU time apiece to a total of 3%. But maxAtmos particles is only 256x256 particles...

P.S.: If I am not mistaken, no form of culling is performed prior to sending a block for rendering. So even if you zoom down to 30 of the total 400 units. We will suffer some of the performance penalty on every cycle! If there any work being done on bounding boxes and frustum culling?

P.S.2.: Even that little bit of text that is shown in the console is taking a toll on performance. I think we must cache text renders.
User avatar
NoQ
Special
Special
Posts: 6226
Joined: 24 Dec 2009, 11:35
Location: /var/zone

Re: Future plans for GUI

Post by NoQ »

displayConsoleMessages is a curious finding, optimizing it may give 1-2fps when a message is dispayed (in tutorial, some messages are always displayed). Other functions do not paint GUI, but are about rendering the game scene.

In any case, i doubt that the current engine is really worth optimizing instead of, first of all, refactoring, or even rewriting from scratch. Unless the fixes are very simple.

I'm curious what profiling tool you are using and whether you have an estimate for instrumentation overhead; because if it counts number of calls, then it *must* have a non-linear instrumentation overhead.

There is an easy way to estimate the CPU-heaviness of drawing an in-game GUI avoiding any overhead: start the game (with little amount of units and/or AIs) and look at the total CPU usage of the process. GUI can't be taking more than that.
wuz21m
Trained
Trained
Posts: 36
Joined: 18 Dec 2014, 20:57

Re: Future plans for GUI

Post by wuz21m »

This is what I used: valgrind --tool=callgrind src/warzone2100

As for fixes:
The bounding box implementation will not take much time. An AABB can be generated and when deciding on choosing units. A simple isometric culling is performed before sending the unit for render. I think the whole thing can be done in 200-500 lines of code.
About the textdraw overhead. I think this is what causes the tilde menu to be so slow in WZ. To fix this issue, we need to generate a texture and render it into a buffer. I don't know if quesoglc is going to make it easy. Otherwise, we need to replace quesoglc and use a library that will generate textures rather than need immediate mode calls. This can take from 100 lines of code (just using VBO and quesoglc and caching the texture id) a bit more might be necessary to ensure that state is valid while using VBO.

As for the 2D/3D code. I agree that replacement of code can go a long way but I think it is prohibitively time consuming.
User avatar
NoQ
Special
Special
Posts: 6226
Joined: 24 Dec 2009, 11:35
Location: /var/zone

Re: Future plans for GUI

Post by NoQ »

  • For graphics engine optimizations, you'd better look for another thread; a lot of such discussions were happening in ArtRevolution threads, and in fact a lot of new optimizations were already introduced in -master. Also, you may want to synchronize your performance analysis with [2].
  • Ah, valgrind. It introduces a huge overhead, but then works around it; numbers it displays are not exactly time, but rather number of emulated processor instructions, though it's still pretty accurate. Things to be aware of:
    • It doesn't take sleep times into account, which means that you really don't know the absolute CPU usage values for the functions you found. It may be "10% of almost-nothing", and after adding 10-20 units on the board it may reduce to a negligible 1%. In any case, it is much more relevant to collect profile performance-critical situations, like with 1500 units on board (the limit we say to support in a 10-player game), especially when it comes to tools that discard sleep time, like valgrind or perf (in its out-of-the-box variant).
    • Time-based application logic may be scewed by the large overhead. Performance statistics for the game running @60fps and @3fps are completely different, even if the same things are happening. It might be that console redraws (or anything else) happen at game frame rather than at render frame.
    • As far as i remember, there was a way to arbitrary enable and disable collecting statistics (not instrumentation, of course) in valgrind to avoid mixing in statistics from the start menu, not sure if you used it.
User avatar
MaNGusT
Art contributor
Posts: 1152
Joined: 22 Sep 2006, 10:31
Location: Russia

Re: Future plans for GUI

Post by MaNGusT »

AABB optimizations are included in the wzm loader patch but patch itself requires some work to be merged with master version.
stiv
Warzone 2100 Team Member
Warzone 2100 Team Member
Posts: 876
Joined: 18 Jul 2008, 04:41
Location: 45N 86W

Re: Future plans for GUI

Post by stiv »

But what about the game frontend and in-game widgets?
We looked at using Qt with an eye to getting a GUI toolkit and some modern drawing funcs. Unfortunately, Qt does not do OpenGL well. Not the results we were looking for, but still an experiment worth doing. We are still using some parts of the Qt toolkit like like its Javascript, INI and JSON readers and maybe some of its container classes. Allegedly, Qt Version N+1 will have useful OpenGL. I'm not sure waiting around is a good idea.

The performance data is good to have (thanks for doing that, wuz21m!), but let's not spend a lot of time wanking over the exactness of the numbers. Valgrind can give you a good idea of the relative time spent in various functions and how often they are called. I don't see anything particularly scary although the time spent on console messages is interesting.

Working to optimize the drawing engine might give you some small speedups. (the general rule is algorithms over hand-optimized code). But a better way to go forward would be to look at using a modern scene graph engine like Open Scene Graph. This gives you modern shaders, culling and a bunch of other goodies for free.
Per
Warzone 2100 Team Member
Warzone 2100 Team Member
Posts: 3780
Joined: 03 Aug 2006, 19:39

Re: Future plans for GUI

Post by Per »

No big plans for the GUI, but performance should be improved by https://github.com/perim/warzone2100/tree/gfxqueue which I've just recently rebased on git master. A bit short on time right now, but I hope to revisit it soon.
wuz21m
Trained
Trained
Posts: 36
Joined: 18 Dec 2014, 20:57

Re: Future plans for GUI

Post by wuz21m »

Thanks for all the feedback, everyone :)

@NoQ: Thanks for the insight about Valgrind. I did notice that it reported instruction count. But either way, WZ2100 is very CPU intensive. I ran WZ2100 in windowed mode side-by-side with htop. The CPU consumption was around 50~60% on the default fast play mode. We should find a way for dealing with this.

@Stiv: You are very welcome.

@Per: Thanks for the heads up

Have there been any efforts on making WZ2100 multi-threaded? I think at least the audio code can be put into its own thread (if it has already not been). What about the AI? I know it is much more difficult for OpenGL though...

EDIT: I just double-checked ~30% usage on main menu. Tutorial mode, not doing anything: ~50-60%
User avatar
NoQ
Special
Special
Posts: 6226
Joined: 24 Dec 2009, 11:35
Location: /var/zone

Re: Future plans for GUI

Post by NoQ »

let's not spend a lot of time wanking over the exactness of the numbers
Since my head is full of this stuff nowadays, i can't help talking about it. As an example, i'd like to point out how different are self-time function tops collected by callgrind and perf in the same tutorial scenario on my machine.
2015-01-11-003329_3200x1080_scrot.png
The second column in valgrind should be roughly the same as the first column in perf-top, both representing relative time spend inside a function but outside sub-functions ("self" time). I already provided some explanation for the reasons behind such a dramatic difference in the previous post.

Unrelated interesting stuff:
  • memset() disappears almost completely and CPU usage halves (35%->15%) when i disable vsync.
  • locateMouse() has a comment in it: "todo This is slow - speed it up"
wuz21m
Trained
Trained
Posts: 36
Joined: 18 Dec 2014, 20:57

Re: Future plans for GUI

Post by wuz21m »

Are you sure the results for perf are accurate?

Does this mean 15%/40% of the CPU time is spent doing memset?

EDIT: Wait a second, I tried tracking the memsets based on the valgrind results... And they are all caused by glPushDebugGroup. Check it out for yourself.

EDIT2: Another root cause is __glcConvertToVisualUcs4 !

The rest are caused by the engine from what I checked.
Attachments
malloc.dot.pdf
The hex stuff is again caused by glPushDebugGroup, the other from __glcConvertToVisualUcs4
(3.8 KiB) Downloaded 360 times
callgraph.dot.pdf
The initial call is glPushDebugGroup
(4.51 KiB) Downloaded 342 times
User avatar
NoQ
Special
Special
Posts: 6226
Joined: 24 Dec 2009, 11:35
Location: /var/zone

Re: Future plans for GUI

Post by NoQ »

This might be a pecularity of my intel video driver, but in general i think i explained most of it. It is clear that memset bottleneck backtraces through video driver to render loop. It is clear that under valgrind, due to instrumentation overhead, render loop frequency is reduced (~7 times less fps on my system), and hence you'd see anything related to render reduced at least that much in valgrind. It is also confirmed through disabling vsync, which not only removes memset calls, but also actually halves CPU usage in top; it makes sense that memset calls are necessary to clear the buffers.

As a speculation, there might be another reason for valgrind to underestimate memset, namely this function is heavily assembler-optimized, and uses "heavier" CPU instructions than normal code, but valgrind might be having a wrong idea on how to take that into account (perf doesn't need to consider this factor at all).

So yeah, i believe that perf, in its out-of-the-box approach, is much closer to the truth than valgrind; simply because it doesn't alter program behavior in any way or cause any overhead (or even do any instrumentation) and its technology (a sampler) is fairly straightforward and simple. The primary downside of perf is that it provides much less information (eg., it cannot count function calls with this approach, or discriminate between few-long and many-short function calls, and once you ask him to consider that, it's no better than any other tool).
wuz21m
Trained
Trained
Posts: 36
Joined: 18 Dec 2014, 20:57

Re: Future plans for GUI

Post by wuz21m »

Well, I just tested the effect of disabling vSync on my computer. The results are completely different from yours.

When vSync is on. Frame rate is at 60, CPU usage is at ~50%
Well, I just tested disabling vSync on my computer. That increased frame rate to ~144 (at the tutorial screen doing nothing) and increased CPU usage to 70%.

Now the funny thing comes in:
When I let the tutorial, doing nothing. On both cases CPU usage increases!
It reaches 100% in the no vSync mode, FPS drops to 70-80.
It reaches ~90% in vSync mode. FPS is maintained.

Does the tutorial mode do anything funny if it is ignored for some time?
Maybe the GPU is being silly!

BTW, how do you run perf? Is it simply perf top?
User avatar
NoQ
Special
Special
Posts: 6226
Joined: 24 Dec 2009, 11:35
Location: /var/zone

Re: Future plans for GUI

Post by NoQ »

I've no idea why it doesn't increase fps when i disable vsync on my machine (in fact, it doesn't reach 60fps in any case). In any case, it's clearly GPU-specific, and it's probably not the thing we want to fix. In fact, reducing CPU usage is not our aim (it's not instantly a problem when the game always uses 100% CPU, as we don't really mind reducing fps from 140 to 30 if necessary).

So i keep insisting that we need to focus on profiling CPU-intensive stuff that happens when there are many things on board, which would make more sense and be more consistent across different systems.
Is it simply perf top?
perf top -p `pidof warzone2100`
(to capture statistics by pid, rather than of the whole system; as on screenshot above)
perf top -g -p `pidof warzone2100`
(a useful feature: to count total time and classify backtraces, not just self time)
Post Reply