Future plans for GUI

For code related discussions and questions
User avatar
MaNGusT
Art contributor
Posts: 1152
Joined: 22 Sep 2006, 10:31
Location: Russia

Re: Future plans for GUI

Post by MaNGusT »

Will it be easy or just not that hard to implement ready to use libs as Ogre3d engine does? :ninja:
Image
User avatar
Terminator
Regular
Regular
Posts: 1077
Joined: 05 Aug 2006, 13:46
Location: Ukraine
Contact:

Re: Future plans for GUI

Post by Terminator »

Isn't the first thing devs should do Is Unhardcode current GUI code ? before any other actions if anyone decides to implement new GUI, or from ready-made lib or, extend current, or switch to html+css layer.
Am I right ?
In any case anyone who will start messing with GUI 1st thing he will do is put all current GUI "calls" to separate module.
Death is the only way out... sh*t Happens !

Russian-speaking Social network Group http://vk.com/warzone2100
Per
Warzone 2100 Team Member
Warzone 2100 Team Member
Posts: 3780
Joined: 03 Aug 2006, 19:39

Re: Future plans for GUI

Post by Per »

MaNGusT wrote:Will it be easy or just not that hard to implement ready to use libs as Ogre3d engine does? :ninja:
I think we first we need to restructure the graphics code so that works like modern graphics code should. Then it will be possible to use a third-party library. But I do not have much experience using these graphics engines, so I might be wrong.

Also I have little idea which of them would be a good fit for us. It would need to be easily multiplatform, and work well with SDL and Qt.
wuz21m
Trained
Trained
Posts: 36
Joined: 18 Dec 2014, 20:57

Re: Future plans for GUI

Post by wuz21m »

OK, I install google profiler tools.
The tool has limited overhead yet it can generate call graphs.
The raw results look somewhat similar to perf output (here:
tutorial.txt
(80.16 KiB) Downloaded 332 times
)
Top entries:
2871 22.0% 22.0% 7599 58.3% _init@3e6b8
2186 16.8% 38.8% 5503 42.2% __driDriverGetExtensions_i965
900 6.9% 45.7% 900 6.9% ioctl
728 5.6% 51.3% 1120 8.6% drm_intel_bufmgr_fake_init
214 1.6% 52.9% 690 5.3% atmosDrawParticles
197 1.5% 54.4% 1846 14.2% __driDriverGetExtensions_i915
197 1.5% 55.9% 311 2.4% atmosUpdateSystem
160 1.2% 57.1% 364 2.8% locateMouse
159 1.2% 58.4% 159 1.2% edgeLessThan

The graphs (here:
tutorial.gif
) look very similar to the valgrind stats I extracted:
pie_drawShadow -> 8.4%
Two atmos calls -> 7% in total
locateMouse -> 2.8%
display Widgets -> 2%
displayConsoleMessages -> 11%
pie_Draw3DShape2 -> 42%

So I think the CPU usage stats we get from Valgrind are on track.

The reason I insist on focusing on the bottom line is because just staring at the Tutorial level and having 50% CPU usage on a rather modern PC is pretty poor IMHO. And since WZ2100 is single threaded, it can only get worse from there :(

@MaNGusT:
As for migrating the code to Ogre, Let me give an example:
You see these rotating models "Angry Python with Guns" on the Manufacture Menu? The matrix model is transformed, the model is actually redrawn every frame. The font rendering code? It transforms the code into UCS4, picks the glyphs (and measures the length), transforms the view matrix and renders the damn thing every frame (thus I get ~16% CPU usage just staring at the title screen). A correct approach should just use a texture and then re-use it every frame.

You get the idea...

P.S.: I am not trying to take away anything from the efforts that the original and current coders put into WZ2100. Back then WZ2100 had to work on very limited memory and CPU, right now we have great RAM and VRAM capacity even on low-end hardware.
cybersphinx
Inactive
Inactive
Posts: 1695
Joined: 01 Sep 2006, 19:17

Re: Future plans for GUI

Post by cybersphinx »

Per wrote:Also I have little idea which of them would be a good fit for us. It would need to be easily multiplatform, and work well with SDL and Qt.
CEGUI is in Debian and MXE (though both somewhat outdated), I think it's the only option if we don't want to make our 3rdparty directory larger. It has also been around for quite a while (so is unlikely to be abandoned soon), and its features and tool support look good (on paper/screen).
We want information... information... information.
Per
Warzone 2100 Team Member
Warzone 2100 Team Member
Posts: 3780
Joined: 03 Aug 2006, 19:39

Re: Future plans for GUI

Post by Per »

wuz21m wrote:You see these rotating models "Angry Python with Guns" on the Manufacture Menu? The matrix model is transformed, the model is actually redrawn every frame. The font rendering code? It transforms the code into UCS4, picks the glyphs (and measures the length), transforms the view matrix and renders the damn thing every frame (thus I get ~16% CPU usage just staring at the title screen). A correct approach should just use a texture and then re-use it every frame.
I agree with your post for the most part. I just want to add that it is a bit worse than this. The original graphics code would involve the CPU in the drawing of every vertex in every frame. For models, that is now fixed, storing meshes on the GPU and drawing them as a whole with a few calls to the GPU. In my gfxqueue branch, I am extending this approach to everything, including fonts, lines, etc. which means that whenever they are unchanged, they are reused from attributes stored on the GPU. I don't think that writing things into textures gains a lot over this. The big jump up in performance should be getting the CPU out of every vertex.
User avatar
NoQ
Special
Special
Posts: 6226
Joined: 24 Dec 2009, 11:35
Location: /var/zone

Re: Future plans for GUI

Post by NoQ »

google profiler tools
I think the CPU usage stats we get from Valgrind are on track

Code: Select all

$ pprof --text src/warzone2100 /tmp/mybin.prof                                   
Using local file src/warzone2100.
Using local file /tmp/mybin.prof.
Removing __funlockfile from all stack traces.
Total: 2231 samples
     893  40.0%  40.0%      893  40.0% __memset_sse2
     155   6.9%  47.0%      155   6.9% _init@750
     131   5.9%  52.8%      131   5.9% _init@38f98
     117   5.2%  58.1%      117   5.2% __ioctl
     101   4.5%  62.6%      101   4.5% __driDriverGetExtensions_i965
      33   1.5%  64.1%       33   1.5% drm_intel_bufmgr_fake_init
      31   1.4%  65.5%       31   1.4% inflateBackEnd
      26   1.2%  66.7%       26   1.2% tx_compress_dxtn
      21   0.9%  67.6%       21   0.9% png_set_read_user_transform_fn
      18   0.8%  68.4%       19   0.9% atmosUpdateSystem
      17   0.8%  69.2%       17   0.8% edgeLessThan
      16   0.7%  69.9%       16   0.7% __driDriverGetExtensions_i915
      16   0.7%  70.6%       16   0.7% __fsync_nocancel
      16   0.7%  71.3%       27   1.2% drawTerrain
      15   0.7%  72.0%       15   0.7% __GI___poll
      15   0.7%  72.7%       15   0.7% __GI___pthread_mutex_lock
      14   0.6%  73.3%       14   0.6% __memcpy_avx_unaligned
      13   0.6%  73.9%       15   0.7% atmosDrawParticles
      10   0.4%  74.3%       27   1.2% locateMouse
It pretty much coincides with perf to me, and it's also based on a similar approach (and straightforwardly contradicts valgrind stats).

(profiled from start because game source code changes are necessary to turn it on and off in run-time)
wuz21m
Trained
Trained
Posts: 36
Joined: 18 Dec 2014, 20:57

Re: Future plans for GUI

Post by wuz21m »

I am referring to the call graph stats.
The high level graph stats (attached gif) are very similar to valgrind's (Compare to 3rd post in this thread)
Yes, the raw ones are very similar to perf.
User avatar
vexed
Inactive
Inactive
Posts: 2538
Joined: 27 Jul 2010, 02:07

Re: Future plans for GUI

Post by vexed »

wuz21m wrote:I have read on the forums that plans were underway to port WZ2100 to QT 5. I know we are already using QT 5 to build Warzone2100.
Kinda. Yes, we have moved to QT5, but, moving everything to Qt isn't yet possible because of performance issues.
At one point we did have font support rendered using Qt, but, we quickly switched back once we saw the performance hit.
I know that we are using QT 5 extensively and a map editor is already using QT 5.
There are no editors that I know of that uses Qt that works.
There was an attempt at one, but, that hasn't been worked on in years.
The newest map editor is called SharpFlame, and that is being worked on by Cowboy. That is using C#.
But what about the game frontend and in-game widgets? My searches didn't turn up any conclusive results. The current solution performs many re-draws and is probably a major performance bottleneck. Are there any plans in place to do something about this?
Everything done in WZ is the brute force approach, and that does have some advantages.
Lots of games redraw everything every frame, that isn't the the problem per se, it is how they go about drawing things that matter the most.
At one time, we had sleep() calls in the menu code to be "nicer" to laptops, and, while that isn't the best approach, that is what lots of games do these days.

To answer some of the other questions, librocket would be the HTML/CSS choice, or CEGUI for direct openGL, but, it still will be difficult to integrate no matter which one is picked (if we go that route)
/facepalm ...Grinch stole Warzone🙈🙉🙊 contra principia negantem non est disputandum
Super busy, don't expect a timely reply back.
User avatar
Tzeentch
Trained
Trained
Posts: 300
Joined: 14 Oct 2012, 14:24

Re: Future plans for GUI

Post by Tzeentch »

NoQ wrote:
  • For graphics engine optimizations, you'd better look for another thread; a lot of such discussions were happening in ArtRevolution threads, and in fact a lot of new optimizations were already introduced in -master. Also, you may want to synchronize your performance analysis with [2].
  • Ah, valgrind. It introduces a huge overhead, but then works around it; numbers it displays are not exactly time, but rather number of emulated processor instructions, though it's still pretty accurate. Things to be aware of:
    • It doesn't take sleep times into account, which means that you really don't know the absolute CPU usage values for the functions you found. It may be "10% of almost-nothing", and after adding 10-20 units on the board it may reduce to a negligible 1%. In any case, it is much more relevant to collect profile performance-critical situations, like with 1500 units on board (the limit we say to support in a 10-player game), especially when it comes to tools that discard sleep time, like valgrind or perf (in its out-of-the-box variant).
    • Time-based application logic may be scewed by the large overhead. Performance statistics for the game running @60fps and @3fps are completely different, even if the same things are happening. It might be that console redraws (or anything else) happen at game frame rather than at render frame.
    • As far as i remember, there was a way to arbitrary enable and disable collecting statistics (not instrumentation, of course) in valgrind to avoid mixing in statistics from the start menu, not sure if you used it.
procsystime script captures and prints the system call time usage for a given process name

In RTS framerate is not as important as other games.

I agree to this, warzone spends alot of time using sleep() calls as per..
(nanosecond timestamps)

Code: Select all

# /usr/share/dtrace/toolkit/procsystime -n warzone2100 
Tracing... Hit Ctrl-C to end...
dtrace: 12797 dynamic variable drops with non-empty dirty list
dtrace: 13867 dynamic variable drops with non-empty dirty list
^C

Elapsed Times for processes warzone2100,

         SYSCALL          TIME (ns)
          writev             105663
           write            1600391
          select            3821487
          getpid            4151995
          sendto            4205746
            read            4522322
         recvmsg            5857182
     sched_yield           15857375
           ioctl           16124479
       nanosleep        25845524999
        _umtx_op        26613797190
            poll        41294596003
* ioctl function manipulates the underlying device parameters of special files.
In particular, many operating characteristics of
character special files (e.g., terminals) may be controlled with
ioctl() requests.

* nanosleep() (high resolution sleep) suspends the execution of the calling thread until either at least the time specified in *req has elapsed,
or the delivery of a signal that triggers the invocation of a handler in the calling thread or that terminates the process.

* poll() performs a similar task to select, waits for one of a set of file descriptors to become ready to perform I/O.

- select select() and pselect() allow a program to monitor multiple file descriptors,
waiting until one or more of the file descriptors become "ready" for some class of I/O operation (e.g., input possible).
A file descriptor is considered ready if it is possible to perform the corresponding I/O operation (e.g., read) without blocking.

* _umtx_op uncertain at this time. Perhaps FreeBSD specific

seems Warzone spends most of it's time on these main operations.
User avatar
Tzeentch
Trained
Trained
Posts: 300
Joined: 14 Oct 2012, 14:24

Re: Future plans for GUI

Post by Tzeentch »

Got a flame graph of warzone from tutorial doing barely anything. http://i.imgur.com/yGZCm4o.png?1

(flame graph steps taken posted here viewtopic.php?f=32&t=12021)

Most of the stack traces on the right hand side are just libnvidia-glcore.so. And I believe having all the highest graphic settings are obscuring my results compared to above. Shall I use bare minimal for graphics or something specific? (and different hardware)

Compared against above graph mine shows:

pie_Draw3DShape = 14.35%
inDisplayWidgets = 0.26%
displayConsoleMessages = 2.93%
atmosUpdateSystem = 1.97%
atmosDrawParticles = 19.47%
displayFeatures = 2.80%

pie_drawShadow - had trouble finding.
wuz21m
Trained
Trained
Posts: 36
Joined: 18 Dec 2014, 20:57

Re: Future plans for GUI

Post by wuz21m »

Oh wow.

22% spent over particles (My measurements were ~7.7%). Were you just viewing the particles? Did you pan or zoom or do anything?

You are using a powerful Nvidia GPU versus my Intel GPU. So you are probably wasting less CPU time on rendering. atmosDrawParticles is called. If you look at the warzone2100 source code, the function is rather simplistic, it iterates over the particles (65536 if I am not mistaken), and calls renderParticle every time. This function in turn, ...pushes and pops the matrix every time (calls pie_MatBegin and pie_MatEnd)

Why are we not making cached calls?
Last edited by wuz21m on 11 Feb 2015, 19:19, edited 1 time in total.
wuz21m
Trained
Trained
Posts: 36
Joined: 18 Dec 2014, 20:57

Re: Future plans for GUI

Post by wuz21m »

I just tried changing the calls to cached ones. Tzeentch, can you do these:
1- Go to renderParticle in atmos.cpp and change pie_MatBegin() to pie_MatBegin(true). The visual results are not different.
2- Monitor (a) CPU usage percentage (e.g. 45% of one core) (b) share of atmosDrawParticles and renderParticle in CPU usage. Once with no code changes and once with the change above applied.
3- Make sure the system is not doing much. Use a fixed duration (e.g. 3 minutes).
4- Run warzone with these parameters: src/warzone2100 --game TUTORIAL3 (so you would skip the main menu).

Thanks!
User avatar
Tzeentch
Trained
Trained
Posts: 300
Joined: 14 Oct 2012, 14:24

Re: Future plans for GUI

Post by Tzeentch »

Intend to work on this ASAP, just having issues -> viewtopic.php?f=43&t=11974#p131029
User avatar
Tzeentch
Trained
Trained
Posts: 300
Joined: 14 Oct 2012, 14:24

Re: Future plans for GUI

Post by Tzeentch »

Solaris 11.2 taking time... so using Ubuntu with perf in meantime. (As most I imagine will be using this.) Although note I didn't do this with the Nvidia driver configured

Code: Select all

The game parameter requires one of the following keywords:CAM_1A, CAM_2A, CAM_3A, TUTORIAL3, or FASTPLAY.
root@System-Product-Name:/home/test/war-test# src/warzone2100 --game TUTORIAL3
Unrecognized option: TUTORIAL3
some reason isn't working.

17.0% - 47.3% CPU usage warzone in tutorial depending on messages before or after any changes, difficult to gauge on Ubuntu as perforamnce is very different compared to using freeBSB (due to difference in packages most likely). Using Perf 60 second measurement, fold stacks to single line then convert to flamegraph. (I zoomed out fully proir to exectution in both tests) graphic settings on maximum.

Code: Select all

perf record -F 99 -p 16323 -g -- sleep 60
perf script | ./stackcollapse-perf.pl > out.perf-folded
./flamegraph.pl out.perf-folded > perf.svg
Before
pie_MatBegin - 0.71%, Samples taken 16

pie_Draw3DShape - 1.52%, Samples taken 34

renderParticle - 3.98%, Samples taken 89

atmosDrawParticles - 5.94%, Samples taken 133

After change
pie_MatBegin - 0.36%, Samples taken 5

pie_Draw3DShape - 0.66%, Samples taken 9

renderParticle - 2.26%, Samples taken 31

atmosDrawParticles - 3.93%, Samples taken 54


Excellent! Lower to the point where after change atmosDrawParticles is even less than both atmosDrawParticles & renderParticle. I am happy. Was this expected? I need to double check and compare this on other setups. From this all code paths above pie_Draw3DShape are using half the CPU time.

What shall we do to improve performance elsewhere? Should this be moving to a separate thread, as it'll probs go off topic unless we stick to GUI performance I guess
Post Reply