Future plans for GUI
Re: Future plans for GUI
Will it be easy or just not that hard to implement ready to use libs as Ogre3d engine does?
- Terminator
- Regular
- Posts: 1077
- Joined: 05 Aug 2006, 13:46
- Location: Ukraine
- Contact:
Re: Future plans for GUI
Isn't the first thing devs should do Is Unhardcode current GUI code ? before any other actions if anyone decides to implement new GUI, or from ready-made lib or, extend current, or switch to html+css layer.
Am I right ?
In any case anyone who will start messing with GUI 1st thing he will do is put all current GUI "calls" to separate module.
Am I right ?
In any case anyone who will start messing with GUI 1st thing he will do is put all current GUI "calls" to separate module.
Death is the only way out... sh*t Happens !
Russian-speaking Social network Group http://vk.com/warzone2100
Russian-speaking Social network Group http://vk.com/warzone2100
Re: Future plans for GUI
I think we first we need to restructure the graphics code so that works like modern graphics code should. Then it will be possible to use a third-party library. But I do not have much experience using these graphics engines, so I might be wrong.MaNGusT wrote:Will it be easy or just not that hard to implement ready to use libs as Ogre3d engine does?
Also I have little idea which of them would be a good fit for us. It would need to be easily multiplatform, and work well with SDL and Qt.
Re: Future plans for GUI
OK, I install google profiler tools.
The tool has limited overhead yet it can generate call graphs.
The raw results look somewhat similar to perf output (here: )
Top entries:
2871 22.0% 22.0% 7599 58.3% _init@3e6b8
2186 16.8% 38.8% 5503 42.2% __driDriverGetExtensions_i965
900 6.9% 45.7% 900 6.9% ioctl
728 5.6% 51.3% 1120 8.6% drm_intel_bufmgr_fake_init
214 1.6% 52.9% 690 5.3% atmosDrawParticles
197 1.5% 54.4% 1846 14.2% __driDriverGetExtensions_i915
197 1.5% 55.9% 311 2.4% atmosUpdateSystem
160 1.2% 57.1% 364 2.8% locateMouse
159 1.2% 58.4% 159 1.2% edgeLessThan
The graphs (here: ) look very similar to the valgrind stats I extracted:
pie_drawShadow -> 8.4%
Two atmos calls -> 7% in total
locateMouse -> 2.8%
display Widgets -> 2%
displayConsoleMessages -> 11%
pie_Draw3DShape2 -> 42%
So I think the CPU usage stats we get from Valgrind are on track.
The reason I insist on focusing on the bottom line is because just staring at the Tutorial level and having 50% CPU usage on a rather modern PC is pretty poor IMHO. And since WZ2100 is single threaded, it can only get worse from there
@MaNGusT:
As for migrating the code to Ogre, Let me give an example:
You see these rotating models "Angry Python with Guns" on the Manufacture Menu? The matrix model is transformed, the model is actually redrawn every frame. The font rendering code? It transforms the code into UCS4, picks the glyphs (and measures the length), transforms the view matrix and renders the damn thing every frame (thus I get ~16% CPU usage just staring at the title screen). A correct approach should just use a texture and then re-use it every frame.
You get the idea...
P.S.: I am not trying to take away anything from the efforts that the original and current coders put into WZ2100. Back then WZ2100 had to work on very limited memory and CPU, right now we have great RAM and VRAM capacity even on low-end hardware.
The tool has limited overhead yet it can generate call graphs.
The raw results look somewhat similar to perf output (here: )
Top entries:
2871 22.0% 22.0% 7599 58.3% _init@3e6b8
2186 16.8% 38.8% 5503 42.2% __driDriverGetExtensions_i965
900 6.9% 45.7% 900 6.9% ioctl
728 5.6% 51.3% 1120 8.6% drm_intel_bufmgr_fake_init
214 1.6% 52.9% 690 5.3% atmosDrawParticles
197 1.5% 54.4% 1846 14.2% __driDriverGetExtensions_i915
197 1.5% 55.9% 311 2.4% atmosUpdateSystem
160 1.2% 57.1% 364 2.8% locateMouse
159 1.2% 58.4% 159 1.2% edgeLessThan
The graphs (here: ) look very similar to the valgrind stats I extracted:
pie_drawShadow -> 8.4%
Two atmos calls -> 7% in total
locateMouse -> 2.8%
display Widgets -> 2%
displayConsoleMessages -> 11%
pie_Draw3DShape2 -> 42%
So I think the CPU usage stats we get from Valgrind are on track.
The reason I insist on focusing on the bottom line is because just staring at the Tutorial level and having 50% CPU usage on a rather modern PC is pretty poor IMHO. And since WZ2100 is single threaded, it can only get worse from there
@MaNGusT:
As for migrating the code to Ogre, Let me give an example:
You see these rotating models "Angry Python with Guns" on the Manufacture Menu? The matrix model is transformed, the model is actually redrawn every frame. The font rendering code? It transforms the code into UCS4, picks the glyphs (and measures the length), transforms the view matrix and renders the damn thing every frame (thus I get ~16% CPU usage just staring at the title screen). A correct approach should just use a texture and then re-use it every frame.
You get the idea...
P.S.: I am not trying to take away anything from the efforts that the original and current coders put into WZ2100. Back then WZ2100 had to work on very limited memory and CPU, right now we have great RAM and VRAM capacity even on low-end hardware.
-
- Inactive
- Posts: 1695
- Joined: 01 Sep 2006, 19:17
Re: Future plans for GUI
CEGUI is in Debian and MXE (though both somewhat outdated), I think it's the only option if we don't want to make our 3rdparty directory larger. It has also been around for quite a while (so is unlikely to be abandoned soon), and its features and tool support look good (on paper/screen).Per wrote:Also I have little idea which of them would be a good fit for us. It would need to be easily multiplatform, and work well with SDL and Qt.
We want information... information... information.
Re: Future plans for GUI
I agree with your post for the most part. I just want to add that it is a bit worse than this. The original graphics code would involve the CPU in the drawing of every vertex in every frame. For models, that is now fixed, storing meshes on the GPU and drawing them as a whole with a few calls to the GPU. In my gfxqueue branch, I am extending this approach to everything, including fonts, lines, etc. which means that whenever they are unchanged, they are reused from attributes stored on the GPU. I don't think that writing things into textures gains a lot over this. The big jump up in performance should be getting the CPU out of every vertex.wuz21m wrote:You see these rotating models "Angry Python with Guns" on the Manufacture Menu? The matrix model is transformed, the model is actually redrawn every frame. The font rendering code? It transforms the code into UCS4, picks the glyphs (and measures the length), transforms the view matrix and renders the damn thing every frame (thus I get ~16% CPU usage just staring at the title screen). A correct approach should just use a texture and then re-use it every frame.
Re: Future plans for GUI
google profiler tools
I think the CPU usage stats we get from Valgrind are on track
Code: Select all
$ pprof --text src/warzone2100 /tmp/mybin.prof
Using local file src/warzone2100.
Using local file /tmp/mybin.prof.
Removing __funlockfile from all stack traces.
Total: 2231 samples
893 40.0% 40.0% 893 40.0% __memset_sse2
155 6.9% 47.0% 155 6.9% _init@750
131 5.9% 52.8% 131 5.9% _init@38f98
117 5.2% 58.1% 117 5.2% __ioctl
101 4.5% 62.6% 101 4.5% __driDriverGetExtensions_i965
33 1.5% 64.1% 33 1.5% drm_intel_bufmgr_fake_init
31 1.4% 65.5% 31 1.4% inflateBackEnd
26 1.2% 66.7% 26 1.2% tx_compress_dxtn
21 0.9% 67.6% 21 0.9% png_set_read_user_transform_fn
18 0.8% 68.4% 19 0.9% atmosUpdateSystem
17 0.8% 69.2% 17 0.8% edgeLessThan
16 0.7% 69.9% 16 0.7% __driDriverGetExtensions_i915
16 0.7% 70.6% 16 0.7% __fsync_nocancel
16 0.7% 71.3% 27 1.2% drawTerrain
15 0.7% 72.0% 15 0.7% __GI___poll
15 0.7% 72.7% 15 0.7% __GI___pthread_mutex_lock
14 0.6% 73.3% 14 0.6% __memcpy_avx_unaligned
13 0.6% 73.9% 15 0.7% atmosDrawParticles
10 0.4% 74.3% 27 1.2% locateMouse
(profiled from start because game source code changes are necessary to turn it on and off in run-time)
Maps | Tower Defense | NullBot AI | More NullBot AI | Scavs | More Scavs | Tilesets | Walkthrough | JSCam
Re: Future plans for GUI
I am referring to the call graph stats.
The high level graph stats (attached gif) are very similar to valgrind's (Compare to 3rd post in this thread)
Yes, the raw ones are very similar to perf.
The high level graph stats (attached gif) are very similar to valgrind's (Compare to 3rd post in this thread)
Yes, the raw ones are very similar to perf.
Re: Future plans for GUI
Kinda. Yes, we have moved to QT5, but, moving everything to Qt isn't yet possible because of performance issues.wuz21m wrote:I have read on the forums that plans were underway to port WZ2100 to QT 5. I know we are already using QT 5 to build Warzone2100.
At one point we did have font support rendered using Qt, but, we quickly switched back once we saw the performance hit.
There are no editors that I know of that uses Qt that works.I know that we are using QT 5 extensively and a map editor is already using QT 5.
There was an attempt at one, but, that hasn't been worked on in years.
The newest map editor is called SharpFlame, and that is being worked on by Cowboy. That is using C#.
Everything done in WZ is the brute force approach, and that does have some advantages.But what about the game frontend and in-game widgets? My searches didn't turn up any conclusive results. The current solution performs many re-draws and is probably a major performance bottleneck. Are there any plans in place to do something about this?
Lots of games redraw everything every frame, that isn't the the problem per se, it is how they go about drawing things that matter the most.
At one time, we had sleep() calls in the menu code to be "nicer" to laptops, and, while that isn't the best approach, that is what lots of games do these days.
To answer some of the other questions, librocket would be the HTML/CSS choice, or CEGUI for direct openGL, but, it still will be difficult to integrate no matter which one is picked (if we go that route)
/facepalm ...Grinch stole Warzone contra principia negantem non est disputandum
Super busy, don't expect a timely reply back.
Super busy, don't expect a timely reply back.
Re: Future plans for GUI
procsystime script captures and prints the system call time usage for a given process nameNoQ wrote:
- For graphics engine optimizations, you'd better look for another thread; a lot of such discussions were happening in ArtRevolution threads, and in fact a lot of new optimizations were already introduced in -master. Also, you may want to synchronize your performance analysis with [2].
- Ah, valgrind. It introduces a huge overhead, but then works around it; numbers it displays are not exactly time, but rather number of emulated processor instructions, though it's still pretty accurate. Things to be aware of:
- It doesn't take sleep times into account, which means that you really don't know the absolute CPU usage values for the functions you found. It may be "10% of almost-nothing", and after adding 10-20 units on the board it may reduce to a negligible 1%. In any case, it is much more relevant to collect profile performance-critical situations, like with 1500 units on board (the limit we say to support in a 10-player game), especially when it comes to tools that discard sleep time, like valgrind or perf (in its out-of-the-box variant).
- Time-based application logic may be scewed by the large overhead. Performance statistics for the game running @60fps and @3fps are completely different, even if the same things are happening. It might be that console redraws (or anything else) happen at game frame rather than at render frame.
- As far as i remember, there was a way to arbitrary enable and disable collecting statistics (not instrumentation, of course) in valgrind to avoid mixing in statistics from the start menu, not sure if you used it.
In RTS framerate is not as important as other games.
I agree to this, warzone spends alot of time using sleep() calls as per..
(nanosecond timestamps)
Code: Select all
# /usr/share/dtrace/toolkit/procsystime -n warzone2100
Tracing... Hit Ctrl-C to end...
dtrace: 12797 dynamic variable drops with non-empty dirty list
dtrace: 13867 dynamic variable drops with non-empty dirty list
^C
Elapsed Times for processes warzone2100,
SYSCALL TIME (ns)
writev 105663
write 1600391
select 3821487
getpid 4151995
sendto 4205746
read 4522322
recvmsg 5857182
sched_yield 15857375
ioctl 16124479
nanosleep 25845524999
_umtx_op 26613797190
poll 41294596003
In particular, many operating characteristics of
character special files (e.g., terminals) may be controlled with
ioctl() requests.
* nanosleep() (high resolution sleep) suspends the execution of the calling thread until either at least the time specified in *req has elapsed,
or the delivery of a signal that triggers the invocation of a handler in the calling thread or that terminates the process.
* poll() performs a similar task to select, waits for one of a set of file descriptors to become ready to perform I/O.
- select select() and pselect() allow a program to monitor multiple file descriptors,
waiting until one or more of the file descriptors become "ready" for some class of I/O operation (e.g., input possible).
A file descriptor is considered ready if it is possible to perform the corresponding I/O operation (e.g., read) without blocking.
* _umtx_op uncertain at this time. Perhaps FreeBSD specific
seems Warzone spends most of it's time on these main operations.
Re: Future plans for GUI
Got a flame graph of warzone from tutorial doing barely anything. http://i.imgur.com/yGZCm4o.png?1
(flame graph steps taken posted here viewtopic.php?f=32&t=12021)
Most of the stack traces on the right hand side are just libnvidia-glcore.so. And I believe having all the highest graphic settings are obscuring my results compared to above. Shall I use bare minimal for graphics or something specific? (and different hardware)
Compared against above graph mine shows:
pie_Draw3DShape = 14.35%
inDisplayWidgets = 0.26%
displayConsoleMessages = 2.93%
atmosUpdateSystem = 1.97%
atmosDrawParticles = 19.47%
displayFeatures = 2.80%
pie_drawShadow - had trouble finding.
(flame graph steps taken posted here viewtopic.php?f=32&t=12021)
Most of the stack traces on the right hand side are just libnvidia-glcore.so. And I believe having all the highest graphic settings are obscuring my results compared to above. Shall I use bare minimal for graphics or something specific? (and different hardware)
Compared against above graph mine shows:
pie_Draw3DShape = 14.35%
inDisplayWidgets = 0.26%
displayConsoleMessages = 2.93%
atmosUpdateSystem = 1.97%
atmosDrawParticles = 19.47%
displayFeatures = 2.80%
pie_drawShadow - had trouble finding.
Re: Future plans for GUI
Oh wow.
22% spent over particles (My measurements were ~7.7%). Were you just viewing the particles? Did you pan or zoom or do anything?
You are using a powerful Nvidia GPU versus my Intel GPU. So you are probably wasting less CPU time on rendering. atmosDrawParticles is called. If you look at the warzone2100 source code, the function is rather simplistic, it iterates over the particles (65536 if I am not mistaken), and calls renderParticle every time. This function in turn, ...pushes and pops the matrix every time (calls pie_MatBegin and pie_MatEnd)
Why are we not making cached calls?
22% spent over particles (My measurements were ~7.7%). Were you just viewing the particles? Did you pan or zoom or do anything?
You are using a powerful Nvidia GPU versus my Intel GPU. So you are probably wasting less CPU time on rendering. atmosDrawParticles is called. If you look at the warzone2100 source code, the function is rather simplistic, it iterates over the particles (65536 if I am not mistaken), and calls renderParticle every time. This function in turn, ...pushes and pops the matrix every time (calls pie_MatBegin and pie_MatEnd)
Why are we not making cached calls?
Last edited by wuz21m on 11 Feb 2015, 19:19, edited 1 time in total.
Re: Future plans for GUI
I just tried changing the calls to cached ones. Tzeentch, can you do these:
1- Go to renderParticle in atmos.cpp and change pie_MatBegin() to pie_MatBegin(true). The visual results are not different.
2- Monitor (a) CPU usage percentage (e.g. 45% of one core) (b) share of atmosDrawParticles and renderParticle in CPU usage. Once with no code changes and once with the change above applied.
3- Make sure the system is not doing much. Use a fixed duration (e.g. 3 minutes).
4- Run warzone with these parameters: src/warzone2100 --game TUTORIAL3 (so you would skip the main menu).
Thanks!
1- Go to renderParticle in atmos.cpp and change pie_MatBegin() to pie_MatBegin(true). The visual results are not different.
2- Monitor (a) CPU usage percentage (e.g. 45% of one core) (b) share of atmosDrawParticles and renderParticle in CPU usage. Once with no code changes and once with the change above applied.
3- Make sure the system is not doing much. Use a fixed duration (e.g. 3 minutes).
4- Run warzone with these parameters: src/warzone2100 --game TUTORIAL3 (so you would skip the main menu).
Thanks!
Re: Future plans for GUI
Intend to work on this ASAP, just having issues -> viewtopic.php?f=43&t=11974#p131029
Re: Future plans for GUI
Solaris 11.2 taking time... so using Ubuntu with perf in meantime. (As most I imagine will be using this.) Although note I didn't do this with the Nvidia driver configured some reason isn't working.
17.0% - 47.3% CPU usage warzone in tutorial depending on messages before or after any changes, difficult to gauge on Ubuntu as perforamnce is very different compared to using freeBSB (due to difference in packages most likely). Using Perf 60 second measurement, fold stacks to single line then convert to flamegraph. (I zoomed out fully proir to exectution in both tests) graphic settings on maximum.
Before
pie_MatBegin - 0.71%, Samples taken 16
pie_Draw3DShape - 1.52%, Samples taken 34
renderParticle - 3.98%, Samples taken 89
atmosDrawParticles - 5.94%, Samples taken 133
After change
pie_MatBegin - 0.36%, Samples taken 5
pie_Draw3DShape - 0.66%, Samples taken 9
renderParticle - 2.26%, Samples taken 31
atmosDrawParticles - 3.93%, Samples taken 54
Excellent! Lower to the point where after change atmosDrawParticles is even less than both atmosDrawParticles & renderParticle. I am happy. Was this expected? I need to double check and compare this on other setups. From this all code paths above pie_Draw3DShape are using half the CPU time.
What shall we do to improve performance elsewhere? Should this be moving to a separate thread, as it'll probs go off topic unless we stick to GUI performance I guess
Code: Select all
The game parameter requires one of the following keywords:CAM_1A, CAM_2A, CAM_3A, TUTORIAL3, or FASTPLAY.
root@System-Product-Name:/home/test/war-test# src/warzone2100 --game TUTORIAL3
Unrecognized option: TUTORIAL3
17.0% - 47.3% CPU usage warzone in tutorial depending on messages before or after any changes, difficult to gauge on Ubuntu as perforamnce is very different compared to using freeBSB (due to difference in packages most likely). Using Perf 60 second measurement, fold stacks to single line then convert to flamegraph. (I zoomed out fully proir to exectution in both tests) graphic settings on maximum.
Code: Select all
perf record -F 99 -p 16323 -g -- sleep 60
perf script | ./stackcollapse-perf.pl > out.perf-folded
./flamegraph.pl out.perf-folded > perf.svg
pie_MatBegin - 0.71%, Samples taken 16
pie_Draw3DShape - 1.52%, Samples taken 34
renderParticle - 3.98%, Samples taken 89
atmosDrawParticles - 5.94%, Samples taken 133
After change
pie_MatBegin - 0.36%, Samples taken 5
pie_Draw3DShape - 0.66%, Samples taken 9
renderParticle - 2.26%, Samples taken 31
atmosDrawParticles - 3.93%, Samples taken 54
Excellent! Lower to the point where after change atmosDrawParticles is even less than both atmosDrawParticles & renderParticle. I am happy. Was this expected? I need to double check and compare this on other setups. From this all code paths above pie_Draw3DShape are using half the CPU time.
What shall we do to improve performance elsewhere? Should this be moving to a separate thread, as it'll probs go off topic unless we stick to GUI performance I guess