Dtrace Vsync

For code related discussions and questions
Post Reply
User avatar
Tzeentch
Trained
Trained
Posts: 242
Joined: 14 Oct 2012, 14:24

Dtrace Vsync

Post by Tzeentch » 07 Feb 2015, 02:30

It has been said to me (along with evidence, specifically this point viewtopic.php?f=32&t=11978#p130616) that performance issues ( more CPU activity) occurs when Vsync is disabled, as memset() operations are dramatically higher.

Thus soon as I disable Vsync CPU usage climbs all the way up to 100% from the main menu!! O_o

Code: Select all

last pid:  2407;  load averages:  1.38,  0.98,  0.59                                                          up 0+04:36:32  23:37:41
100 processes: 1 running, 98 sleeping, 1 zombie
CPU 0:  8.6% user,  0.0% nice,  8.2% system,  0.0% interrupt, 83.1% idle
CPU 1:  5.9% user,  0.0% nice,  4.3% system,  3.1% interrupt, 86.7% idle
CPU 2: 35.7% user,  0.0% nice, 12.5% system,  0.0% interrupt, 51.8% idle
CPU 3: 24.3% user,  0.0% nice, 12.2% system,  0.4% interrupt, 63.1% idle
Mem: 2399M Active, 3537M Inact, 3075M Wired, 5064K Cache, 6873M Free
ARC: 2270M Total, 1441M MFU, 776M MRU, 18K Anon, 26M Header, 27M Other
Swap: 2048M Total, 2048M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
 2364 scott         5  20    0   390M 94336K usem    0   5:57 100.00% warzone2100
  794 root          1  23    0  3261M 69444K select  0   5:43   5.96% Xorg
Checking memset entry points with Vsync disabled, while in the main menu (CPU constantly at 100%)

Code: Select all

# dtrace -n 'pid$target::memset:entry { @ = quantize(arg2); }' -p 2364
dtrace: description 'pid$target::memset:entry ' matched 1 probe
^C


           value  ------------- Distribution ------------- count    
               2 |                                         0        
               4 |@@                                       285216   
               8 |@@@@@@@@@@@@@@                           1853903  
              16 |@@                                       228172   
              32 |@@                                       285216   
              64 |@@@@@@@@@                                1184353  
             128 |@@                                       222909   
             256 |@@                                       330111   
             512 |@@@@@                                    688332   
            1024 |@@@                                      350560   
            2048 |                                         0        
            4096 |                                         7118     
            8192 |                                         1188     
           16384 |                                         0 
           
8 being the most common operation, printing this out with user-space stack trace.

Code: Select all

# dtrace -n 'pid$target::memset:entry /arg2 == 8/ { ustack(); }' -p 2364


  2  59411                     memset:entry 
              libc.so.7`memset+0x1
              libfreetype.so.6.11.2`ft_mem_realloc+0x86
              libfreetype.so.6.11.2`FT_CMap_New+0x104
              libfreetype.so.6.11.2`0x80237ab68
              libfreetype.so.6.11.2`0x802376e20
              libfreetype.so.6.11.2`0x8023429b7
              libfreetype.so.6.11.2`0x8023328da
              libfreetype.so.6.11.2`FT_Open_Face+0x254
              libfreetype.so.6.11.2`FT_New_Face+0x2e
              warzone2100`__glcFaceDescOpen+0x6b
              warzone2100`glcFont+0x9a
              warzone2100`displayTextOption(WIDGET*, unsigned int, unsigned int, PIELIGHT*)+0x2a
              warzone2100`0x601718
              warzone2100`0x601703
              warzone2100`0x601703
              warzone2100`widgDisplayScreen(W_SCREEN*)+0x50
              warzone2100`runTitleMenu(void)+0x1b1
              warzone2100`titleLoop(void)+0x148
              warzone2100`mainLoop(void)+0x44
              warzone2100`wzMain3(void)+0x4d

  2  59411                     memset:entry 
              libc.so.7`memset+0x1
              libfreetype.so.6.11.2`ft_mem_realloc+0x86
              libfreetype.so.6.11.2`FT_CMap_New+0x104
              libfreetype.so.6.11.2`0x80237ab68
              libfreetype.so.6.11.2`0x802376e20
              libfreetype.so.6.11.2`0x8023429b7
              libfreetype.so.6.11.2`0x8023328da
              libfreetype.so.6.11.2`FT_Open_Face+0x254
              libfreetype.so.6.11.2`FT_New_Face+0x2e
              warzone2100`__glcFaceDescOpen+0x6b
              warzone2100`glcFont+0x9a
              warzone2100`displayTextOption(WIDGET*, unsigned int, unsigned int, PIELIGHT*)+0x2a
              warzone2100`0x601718
              warzone2100`0x601703
              warzone2100`0x601703
              warzone2100`widgDisplayScreen(W_SCREEN*)+0x50
              warzone2100`runTitleMenu(void)+0x1b1
              warzone2100`titleLoop(void)+0x148
              warzone2100`mainLoop(void)+0x44
              warzone2100`wzMain3(void)+0x4d

  2  59411                     memset:entry 
              libc.so.7`memset+0x1
              libfreetype.so.6.11.2`ft_mem_realloc+0x86
              libfreetype.so.6.11.2`FT_CMap_New+0x104
              libfreetype.so.6.11.2`0x80237ab68
              libfreetype.so.6.11.2`0x802376e20
              libfreetype.so.6.11.2`0x8023429b7
              libfreetype.so.6.11.2`0x8023328da
              libfreetype.so.6.11.2`FT_Open_Face+0x254
              libfreetype.so.6.11.2`FT_New_Face+0x2e
              warzone2100`__glcFaceDescOpen+0x6b
              warzone2100`glcFont+0x9a
              warzone2100`displayTextOption(WIDGET*, unsigned int, unsigned int, PIELIGHT*)+0x2a
              warzone2100`0x601718
              warzone2100`0x601703
              warzone2100`0x601703
              warzone2100`widgDisplayScreen(W_SCREEN*)+0x50
              warzone2100`runTitleMenu(void)+0x1b1
              warzone2100`titleLoop(void)+0x148
              warzone2100`mainLoop(void)+0x44
              warzone2100`wzMain3(void)+0x4d
              
And due to system % in top output then checked time spent in Syscalls from main menu

Code: Select all

# /usr/share/dtrace/toolkit/procsystime -n warzone2100        
              
  Elapsed Times for processes warzone2100,

         SYSCALL          TIME (ns)
           write             318103
          sendto             861108
            read            1178832
          getpid            9493368
          select           12269129
         recvmsg           13732521
           fcntl           20031746
     sched_yield           29221444
           fstat           66409512
           close           70381193
     sigprocmask           83982928
            mmap          204389463
            open          302234442
          munmap          376817672
            poll         5070627482
        _umtx_op         5241236355
       nanosleep         5658388868
Decided to try a skirmish game vs 3 Nexus AI's , Vsync still disabled and quite different memset operations but CPU remains very high, (100-110%):

Code: Select all

# dtrace -n 'pid$target::memset:entry { @ = quantize(arg2); }' -p 2471
dtrace: description 'pid$target::memset:entry ' matched 1 probe
^C


           value  ------------- Distribution ------------- count    
               0 |                                         0        
               1 |@                                        39955    
               2 |@                                        33772    
               4 |@@@                                      98280    
               8 |                                         5824     
              16 |@                                        34940    
              32 |                                         8        
              64 |@@@@@@@@@@@@@@@@@@@@@@@@@                878290   
             128 |@@@                                      121742   
             256 |@@@                                      100907   
             512 |                                         67       
            1024 |@@@                                      112253   
            2048 |                                         12       
            4096 |                                         5446     
            8192 |                                         1085     
           16384 |                                         40       
           32768 |                                         0        
           65536 |                                         1243     
          131072 |                                         0        

Again user-space stack trace for most common operations.

Code: Select all

# dtrace -n 'pid$target::memset:entry /arg2 == 64/ { ustack(); }' -p 2471
dtrace: description 'pid$target::memset:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  1  59411                     memset:entry 
              libc.so.7`memset+0x1
              libc.so.7`__calloc+0xf52
              libpulse.so.0.12.4`pa_xmalloc0+0x25
              libpulse.so.0.12.4`0x80f41ea26
              libpulsecommon-0.9.23.so`pa_pdispatch_register_reply+0xc7
              libpulse.so.0.12.4`pa_stream_update_timing_info+0x153
              libpulse.so.0.12.4`0x80f424b92
              libpulse.so.0.12.4`0x80f42b96b
              libpulse.so.0.12.4`pa_mainloop_dispatch+0x12b
              libpulse.so.0.12.4`pa_mainloop_iterate+0x3a
              libpulse.so.0.12.4`pa_mainloop_run+0x20
              libpulse.so.0.12.4`0x80f42c24c
              libpulsecommon-0.9.23.so`0x80f67a477
              libthr.so.3`0x8028034f5

  2  59411                     memset:entry 
              libc.so.7`memset+0x1
              libc.so.7`__calloc+0xf52
              libpulse.so.0.12.4`pa_xmalloc0+0x25
              libpulse.so.0.12.4`0x80f41ea26
              libpulsecommon-0.9.23.so`pa_pdispatch_register_reply+0xc7
              libpulse.so.0.12.4`pa_stream_update_timing_info+0x153
              libpulse.so.0.12.4`0x80f424b92
              libpulse.so.0.12.4`0x80f42b96b
              libpulse.so.0.12.4`pa_mainloop_dispatch+0x12b
              libpulse.so.0.12.4`pa_mainloop_iterate+0x3a
              libpulse.so.0.12.4`pa_mainloop_run+0x20
              libpulse.so.0.12.4`0x80f42c24c
              libpulsecommon-0.9.23.so`0x80f67a477
              libthr.so.3`0x8028034f5
            
Syscalls (nano timestamps) during vs 3 Nexus AI's.

Code: Select all

# /usr/share/dtrace/toolkit/procsystime -n warzone2100
Tracing... Hit Ctrl-C to end...
^C

Elapsed Times for processes warzone2100,

         SYSCALL          TIME (ns)
     sigprocmask               1064
          writev               7905
           write             639880
          sendto            1667371
            read            2461483
          getpid           10656228
          select           11675878
         recvmsg           13243572
     sched_yield           97018093
            poll        10328725562
       nanosleep        10635305240
        _umtx_op        33636475465
This to me shows that Noq's right as memsets are much more frequent, and CPU becomes the bottleneck as an effect. It would be useful to view the amount of frames my machine is attempting to display. Being as I've got a high end graphics card I wonder if this is displaying very high frames and causing the CPU to not be able to keep up with the amount of frequent calls being made. Their wouldn't be much point trying to produce more frames than 60fps as this is the most the human eye is able to view and many monitors only support 60hz ( refresh screen 60 times a second ) so maybe a limit to max of 60 either way :hmm:

Also obtained a Flame Graph but having issue attaching to tickets :(
More or less grey

User avatar
Powzone
Trained
Trained
Posts: 332
Joined: 17 Feb 2008, 17:25
Location: Germany

Re: Dtrace Vsync

Post by Powzone » 12 Feb 2015, 21:57

Tzeentch wrote:It would be useful to view the amount of frames my machine is attempting to display.
You can enable the FPS counter in the config file. Just change "showFPS=0" to "showFPS=1". Though me being on a GTX750 turning off VSync only gives me around 360-370fps on a quick test starting a new campaign and doing nothing. 100% CPU Usage on one core also happpens to me. Seems legit.
PC: Manjaro 17 XFCE / Intel Core i7 6700 / 32GB RAM / NVidia GTX1050Ti
NB: Manjaro 17 XFCE / Atom 1,6GHz / 2GB RAM / Intel 945 Express

User avatar
Tzeentch
Trained
Trained
Posts: 242
Joined: 14 Oct 2012, 14:24

Re: Dtrace Vsync

Post by Tzeentch » 13 Feb 2015, 10:26

thanks, I just checked and I do almost 1,000 fps. (981-989fps). (with all graphics on max) So going back to my end comment why bother displaying more frames than 60? Makes this option seem pointless if we are able to retain 60fps without high CPU. Think a vote is needed regarding this option as I see no purpose for it at present
More or less grey

wuz21m
Trained
Trained
Posts: 36
Joined: 18 Dec 2014, 20:57

Re: Dtrace Vsync

Post by wuz21m » 13 Feb 2015, 11:26

The idea about vSync is that you don't need to re-render something faster than your monitor can draw (60Hz). Thus we preserve CPU and GPU. Otherwise, you will be drawing as fast as your computer can until you saturate CPU or GPU.

Disabling vSync only works for specific scenarios.

P.S.: About memset, I don't think it is really important for performance (unless DTrace reports a significant CPU share for that function). I think malloc/calloc are much more important suspects.

User avatar
Tzeentch
Trained
Trained
Posts: 242
Joined: 14 Oct 2012, 14:24

Re: Dtrace Vsync

Post by Tzeentch » 21 Feb 2015, 10:10

Fair enough, I'm just wondering what specific scenario it would be useful for.

CPU is defo the first bottleneck, then again I have a high end GPU so perhaps a lower spec model would cause that to bottleneck first.

I was informed to trace memsets as opposed to mallocs, in future I can just check both from now on.
More or less grey

stiv
Warzone 2100 Team Member
Warzone 2100 Team Member
Posts: 876
Joined: 18 Jul 2008, 04:41
Location: 45N 86W

Re: Dtrace Vsync

Post by stiv » 18 Mar 2015, 04:51

Memset is not an expensive operation. Malloc, on the other hand, is. It comes in at #2 right after disk I/O.

Per
Warzone 2100 Team Member
Warzone 2100 Team Member
Posts: 3755
Joined: 03 Aug 2006, 19:39

Re: Dtrace Vsync

Post by Per » 18 Mar 2015, 10:24

stiv wrote:Memset is not an expensive operation. Malloc, on the other hand, is. It comes in at #2 right after disk I/O.
According to what now?

Malloc on linux is quite cheap. Memset on a large enough chunk of memory can be really slow (which is why you always want to use calloc rather than malloc + memset).

User avatar
NoQ
Special
Special
Posts: 6197
Joined: 24 Dec 2009, 11:35
Location: /var/zone

Re: Dtrace Vsync

Post by NoQ » 18 Mar 2015, 19:10

Cloning a dinosaur might be an expensive operation, but it doesn't mean that optimizing this operation is more important than optimizing the rendering engine, unless we plan to mass-produce them to fund our web server...

User avatar
Tzeentch
Trained
Trained
Posts: 242
Joined: 14 Oct 2012, 14:24

Re: Dtrace Vsync

Post by Tzeentch » 19 Mar 2015, 02:19

Can we account for malloc across all separate OS's as well? I imagine so.

What usually is expensive is not just how much you request but more so how often you make those calls. It is the bottleneck of application -> library -> system call interface and back up to acknowledge the request for memory and the subsequent allocations completion. (Warzone only uses small amounts of memory but does request fairly often) it's more efficient to do these in batch operations where possible. Perhaps I'm going off topic here, sounds like we have bigger issues so my main Q is what should I be checking for or investing time to look for potential improvements? Nullbots higher CPU use, slowest application functions or something totally different. Some of this is probably already known. Guess I just need some general guidance
More or less grey

User avatar
vexed
Inactive
Inactive
Posts: 2471
Joined: 27 Jul 2010, 02:07

Re: Dtrace Vsync

Post by vexed » 19 Mar 2015, 04:24

Per wrote:
stiv wrote:Memset is not an expensive operation. Malloc, on the other hand, is. It comes in at #2 right after disk I/O.
According to what now?

Malloc on linux is quite cheap. Memset on a large enough chunk of memory can be really slow (which is why you always want to use calloc rather than malloc + memset).
We are talking about WZ are we not, and we are not doing memsets on very large chunks of memory to make any impact on things vs calloc here. Most of WZ's allocations are small while the game is running.
This is getting OT though, since this thread is about Vsync, and the only thing that matters here is how fast your machine is between frames, since now you make the GPU wait for the refresh signals if it is on.

And for what it is worth, most of those above are from the drivers, and not WZ directly.
/facepalm ...Grinch stole Warzone🙈🙉🙊 contra principia negantem non est disputandum
Super busy, don't expect a timely reply back.

Post Reply