small and easy speed improvements

Discuss the future of Warzone 2100 with us.
User avatar
kage
Regular
Regular
Posts: 751
Joined: 05 Dec 2006, 21:45

small and easy speed improvements

Post by kage » 09 Dec 2006, 12:49

as i'm going through the code, i'm finding a whole lot of stuff that is written in the traditional "declare variables at top of function and do everything else afterwards" style -- while that has it's place and does help make functions somewhat more understandable, i've found a lot stuff like the following code

Code: Select all

BOOL audio_Update( void )
{
	//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	SDWVEC3D		vecPlayer;
	SDWORD			iA;
	AUDIO_SAMPLE	*psSample, *psSampleTemp;
	//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	// if audio not enabled return TRUE to carry on game without audio
	if ( g_bAudioEnabled == FALSE )
	{
		return TRUE;
	}

	audio_UpdateQueue();

        ////// rest of function follows ...
as you can see, variables are declared (making the processor allocate memory for them every time the function is run even though the function may immediately return), and after that, there is a conditional check as to whether the function should continue running... i could prove a point by showing you the assembly code those 3 lines generate (if i could get my source to compile -- i haven't really tried yet since i can't effectively play warzone on any available comp). granted, this particular function is usually not going to return right away, since most players would presumably be using sound, but it is one function among many that works like this.

for functions that have a 50/50 or greater chance of returning right away, we're looking at several wasted instructions every single tick, which is currently, iirc, 1000 times per second. while fixing all of these functions to do early checks before variable declaration probably wont give the majority of us even 1 extra frame per second, it takes about 3 seconds to fix each function if you come across it, and if none of you are inclined, i'd be happy to go and fix these as i see them, as they can only (marginally) help, and can't really cause any bugs (unless your compiler is buggy).

Kamaze
Regular
Regular
Posts: 1017
Joined: 30 Jul 2006, 15:23

Re: small and easy speed improvements

Post by Kamaze » 09 Dec 2006, 14:40

Dosn't the compiler optimize such things? (Yes, this is anyway bad code...)
We all have the same heaven, but not the same horizon.

User avatar
Watermelon
Code contributor
Code contributor
Posts: 551
Joined: 08 Oct 2006, 09:37

Re: small and easy speed improvements

Post by Watermelon » 09 Dec 2006, 15:26

kage wrote: as i'm going through the code, i'm finding a whole lot of stuff that is written in the traditional "declare variables at top of function and do everything else afterwards" style -- while that has it's place and does help make functions somewhat more understandable, i've found a lot stuff like the following code

Code: Select all

BOOL audio_Update( void )
{
	//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	SDWVEC3D		vecPlayer;
	SDWORD			iA;
	AUDIO_SAMPLE	*psSample, *psSampleTemp;
	//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	// if audio not enabled return TRUE to carry on game without audio
	if ( g_bAudioEnabled == FALSE )
	{
		return TRUE;
	}

	audio_UpdateQueue();

        ////// rest of function follows ...
as you can see, variables are declared (making the processor allocate memory for them every time the function is run even though the function may immediately return), and after that, there is a conditional check as to whether the function should continue running... i could prove a point by showing you the assembly code those 3 lines generate (if i could get my source to compile -- i haven't really tried yet since i can't effectively play warzone on any available comp). granted, this particular function is usually not going to return right away, since most players would presumably be using sound, but it is one function among many that works like this.

for functions that have a 50/50 or greater chance of returning right away, we're looking at several wasted instructions every single tick, which is currently, iirc, 1000 times per second. while fixing all of these functions to do early checks before variable declaration probably wont give the majority of us even 1 extra frame per second, it takes about 3 seconds to fix each function if you come across it, and if none of you are inclined, i'd be happy to go and fix these as i see them, as they can only (marginally) help, and can't really cause any bugs (unless your compiler is buggy).
yes almost all functions in the source files I touched use this kind of variable declaration style,probably it will hog some cpu time if the compiler cannot optimize them efficiently(tons of mov to register),esp the ones get executed every frame/tick.Also I think 1000 ticks per a second is probably too much,think modern games run at 100? ticks/server frame per second at max...

delaying variable declaration/reusing variable are always 'good' optimization techniques imo.
tasks postponed until the trunk is relatively stable again.

User avatar
DevUrandom
Regular
Regular
Posts: 1690
Joined: 31 Jul 2006, 23:14

Re: small and easy speed improvements

Post by DevUrandom » 09 Dec 2006, 17:22

Yes, and if you reuse your little "int i" often enough, you can be sure that no one understands your code anymore...

User avatar
kage
Regular
Regular
Posts: 751
Joined: 05 Dec 2006, 21:45

Re: small and easy speed improvements

Post by kage » 10 Dec 2006, 02:13

Kamaze wrote: Dosn't the compiler optimize such things? (Yes, this is anyway bad code...)
not everyone has a good compiler. but, let's say we only have to deal with gcc, which should do that, function inlining, and a whole lot of other fun stuff: it's still no excuse for bad code, since it is only a program -- it will do things for you if you know what you're doing, but you can't rely to be able to read your mind, or be able to fix bad but nonetheless valid code (that's why humans are still in the mix).
Watermelon wrote: yes almost all functions in the source files I touched use this kind of variable declaration style,probably it will hog some cpu time if the compiler cannot optimize them efficiently(tons of mov to register),esp the ones get executed every frame/tick.
it makes me cry too.
Watermelon wrote: Also I think 1000 ticks per a second is probably too much,think modern games run at 100? ticks/server frame per second at max...
i agree with you there: hopping it down to 100 would free up a whole lot more cpu cycles (an order of magnitude, to be exact), though i really hate to use this to support one of my other points: we can't get away with reducing the tick count to 100 if the collision detection system uses the "have you hit me now?" tick-by-tick approach -- that's probably the only reason there are 1000 ticks per second. if you make the time interval 10 times smaller, then projectiles have to move 10 times further each tick, and at that, with projectiles at their current speed, you'd find that heat shells may well have a 50/50 chance of appearing on the other side of an enemy tank the next tick, bypassing it entirely.  the only way to reclaim these ticks is by completely removing any and all forms of collision detection that rely on tick-by-tick checking to see if one moving object currently exists in the same location as another -- there is literally no other way to do it.

if we replace the current collision detection checks with smart trigonometric checks, such as raycasting (as devurandom mentioned in another post) and an event-queue system for almost all in-game actions, you could actually reduce the tick frequency to 1 tick per second, and everything would occur exactly the same, in regards to collision detection as if you were running at 10 million ticks per second (a collision detection system which never makes mistakes, such as space-warping past an object it should've hit, is pretty nice).

and with any "look ahead" collision detection system (such as ray-casting), in almost all cases, doing collision detection every tick (or every other tick, or at any repeating interval) is a complete waste of time, and really shouldn't be done, since look ahead systems can "look ahead" for an arbitrary number of ticks, and can be used to figure out the first possible moment that any given object could collide with anything else -- only at that "first possible moment" would you check the future collision possibilities for that moving object again.
Watermelon wrote: delaying variable declaration/reusing variable are always 'good' optimization techniques imo.
DevUrandom wrote: Yes, and if you reuse your little "int i" often enough, you can be sure that no one understands your code anymore...
i just thought of something, but i don't think i've ever seen it used before (either because no one's done it, or it's a really really bad idea and no one wants to do it):

take advantage of the c pre-processor... that way both of you two are right and a compromise is made (only one actual variable is used, but it is given distinct names for each use):

as an example, take the following code example (has no relation to anything):

Code: Select all

void dosomething( void ) {
  int num_objects;
  num_objects = count_objects();

  /* code here to use num_objects */
  /* as of now, we will have no further use for num_objects
     we could collect it from memory (i forget how to do this
     in c, but i think it's del in c++) or just let it sit until it is
     dereferenced when the function is returned... for emphasis
     on clean, efficient code, i'll del it now */

  del num_objects;
  /* deleting num_objects requires a few processor instructions,
      so while removing it frees up some memory, you don't get
      to magically del things for free. note if it wasn't done now,
      the same thing would've been done when the function
      returns */

  int last_array_index = -1;
  int arr[1000];

  while ( last_array_index < 23 ) {
    arr[ ++last_array_index ] = last_array_index * 2;
  }
  
  for ( int i = last_array_index; i >= 0; i-- )
    printf ( "position: %d, value: %d", i, arr[ i ] );
}
in the above traditional approach, all variables are clearly distinct, but you have 2 int variables that are used at seperate times (no overlap of use), so the variable num_objects could be used again instead of creating a new distinct variable, but this, as dev pointed out, would mislead the reader, and reclaiming resources from num_objects takes processor time no matter what (either if done explicitly when it is no longer needed, which is preferred, or automatically when the function returns). from the standpoint of code readability, there must be two seperate variables, but from the standpoint of code performance, there must be only one, which brings us to the next code piece:

Code: Select all

void dosomething( void ) {
  int num_objects;
  num_objects = count_objects();

  /* code here to use num_objects */
  /* as of now, we will have no further use for "num_objects" */

  #define last_array_index    num_objects
  /* we're making all instances of the "last_array_index" actually
      point to the "num_objects" variable, so that the now unused
      variable can be usefully reused under a different name */
  
  last_array_index = -1;
  int arr[1000];

  while ( last_array_index < 23 ) {
    arr[ ++last_array_index ] = last_array_index * 2;
  }

  #undef last_array_index
  /* virtual garbage collection: we're no longer using this alias to the
     num_objects variable, and we don't want it to conflict with other
     uses of it throughout the source code: this is absolutely mandatory
     and at least at the end of every function, there should be one of
     these undef statements for each one created. */
  
  for ( int i = last_array_index; i >= 0; i-- )
    printf ( "position: %d, value: %d", i, arr[ i ] );
}
as you can see above, as far as the code is concerned after the preprocessor runs through, all instances of last_array_index will be turned into num_object, and will use the exact same variable in memory, but is distinct and meaningful in the code, and except for the #define statement (the virtual variable declaration), it looks exactly like normal c code. the only downside, which is a big downside, is that you need to #undef each "variable alias" at the bottom of the code block in which it is used (usually a function), or you'll get a lot of "undeclared variable num_objects" style errors, but, if you can remember your undefs (just create an #undef at the bottom of the function as soon as you finish typing the corresponding #define, then this shouldn't be a problem, and you'll have very lean code that is no less readable than it was before.

so given the above, you can just reuse no-longer-used variables until you run out of old ones of that type, in which case you create a new one, and then link aliases to that when it is no longer used.

as an example of the differences between this approach and a modern approach:

if your code is organized so that any given variable is not used anywhere it's not needed, and is fairly isolated, you might be able to have a function using the same 3 ints throughout, when before it would've needed 12 or 14 to keep it readable, with all the extra memory use and dereferencing that goes with managing those.

also, on a side note, you really should only be using single-letter variable names, such as i or j for loop iterations, so reusing "i" in anything but the context of a loop iteration is about as confusing as you can get, but when you reuse it an unrelated loop, it's pretty well understood that there's no connection between this loop and the last one (assuming you reset i to 0, or whatever you need)

User avatar
Watermelon
Code contributor
Code contributor
Posts: 551
Joined: 08 Oct 2006, 09:37

Re: small and easy speed improvements

Post by Watermelon » 10 Dec 2006, 10:12

reusing 'i' wont make the source hard to read as long as the quantity variable to compare with 'i' is descriptive,and 'i' is a magic character which stands for both integer and iterator,'j' is only used when a function has nested loops...

raycast is probably an overkill in this case,we dont need that precise per pixel collision check in a rts game.world update is optimal for a server/client game,not very usefuly for wz 'sync land' sytle,raycast would probably become a havoc in wz sync multiplayer games.Also wz has its own raycast implementation,no need to rely on external engines...
tasks postponed until the trunk is relatively stable again.

User avatar
DevUrandom
Regular
Regular
Posts: 1690
Joined: 31 Jul 2006, 23:14

Re: small and easy speed improvements

Post by DevUrandom » 10 Dec 2006, 12:44

Raycasting doesn't necessarily mean to be pixel precise. Just a check if a ray hits an object and if yes when.

Also I can bet that there are exactly 60 or less ticks in each second. (And I will win this bet, unless you modify the sourcecode. ;) )
We limit the framerate to 60 fps (it would eat 100% CPU otherwise and doesn't bring much benefit graphics-wise). But as WZ is not threaded, this also means that all other parts of the game run at 60 fps, too.

I think writing something like the following would be optimized good by the compiler and wouldn't need any nasty macro definition.

Code: Select all

void function(void)
{
  {
    int var1;
    /* Do something */
  }
  {
    int var2;
    /* Do another thing */
  }
}
This way the compiler still can't read my mind, but I told it what I think.

User avatar
kage
Regular
Regular
Posts: 751
Joined: 05 Dec 2006, 21:45

Re: small and easy speed improvements

Post by kage » 10 Dec 2006, 13:19

DevUrandom wrote: Raycasting doesn't necessarily mean to be pixel precise. Just a check if a ray hits an object and if yes when.
yes indeed. in all games i know of that use pixel-perfect (well, poly-perfect) ray casting for final collision resolution, 95% of raycasting used is "test ray against bounding box", and then if, and only if, there is bounding box intersection will anything more accurate occur, and often enough engines might determine sub-bounding boxes for convex shapes, to save even more processing. using "time-stretched bounding box vs bounding box intersection" detection is a form of raycasting that is far more accurate and at the same time less resource intensive than "discrete 1-dimensional vector vs bounding box" detection. the math is quite sound, and any game nowadays that's designed from the ground up is done insanely if it doesn't do a form of raycasting collision (or something more advanced).
DevUrandom wrote: We limit the framerate to 60 fps (it would eat 100% CPU otherwise and doesn't bring much benefit graphics-wise). But as WZ is not threaded, this also means that all other parts of the game run at 60 fps, too.
i found, in /lib/gamelib/gtime.h this very interesting piece of code, which i got from svn 2 days ago, and it has not been modified since.

Code: Select all

/* The number of ticks per second for the game clock */
#ifdef WIN32
#define GAME_TICKS_PER_SEC              1000
#else
#define GAME_TICKS_PER_SEC              1000
DevUrandom wrote: Also I can bet that there are exactly 60 or less ticks in each second. (And I will win this bet, unless you modify the sourcecode. ;) )
i thank you for your contribution: how shall i collect my winnings?

of interest as well, in scripttabs.h, you'll find

Code: Select all

// How many game ticks for one event tick
#define SCR_TICKRATE    100
which seems to indicate that scripts run 10 times per second, based on the above.
DevUrandom wrote: I think writing something like the following would be optimized good by the compiler and wouldn't need any nasty macro definition.

Code: Select all

void function(void)
{
  {
    int var1;
    /* Do something */
  }
  {
    int var2;
    /* Do another thing */
  }
}
This way the compiler still can't read my mind, but I told it what I think.
i don't know if msvc does this, but yes, i do know remember that, unless i'm way off, gcc does some nice variable reuse checks, and should reorder that function so that var1 is declared above the first block, and var2 is really the same thing as var1, though as soon as the second code block ends, var2 (which is really to be var1) is implicitly dealloc'd (which is a good thing).

that's really much much better than what i was thinking.

Edit: Fix

Code: Select all

 block
Last edited by DevUrandom on 10 Dec 2006, 16:20, edited 1 time in total.

karmazilla
Trained
Trained
Posts: 84
Joined: 26 Aug 2006, 21:05

Re: small and easy speed improvements

Post by karmazilla » 10 Dec 2006, 13:44

#define last_array_index    num_objects
Using a meta-language to micro-optimize away sizeof(int) bits of memory in a stack frame. I'de call this an ugly hack; bad style unless you're doing embedded programming in a super-constrained invironment.

There are three levels of performance optimizations:
  • Algorithm:
    Measuring: Big-O notation. Possible gain: several orders of magnitude
  • Design:
    Measuring: Seconds or Milliseconds. Possible gain: up to an order of magnitude
  • Implementation:
    Measuring: Milli-, Micro- or Nanoseconds. Possible gain: only noticable in very performance critical code paths (libc, process schedulars and file systems comes to mind)
Many of the optimizations proposed here are of the implementation class, and while I agree that we should think about performance when writing code, it usually dosn't pay to priotize implementation optimizations over code readability and maintainability.
And personally, I'de spend very little time refactoring existing code to optimize the implementation - it could easily turn out to be a waste of time because
a) the compiler is specifically good at this class of optimizations, and
b) the implementation might be thrown in the bin when the design or the algorithm is refactored.

On the other hand, time used to refactor algorithms or design may very well turn out to be well spend, as the gain is so much higher, and the compiler don't know how to do this for you.

User avatar
kage
Regular
Regular
Posts: 751
Joined: 05 Dec 2006, 21:45

Re: small and easy speed improvements

Post by kage » 10 Dec 2006, 14:04

yeah, karma's right... no sense wasting 5 minutes of programmer time to save 2 nanoseconds each tick. in larger projects, i've often heard it's much more efficient to spend time finding out where the biggest problems are, prioritize them, and work on those specifically, instead of fixing every little thing you find along the way (though it doesn't hurt to fix the small things if you want to).

User avatar
DevUrandom
Regular
Regular
Posts: 1690
Joined: 31 Jul 2006, 23:14

Re: small and easy speed improvements

Post by DevUrandom » 10 Dec 2006, 16:29

kage wrote: i found, in /lib/gamelib/gtime.h this very interesting piece of code, which i got from svn 2 days ago, and it has not been modified since.

Code: Select all

/* The number of ticks per second for the game clock */
#ifdef WIN32
#define GAME_TICKS_PER_SEC              1000
#else
#define GAME_TICKS_PER_SEC              1000
i thank you for your contribution: how shall i collect my winnings?
You lost! :)
You didn't look at http://svn.gna.org/viewcvs/warzone/trun ... iew=markup
Especially you didn't look at frameInitialise():282

Code: Select all

SDL_setFramerate( &wzFPSmanager, 60 );
And frameUpdate():355

Code: Select all

SDL_framerateDelay( &wzFPSmanager );
And I have to agree with karma: Those optimizations are probably not worth the work...
Those implementation details can't be the reason that WZ runs only at ~60fps, even without limiting the framerate...

karmazilla
Trained
Trained
Posts: 84
Joined: 26 Aug 2006, 21:05

Re: small and easy speed improvements

Post by karmazilla » 10 Dec 2006, 17:21

DevUrandom wrote: Especially you didn't look at frameInitialise():282

Code: Select all

SDL_setFramerate( &wzFPSmanager, 60 );
Ohh....!

Can you make a command line option to set the framerate?

Why is there both an SDL_setFramerate and SDL_setFrameDelay? Which one's the boss if you want to set a max framerate?

Update:
Ok, I tried to

Code: Select all

SDL_setFramerate( &wzFPSmanager, 30 );
and it lowered my CPU usage from 99% to ca. 80%....  :-\ I had hoped for more; that's not enough for laptop users.
Last edited by karmazilla on 10 Dec 2006, 17:40, edited 1 time in total.

User avatar
DevUrandom
Regular
Regular
Posts: 1690
Joined: 31 Jul 2006, 23:14

Re: small and easy speed improvements

Post by DevUrandom » 10 Dec 2006, 22:09

karmazilla wrote: Can you make a command line option to set the framerate?
Sure, possible. Will have a look whenever I have time.
(If you want to make sure I look at it, you should add it to http://wz2100.net/wiki/dev:future_ideas, which is kind of tode/wishlist.)
Why is there both an SDL_setFramerate and SDL_setFrameDelay? Which one's the boss if you want to set a max framerate?
Read thoroughly please...
setFramerate sets the desired framerate and framerateDelay delays the current frame so that it matches the desired framerate.
For more docs look here: http://svn.gna.org/viewcvs/warzone/trun ... iew=markup
Update:
Ok, I tried to

Code: Select all

SDL_setFramerate( &wzFPSmanager, 30 );
and it lowered my CPU usage from 99% to ca. 80%....  :-\ I had hoped for more; that's not enough for laptop users.
Yes, that's what I experienced, too:
DevUrandom wrote: Those implementation details can't be the reason that WZ runs only at ~60fps, even without limiting the framerate...
So even if I remove the framerateDelay, it still runs at about 60fps here.
When I introduced that framerate control it was set to 30, but even that didn't help much, as you said yourself.
That's why I think there must be something seriously wrong. One of those parts is the graphics engine. (I forgot the exact functions, but I think I still have some callgrind log around, if someone needs it.)

User avatar
kage
Regular
Regular
Posts: 751
Joined: 05 Dec 2006, 21:45

Re: small and easy speed improvements

Post by kage » 11 Dec 2006, 05:02

DevUrandom wrote: Also I can bet that there are exactly 60 or less ticks in each second. (And I will win this bet, unless you modify the sourcecode. ;) )
you said "60 or less ticks in each second, but the code you posted specifically dealt with framerate. now, while i'm sure you've still won, since i'm pretty sure warzone doesn't have the graphics and main loop running in seperate threads, i think this emphasizes that the main game loop needs to have a seperate thread than the graphics stuff.

and even before any implementation of sdl, the there would've been 1000 ticks per second (unless this was a wzres change), since most videocard drivers, afaik, force it into a multithreaded approach, since they'd have just ignored new graphics updates until it was ready for a new one.

in either way, this game is synchronus, which sucks -- low framerate equals high ping, whereas in an asynchronus game (seperate threads for graphics, audio, and game logic), you could have a 4 mb video card trying to do 16x FSAA, but it'd still have fairly decent ping despite displaying at about 1 frame every 4 seconds. going asynchronus also is much better in making sure that network buffers don't overflow because they haven't been processed fast enough.

User avatar
Watermelon
Code contributor
Code contributor
Posts: 551
Joined: 08 Oct 2006, 09:37

Re: small and easy speed improvements

Post by Watermelon » 11 Dec 2006, 08:12

I am confused by this too,iirc someone told me that the SDL frame delay is only used when you open up a menu/pause the game to prevent the SDL from hogging the cpu time.

If the 60 frames/sec is true,there must be something horribly wrong within the code,the original 1.1x wz runs alot faster(cpu-wise,not graphics-wise,since the renderer has changed alot) with 1000 ticks/sec...
tasks postponed until the trunk is relatively stable again.

Post Reply