My options for improving code speed.

For code related discussions and questions
User avatar
moltengear
Trained
Trained
Posts: 170
Joined: 22 Jul 2017, 15:05

Re: My options for improving code speed.

Post by moltengear »

Cyp wrote: 06 Apr 2019, 20:02 Might be fairer to compare to sqrtf than to sqrt. I'd naïevely guess sqrtf to be faster and more accurate than Q_rsqrt.

I'm not sure square root calculations are the main bottleneck in the game.
Remade! There was a difference of 10 times. Now more than 13 times. So it works!
Cyp
Evitcani
Evitcani
Posts: 784
Joined: 17 Jan 2010, 23:35

Re: My options for improving code speed.

Post by Cyp »

faster ≠ works

Changing slightly, so it could actually compile:

Code: Select all

// g++ -O3 -march=native -fno-strict-aliasing -DQ -s -o sqrtf sqrtf.cpp -Wall -Wextra -fno-math-errno

#include <cmath>
#include <iostream>


float Q_rsqrt(float number)
{
	int32_t i;
	float x2, y;
	const float threehalfs = 1.5F;

	x2 = number * 0.5F;
	y = number;
	i = *(int32_t *)&y;
	i = 0x5f3759df - (i >> 1);
	y = *(float *)&i;
	y = y * (threehalfs - (x2 * y * y));

	return y;
}

int main() {
	float a = 0, t = 0;
#ifdef Q
	for (int x = 0; x < 1000000000; x++)
	{
		a = Q_rsqrt(t);
		t = t + 0.01;
		float z = Q_rsqrt(a + t);
		a = z;
	}
#else
	for (int x = 0; x < 1000000000; x++)
	{
		a = 1.f/sqrtf(t);
		t = t + 0.01f;
		float z = 1.f/sqrtf(a + t);
		a = z;
	}
#endif
	std::cout << a << std::endl;
}
I do actually measure the Q_rsqrt version to be about 13% faster, but the final result doesn't exactly agree (gives 0.00194982 instead of 0.00195312). Even if it was infinitely faster, I don't think it would affect the game performance significantly.

——————————————

Note that without -fno-math-errno, the sqrtf version is about 133% slower. The game only uses errno in one place, so should probably be compiled with -fno-math-errno if that can be removed.
User avatar
moltengear
Trained
Trained
Posts: 170
Joined: 22 Jul 2017, 15:05

Re: My options for improving code speed.

Post by moltengear »

As far as I know, sound support is implemented in the same thread as the entire game code.
I wonder how much CPU time percentage spent on providing sound in the game?
Per
Warzone 2100 Team Member
Warzone 2100 Team Member
Posts: 3780
Joined: 03 Aug 2006, 19:39

Re: My options for improving code speed.

Post by Per »

moltengear wrote: 31 Aug 2019, 15:09 As far as I know, sound support is implemented in the same thread as the entire game code.
I wonder how much CPU time percentage spent on providing sound in the game?
Very little. Graphics, pathfinding and AI are the big consumers.
User avatar
moltengear
Trained
Trained
Posts: 170
Joined: 22 Jul 2017, 15:05

Re: My options for improving code speed.

Post by moltengear »

Yes, shadows are calculated on the processor.
Jorzi
Regular
Regular
Posts: 2063
Joined: 11 Apr 2010, 00:14

Re: My options for improving code speed.

Post by Jorzi »

The fast square root approximation should only be used for graphics calculations where small shading errors don't make a difference. No one cares if the brightness of a pixel is 10% wrong, but calculating distance 10% wrong for pathfinding can mess up a lot of stuff.

The biggest graphical slowdown right now is shadows, since it loops through every visible transformed polygon (on the cpu) and uses it to generate a shadow volume mesh. This process is not easily multithreaded or transfered to the cpu and puts a tight limit on how many polygons can be on screen before the game brings even the most advanced CPU to its knees. The most obvious solution to this is to use something like a cascaded shadow map (or just a normal shadow buffer that somehow follows the camera). This is done on the gpu by simply adding another render pass, which is very cheap on modern gpus.

I think almost all of the slowdowns we have now have to do with unit count. Pathfinding with loads of units maneuvering around each other, AI control, targeting, decision making, projectiles. These all have to loop through all units in range and are tricky problems in any RTS. Some of these things could probably have a dedicated thread, splitting up calculations on multiple cores, but that is serious refactoring which requires serious skill. Another thing to optimise, which might already have been done, is making sure that you only loop through the most relevant units, by organising things in the correct data structures.
One possibility is, for example, to have the map tiles keep track of which units are currently on them, which allows you to quickly get a list of only the units that are in a specific relevant area.

Unfortunately I don't have the skills or time to try this myself, just getting some thoughts out there
ImageImage
-insert deep philosophical statement here-
Post Reply