Remade! There was a difference of 10 times. Now more than 13 times. So it works!
My options for improving code speed.
- moltengear
- Trained
- Posts: 170
- Joined: 22 Jul 2017, 15:05
Re: My options for improving code speed.
Re: My options for improving code speed.
faster ≠ works
Changing slightly, so it could actually compile:
I do actually measure the Q_rsqrt version to be about 13% faster, but the final result doesn't exactly agree (gives 0.00194982 instead of 0.00195312). Even if it was infinitely faster, I don't think it would affect the game performance significantly.
——————————————
Note that without -fno-math-errno, the sqrtf version is about 133% slower. The game only uses errno in one place, so should probably be compiled with -fno-math-errno if that can be removed.
Changing slightly, so it could actually compile:
Code: Select all
// g++ -O3 -march=native -fno-strict-aliasing -DQ -s -o sqrtf sqrtf.cpp -Wall -Wextra -fno-math-errno
#include <cmath>
#include <iostream>
float Q_rsqrt(float number)
{
int32_t i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = *(int32_t *)&y;
i = 0x5f3759df - (i >> 1);
y = *(float *)&i;
y = y * (threehalfs - (x2 * y * y));
return y;
}
int main() {
float a = 0, t = 0;
#ifdef Q
for (int x = 0; x < 1000000000; x++)
{
a = Q_rsqrt(t);
t = t + 0.01;
float z = Q_rsqrt(a + t);
a = z;
}
#else
for (int x = 0; x < 1000000000; x++)
{
a = 1.f/sqrtf(t);
t = t + 0.01f;
float z = 1.f/sqrtf(a + t);
a = z;
}
#endif
std::cout << a << std::endl;
}
——————————————
Note that without -fno-math-errno, the sqrtf version is about 133% slower. The game only uses errno in one place, so should probably be compiled with -fno-math-errno if that can be removed.
- moltengear
- Trained
- Posts: 170
- Joined: 22 Jul 2017, 15:05
Re: My options for improving code speed.
As far as I know, sound support is implemented in the same thread as the entire game code.
I wonder how much CPU time percentage spent on providing sound in the game?
I wonder how much CPU time percentage spent on providing sound in the game?
Re: My options for improving code speed.
Very little. Graphics, pathfinding and AI are the big consumers.moltengear wrote: ↑31 Aug 2019, 15:09 As far as I know, sound support is implemented in the same thread as the entire game code.
I wonder how much CPU time percentage spent on providing sound in the game?
- moltengear
- Trained
- Posts: 170
- Joined: 22 Jul 2017, 15:05
Re: My options for improving code speed.
Yes, shadows are calculated on the processor.
Re: My options for improving code speed.
The fast square root approximation should only be used for graphics calculations where small shading errors don't make a difference. No one cares if the brightness of a pixel is 10% wrong, but calculating distance 10% wrong for pathfinding can mess up a lot of stuff.
The biggest graphical slowdown right now is shadows, since it loops through every visible transformed polygon (on the cpu) and uses it to generate a shadow volume mesh. This process is not easily multithreaded or transfered to the cpu and puts a tight limit on how many polygons can be on screen before the game brings even the most advanced CPU to its knees. The most obvious solution to this is to use something like a cascaded shadow map (or just a normal shadow buffer that somehow follows the camera). This is done on the gpu by simply adding another render pass, which is very cheap on modern gpus.
I think almost all of the slowdowns we have now have to do with unit count. Pathfinding with loads of units maneuvering around each other, AI control, targeting, decision making, projectiles. These all have to loop through all units in range and are tricky problems in any RTS. Some of these things could probably have a dedicated thread, splitting up calculations on multiple cores, but that is serious refactoring which requires serious skill. Another thing to optimise, which might already have been done, is making sure that you only loop through the most relevant units, by organising things in the correct data structures.
One possibility is, for example, to have the map tiles keep track of which units are currently on them, which allows you to quickly get a list of only the units that are in a specific relevant area.
Unfortunately I don't have the skills or time to try this myself, just getting some thoughts out there
The biggest graphical slowdown right now is shadows, since it loops through every visible transformed polygon (on the cpu) and uses it to generate a shadow volume mesh. This process is not easily multithreaded or transfered to the cpu and puts a tight limit on how many polygons can be on screen before the game brings even the most advanced CPU to its knees. The most obvious solution to this is to use something like a cascaded shadow map (or just a normal shadow buffer that somehow follows the camera). This is done on the gpu by simply adding another render pass, which is very cheap on modern gpus.
I think almost all of the slowdowns we have now have to do with unit count. Pathfinding with loads of units maneuvering around each other, AI control, targeting, decision making, projectiles. These all have to loop through all units in range and are tricky problems in any RTS. Some of these things could probably have a dedicated thread, splitting up calculations on multiple cores, but that is serious refactoring which requires serious skill. Another thing to optimise, which might already have been done, is making sure that you only loop through the most relevant units, by organising things in the correct data structures.
One possibility is, for example, to have the map tiles keep track of which units are currently on them, which allows you to quickly get a list of only the units that are in a specific relevant area.
Unfortunately I don't have the skills or time to try this myself, just getting some thoughts out there
-insert deep philosophical statement here-