Spellcaster Studios

Make it happen…

Optimizing

Let me tell you a story…

As I said yesterday, I felt a real drop in performance with the pirate lair, and I couldn’t figure out what was the issue.

The normal course of action (if you don’t have a decent profiler or are too lazy to find a free one) is to shut down pieces of code and see what happens.

First I tried the expected culprits: AI and rendering… no big changes there.

Then I started turning off different types of enemies. First all of them (which boosted the frame rate significantly), then I turned them on, one by one… Was expecting for the culprit to be the pirates, but those didn’t affect the smoothness that much. The only difference I could discern was with the laser turrets!

I looked at the turret code and couldn’t find anything, so I started doubting the problem was in the turrets… “It’s my perception of smoothness”, I thought…

So I decided to bite the bullet, and following the old saying “you can only control what you can measure”, I built a simple profiler in my framework… It’s actually a pretty nice one for simple measurements, I’m pretty happy about it.

The way it works is by creating “profiling points”. So at the start of each frame, I reset the counters, and then I add to them whatever I’m trying to measure. So I can create a “Enemies” measurement, and when I animate them, it will add the time to that counter. Then I can go deeper and deeper, trying to figure out what is wrong, all of that just by enclosing what I want to profile in something like this:

start_profile_block(12,"Items");
auto it=_items.begin();
while (it!=_items.end())
{
    (*it)->anim(t);

    ++it;
}
end_profile_block();

This will create a profile point called “Items”, with ID=12. I can do this without an ID, just by name, but that will impact the performance more, since it will require a lookup (quite a fast one, though, by using const char* and do the no-no comparison == on it. I’m comparing pointers, but it works while I use literals and not actual strings).

After I ran it, there it was: the turret was indeed responsible for about 24 ms of a 26 ms frame time! So, my perception wasn’t wrong, but it’s still weird… The turret does very little compared to most other stuff in the game. It almost doesn’t have AI, it doesn’t move…

So  I added some more points to it, and found out the problem was in animating the turret weapon… The turret has two laser cannons, and animating them was taking almost 24 ms! Looking at the animation code for the laser cannon, the only thing it was doing was getting a pointer to a rendering mesh, writing all the data necessary to render the shots (which were zero at this point, since it wasn’t shooting), and then closing the mesh so it was ready for rendering (with zero vertex!). Deeper profiling, I narrowed it down to closing the mesh…

And there I found he real culprit, a couple of pieces of code that are necessary in some minor circumstances, but mostly should be kept off:

  • When I close a mesh, I was computing the AABB (axis-aligned bounding box) of it… But, I hear you say, how long can it take to process zero vertex?! Well, it turns out it wasn’t processing zero vertex, but the number of vertex that were pre-allocated. On the laser cannon (and most of the other weapons), I was pre-allocating a mesh with 2000 vertex (so I don’t have to do resizes), but only sending to the video card the ones that actually had to be rendered (so it was alright from a rendering perspective, no extra weight except the memory spent)… But when computing the AABB, I was scanning all the 2000 vertex! And I was not even using the AABB! Multiply these 2000 vertex by two laser cannons per turret, and multiply that by 5 cannons present on this pirate lair… That’s 20000 useless vertex being traversed to compute a completely invalid AABB! This also happened in all of the other line rendered weapons, with the difference that I wasn’t using so many pre-allocated vertex (that’s why I didn’t detect this sooner!). I added a flag on closing the mesh that allow me to specify if the AABB should be computed or not (a lot of cases I don’t need it, it’s more expensive to check for the AABB of a model to see if I should draw it than to send it over to the video card and draw it even if it’s not visible!). That shaved off about 3/4 of the performance hit.
  • I was keeping a copy of the vertex data in RAM when the vertex buffer was dynamic, so I could update it and sent it again. For that, when I closed the mesh, if it was dynamic I was allocating a memory buffer and copying the data from the vertex buffer to the RAM buffer. There’s a lot of stuff wrong with this: one is that I was keeping data that I‘m not using: In most cases, I’m rebuilding the vertex buffer from scratch. The other was that I was copying the whole 2000 vertex, instead of just the ones that are actually “alive”". Removing this copy (or more precisely, having a flag in the Mesh constructor telling me if I should keep the data or not) solved all of my performance issues, not only the ones I noticed, but even in the rest of the game!

So, it was an excellent use of 3 hours of work. Below you can see the output of the profiler after I did all the changes:

screen146

Before, the “Anim” point was at about 34 ms, the “Turret” point was at about 24 ms, the “Turret::Weapon::Finish” point was also at 24 ms.

Fixing this not only dropped down the time for the Turret, it moved the total “Anim” time from 34 ms to only 1 ms… So I had about 33 ms per frame being wasted on unused updates on the AABB and the RAM buffer!

This was present in all of the game, just became more noticeable on the pirate lair…

The profiler I’ve built was a very useful tool which I will without a doubt use again in the future. It allows me to do something that I normally couldn’t do with my “hack” profilers: I can treat stuff that happens in different times (and are not sequential in code) as if they’re just one element. For example, it would be possible to put all the rendering and animation of the enemies in a single point “Enemies”, to have a better view on where I’m spending my time.

With some simple work, I can even make this hierarchical (mostly for presentation purposes).

All in all, excellent work day!

Link of the Day: This is an amazing homage to Wing Commander… It’s been in development for some time now, but it strikes a really neat balance between the old-time graphics (even the clunky rotation and low-res texture work, which were probably more work than making it state-of-the-art) and a modern look and feel in what matters (interfaces, for example, or additional special effects). Well worth a play: http://www.wingsofstnazaire.com/

Comment