This weekend I worked on a major optimization; I was supposed to work on this after the first milestone was done, but I had serious enough performance issues to warrant the work now instead…
The problem is related to the amount of objects. Most objects in Spellbook are composed of loads of triangles, but in today’s modern video cards (and not so modern anymore), it takes more time setting up the geometry, textures, shaders, etc, than actually drawing the objects!
To stress test the system, I created a scene in the editor with a lot of trees (a kind of forest):
All of these are independent objects, each with its own position, orientation and scale. There’s about 200 trees in the scene, and with just one directional light with PCF shadows, it ran at about 40 FPS, which is completely unnaceptable for this ammount of objects…
Long story short, I fixed it, and the scene now runs at 60 FPS with 20 point lights or so (images after the break).
Below, technical explanation on how we did this:
There are several problems at work here:
200 objects that get drawn at least 2 times each: one for the G-Buffer (for deferred rendering), and one for each light’s shadowmap (twice for point lights);
Texture switching: although in the above screenshot the textures are the same, internally they are different textures with the same bitmap (so I could stress the system some more)
The rendering pipeline of Spellbook is quite complex and versatile, and that takes its toll on performance: there’s a hefty price to pay for each object up front (will get better in the future, but for now I rather like the flexibility)
All of these can be solved using the same “tool”: GeoClusters!
A GeoCluster is basically what I call to a bunch of geometry that I grab together and combine in a single object… In the case above, I grabbed all the trees and placed it on the same object… In the context of the editor, I just build an aggregate object with all the objects I want to combine, right-click on it and select “Build GeoCluster”. This will trigger an internal system that joins objects together in the same “primitive”, so 200 objects basically become 1! Drawing becomes super-fast and GPU limited, instead of CPU limited as it is without the GeoCluster.
The GeoCluster isn’t limited to “similar” objects, they can be completely distinct. In that case, the GeoCluster will join geometry that has things in common (mostly textures). So imagine I’d have 2 types of trees with completely different textures. The GeoCluster would create two meshes internally and send each type of tree to the correct one.
The system even has a more advanced mode, in which it can combine objects with completely different textures in the same mesh, by using a texture atlas, which is a kind of lookup texture that aggregates a series of textures. The system can generate an automatic atlas, using the GPU to do so (no human intervention or storage required).
This is supposed to be used as a pre-process stage, since the generation of a big GeoCluster can take some time (the one above takes about 100 ms to generate, which doesn’t seem like much but it kind of stacks with all the rest of the loading process).
I’m very happy with this optimization, since it makes everything run much more smooth and allows me to increase the quality of the shadows (for example using cube-maps for point lights, instead of dual-paraboloid maps which have much less resolution).
In the process of testing this, I became disappointed that Grey won’t feature any forests in this episode, since the end result is very cool (for just one type of tree and no regard on placement, as I did above).
Here you can see a closeup of the tree. Note that anti-aliased self-shadowing with a low-res shadowmap (256×256 for the whole area)…
A closer look into the forest… In this test, I had 20 lights with PCF anti-aliasing: a big directional one that gives some light directional luminance to the scene, and some short-range purple omnidirectional lights for “ambience”
I’m just wrapping some additional things on SurgeEd (think most of the bugs found are solved now), and I’ll jump into the game code itself (although next weekend I won’t have any time to work on Grey, since I’ll be participating on Ludum Dare’s 48-Hour Game Development compo)…