Spellcaster Studios

Make it happen…

More Compute Shaders

After getting compute shaders working properly yesterday, decided to check some profiling…

After some moving code around, got some rudimentary profiling working. In terms of CPU flow, running the compute shader to sum 33 million elements took about 0.2 ms (against 37ms of CPU time)… This looks great, in theory…

But the reality is way more complicated…

Untitled

This sort of commands are asynchronous, which means that when I run a compute shader, it’s not running immediately… It just gets placed in a queue for consumption later, so the time it takes me to actually setup the operations is what I’m measuring… So, if I want to do a bit more accurate measuring, I have to actually fetch the results. Here we hit a huge bottleneck in PCs: reading from the GPU to the CPU…

My initial work was pointing to 7 ms to fetch the result, which is way too much!

After some mucking around, I figured out this the time waiting on a result to be ready; but here is where things get tricky… For example, if I run a Present() command, that time goes down a bit (but that could be just because the present itself takes some time)… But if I do 2 Presents, the time goes to zero, and there’s no way 2 presents take those 7 ms! Even weirder, it seems that if I do a sleep of one second, the data is not ready as well..

My working theory is that the compute shader is not taking that long to run, but it only is marked as “done” (so the results can be fetched) when I do the Present call… So after the first present, the data is not ready yet… But when I do the second present, the data is there, and that’s why the time goes down to zero…

Although this is a theory that could explain some of the practical results, I’m not happy with it… There’s still some stuff that doesn’t jog with it… For example, if I do only one present, preceded by a sleep, it should have the data ready for me, but it doesn’t…

Anyway, I’ll investigate this in greater depth, and I’m going to see if I can find any tool that can give me some performance measurement on the GPU calls…

Now listening to “Roots” by “Sepultura”

Comment