GPU Profiling 101

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Oct 13, 2011 at 05:25

Hi all, I wrote a blog post about GPU profiling, or how to measure timings on the GPU with Direct3D 11. It includes some sample code that you can plug into your own apps if you so desire. Check it out here. :yes:

6 Replies

Please log in or register to post a reply.

Fd80f81596aa1cf809ceb1c2077e190b
0
rouncer 103 Oct 13, 2011 at 05:44

d3d11 queries save the day again… I only used them up till now to get how many objects were created by my geometry shaders, so queries do profiling too, awesome.

6837d514b487de395be51432d9cdd078
0
TheNut 179 Oct 13, 2011 at 21:34

Good article, although I don’t use D3D ;) Is there any way for you to monitor video memory as well? A problem I recall seeing on older hardware is texture swapping, which will blow away any performance.

I often come across gimmick performance editors and benchmark tools, but a simple editor to report the bare-bones GPU performance as well as custom shaders would be a real treat. Like you wrote, my main interest is knowing what the best possible rate is, and if I’m close enough then I’m content. Not just for my machine though, I have to know what it’s like on every machine, with every driver, with every resolution, with varying shader combinations, etc. Too much guesswork that can come back to bite you if you don’t plan for it.

340bf64ac6abda6e40f7e860279823cb
0
_oisyn 101 Oct 13, 2011 at 22:52

Nice article!

Question: you mention you have to double buffer your queries. But is it guaranteed that you can read out all the queries of the previous frame while your rendering the new frame? Isn’t there a possibility that the GPU is even more frames behind?

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Oct 14, 2011 at 00:06

TheNut, I bet you can monitor video memory usage somehow, but I don’t know exactly how to do it (I haven’t seen anything about it in the D3D API, but maybe there’s something in the Windows API).

.oisyn, it is possible that the GPU isn’t done yet, especially if the GPU is more heavily loaded than the CPU (e.g. a frame takes 10ms on the GPU, but only 5ms on the CPU. CPU kicks a frame, spends 5ms doing the next frame, then has to wait 5ms more for the GPU to finish the first frame so it can read the queries). And it’s possible to triple buffer, quadruple buffer, etc. in this situation…but it’s a losing battle, since eventually you’ll run out of buffers and the CPU will have to wait! :) So I don’t really see a lot of point in going beyond double-buffering. Even if you weren’t doing queries, the CPU would still have to wait due to other resources like command buffers, dynamic vertex buffers, etc. that are buffered internally by the driver.

Fd80f81596aa1cf809ceb1c2077e190b
0
rouncer 103 Oct 14, 2011 at 07:10

monitoring memory usage can be done by keeping your own increments, but i spose thats a hacky way to do it.

One thing about queries is tho, god they are so slow -and WHY?
Its definitely not data overloading the bus, all you are is fetching single DWORDS for heavens sake.

     HRESULT hr=S_FALSE;

     D3D11_QUERY_DATA_SO_STATISTICS queryData; // This data type is different depending on the query type
     while(hr==S_FALSE)
     {
      hr=dc->GetData(pQuery[0], &queryData, sizeof(D3D11_QUERY_DATA_SO_STATISTICS), 0);
     }

infinite loop just waiting for the query to come in on a single process.

340bf64ac6abda6e40f7e860279823cb
0
_oisyn 101 Oct 14, 2011 at 09:39

@rouncer

monitoring memory usage can be done by keeping your own increments, but i spose thats a hacky way to do it.

Especially since you have no clue about alignment and padding restrictions. For example, textures on the Xbox are layed out in tiles of 32x32 pixels, or in case of compressed textures 32x32 blocks (128x128 pixels), and have to be aligned on 4k boundaries. This applies to each individual mip level (although all the levels below 16xN or Nx16 can be packed in the tile that is used for the 16xN or Nx16 level itself). Especially for compressed textures, this can result in a huge waste of space. Now, the memory segments that are used for padding can of course be used for other relatively small resources, but it all gets very fragmented.

In other words, you can’t simply count the bytes of your texture resource and assume that’s how much video memory they will take up.

One thing about queries is tho, god they are so slow -and WHY?

They are not slow. You have to keep in mind that the GPU and CPU are operating asynchronously. A drawprim call is nothing more than yet another command that is stored in a buffer, to be executed by the GPU later on when she’s done with all the previous commands stored in the buffer - the GPU is probably even busy rendering the previous frame while the CPU is issueing commands for the next.