Jump to content


GPU Profiling 101


6 replies to this topic

#1 Reedbeta

    DevMaster Staff

  • Administrators
  • 5307 posts
  • LocationBellevue, WA

Posted 13 October 2011 - 05:25 AM

Hi all, I wrote a blog post about GPU profiling, or how to measure timings on the GPU with Direct3D 11. It includes some sample code that you can plug into your own apps if you so desire. Check it out here. :yes:
reedbeta.com - developer blog, OpenGL demos, and other projects

#2 rouncer

    Senior Member

  • Members
  • PipPipPipPip
  • 2722 posts

Posted 13 October 2011 - 05:44 AM

d3d11 queries save the day again... I only used them up till now to get how many objects were created by my geometry shaders, so queries do profiling too, awesome.
you used to be able to fit a game on a disk, then you used to be able to fit a game on a cd, then you used to be able to fit a game on a dvd, now you can barely fit one on your harddrive.

#3 TheNut

    Senior Member

  • Moderators
  • 1699 posts
  • LocationThornhill, ON

Posted 13 October 2011 - 09:34 PM

Good article, although I don't use D3D ;) Is there any way for you to monitor video memory as well? A problem I recall seeing on older hardware is texture swapping, which will blow away any performance.

I often come across gimmick performance editors and benchmark tools, but a simple editor to report the bare-bones GPU performance as well as custom shaders would be a real treat. Like you wrote, my main interest is knowing what the best possible rate is, and if I'm close enough then I'm content. Not just for my machine though, I have to know what it's like on every machine, with every driver, with every resolution, with varying shader combinations, etc. Too much guesswork that can come back to bite you if you don't plan for it.
http://www.nutty.ca - Being a nut has its advantages.

#4 .oisyn

    DevMaster Staff

  • Moderators
  • 1842 posts

Posted 13 October 2011 - 10:52 PM

Nice article!

Question: you mention you have to double buffer your queries. But is it guaranteed that you can read out all the queries of the previous frame while your rendering the new frame? Isn't there a possibility that the GPU is even more frames behind?
C++ addict
-
Currently working on: the 3D engine for Tomb Raider.

#5 Reedbeta

    DevMaster Staff

  • Administrators
  • 5307 posts
  • LocationBellevue, WA

Posted 14 October 2011 - 12:06 AM

TheNut, I bet you can monitor video memory usage somehow, but I don't know exactly how to do it (I haven't seen anything about it in the D3D API, but maybe there's something in the Windows API).

.oisyn, it is possible that the GPU isn't done yet, especially if the GPU is more heavily loaded than the CPU (e.g. a frame takes 10ms on the GPU, but only 5ms on the CPU. CPU kicks a frame, spends 5ms doing the next frame, then has to wait 5ms more for the GPU to finish the first frame so it can read the queries). And it's possible to triple buffer, quadruple buffer, etc. in this situation...but it's a losing battle, since eventually you'll run out of buffers and the CPU will have to wait! :) So I don't really see a lot of point in going beyond double-buffering. Even if you weren't doing queries, the CPU would still have to wait due to other resources like command buffers, dynamic vertex buffers, etc. that are buffered internally by the driver.
reedbeta.com - developer blog, OpenGL demos, and other projects

#6 rouncer

    Senior Member

  • Members
  • PipPipPipPip
  • 2722 posts

Posted 14 October 2011 - 07:10 AM

monitoring memory usage can be done by keeping your own increments, but i spose thats a hacky way to do it.

One thing about queries is tho, god they are so slow -and WHY?
Its definitely not data overloading the bus, all you are is fetching single DWORDS for heavens sake.


     HRESULT hr=S_FALSE;


     D3D11_QUERY_DATA_SO_STATISTICS queryData; // This data type is different depending on the query type

     while(hr==S_FALSE)

     {

      hr=dc->GetData(pQuery[0], &queryData, sizeof(D3D11_QUERY_DATA_SO_STATISTICS), 0);

     }


infinite loop just waiting for the query to come in on a single process.
you used to be able to fit a game on a disk, then you used to be able to fit a game on a cd, then you used to be able to fit a game on a dvd, now you can barely fit one on your harddrive.

#7 .oisyn

    DevMaster Staff

  • Moderators
  • 1842 posts

Posted 14 October 2011 - 09:39 AM

rouncer said:

monitoring memory usage can be done by keeping your own increments, but i spose thats a hacky way to do it.
Especially since you have no clue about alignment and padding restrictions. For example, textures on the Xbox are layed out in tiles of 32x32 pixels, or in case of compressed textures 32x32 blocks (128x128 pixels), and have to be aligned on 4k boundaries. This applies to each individual mip level (although all the levels below 16xN or Nx16 can be packed in the tile that is used for the 16xN or Nx16 level itself). Especially for compressed textures, this can result in a huge waste of space. Now, the memory segments that are used for padding can of course be used for other relatively small resources, but it all gets very fragmented.

In other words, you can't simply count the bytes of your texture resource and assume that's how much video memory they will take up.

Quote

One thing about queries is tho, god they are so slow -and WHY?
They are not slow. You have to keep in mind that the GPU and CPU are operating asynchronously. A drawprim call is nothing more than yet another command that is stored in a buffer, to be executed by the GPU later on when she's done with all the previous commands stored in the buffer - the GPU is probably even busy rendering the previous frame while the CPU is issueing commands for the next.
C++ addict
-
Currently working on: the 3D engine for Tomb Raider.





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users