Jump to content


1 D3D device per thread, or a single device with mutexes ?


15 replies to this topic

#1 Hawkwind

    Member

  • Members
  • PipPip
  • 92 posts

Posted 10 February 2008 - 12:10 AM

If I want to render two different scenes in different windows using DX, is it
best to create two D3Ddevices and use two threads, or just to create a
single device and render the scenes sequentially ?

I am wondering if the two parallel renders would actually be done simultaneously on the card. If not then there would be no point in doing this
and both renders could be done using a single device, and resources such as
any textures which are the common for both scenes would be shared rather than held twice (once per device).

Of course I could use a single device and two threads but I guess it would
probably need mutexes around the beginscene/endscene area which would
effectively eliminate the parallelism.

Cheers.
Bring back the Z80 I say...

#2 starstutter

    Senior Member

  • Members
  • PipPipPipPip
  • 1039 posts

Posted 10 February 2008 - 01:16 AM

well, maybe your on the right track (I personally don't have a clue if that's possible or not) but I don't think making two devices will work.

If you'll notice, when you use the device, you always are using a pointer to it (->). To me, that would mean you could have 50 of them, but they're all pointing back to the same device. It's a good idea on their part because anything else would needlessly take up enourmous amounts of memory.

Using multiple devices, I doubt it, but rendering multiple scenes with different GPU threads is a good idea (if it's possible).

#3 Reedbeta

    DevMaster Staff

  • Administrators
  • 4979 posts
  • LocationBellevue, WA

Posted 10 February 2008 - 03:15 AM

Remember that the part of the code that sends your rendering commands to the driver (the BeginScene to EndScene area) doesn't actually cause the GPU to do the rendering immediately. Rather the driver just queues up the commands and sends them all to the GPU at some later point. So, while the GPU does not render two scenes at the same time, it's entirely possible to have the GPU rendering one scene while your CPU does the setup for the other scene. In fact this is exactly how games acheive CPU-GPU parallelism even when rendering a single scene, by doing the setup for the next frame while the GPU renders the current frame.
reedbeta.com - developer blog, OpenGL demos, and other projects

#4 Hawkwind

    Member

  • Members
  • PipPip
  • 92 posts

Posted 10 February 2008 - 11:04 AM

Reedbeta - ok. but if I have just one device and two threads, both of which
want to do a beginscene..endscene then I would have to use a mutex to
prevent them from confusing each other ?

#5 Goz

    Senior Member

  • Members
  • PipPipPipPip
  • 574 posts

Posted 10 February 2008 - 03:11 PM

yeah ... which makes me think you need a re-design. If one thread is gonna stop and wait while the other thread does its rendering why not just put both sets of renders in the same thread? That way you even bypass any threadswitching delays as well :D

#6 Hawkwind

    Member

  • Members
  • PipPip
  • 92 posts

Posted 11 February 2008 - 10:09 AM

Goz - Exactly! this is the problem .

#7 CheshireCat

    New Member

  • Members
  • PipPip
  • 19 posts

Posted 11 February 2008 - 07:02 PM

For multi-threaded game engines, it's common practice to issue all D3D calls from a single thread. Game consoles could be an exception to the rule, but they don't suffer from the crazy overhead of HAL on PC and you can use other tricks to lower the CPU overhead (e.g. precompiled command buffers). Issuing draw calls should have less significant overhead on D3D10 on PC though due to new driver model but haven't tested that myself.

#8 Goz

    Senior Member

  • Members
  • PipPipPipPip
  • 574 posts

Posted 12 February 2008 - 07:41 PM

Hawkwind said:

Goz - Exactly! this is the problem .

Well then , you have your answer :)

TBH what I do to get good threading ability is i build my own command buffer ini a given frame in 1 thread while an alternate thread kicks out the previous frames command buffer. If the rendering finishes in time then my thread manager can re-assign those threads to logic and physics tasks until everything is complete. This has a few advantages

1) I get slighlty better concurrency even on a single processor machine because i find the D3DDevice stalls and at these times the next frames worth of data can be partially calculated, i assume at least. The difference is of the order of 0.01% though so its not really noticeable (And, hence, its not a problem on a single processor machine).

2) Because everything goes through the command buffer i can swap between rendering techniques on subsequent frames. Really good for checking differences between different rendering techniques such as Deferred and non-deferred rendering.

3) Performance on a multi processor machine is sweet! :D

#9 Goz

    Senior Member

  • Members
  • PipPipPipPip
  • 574 posts

Posted 12 February 2008 - 07:44 PM

CheshireCat said:

For multi-threaded game engines, it's common practice to issue all D3D calls from a single thread. Game consoles could be an exception to the rule, but they don't suffer from the crazy overhead of HAL on PC and you can use other tricks to lower the CPU overhead (e.g. precompiled command buffers). Issuing draw calls should have less significant overhead on D3D10 on PC though due to new driver model but haven't tested that myself.

Even on consoles its still something to avoid doing. Don't forget if you ask somethign to render in the middle of something else you can mess up all kinds of render states and such like. So either way you are going to be synchornising all the threads against each other and you'd lose any potential performance gains. Not to mention the fact that it would screw up all kinds of sort by shader and such like optimisations.

#10 CheshireCat

    New Member

  • Members
  • PipPip
  • 19 posts

Posted 13 February 2008 - 06:42 AM

Goz said:

So either way you are going to be synchornising all the threads against each other and you'd lose any potential performance gains.
Not really, because you would have separate command buffers for each thread and clue them together in the end of the frame. Not saying that you should actually do this, because all you care is that your render thread doesn't take longer than 16ms, but on consoles it's much more doable than on PC.

#11 Goz

    Senior Member

  • Members
  • PipPipPipPip
  • 574 posts

Posted 15 February 2008 - 06:54 PM

CheshireCat said:

Not really, because you would have separate command buffers for each thread and clue them together in the end of the frame. Not saying that you should actually do this, because all you care is that your render thread doesn't take longer than 16ms, but on consoles it's much more doable than on PC.

Fair point .. Was thinking you were talking about some kind of immediate mode renderer. With display lists yeah you can get a bonus but you'd probably still get better performance by sorting all your shader and render state changes, and the like, in 1 big command buffer. That said you can always sort the command buffers into each other :)

#12 CheshireCat

    New Member

  • Members
  • PipPip
  • 19 posts

Posted 15 February 2008 - 09:45 PM

On consoles you can also have precompiled command buffers for objects and patch them at run-time for each object instance, so there's no CPU overhead from state changes thus no gain in sorting by states. IIRC, GPU overhead of state changes is neglible.

#13 Goz

    Senior Member

  • Members
  • PipPipPipPip
  • 574 posts

Posted 16 February 2008 - 08:43 AM

CheshireCat said:

On consoles you can also have precompiled command buffers for objects and patch them at run-time for each object instance, so there's no CPU overhead from state changes thus no gain in sorting by states. IIRC, GPU overhead of state changes is neglible.

Well thats nto true ... there is still a cost. On Wii, for example i got 3% of a frame back by removing redundant state changes (As a result I tried not to pre-compile states changes, only vertex data chunks). PS2 was the same too. Don't see why 360 or PS3 would be any different. It still costs time to change a vertex or pixel shader for example. Sure .. the cost isn't quite as bad as on PC (No security ring changes) but that doesn't mean its free ... a pipeline flush is still a pipeline flush.

#14 CheshireCat

    New Member

  • Members
  • PipPip
  • 19 posts

Posted 16 February 2008 - 09:15 AM

Okay, it's good to differentiate CPU and GPU overhead here. I said there is no CPU overhead in state changes if you use precompiled command buffers, because state changes are precompiled thus there's no CPU work to be done. IIRC, on Xbox/PS3 you can even make a call to precompiled command buffer so there isn't even overhead of memcopy. The GPU overhead is of course different story, but for example Xbox360 is unified shader architecture and capable to run multiple shaders simultaneously, thus changing shaders doesn't cause pipeline flush.

#15 Goz

    Senior Member

  • Members
  • PipPipPipPip
  • 574 posts

Posted 16 February 2008 - 02:03 PM

CheshireCat said:

Okay, it's good to differentiate CPU and GPU overhead here. I said there is no CPU overhead in state changes if you use precompiled command buffers, because state changes are precompiled thus there's no CPU work to be done. IIRC, on Xbox/PS3 you can even make a call to precompiled command buffer so there isn't even overhead of memcopy. The GPU overhead is of course different story, but for example Xbox360 is unified shader architecture and capable to run multiple shaders simultaneously, thus changing shaders doesn't cause pipeline flush.

Hmmm ... i was told that an X-Box 360 shader switch was INCREDIBLY expensive. The unified shader acrhictecture doesn't mean multiple different shaders can run simultaneously it just means the system can be rebalanced when there is more pixel shader intensive work (or indeed vertex shader work) going on ... Maybe im wrong ... but im fairly sure im not ... ah well.

But yeah i agree with ya on the CPU usage part. Its the same on Wii (and thus GC), X-Box and PS2 as well ... Though i tend to think of the GPU as another processor and one that should be encouraged to run without stalls as much as the CPU. Its all about performance in the end ...

#16 CheshireCat

    New Member

  • Members
  • PipPip
  • 19 posts

Posted 16 February 2008 - 04:01 PM

Goz said:

Hmmm ... i was told that an X-Box 360 shader switch was INCREDIBLY expensive. The unified shader acrhictecture doesn't mean multiple different shaders can run simultaneously it just means the system can be rebalanced when there is more pixel shader intensive work (or indeed vertex shader work) going on ... Maybe im wrong ... but im fairly sure im not ... ah well.
Yeah, unified architecture doesn't mean that changing shaders is inexpensibe, but it implies this because same 48 units are used for both vertex and pixel processing. Agreed, I was cutting corners there, but I vaguely recall something like this. Anyway, I did some googling around and found this, which states (in page 10) that Xenos has "Up to 8 simultaneous contexts in-flight at once" thus "Changing shaders or render states is inexpensive, since a new context can be started up early".





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users