Reducing DirectX DrawPrimitive calls
Started by skusey, Dec 03 2006 12:04 PM
10 replies to this topic
#1
Posted 03 December 2006 - 12:04 PM
Hi,
I have read all over the place the importance in trying to reduce DrawPrimtives/DrawIndexexedPrimitives, and how it can speed things up. But whereas in some cases I can see how this easily applies, in others, like with the example of laser beams in a game, I can't. Lets say these are textured quads, with their movement being updated with each new frame. I orginally thought I would have a single static VertexBuffer that describes this type of laser beam's quad, then just calculate a new worldmatrix for each laser beam describing its orientation and position for each frame and call DrawPrimitive. I could potentially have hundreds of laser beams at any one time, so hundreds of DrawPrimitive calls just to draw a single quad each time. So I guess I'm asking how do other people handle this kind of situation? Is a single vertex buffer used instead and that is updated each frame with all the laser beams quads, then a single DrawPrimitive call made? Would writing a new vertexbuffer each frame and discarding the old be slow? Is that a common pratice? Sorry I'm rambling a bit now, so I'll shut up! Any help or thoughts at all on this would be great :).
Cheers,
skusey
I have read all over the place the importance in trying to reduce DrawPrimtives/DrawIndexexedPrimitives, and how it can speed things up. But whereas in some cases I can see how this easily applies, in others, like with the example of laser beams in a game, I can't. Lets say these are textured quads, with their movement being updated with each new frame. I orginally thought I would have a single static VertexBuffer that describes this type of laser beam's quad, then just calculate a new worldmatrix for each laser beam describing its orientation and position for each frame and call DrawPrimitive. I could potentially have hundreds of laser beams at any one time, so hundreds of DrawPrimitive calls just to draw a single quad each time. So I guess I'm asking how do other people handle this kind of situation? Is a single vertex buffer used instead and that is updated each frame with all the laser beams quads, then a single DrawPrimitive call made? Would writing a new vertexbuffer each frame and discarding the old be slow? Is that a common pratice? Sorry I'm rambling a bit now, so I'll shut up! Any help or thoughts at all on this would be great :).
Cheers,
skusey
#2
Posted 03 December 2006 - 01:26 PM
skusey said:
Hi,
Is a single vertex buffer used instead and that is updated each frame with all the laser beams quads, then a single DrawPrimitive call made?
Is a single vertex buffer used instead and that is updated each frame with all the laser beams quads, then a single DrawPrimitive call made?
Quote
Would writing a new vertexbuffer each frame and discarding the old be slow? Is that a common pratice?
#3
Posted 03 December 2006 - 04:07 PM
Thanks for the reply juhnu I will take that approach then with what I'm doing.
So would the VertexBuffer thats being used to render these multiple laser beams, etc, be only created once at a fixed size? Something large enough to cover all situations, or would that be re-created each time a new object is added to it? Also would it make sense to just keep an array of vertex data for the laser beams locally, updating and amending that as required, then memcpy'ing it over into the locked buffer each frame? If I did use that approach would I still use the D3DLOCK_NOOVERWRITE flag, or just the D3DLOCK_DISCARD on its own?
Many thanks for the help btw, its great to be able to talk it over with some of you guys who really know what your doing.
So would the VertexBuffer thats being used to render these multiple laser beams, etc, be only created once at a fixed size? Something large enough to cover all situations, or would that be re-created each time a new object is added to it? Also would it make sense to just keep an array of vertex data for the laser beams locally, updating and amending that as required, then memcpy'ing it over into the locked buffer each frame? If I did use that approach would I still use the D3DLOCK_NOOVERWRITE flag, or just the D3DLOCK_DISCARD on its own?
Many thanks for the help btw, its great to be able to talk it over with some of you guys who really know what your doing.
#4
Posted 03 December 2006 - 06:49 PM
The main issue with dynamic geometry like your laser beams is timing: Sure, collecting vertex data and writing that stuff to a vertex buffer does take time. Usually it pays off to do some extra work on the cpu.
If you do this stuff after you've issued all draw commands for static geometry, the GPU is busy drawing and the CPU has time enough to do more expensive things.
Parallelism and pipelining usually help to overlap cpu and gpu work to some extend, but games sometimes require a fast mouse/gamepad response and force the gpu into lockstep ( a rant about that is here: http://xyzw.de/c120.html ) So timing such cpu heavy jobs is still a good practice.
If you do this stuff after you've issued all draw commands for static geometry, the GPU is busy drawing and the CPU has time enough to do more expensive things.
Parallelism and pipelining usually help to overlap cpu and gpu work to some extend, but games sometimes require a fast mouse/gamepad response and force the gpu into lockstep ( a rant about that is here: http://xyzw.de/c120.html ) So timing such cpu heavy jobs is still a good practice.
#5
Posted 03 December 2006 - 07:54 PM
ahhh... that is interesting, I had never considered that. I have a terrain in my game, rendered in large patches, maybe 5 - 15 DrawIndexedPrimitives a go for the whole terrain. Obivously it is a rather GPU intensive aspect of the game, so rendering this first in the game cycle will ensure the GPU is busy while I get on with the other aspects on the CPU? Does that mean the GPU queues up DrawPrimitive commands recieved from the CPU? It does'nt complete it before returning control to the CPU? Thanks for the advice Nils Pipenbrinck!
#6
Posted 03 December 2006 - 08:28 PM
skusey said:
Does that mean the GPU queues up DrawPrimitive commands recieved from the CPU? It does'nt complete it before returning control to the CPU?
Yep, pretty much all commands to the GPU go into a queue, allowing the GPU and CPU to execute concurrently. One normally synchronizes at the end of each frame or so as that article explains.
reedbeta.com - developer blog, OpenGL demos, and other projects
#7
Posted 03 December 2006 - 09:24 PM
if you draw your terrain with just a dozen of calls, you will have plenty of cpu-time to combine your laser-beam geometry.
Some drives are even that smart that they batch together draw calls if you haven't changed textures or renderstate, so you might not see a performance improvement. But some other drivers aren't as smart, so you'll better do it on your own, just to make sure it runs fast everywhere.
Some drives are even that smart that they batch together draw calls if you haven't changed textures or renderstate, so you might not see a performance improvement. But some other drivers aren't as smart, so you'll better do it on your own, just to make sure it runs fast everywhere.
#8
Posted 04 December 2006 - 03:01 AM
Quote
So would the VertexBuffer thats being used to render these multiple laser beams, etc, be only created once at a fixed size? Something large enough to cover all situations, or would that be re-created each time a new object is added to it?
Quote
Also would it make sense to just keep an array of vertex data for the laser beams locally, updating and amending that as required, then memcpy'ing it over into the locked buffer each frame? If I did use that approach would I still use the D3DLOCK_NOOVERWRITE flag, or just the D3DLOCK_DISCARD on its own?
It might be just faster to recalculate vertices on fly when you are updating the data(if it's going to change every frame anyway?) and remember to update the vertex buffer in linear order - do not random access it as that wouldn't give you benefits of the write cache.
#9
Posted 04 December 2006 - 11:05 AM
Thanks everyone for your comments, thats just the sort of information I needed.
One last question, as discussed about the GPU working away leaving time free for the CPU, would it not make more sense then to have the main game cycle start with its Draw methods, then finish with its Update methods? Traditionally I have always done it the other way round, but it seems unproductive after this chat.
One last question, as discussed about the GPU working away leaving time free for the CPU, would it not make more sense then to have the main game cycle start with its Draw methods, then finish with its Update methods? Traditionally I have always done it the other way round, but it seems unproductive after this chat.
#10
Posted 04 December 2006 - 08:28 PM
If you synch at the end of the game cycle, then yes, do the drawing first and then the updating. Actually, you can have it be multithreaded and do the updates and drawing in separate threads, as long as you maintain two copies of the state data so that you're not updating something at the same time you're trying to render it.
reedbeta.com - developer blog, OpenGL demos, and other projects
#11
Posted 04 December 2006 - 11:03 PM
the multithreading approach is best of booth worlds today..
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users











