# Optimizing mesh rendering

Hi guys. I've been reading into mesh optimization latley. I won't lie, I'm fairly new to handling geometry and meshes effectivley. The whole topic is pretty complex, so I've got a few questions:

1. how effective is the Optimize() function alone in DirectX 9? Are there better methods? I tried the tactic of converting subsets to strips, but that actually seemed to lower performance. Maybe I did something wrong on that.

2. Is the D3DXMESH object still acceptable? I know that .x files have gone out of style, but has there been some other class or structure which is the new standard for professionals? It doesn't seem like DX would accept any other type of object, but crazier feats in software have been done :P

3. I noticed that rendering high-poly objects is taking an enourmous chunk out of my CPU process time, and dropping my framerate to near nothing. Did I miss something? Or is it totally normal for the CPU to take the majority of the load when rendering geometry. I know that it must transfer them to the GPU, but I thought that a low number or render calls was supposed to take care of this, or at least help more than it is.

EDIT: Just to clarify, I've always known that high-polycounts were something to be avoided, because the transfer to the GPU is slow (thus we have parallax mapping to simulate polys). I just feeling like its sucking way more power out of the cpu than it should.

Haha, I sound like a total novice here. I've dealt so long with pixels and shaders, but never got much into geometry. Any help is much appreciated.
The D3DXMESH object doesn't need to be associated with .X files. If you use the D3DX X file loading functions, you will create a D3DXMESH but you can create one from any model file if you write a loader for it.
Also, it may be better to think in terms of models rather than single meshes. A model could have multiple meshes and in the case of very high poly models, having the overall model represented by different meshes can improve performance.
I'm not an expert in the field but just a few thoughts I had while reading your post.

Thanks for the quick reply. Yes, I was aware that multiple formats can be loaded into a D3DXMESH. Even if they're never converted to .x at all, you can lock the VB and IB manually and transfer the data if you need to, given that seems like an awful lot of work as there are so many suitable converters.

So breaking the mesh up into smaller meshes helps? That seems very strange to me. I've always been told that batching, which is of course sending as many polys as possible at once, is the key to more efficient rendering. It seems like breaking the mesh into smaller components would hinder that. Of course its more efficient to break up the batches to take advantage of frustum culling or portals, but in general I've heard the less render calls, the better.
its cause you need multiple materials, is it not why meshes come in subsets? if your only using one material you wouldnt use subsets.

I dont even use the mesh library myself, i just use vertex and index buffers and do everything myself, even if i wanted to optimize.

For optimizing mesh rendering for GPU post-transform cache, I use nvTriStrip Library. You only pass it an array of mesh indices and it gives you back another optimized array. I don't know how this compares to D3DX.

rouncer said:

its cause you need multiple materials, is it not why meshes come in subsets? if your only using one material you wouldnt use subsets.
I knew about that whole process, but I'm speaking purley geometric here. Like.... pretend I'm rendering a depth map I guess.

rouncer said:

I dont even use the mesh library myself, i just use vertex and index buffers and do everything myself, even if i wanted to optimize.
Noble, but are you sure that's the best way to go? I totally understand wanting to learn the math and inner workings of a system, but in a lot of cases that seems a bit extreme. Of course with things like particle systems you have to work in VB's anyway (at least thats the practical way).

JarkkoL said:

For optimizing mesh rendering for GPU post-transform cache, I use nvTriStrip Library. You only pass it an array of mesh indices and it gives you back another optimized array. I don't know how this compares to D3DX.
Hmm, interesting, I'mm have to check it out. Thanks for the link.

Hey, out of pure curiosity, what does the D3DXMESHOPT_VERTEXCACHE flag do exactly? On a memory/technical/math level of course.
Off the top of my head:
1) Multires your models. Lower them to the smallest possible number of polygons and use a normal map to restore detail.

2) Run your geometry through a vertex welder to remove dups. Beware not to destory your UV map though. Although I personally recommend all your objects be "wholes" and not "parts". It's up to you and how much texture resolution matters.

3) Batch as much as you can where you can. Not always possible, but you should be maximizing your VBOs. Reduce function and state change overhead. Render by shader / material / then geometry

4) Use tri strips whenever possible and make effective use of vertex caches. This should produce a noticeable gain if you're moving from tris or polys, although the maths to work with such a setup is not as straightforward.

5) Use the lowest common data storage format. If you will never have more than 65K indices (which you shouldn't as this is an upper limit to VBOs hardware acceleration), then store the indices in a short array. Don't waste memory and efficiency by using ints. I noticed a substancial gain from doing this, although it could have been a driver efficiency issue at the time. Same goes for colours. Use unsigned chars instead of floats. ~2-5 fps gain.
Judging by the wording in nr. 3, it sounds like you are sending the geometry data to the GPU every frame. You aren't, right?
Kenneth Gorking said:

Judging by the wording in nr. 3, it sounds like you are sending the geometry data to the GPU every frame. You aren't, right?
Huh? :huh: I'm talking about DrawIndexedPrimitive(), SetStreamSource(), DrawSubset(), ect. Are you asking if I load a mesh into video memory every time? If I did that I don't think I'd have a framerate at all :P
Nah I dont think hes doing that, have you tried drawindexedprimtive(TYPE tristrip) instead of drawindexedprimitive(TYPE trilist) ?

rouncer said:

Nah I dont think hes doing that, have you tried drawindexedprimtive(TYPE tristrip) instead of drawindexedprimitive(TYPE trilist) ?
Well I already tried the whole thing with converting a mesh to a strip and then using TRISTRIP. Believe it or not it actually made it much slower. Maybe I missed something in that, idk.

Thanks for all the responses so far. What is everyones take on progressive meshes? Are they treated much the same as regular meshes? How much power does SetNumFaces() take to call?
I used progressive meshes once to do edge collapsing on my high poly models to make them lowpoly restoring the data with a normal map.
But these days id just do the edge collapsing myself instead of relying on other peoples code.

But I never used them for mipping, im too lazy - its not in my important books.

Im as puzzled as you starstutter, i dont know why this aint working for ya! :P

Triangle strips are actually worse than triangle lists from post-transform cache point of view, so that might have something to do with the impaired performance. Can you give the FPS and number of triangles/vertices you got? And you are not using DrawPrimitiveUP() / DrawIndexedPrimitiveUP() functions I guess?

JarkkoL said:

Triangle strips are actually worse than triangle lists from post-transform cache point of view, so that might have something to do with the impaired performance. Can you give the FPS and number of triangles/vertices you got? And you are not using DrawPrimitiveUP() / DrawIndexedPrimitiveUP() functions I guess?

Nope, no DrawPrimitiveUP's. I did away with those a while ago. Well, I'm not sure how much relevance it holds, but I was drawing 20 high poly meshes with one other very high vertex mesh. The 20 meshes were about 57,095 verticies with 113,904 faces and the room they were in was around 128,288 verticies with 64,052 faces. So the total polycount (num faces) was around 2,342,132 polys. Yeah, a bit excessive I know, but that's the point.

The results in FPS though:
Using only the vertex cache optimization: 26 fps
Using triangle strip technique: 22 fps
Can you post up your ID3DXMesh creation code?

If changing the primitive has such a big influence (7ms), then I would guess you are just vertex shader bound. Try to optimize the vertex shader and see if your FPS goes up.

Goz said:

Can you post up your ID3DXMesh creation code?
will do in a bit

JarkkoL said:

If changing the primitive has such a big influence (7ms), then I would guess you are just vertex shader bound. Try to optimize the vertex shader and see if your FPS goes up.
Umm, sure thing but I highly doubt that. I'm using deferred shading, so there's very very little in the VS.
starstutter said:

Umm, sure thing but I highly doubt that. I'm using deferred shading, so there's very very little in the VS.
It could also be the structure of your vertex data. The GPU can waste a fair amount of time, if it is not aligned on a 32- or 64-byte basis. What is the size of your vertex structure?

Another thing may be state-switching. Changing textures, or commiting changes to vertex constants, between alternating subsets could slow it down. Re-arranging the rendering loop might help this:

//something like this would be bad

for each model

for each subset

draw

// better

for each subset

for each model

draw



Kenneth Gorking said:

It could also be the structure of your vertex data. The GPU can waste a fair amount of time, if it is not aligned on a 32- or 64-byte basis. What is the size of your vertex structure?
Now that very well could be a problem. No, it isn't aligned at all. I'll try to syncronize it and get back to you. I'll also try removing the binormal and instead computing that in the VS.
TBH the largest performance improvement for a tiny change i've ever seen was setting the "write only" flag on an ID3DXMesh ...

