0
101 Mar 14, 2008 at 12:36

Hi all,

I need to perform rendering in a cubemap and because my engine is written in DirectX 10, I wish to understand if using the geometry shader to render the cubemap in a single pass is really useful (or it’s a waste of time!).

On the one hand, we have the advantage that we perform 1/6th of the rendering calls, but on the other hand, we cannot perform a per-object frustum culling, so we need more drawing calls!
Ok, we can perform per-triangle frustum culling on the geometry shader, but why stress vertex and geometry shader for mesh outside the frustum? I mean, there’s a real advantage (that I miss) in use a single pass cubemap rendering? (the inefficiency grow if one use some occlusion culling methods, like Portal rendering… but even with frustum culling alone, probably, a single pass could be slower…). Any opinions and/or experience reporting is really appreciated.

Thanks,

• AGPX

#### 28 Replies

0
101 Mar 14, 2008 at 12:53

Ok,

1) If one use only frustum culling, because the six frustum cover the entire space surrounding the viewing point, then all the object will be issued to the graphics card. So basically, the number of calls is the same (i.e. all the scene objects must be processed).

2) If one use an occlusion culling algorithm (like Portal rendering) then I think that a single pass cube rendering is a waste….

0
101 Mar 14, 2008 at 13:10

@AGPX

1) If one use only frustum culling, because the six frustum cover the entire space surrounding the viewing point, then all the object will be issued to the graphics card. So basically, the number of calls is the same (i.e. all the scene objects must be processed). 2) If one use an occlusion culling algorithm (like Portal rendering) then I think that a single pass cube rendering is a waste….

He’s got a very good point. I would say it all depends on the scene you’re rendering. Like for something like (say) outside, the single pass could save you a lot of time. If you’re inside though, portal culling is an absolute must. Overally I’d say just implement both (I’m assuming you already know how to do the standard version) and choose the tecnique in realtime based on the objects surroundings.

In either case though, always make sure you’re rendering front to back. I don’t know if that causes problems in the single pass, but I wouldn’t think so.

0
101 Mar 14, 2008 at 13:30

Hi,

“Like for something like (say) outside, the single pass could save you a lot of time.”

I haven’t understand why a single pass could save a lot of time for outside scenario… can you clarify it to me, please?

Another scenario when drawing in a single pass could be a waste of time:

Suppose you have M objects and that every object have N vertices. OK? Suppose that, on average, we can see M/6 objects on every face of the cube map.
Well, with classic cube map, you need to transform 6 * M/6 * N = M * N vertices in total.
With single pass cube map, you need to transform M * 6 * N vertices, i.e. six times the classic one!

[maxvertexcount(18)]
void GSArray( triangle GS_Input input[3], inout TriangleStream<PS_Input> TriStream )
{
PS_Input output;

// Render six times - for each slice of a cubemap

for( int rt = 0; rt < 6; rt ++ )
{
for( int v = 0; v < 3; v ++ )
{
output.Pos = mul( input[v].Pos, CubeViewMatrices[rt] );
output.Pos = mul( output.Pos, CubeProjectionMatrix );
output.RTIndex = rt;
output.VPIndex = 0;

TriStream.Append( output );
}

TriStream.RestartStrip();
}
}


this code is taken from the DX10FeatureDemo nVidia demo.

So even with frustum culling only, the classic mode should be faster…

• AGPX
0
101 Mar 14, 2008 at 14:50

@AGPX

we have the advantage that we perform 1/6th of the rendering calls, but on the other hand, we cannot perform a per-object frustum culling, so we need more drawing calls!

I think you have misunderstood how this works. You don’t save any object rendering calls, the lot still needs to be sent to the hardware. The reason why using a geometry shader to render to a cubemap is fast, is because you can do it in a single DrawIndexed call, instead of the six you need for the classical method (ie. one for each cubeface). You also don’t need to switch render-target seven times, which is an expensive operation, and you don’t need to do per-cubeface frustum culling, because everything around the reflected object gets sent to the hardware.

The performance gains far outweigh the fact that you need to transform every vertex an extra six times…
@AGPX

2) If one use an occlusion culling algorithm (like Portal rendering) then I think that a single pass cube rendering is a waste….

Culling stuff that cannot be seen will never be a waste :)

0
101 Mar 14, 2008 at 14:59

@AGPX

I haven’t understand why a single pass could save a lot of time for outside scenario… can you clarify it to me, please?

Take for instance a game like Crysis. You’ve got a lot of geometry to pass (mostly in the form of billboards or plants) and the trees surrounding you don’t have a way of completley blocking other trees from your view. That pretty much eliminates the possibility of occlusion culling and portal culling isn’t even applicable in that scenerio. So, since you can’t cull anyway, you might as well just pass all the geometry into the GPU once rather than initializing it 6 times.

I don’t know about the raw preformance of this technique on current hardware. It seems as though a lot of things the DirectX team comes up with that are supposed to provide speed improvements actually ends up making things slower… somehow. ;)

0
101 Mar 14, 2008 at 15:30

Kenneth Gorking:

Culling stuff that cannot be seen will never be a waste

I mean, that you can’t exploit occlusion culling algorithms (like Portals) to accelerate single-pass cubemap rendering.

I think you have misunderstood how this works. You don’t save any object rendering calls, the lot still needs to be sent to the hardware. The reason why using a geometry shader to render to a cubemap is fast, is because you can do it in a single DrawIndexed call, instead of the six you need for the classical method (ie. one for each cubeface).

For “rendering calls”, I mean “DrawIndexed”. You need to do M call to DrawIndexed in the single pass version (where M is the total number of objects of the scene). You need to do M1+M2+M3+M4+M5+M6 call to DrawIndexed in the multi pass version. Thanks to the frustum culling, M1+M2+M3+M4+M5+M6 is equal to M on average (could be bigger than M because some objects could be visible in more than one face of the cube). Anyhow, the total number of DrawIndexed calls needed are by far less than 6*M in the multipass version.

starstutter:

So, since you can’t cull anyway, you might as well just pass all the geometry into the GPU once rather than initializing it 6 times.

As explained before, you don’t need to initialize it 6 times. Anyway to understand this better, I will try to compute how much geometry (for an outdoor scenario) the six frustum see (in total) to figure out how much Drawing calls are necessary on average with multi pass version. I expect a number near to M (the total number of objects).

• AGPX
0
101 Mar 14, 2008 at 15:36

Kenneth Gorking:

You also don’t need to switch render-target seven times, which is an expensive operation

Ok, this is a possible advantage of the single pass rendering…

0
101 Mar 14, 2008 at 15:41

@AGPX

As explained before, you don’t need to initialize it 6 times.

I actually just worded that wrong. What I meant to say was it becomes possibe to just make 1 DrawIndexedPrimitive call, rather than 6 which would be the minimum with the 6 render method.

When I say initialized, I meant initialized as in passed to the GPU for drawing. I know that’s not the proper use for the word, but hey who says technology developers need to be technical :)

0
101 Mar 14, 2008 at 16:13

I actually just worded that wrong. What I meant to say was it becomes possibe to just make 1 DrawIndexedPrimitive call, rather than 6 which would be the minimum with the 6 render method.

Yes, I have understand that you meant DrawIndexedPrimitive call. But, what I try to say is that you DON’T need 6 times the DrawIndexedPrimitive call for the 6 render method (because the frustum culling reduce it). Let me clarify what I meant with the following draw:

In this picture you have 4 frustum. You need 19 DrawIndexed calls (instead of 18 in the single pass), but the point is that you don’t need 72 (=4*18 DrawIndexed calls). In general you have more DrawIndexed calls than a single pass, but statistically this number is comparable with the total number of objects.

0
101 Mar 14, 2008 at 16:41

OH!

Now I see why I was so confused. Although now that you mention this scenerio, I can tell you specificly what I was thinking. My mind was still on crysis where there is tons of billboarded vegetation, which can all be handled in a single draw pass (as opposed to using one call per object). You have to unload some work onto the CPU, but I actually like that because it typically improves framerates provided that you’re not already doing something CPU expensive (ie shadow volumes).

Now that I think about it though, you really just might want to stay with the 6 render method because of one optimization I failed to mention… reflection culling.

Ok, ok, so I kind of just made up that term, but here’s the idea behind it:
you cannot see a reflection on a mirror who’s normal is aligned with your angle of view.

From this, you may be able to cut up to 2 drawing calls depending on the nature of the object. One consequence might be the artifacts if the player might happen to see a part of a side that is not being drawn, but as a preventative, you could clear the render targets with partial transparency and fade out between the two. It’s doubtful to make a real visual difference.

0
101 Mar 14, 2008 at 17:24

you cannot see a reflection on a mirror who’s normal is aligned with your angle of view. From this, you may be able to cut up to 2 drawing calls depending on the nature of the object.

Sounds interesting… but I haven’t catch the idea, can you explain to me this optimization further, please? Thanks in advance.

0
101 Mar 14, 2008 at 18:17

@AGPX

Sounds interesting… but I haven’t catch the idea, can you explain to me this optimization further, please? Thanks in advance.

The optimization comes from the old and simple concept… don’t draw what you can’t see. Now that gets a tad bit more complicated when 2 cameras are thrown in.

The other concept is that you cannot see a surface that is facing away from you.

Imagine for a moment that you were rendering a mirror (this is not a math correct method but…) you have a camera attached to the front of it and everything it sees, it draws onto the surface of the mirror. Now imagine that the mirror is turned away from you and you can no longer see the shiny surface, and therefore nothing that is being reflected by the mirror.

At that point, why on earth would you draw the scene from the mirrors presepctive (the reflection) if you cant see the reflection at all?

So, (and I’m not totally sure about the exact math functions for this, but I have a very dirty implementation of it) all you need to do is determine (per reflective object) what faces of the cube that the camera cannot see.

Now for a visual representation without resorting to MSpaint… ;)

.

N
|
S--·--E        <) = camera      O = object
|
W

<)        O                    do not draw east side

O      (>            do not draw west side


Unfortunatley, I can’t give you any accurate math to use simply because mine is based off of 2D angles, IE only the side reflections can be occluded. It’s not the optimal way to do it, but assuming you have a character which stays on the ground it usually works just as well.

0
140 Mar 14, 2008 at 19:16

BTW starstutter: you can use the …[/code[b][/b]] tags to do ASCII art if you want. HTML in general (not just the forum) ignores multiple consecutive spaces. The code tags keep formatting (except for removing initial and final whitespace) and they use a monospaced font. [code]…[/code**] tags to do ASCII art if you want. HTML in general (not just the forum) ignores multiple consecutive spaces. The code tags keep formatting (except for removing initial and final whitespace) and they use a monospaced font.

0
101 Mar 14, 2008 at 20:17

awesome, thanks

0
101 Mar 15, 2008 at 09:19

Hi starstutter,

thanks for the nice explanation. And nice ASCII art too. ;)

Cheers

0
101 Mar 15, 2008 at 09:44

AGPX, I think your reference image is not really what’s generally happening in practice, unless if you are doing the rendering for space simulator or such. My gut feeling is that there is much more objects spanning across multiple cubemap faces in let say FPS type of games.

Anyway, what you have to do for each object is to check in which of the 6 FOVs they fall into, regardless which approach you take. Of course you could not do FOV check in GS approach, but I believe it would be worth to do it anyway to reduce the GPU load instead of always duplicating vertices to the 6 faces. Once you know which faces the object falls into, in case of having single pass geometry shader approach, you setup GS to duplicate the triangles only to those faces. The performance advantage is that you only call DrawPrimitive once for each object in any case (CPU savings) and execute vertex shader only as many times you would do while rendering single face (GPU savings).

In any case, with GS approach you will always have less overhead, so I don’t see why you wouldn’t do it. I haven’t used GS myself though, so there might be some things I’m overlooking.

0
101 Mar 15, 2008 at 16:59

You should check out the CubeMapGS sample in the DXSDK, you can toggle the use of standard vs. GS aproach. The speed difference is quite noticeable.

There is also a sample in the ATI SDK that does global illumination in 8 passes using the geometry shader. If they had done it the “old” way, it would have taken 240 passes, so the savings are quite massive.

0
101 Mar 15, 2008 at 18:07

You should check out the CubeMapGS sample in the DXSDK, you can toggle the use of standard vs. GS aproach. The speed difference is quite noticeable.

Yes is QUITE noticeable:

recompiling it with:

g_bUseRenderTargetArray = true  // singlepass
g_bRenderWithInstancing = true


Recompiling it with:

g_bUseRenderTargetArray = false  // multipass
g_bRenderWithInstancing = false


I obtain 42fps … :happy:

My system:

video card: nVidia Geforce 8600m GT 512 Mb
CPU: Intel Core2 Duo 6600 2,4 Ghz
Memory: 2 GB

0
101 Mar 15, 2008 at 19:17

Nice try. Try it without disabling instancing ;)

On my system:
singlepass: 51 fps
multipass : 28 fps

Video card: nVidia Geforce 8600 GS 256 Mb
CPU: Intel Core2 Duo E4500 2.2 Ghz
Memory: 3 GB

I’m beginning to think something might be wrong with your system. Are you using the latest drivers?

0
101 Mar 15, 2008 at 20:30

but .. rendering WITH instancing should also improve the frame rate … so i don’t see how turnignoff the render target array AND turning off the instancing would give a better frame rate.

0
101 Mar 16, 2008 at 02:23

@AGPX

video card: nVidia Geforce 8600m GT 512 Mb
CPU: Intel Core2 Duo 6600 2,4 Ghz
Memory: 2 GB

heh… just to humor myself… would you happen to own a toshiba laptop?

0
101 Mar 16, 2008 at 02:26

@Kenneth Gorking

Nice try. Try it without disabling instancing ;)

ummm, goz has a point. I think you slightly missed what he was trying to tell you.

He tried single pass WITH instancing and it was about a third as fast as multipass WITHOUT instancing. Therefore multipass is way faster. And the speed boost would be even more with instancing on.

0
140 Mar 16, 2008 at 03:07

But when Kenneth tried it, he got a significant speed boost when using single-pass. Which is what’s *supposed* to happen. The fact that AGPX got the opposite result on the same sample code and a very similar video card makes us think there is a problem with his driver.

In any case trying to compare two variables (single vs multipass and instancing) at the same time just makes things complicated. What AGPX should do is leave instancing on and ONLY flip between single-pass and multipass.

0
101 Mar 16, 2008 at 11:24

Arrgghhh!

When instancing is on, my performance fall down horribly!!! :blink:

Yes, I have a notebook. Not a Toshiba, but ASUS… ASUS C90S. I think that my video drivers have serious problem, especially when you talk about performance. ASUS latest video card driver released for C90S is the nVidia version 101.17, that’s quite old…… :no:

0
101 Mar 16, 2008 at 12:23

Ok guys,

I have found and installed a newer driver: 167.58 (non official, but Microsoft signed). Performance now is by far better for many examples (up to 400%!! :surprise:).

Anyhow, here the results on CubeMapGS sample (for the sphere):

Singlepass (g_bUseRenderTargetArray = true):

62 fps = instancing
18 fps = no instancing

Multipass (g_bUseRenderTargetArray = false):

24 fps = instancing
103 fps = no instancing (!)

So basically, multipass with no instancing give me the best performance… I’m still confused… may somebody give a try in the four cases and show me the framerate, please?

Thanks,

• AGPX
0
101 Mar 16, 2008 at 17:48

Yeah the instancing thing was a brain fart on my behalf, I realized that after I had left home and couldn’t edit it :)

If the regular nVidia drivers won’t install on your notebook, then try the omega drivers.

Edit: Hadn’t seen that last post, sorry. Those figures do not make sense… You aparently have a gfx card from the twilight zone :)

0
101 Mar 16, 2008 at 19:45

Twilight zone? What is?

0
140 Mar 16, 2008 at 23:10

AGPX: it was a joke, see this.