Jump to content


A few questions about shaders


7 replies to this topic

#1 starstutter

    Senior Member

  • Members
  • PipPipPipPip
  • 1039 posts

Posted 10 April 2008 - 03:00 PM

I actually have quite a few questions about shaders concerning preformance and quality. Feel free to answer any one you know. Much appreciated in advance, thanks:

1. in shaders, does packing things into seperate functions degrade speed at all? It's in Effect files if that makes any difference ie:

shader render()

{

     do everything;

}


vs.-----


shader render()

{

     apply cube map

     do normal mapping

     ect ect

}


2. is it better to try and do everything in one pass to get the same quality compared to doing multiple passes? ie: if you do normal mapping and cube mapping in the same shader, you can get bumpmapped environment mapping in one pass vs. having to do the normal mapping twice. I really don't know which is faster to be honest. What about in practical situations (ie a large scale scene with many different materials)

3. I can't seem to do dynamic branching correctly. It always does both paths of code no matter what I do (even the earliest rejection possible). could this be a driver problem, card problem, or am I missing something? I'm using normal 'If' statements and conditions set by the application.
(\__/)
(='.'=)
This is Bunny. Copy and paste bunny into
(")_(") your signature to help him gain world domination.
bunny also wants to fight spam: Click Here Bots!

#2 starstutter

    Senior Member

  • Members
  • PipPipPipPip
  • 1039 posts

Posted 10 April 2008 - 03:18 PM

oh and by the way, to clear up the single vs multipass issue, it's using SM 3 and I have tried implementing both using multiple lights, all using normal mapping.

Strangley enough, it seemed to actually go faster when I rendered all the normal mapping 3 times over rather than once with multiple lights in the same shader.
(\__/)
(='.'=)
This is Bunny. Copy and paste bunny into
(")_(") your signature to help him gain world domination.
bunny also wants to fight spam: Click Here Bots!

#3 Kenneth Gorking

    Senior Member

  • Members
  • PipPipPipPip
  • 911 posts

Posted 10 April 2008 - 11:08 PM

1. Functions are always inlined. I think there has been some talk about allowing actual 'call' opcodes for GPU's, but I do not know if they exist yet.

2. It pretty much depens on what you are doing, but for the example you described, the first option will be faster because it only results in one texture lookup to get the normal. Just keep in mind, that you want to keep your shaders as simple as possible while still getting the job done.

For large scenes, you can do a depth-first pass, which just lies down the depth. Then in the following passes you can execute whatever shader you want, and because of the depth-first pass, only pixels that are visible will be evaluated.

Another options is to use one shader for many materials. I remember an old nVidia sample that did 2 different BRDFs on a statue in the same shader in one pass.

In the end, I would recommend you to experiment with it.

3. This is most likely a compiler optimization to avoid excessive branching in small 'if'-blocks. Branching in shader code is costly, because the GPU processes multiple pixels at once. If just one of these takes the 'if' branch, all the other processors has to wait for it to finish before they can continue.

You should only use branching for large 'if'-blocks.
"Stupid bug! You go squish now!!" - Homer Simpson

#4 starstutter

    Senior Member

  • Members
  • PipPipPipPip
  • 1039 posts

Posted 11 April 2008 - 12:53 AM

thanks for the response.

Kenneth Gorking said:

1. Functions are always inlined. I think there has been some talk about allowing actual 'call' opcodes for GPU's, but I do not know if they exist yet.
Yeah, I kind of worded that funny. What I meant was...



OUTPUT CameraPS( INPUT IN  )	

{

             OUTPUT OUT;

		

	OUT.color = find_color(input stuff goes here) ;	


	return OUT;

}



float4 find_color(input stuff)

{

             do calculations;

             return calculations;

}



That's a more literal translation of what I meant. It never goes out of the shader, it just goes out of what would be considered Main().


Kenneth Gorking said:

2. It pretty much depens on what you are doing, but for the example you described, the first option will be faster because it only results in one texture lookup to get the normal. Just keep in mind, that you want to keep your shaders as simple as possible while still getting the job done.
I was really more wondering about the multi vs single pass lighting. I experimented a while back with a fairly small scene. One method used 3 lights in the same pixel shader making no repeat calculations. The other method redrew the polygons 3 times. The visual results were the same, but the multipass version ran almost 3x faster. I'm just convinced I'm doing something wrong with that. It doesn't make a bit of sense.

Kenneth Gorking said:

For large scenes, you can do a depth-first pass, which just lies down the depth. Then in the following passes you can execute whatever shader you want, and because of the depth-first pass, only pixels that are visible will be evaluated.
wait... do you mean rendering the depth (z-write) to the main render target (or back buffer I guess), then just drawing over it with subsequent passes? There's no shader work involved right? That's pretty clever :D

Kenneth Gorking said:

3. Branching in shader code is costly, because the GPU processes multiple pixels at once. If just one of these takes the 'if' branch, all the other processors has to wait for it to finish before they can continue.
but that's the thing, there's no way that any are supposed to execute the compicated path at all. The way I disable it is:



bool     complex;


OUTPUT CameraPS( INPUT IN  )	

{

             OUTPUT OUT;

		

             if (complex == true)

             { 

                      do complex things

              }

             do regular things;


	return OUT;

}




and I enable/disable it through the application via Effect->SetBool() and change it per object. I thought that may have been the problem but I left it off totally and no different results. :(
(\__/)
(='.'=)
This is Bunny. Copy and paste bunny into
(")_(") your signature to help him gain world domination.
bunny also wants to fight spam: Click Here Bots!

#5 Reedbeta

    DevMaster Staff

  • Administrators
  • 4979 posts
  • LocationBellevue, WA

Posted 11 April 2008 - 04:33 AM

starstutter said:

It never goes out of the shader, it just goes out of what would be considered Main().

I think Kenneth understood what you meant. What he's saying is that there is not really any such thing as a function call on a GPU. In your example, the shader compiler would just insert the code for find_color directly into CameraPS at the point where find_color is called. So, there's no performance penalty for splitting things into their own functions, but nor is there any benefit in compiled code size for doing so.

Quote

wait... do you mean rendering the depth (z-write) to the main render target (or back buffer I guess), then just drawing over it with subsequent passes? There's no shader work involved right? That's pretty clever :)

No shader work involved in the z-only pass. In fact, I've heard that many modern GPUs can write 2 pixels per clock in z-only mode, so it's really, really fast.

Quote

The way I disable it is: ... and I enable/disable it through the application via Effect->SetBool() and change it per object.

I think what you want in this case is for the shader compiler to actually create two different versions of the shader, one for complex = true and one for complex = false. You probably don't want this to be implemented as an actual branch in the pixel shader since there is a cost associated with that, even if it is highly coherent in screen space (due to being set on a per-object basis). I'm not sure exactly what shader language this is, but if it's Cg, I know you can create an effect file that has two different techniques, one of which compiles the shader with the boolean on and the other with the boolean off. Then you can switch between the techniques in the application. I'm not sure if it will do the same thing if you have a global boolean that you set like a parameter.
reedbeta.com - developer blog, OpenGL demos, and other projects

#6 Kenneth Gorking

    Senior Member

  • Members
  • PipPipPipPip
  • 911 posts

Posted 11 April 2008 - 05:08 AM

Reedbeta said:

I think what you want in this case is for the shader compiler to actually create two different versions of the shader, one for complex = true and one for complex = false. You probably don't want this to be implemented as an actual branch in the pixel shader since there is a cost associated with that, even if it is highly coherent in screen space (due to being set on a per-object basis). I'm not sure exactly what shader language this is, but if it's Cg, I know you can create an effect file that has two different techniques, one of which compiles the shader with the boolean on and the other with the boolean off. Then you can switch between the techniques in the application. I'm not sure if it will do the same thing if you have a global boolean that you set like a parameter.
It is possible with a global parameter, you just need to mark it as 'literal' (ie. compile-time constant) using cgSetParameterVariability(paramComplex, CG_LITERAL). Just make sure you mark it after you have loaded the shader and before you compile it.
For example:

[INDENT]load shader
mark 'complex' literal
set 'complex' = true
compile shader 1
set 'complex' = false
compile shader 2
[/INDENT]

That should produce 2 different shaders.
"Stupid bug! You go squish now!!" - Homer Simpson

#7 JarkkoL

    Senior Member

  • Members
  • PipPipPipPip
  • 467 posts

Posted 12 April 2008 - 04:36 PM

starstutter, if you set the bool for branching per object, you are doing static branching, not dynamic branching. Dynamic branching means that you evaluate a branch per pixel/vertex. If you do static branching, driver can generate optimized version of the shader run-time where non-taken branches are completely eliminated. This essentially results in same performance as if you had written different versions of the shader yourself and would set the proper shader before draw call. It's up to drivers though how static branching is implemented and if you get any performance hit from using them vs writing specialized shaders.

#8 starstutter

    Senior Member

  • Members
  • PipPipPipPip
  • 1039 posts

Posted 13 April 2008 - 03:55 PM

Kenneth Gorking said:

It is possible with a global parameter, you just need to mark it as 'literal' (ie. compile-time constant) using cgSetParameterVariability(paramComplex, CG_LITERAL).
[INDENT]load shader
mark 'complex' literal
set 'complex' = true
compile shader 1
[/INDENT]

hmmm, do you know of an HLSL equivelant?

also, can it *really* be switched between each object, or is it between each material switch? Like:


complex = true;

shader->begin();

draw_objects();

shader->end();


or can it really be...


shader->begin();

complex = true;

draw_some_objects();

complex = false;

draw_other_objects();

shader->end();


and the second option would have the same speed benefits?
(\__/)
(='.'=)
This is Bunny. Copy and paste bunny into
(")_(") your signature to help him gain world domination.
bunny also wants to fight spam: Click Here Bots!





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users