Shader Permutations hell :D

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Mar 20, 2012 at 15:00

Hi all,

the topic is quite clear and nothing new I guess :P.

So I’ll keep it straight: My first Idea was to code a predefined set of shaders and let the user that was making materials to select them (so no node based approach like unreal material editor or maya hypershader).
I was thinking to avoid in this way not only the explosion that was resulting from the node based approaches but also the fact that shaders generated with node based approaches are more prone in being not optimized at all, or at least, less mantainable in the long run.
The idea is to let the engine programmers to write shaders and optimize them. So, a typical technical artist is allowed only to create materials with a given set of shaders. I think is the crytek material editor approach and naughty dog material editor too: few shaders, optimized and several materials using them.
I’m realizing now that If I write at hand, for example, bump mapping with no specular map (e.g. for a rock material, or in general for a material that doesn’t have specular properties) then when I have to write the shader which has specular map I have to duplicate all the code since they differ only for a texture: the specular map. It’s a waste of time and I’m wondering if I’ll be constrained in making the usual uber shader or if there is a better and faster way to generate shaders considering that most of the code is shared.
At them moment in the engine I’ll have to write a set of function and classes for each new shader added … not fast at all if I want to add a new shader.

What you guys suggest ?

I was thinking of some kind of shader cache that is loaded upfront etc …

Thanks in advance for any reply

22 Replies

Please log in or register to post a reply.

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Mar 20, 2012 at 17:11

For the engine stuff I would try to go with a more data-driven approach, so that you don’t have to add functions and classes for every shader, but have some more generic shader code that can read some metadata attached to the shader (e.g. in annotations or comments in the effect file) to tell your engine whatever it needs to know about the shader.

As for permutations in general, ubershaders are a valid approach. You can write a kind of preprocessor to generate all the technique definitions for the different combinations for you (ideally limiting it to only the combinations you actually use in your game), to save you the work of typing and maintaining those yourself. Again, having some metadata in the effect file can be helpful here.

Another approach is to write a library of shader functions to implement each feature, like ApplyBumpMap(), ApplySpecularMap(), etc., then write a generator that strings together a bunch of these functions to make a pixel shader. The functions would have to have a defined order and operate on some common data model for this to work. I haven’t tried this approach myself but it seems feasible.

In D3D11 there’s also shader dynamic linking, but I don’t know much about it. There are some docs on MSDN here.

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Mar 20, 2012 at 17:27

When you talk about effect file you mean an actual d3d *.fx file or it’s just a general way to indicate a shader ? I do not use fx framework at all …

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Mar 20, 2012 at 17:30

I meant an actual .fx file. I think it’s cleaner to have all the associated shaders (vertex, pixel, etc.) in one file, as that system does.

If you work with bare shaders alone in their own files then you even more badly need some kind of metadata-driven approach to keep them all organized. :)

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Mar 20, 2012 at 17:34

@Reedbeta

I meant an actual .fx file. I think it’s cleaner to have all the associated shaders (vertex, pixel, etc.) in one file, as that system does. If you work with bare shaders alone in their own files then you even more badly need some kind of metadata-driven approach to keep them all organized. :)

Is it still possible to have some metadata stuff (like annotations) in a bare shader ?

I currently have pixel and vertex shader in one file and since the engine is multiplatform I’d like to be decoupled from the fx framework.

Could you provide an example of a use case for your approach ? metada in the shader etc ?

thanks

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Mar 20, 2012 at 17:55

I don’t know if annotations are supported in bare shaders. Probably not, though. You could still use comments with a specific format that your tools could parse, something like

// [Metadata]
// RenderPass = Opaque
// Textures = Diffuse, Normal, Specular
// ShadowShader = shared_shadow_shader.fx

Just making up some stuff as an example, but you get the idea of the kind of things you might store in there.

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Mar 22, 2012 at 08:41

@Reedbeta

I don’t know if annotations are supported in bare shaders. Probably not, though. You could still use comments with a specific format that your tools could parse, something like

// [Metadata]
// RenderPass = Opaque
// Textures = Diffuse, Normal, Specular
// ShadowShader = shared_shadow_shader.fx

Just making up some stuff as an example, but you get the idea of the kind of things you might store in there.

What will be the chain from production to actual in engine compiling of a given shader ?

I mean:

1) An artist will create a material, this material will have different options set etc.
2) then this options will be defines in the uber shader ?…

Static branch removal means that if I have:

if( bHasSpecularMap ) I’ll replace it with #ifdef HAS_SPECULAR_MAP ?

I’ve read on the fxc compiler documentation I can define macros to pass to the shader compiler, but they don’t look to be like #define WHATEVER (I mean that they don’t seem to be preprocessor macros).

Plus, one last question that is tormenting me: Why we can’t get away with just a fixed amount of shaders ? Something like: The shaders needed to represent all the known material are, for example, 10 or something constant not too big and use those 10 shaders from a tool GUI editor selecting the shader of interest between those 10 and change just its parametrization !?
Why Uber shaders? Why we need so many permutations in a big game ? I ask this, maybe, silly question because I can’t see why we can’t just use 10 or 20 shaders (a constant number of them) and share them among several more materials ??

The general trend seems to be loads of permutations …

Thanks

Fe8a5d0ee91f9db7f5b82b8fd4a4e1e6
0
JarkkoL 102 Mar 22, 2012 at 17:06

You really should avoid large number of shader permutations. I have seen how deep that rabbit hole goes and it aint pretty ;) For example, instead of having #ifdef HAS_SPECULAR_MAP, just pass white 4x4 default texture to the shader if you don’t use specular map. Now, nit-pickers will whine at this point that you will spend precious shader cycles for this. That’s BS. If that’s REALLY your performance problem, just replace that shader with optimized version when you are in optimization phase and close to ship your app when you actually know how much that specific version of the shader is used in your app. You can also use static branching on DX9 (if you are not working on PS3, which I presume you are not), or like Reedbeta said, use D3D11 dynamic shader linkage. Note that static branching will cause on-fly shader compilations by driver however, which may cause issues in form of frame rate jagginess. We had to address this issue for SC:CT by iterating through different permutations in the beginning of the map, which we gathered by playing through the map.

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Mar 22, 2012 at 17:21

@JarkkoL

You really should avoid large number of shader permutations. I have seen how deep that rabbit hole goes and it aint pretty ;) For example, instead of having #ifdef HAS_SPECULAR_MAP, just pass white 4x4 default texture to the shader if you don’t use specular map. Now, nit-pickers will whine at this point that you will spend precious shader cycles for this. That’s BS. If that’s REALLY your performance problem, just replace that shader with optimized version when you are in optimization phase and close to ship your app when you actually know how much that specific version of the shader is used in your app. You can also use static branching on DX9 (if you are not working on PS3, which I presume you are not), or like Reedbeta said, use D3D11 dynamic shader linkage. Note that static branching will cause on-fly shader compilations by driver however, which may cause issues in form of frame rate jagginess. We had to address this issue for SC:CT by iterating through different permutations in the beginning of the map, which we gathered by playing through the map.

I’m still on the #ifdef thing. I’ll actually be on ps3 very soon and I think I’ll go with #ifdefs there (as I don’t know how much time will steal texture fetches there. The ps3 gpu is not really powerful afaik). Static branching can be avoided I guess still using #ifdefs (I don’t know much about static branching and its performances and problems, so I’m not into it for now).
I’m going multiplatform and I’ll touch ios too, so I think static branching is really just d3d stuff …
Btw I’m thinking to have a uber shader per platform, I think is the best way to keep the things separated.
You idea to send white texture if not used is interesting, but I don’t think can be applied on all platform.
How expensive could be on ps3, ios, win etc ? Maybe not much on win, as It’ll with dx10/dx11 (maybe in that case I’ll send a 2x2 texture. It doesn’t have to be necessarily 4x4 I guess as long it’s power of 2)…

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Mar 22, 2012 at 17:28

I prefer to use uniform bools with if statements rather than #defines. The code looks nicer that way. You should be able to write (or have your code generate) wrapper functions with the appropriate combination of bools. Something like:

// myshader.ps
float4 PixelShader(float2 uv : TEXCOORD0, uniform bool bHasNormalMap, uniform bool bHasSpecularMap)
{
    // do some stuff
    if (bHasNormalMap) {
        // etc.
    }
    if (bHasSpecularMap) {
        // etc.
    }
}

float4 PixelShaderNormal(float2 uv : TEXCOORD0)
{
    return PixelShader(uv, true, false);
}

float4 PixelShaderSpecular(float2 uv : TEXCOORD0)
{
    return PixelShader(uv, false, true);
}

float4 PixelShaderNormalSpecular(float2 uv : TEXCOORD0)
{
    return PixelShader(uv, true, true);
}

Then you’d compile it with one of the PixelShaderBlahBlah functions specified as entry point. The compiler will optimize away the branches and eliminate any unused code. You could compile the same file multiple times with different entry points to get different versions of it. You could also write some code to automatically generate the PixelShaderBlahBlah functions; you could put in some metadata to tell the generator what it needs to know to write that code.

This is not the same thing as static branching; it’s really just the same as using #defines, but with nicer syntax.

You could also use #defines and #ifdefs; it conceptually works the same way. “#define HAS_SPECULAR_MAP” turns into “/D HAS_SPECULAR_MAP” on the fxc command line, or “#define HAS_SPECULAR_MAP 1” becomes “/D HAS_SPECULAR_MAP=1”. The PS3 shader compiler has similar options.

However, as Jarkko said, having a huge number of permutations may not be necessary and if it is a performance issue, you should establish that it is before actually doing a ton of work on a system like this.

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Mar 22, 2012 at 17:45

@Reedbeta

I prefer to use uniform bools with if statements rather than #defines. The code looks nicer that way. You should be able to write (or have your code generate) wrapper functions with the appropriate combination of bools. Something like:

// myshader.ps
float4 PixelShader(float2 uv : TEXCOORD0, uniform bool bHasNormalMap, uniform bool bHasSpecularMap)
{
    // do some stuff
    if (bHasNormalMap) {
        // etc.
    }
    if (bHasSpecularMap) {
        // etc.
    }
}

float4 PixelShaderNormal(float2 uv : TEXCOORD0)
{
    return PixelShader(uv, true, false);
}

float4 PixelShaderSpecular(float2 uv : TEXCOORD0)
{
    return PixelShader(uv, false, true);
}

float4 PixelShaderNormalSpecular(float2 uv : TEXCOORD0)
{
    return PixelShader(uv, true, true);
}

Then you’d compile it with one of the PixelShaderBlahBlah functions specified as entry point. The compiler will optimize away the branches and eliminate any unused code. You could compile the same file multiple times with different entry points to get different versions of it. You could also write some code to automatically generate the PixelShaderBlahBlah functions; you could put in some metadata to tell the generator what it needs to know to write that code.

This is not the same thing as static branching; it’s really just the same as using #defines, but with nicer syntax.

You could also use #defines and #ifdefs; it conceptually works the same way. “#define HAS_SPECULAR_MAP” turns into “/D HAS_SPECULAR_MAP” on the fxc command line, or “#define HAS_SPECULAR_MAP 1” becomes “/D HAS_SPECULAR_MAP=1”. The PS3 shader compiler has similar options.

However, Jarkko that having a huge number of permutations may not be necessary and if it is a performance issue, you should establish that it is before actually doing a ton of work on a system like this.

The uniforms are managed the same way constants in a constant buffer ? Shouldn’t constants be treated as uniform too ?

What does it mean: The compiler will optimize away the branches ? I thought it was best to avoid branching at all, I mean standard ifs …

maybe on some other platform #ifdefs would be more reliable ? I mean we can’t be sure that every compiler accross different platform will optimize the branches away … ?!

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Mar 22, 2012 at 18:04

Any half-decent compiler should optimize away branches with a constant condition. I’m not aware of any shader compiler in which this wouldn’t work. It certainly works in D3D and on PS3.

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Mar 22, 2012 at 18:09

@Reedbeta

Any half-decent compiler should optimize away branches with a constant condition. I’m not aware of any shader compiler in which this wouldn’t work. It certainly works in D3D and on PS3.

Sorry reedbeta for my question about constant buffers, I didn’t notice that you where passing constant true or false etc… BTW everything makes more sense to me now. Thanks a lot.

Fe8a5d0ee91f9db7f5b82b8fd4a4e1e6
0
JarkkoL 102 Mar 22, 2012 at 18:43

PS3 doesn’t support static branching, which is why I mentioned it. I’m just saying that if you take the ubershader #ifdef route, you can make your life easily miserable if you don’t keep the number of shader permutations in tight control. To support permutations, I would rather define the PixelShader*() functions what Reedbeta had, and just have different entry functions written manually for different optimized permutations. One major problem with #ifdef stuff is that it becomes very difficult to validate that your code is actually correct, since you have to test every possible permutation. Consider that you can compile \~4 shader permutations in a sec, so if you have 8 #ifdefs, that takes alone 1min to compile all permutations just for one shader. So, in my opinion write one shader so that you don’t need permutations for those, and then support manual permutations for special optimized cases. If you build your shaders to support random permutations, you will end up situations like finding permutations for a map and compiling only those (which causes issues such as shader code validation), implementing multi-threaded compilation to combat shader compilation times, consuming insane amount of memory (yes, I said this is a problem) just for shader code and shit like that. I just can’t emphasize enough how much I hate that shader permutation stuff ;)

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Mar 23, 2012 at 07:27

@JarkkoL

PS3 doesn’t support static branching, which is why I mentioned it. I’m just saying that if you take the ubershader #ifdef route, you can make your life easily miserable if you don’t keep the number of shader permutations in tight control. To support permutations, I would rather define the PixelShader*() functions what Reedbeta had, and just have different entry functions written manually for different optimized permutations. One major problem with #ifdef stuff is that it becomes very difficult to validate that your code is actually correct, since you have to test every possible permutation. Consider that you can compile \~4 shader permutations in a sec, so if you have 8 #ifdefs, that takes alone 1min to compile all permutations just for one shader. So, in my opinion write one shader so that you don’t need permutations for those, and then support manual permutations for special optimized cases. If you build your shaders to support random permutations, you will end up situations like finding permutations for a map and compiling only those (which causes issues such as shader code validation), implementing multi-threaded compilation to combat shader compilation times, consuming insane amount of memory (yes, I said this is a problem) just for shader code and shit like that. I just can’t emphasize enough how much I hate that shader permutation stuff ;)

With reedbeta approach how many permutations on average we can have ? I was always wondering how it could be possible that one can easily go past the 1000 permutations ! How this can be possible considering that the materials in general are not that many ? Bump, reflection env map, parallax, displacement with tessellation, skin and that’s it ! How can be possible to have loads of variants from so few materials? (not considering the ones for particles now and general purpose effects) …

6837d514b487de395be51432d9cdd078
0
TheNut 179 Mar 23, 2012 at 11:13

It can add up depending on how far you want to optimize your shaders. Reed was only giving an example based on your original post of bump mapping with vs without specular reflection. Imagine you have materials that require several kinds of diffuse or specular reflection models, or perhaps you have other reflection models to consider such as Fresnel reflection, or multi-textured scenarios where some models use specular maps vs those without. The list can drag on. In theory, a lot of shader code could be reused. In practice, because of the nature of how shaders were designed, IMO it’s not practical to try and maximize that. I use a special pre-compile stage to import extra functions into shaders, but that’s about the extent I go with code reuse. If necessary I will handcraft a shader for a specific instance and that’s the end of it.

B5262118b588a5a420230bfbef4a2cdf
0
Stainless 151 Mar 23, 2012 at 13:35

I agree with TheNut, I don’t use ubershaders at all.

I would rather have one shader per material if it came to it. After all the disk space required for a 100 shaders won’t add up to the space required for a single hires texture.

If you are worried about the complexity of managing lots of shaders, I would move it back a stage in the build process.

Start thinking about maybe using a xml file to define the shader, then in the build phase create a text file containing a valid shader which gets compiled into the game.

It depends on your build environment, but every environment I have used allows custom build stages.

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Mar 23, 2012 at 18:34

@Stainless

I agree with TheNut, I don’t use ubershaders at all.

I would rather have one shader per material if it came to it. After all the disk space required for a 100 shaders won’t add up to the space required for a single hires texture.

If you are worried about the complexity of managing lots of shaders, I would move it back a stage in the build process.

Start thinking about maybe using a xml file to define the shader, then in the build phase create a text file containing a valid shader which gets compiled into the game.

It depends on your build environment, but every environment I have used allows custom build stages.

Ok I understand well now. Once you have your shader variant selected how you expose the constant params that can be set to that given shader without creating a ne function every time that is needed ? Or you actually manage that simply adding a new function to a bunch of functions?

B5262118b588a5a420230bfbef4a2cdf
0
Stainless 151 Mar 23, 2012 at 20:36

This is the way I would do it.

1) Have the artist save the material in a format I can parse
2) Add them to the game project as resources with a custom build step
3) Parse the material file to create a shader
4) (Optional) Compile the shader into a binary form.
5) (Optional) Convert the binary shader to source code
6) Parse the material again to create a c++ class, (Optional) attach source code from stage 5
7) Compile the created class and add to link file list

Then in the game all I would do is map the material name stored in the mesh to a class and create a new instance of it.

Sounds complicated, but it really isn’t that bad, depending on your development environment.

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Mar 24, 2012 at 19:01

@Stainless

This is the way I would do it.

1) Have the artist save the material in a format I can parse
2) Add them to the game project as resources with a custom build step
3) Parse the material file to create a shader
4) (Optional) Compile the shader into a binary form.
5) (Optional) Convert the binary shader to source code
6) Parse the material again to create a c++ class, (Optional) attach source code from stage 5
7) Compile the created class and add to link file list

Then in the game all I would do is map the material name stored in the mesh to a class and create a new instance of it.

Sounds complicated, but it really isn’t that bad, depending on your development environment.

Part 2 is the same as part 6 ? The custom build rule will generate something, that I think is a class?
You write a python script or just plain c++ to parse and generate stuff?

B5262118b588a5a420230bfbef4a2cdf
0
Stainless 151 Mar 24, 2012 at 21:30

Part 2 and part 6 are completely different.

Part 2 generates source code for a shader

Part 6 generates a c++ class that loads the shader in handles all variables that need to be defined

How you do it depends on your build environment, c++, lua, perl, etc. are all possibilities

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Mar 24, 2012 at 21:38

@Stainless

Part 2 and part 6 are completely different.

Part 2 generates source code for a shader

Part 6 generates a c++ class that loads the shader in handles all variables that need to be defined

How you do it depends on your build environment, c++, lua, perl, etc. are all possibilities

When you say in part 2: add them to your project as resources, what you precisely mean? You mean the material saved in part 1 ? If I have a uber shader part 2 will compile the correct shader permutation?

Part 6 is clear.

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Mar 26, 2012 at 18:16

@Reedbeta

I meant an actual .fx file. I think it’s cleaner to have all the associated shaders (vertex, pixel, etc.) in one file, as that system does. If you work with bare shaders alone in their own files then you even more badly need some kind of metadata-driven approach to keep them all organized. :)

But in a multiplatform context will I have equivalents for .fx files if I might want to use them?

PS3 -> *.cgfx ?
DX11 -> *.fx
iOS -> ? Cgfx maybe?

I’m thinking to switch to fx files for the benefit of having annotations and techniques. Otherwise if go with bare shaders I can see that I have to add extra data by myself anyway…
Are there any cons in using fx files like approach in a multiplatform engine?
I find useful to annotate pass to be sure that a given pair pixel/vertex shader is going to be rendered in a well defined pipeline stage. Also the use of annotations to link the shader params to the materials ones etc.
I find wasteful, to write a parser by myself to do the same exact things in the end …
What do you think? Do I really have to go with bare shaders if I have to add all of that stuff? Why I would ;)?