I've got an Idea, is it feasible ? loads of InputLayout and a vertex shader with a big input struct

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Mar 02, 2012 at 14:06

Hi all,

I was thinking about this for a while.

Is it possible to create just one vertex shader with a big input vertex struct and selectively activate a subset of semantics varying just the input layout ? And also, does that make sense and is convenient in general?

The D3D11 CreateInputLayout function doesn’t prevent to do this, except that it will launch you a warning, but that warning is not blocking.

From the remarks:

If a data type in the input-layout declaration does not match the data type in a shader-input signature, CreateInputLayout will generate a warning during compilation. The warning is simply to call attention to the fact that the data may be reinterpreted when read from a register. You may either disregard this warning (if reinterpretation is intentional) or make the data types match in both declarations to eliminate the warning.

Now I have two quetions:

1) If that is theoretically possible, will my vertex input data be interpreted correctly from the Input Assembler ?
2) If yes, will this approach be efficent ? I mean normally I’ve always heard that people tend to keep the input struct for a vertex shader as small as possible to limit the vertex shader input size.

If both the two questions will be satisfied it would be possible to keep one shader and more input layouts that selectively will activate the semantics in the vshader input struct …. !

What do you guys think about this ?

Thanks in advance for any reply

18 Replies

Please log in or register to post a reply.

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 168 Mar 02, 2012 at 20:51

If I understand correctly, you want to have a vertex shader with more input parameters than the vertex buffer has? E.g. the shader might read TEXCOORD0 and TEXCOORD1, but the vertex buffer only has TEXCOORD0?

I don’t think this makes sense. The shader still uses TEXCOORD1, so where should that data come from?

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Mar 03, 2012 at 08:17

@Reedbeta

If I understand correctly, you want to have a vertex shader with more input parameters than the vertex buffer has? E.g. the shader might read TEXCOORD0 and TEXCOORD1, but the vertex buffer only has TEXCOORD0? I don’t think this makes sense. The shader still uses TEXCOORD1, so where should that data come from?

So, the basic idea is a way to minimize shader number/permutations what about the other way around? More input parameters in the vertex buffer and less in the input struct of the vertex shader ? Even though I can’t see how this could solve the problem, since in that case I’m still constrained in declaring a vertex shader for each permutation:

One with just position
One with position and texcoord0
One with position,texcoord,normal


Maybe that is useful only if I want to avoid to check that a mesh input params match the vertex shader input signature (e.g. If I have a vertex shader with just the position and I’ll bind a vertex buffer with 3 input parameters, only the position will be used in the shader and the other two coming from the vbuffer will be ignored). If this is possible I won’t need to check that the shader signature of the vshader referenced by the material referenced by a mesh will necessary match the inpuLayout (vertex buffer input params) …

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 168 Mar 03, 2012 at 18:43

Yes, the other way is fine; you can have extra components in the vertex buffer that are not used by the vertex shader. There may under some circumstances be a performance penalty involved in this (relative to having a vertex buffer with only the components you actually need), but the rendering will work correctly.

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Mar 03, 2012 at 18:59

@Reedbeta

Yes, the other way is fine; you can have extra components in the vertex buffer that are not used by the vertex shader. There may under some circumstances be a performance penalty involved in this (relative to having a vertex buffer with only the components you actually need), but the rendering will work correctly.

How big/important is this performance penalty? I mean should I be careful or is an approach that is widely used?
Or should I check all the time with the profiler?

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 168 Mar 03, 2012 at 20:19

It varies depending on hardware, shaders, the scene, etc. Profiling is your best bet, as with any GPU performance question. It might not be an issue at all, as it’s only relevant if reading the vertices from memory is the bottleneck, i.e. you have to be both vertex-bound and memory-bandwidth-bound.

B5262118b588a5a420230bfbef4a2cdf
0
Stainless 151 Mar 04, 2012 at 12:26

I think on some platforms the struct passed to the shader is expected to be packed.

  float4 position;
  float2 uv;
  float4 colour;

If you have a structure in memory like this ….

  float4 position;
  float   user_parameter;
  float2 uv;
  float  another_user_parameter;
  float4 colour;

Passing that to the shader would require you to shuffle stuff around in memory.

Not 100% sure about this, but I think I am correct.

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 168 Mar 04, 2012 at 17:08

At least on PC/console GPUs you can specify the stride and offset for each component when you set up the vertex buffer - that information is what the “input layout” object in D3D represents. I don’t have any experience with mobile GPUs (for instance); maybe some of them aren’t able to do that.

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Mar 09, 2012 at 09:33

@Reedbeta

Yes, the other way is fine; you can have extra components in the vertex buffer that are not used by the vertex shader. There may under some circumstances be a performance penalty involved in this (relative to having a vertex buffer with only the components you actually need), but the rendering will work correctly.

But if the other way is fine, the fact that I have to pass the byteCode of a compiled shader to the CreateInputLayout function it makes me wonder … ;)

HRESULT CreateInputLayout(
  [in]   const D3D11_INPUT_ELEMENT_DESC *pInputElementDescs,
  [in]   UINT NumElements,
  [in]   const void *pShaderBytecodeWithInputSignature,
  [in]   SIZE_T BytecodeLength,
  [out]  ID3D11InputLayout **ppInputLayout
);

And the next logical conclusion is that because I’m considering the input layout creation process decoupled from the vertex shader input signature, then when I’ll create the input layout I might need to keep a dummy vertex shader just to have the byteCode around.

a vertex shader like this could be used to have a valid bytecode:

float4 VS(in float4 pos : POSITION) : SV_Position
{
   return float4(0.f,0.f,0.f,0.f);
}

The CreateInputLayout will generate a warning if the vertex input parameters doesn’t match that signature (just the POSITION), I’ll ignore the warning and I’ll have my inputLayout created.

Still, this makes me wonder that even though this approach could work it looks so unnatural to me !!! :(

I feel that might not be the right way to go …

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 168 Mar 10, 2012 at 08:08

Does it really give a warning if the vertex format you pass in contains more than the shader uses? That would be annoying, but you can ignore it if it is what you expect.

The reason the input layout depends on the shader is that the shader needs to know how to fetch the components from the vertex buffer. On modern GPUs the hardware doesn’t decode vertices for you; the shader code has to contain instructions to load the vertex components from the buffer. But the vertex shader is compiled without knowing the final buffer layout, so the way this is accomplished is to split it into the main vertex shader and a “preshader”. The main vertex shader assumes that the vertex data it wants are already loaded into a set of registers. The preshader contains a bunch of load instructions to get the data from the vertex buffer with the right offsets and formats, and decode them to float and put them in those registers. The input layout object is really just this preshader. That’s why it needs both the exact vertex buffer layout and the vertex shader bytecode, since it’s generating the glue code that hooks up one to the other. The preshader gets prepended to the main vertex shader internally when you set the vertex shader and input layout in a device context.

Anyway, I can’t tell you if this approach is “the right way to go” for you. You have to decide that after considering the possibilities.

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Mar 10, 2012 at 13:14

@Reedbeta

Does it really give a warning if the vertex format you pass in contains more than the shader uses? That would be annoying, but you can ignore it if it is what you expect.

The reason the input layout depends on the shader is that the shader needs to know how to fetch the components from the vertex buffer. On modern GPUs the hardware doesn’t decode vertices for you; the shader code has to contain instructions to load the vertex components from the buffer. But the vertex shader is compiled without knowing the final buffer layout, so the way this is accomplished is to split it into the main vertex shader and a “preshader”. The main vertex shader assumes that the vertex data it wants are already loaded into a set of registers. The preshader contains a bunch of load instructions to get the data from the vertex buffer with the right offsets and formats, and decode them to float and put them in those registers. The input layout object is really just this preshader. That’s why it needs both the exact vertex buffer layout and the vertex shader bytecode, since it’s generating the glue code that hooks up one to the other. The preshader gets prepended to the main vertex shader internally when you set the vertex shader and input layout in a device context.

Anyway, I can’t tell you if this approach is “the right way to go” for you. You have to decide that after considering the possibilities.

Cool,
I made a test with input layout vertex attributes number > input shader semantics and it works !! If the shader input semantics are just 2 and the input layout has 3 of them, it just select the 2 of the input layout and it works! someone was saying that maybe It could have problem with some drivers … but It works on my nvidia 560!

and btw where did you find all this detailed informations ?

thanks for the infos

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 168 Mar 10, 2012 at 16:48

Cool! Glad it’s working. FYI, I doubt very much this will have problems on any drivers. It’s not really that difficult to ignore some data. :)

I got the terminology a bit wrong, it turns out; the preshader I was talking about is called a “fetch shader” (because it fetches data from the vertex buffers); the word “preshader” means something else (a kind of hoisting optimization performed by the D3D frontend). Anyway, there’s no specific place I learned about it. There are various whitepapers about GPU internals floating around the Web. Here’s one that mentions fetch shaders, for example - it’s about a bit older generation of AMD GPUs, but it’s reasonable to assume that their newer GPUs and probably NVIDIA GPUs also do similar things.

Also, if you haven’t read it, the A Trip Through the Graphics Pipeline articles are extremely informative and well worth reading.

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Mar 10, 2012 at 17:27

@Reedbeta

Cool! Glad it’s working. FYI, I doubt very much this will have problems on any drivers. It’s not really that difficult to ignore some data. :)

I got the terminology a bit wrong, it turns out; the preshader I was talking about is called a “fetch shader” (because it fetches data from the vertex buffers); the word “preshader” means something else (a kind of hoisting optimization performed by the D3D frontend). Anyway, there’s no specific place I learned about it. There are various whitepapers about GPU internals floating around the Web. Here’s one that mentions fetch shaders, for example - it’s about a bit older generation of AMD GPUs, but it’s reasonable to assume that their newer GPUs and probably NVIDIA GPUs also do similar things.

Also, if you haven’t read it, the A Trip Through the Graphics Pipeline articles are extremely informative and well worth reading.

Thanks very very much, I think those articles are gold :D !

Now I can use one shader for more meshes regardless ;)

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Mar 13, 2012 at 09:44

One last question Beg:

When I create my input layout how many of TEXCOORD[n] I can declare ? I bet the max number is hw specific, but it is also true that It won’t exist a vertex input element descriptor big enough to justify, say, more than 4 TEXCOORD as input ?

I ask this because I’m trying to understand how many element of that type I can declare in my enum to have a realistic number of them ! Also, the most complex input layout that I’ve seen was the one that was used for instancing …

Btw for now I’m considering a maximum of 6 for each type (i.e. TEXCOORD0 - TEXCOORD5, TANGENT0 - TANGENT5 etc. …)

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 168 Mar 13, 2012 at 17:28

TEXCOORD, TANGENT, etc. are just user-supplied labels at this point, so they can be arbitrary. It used to be that they mapped to specific hardware registers but nowadays all attributes are treated the same. You can actually make up your own names, like

void vs_main(
    float4 foo : MY_MADE_UP_THING5,
    out float4 pos : SV_POSITION)
{
    pos = foo;
}

This compiles in vs_4_0 or vs_5_0, not in lower profiles though (which have specific predefined semantic labels).

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Mar 13, 2012 at 17:37

@Reedbeta

TEXCOORD, TANGENT, etc. are just user-supplied labels at this point, so they can be arbitrary. It used to be that they mapped to specific hardware registers but nowadays all attributes are treated the same. You can actually make up your own names, like

void vs_main(
    float4 foo : MY_MADE_UP_THING5,
    out float4 pos : SV_POSITION)
{
    pos = foo;
}

This compiles in vs_4_0 or vs_5_0, not in lower profiles though (which have specific predefined semantic labels).

But it’s still better to predefine them for backward compatibility and allow the user to define its own custom semantics beyond the ones that have already been “predefined” by me. I say this cause I don’t know on ps3 or ios how it’s going to be …

Sidetrack question: In lower profiles (I mean below vs_4_0 and vs_5_0) what were the specific predefined semantic labels ?? (If I can state that precisely I could predefine them too in my engine and allow custom semantics only on profiles greater or equal than 4_0 and 5_0, still mantaining the backward compatibility).

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 168 Mar 13, 2012 at 17:43

Ah, if you’re targeting non-D3D10-11 devices then yes, you’ll have to stay within the predefined set of semantics. :) In that case, TEXCOORD likely goes up to 7 or so (varies per device of course, no idea how many you’d get on iOS). COLOR, NORMAL, and TANGENT probably only have 0 and 1. But you can also use generic names like ATTR0, ATTR1, etc, which go up to however many attributes the hardware supports. Note that those are internally aliased to the position/color/texcoord/etc. attributes, e.g. ATTR0 is the same as POSITION, so best not to mix and match the ATTR ones with TEXCOORD and friends.

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Mar 13, 2012 at 18:01

@Reedbeta

Ah, if you’re targeting non-D3D10-11 devices then yes, you’ll have to stay within the predefined set of semantics. :) In that case, TEXCOORD likely goes up to 7 or so (varies per device of course, no idea how many you’d get on iOS). COLOR, NORMAL, and TANGENT probably only have 0 and 1. But you can also use generic names like ATTR0, ATTR1, etc, which go up to however many attributes the hardware supports. Note that those are internally aliased to the position/color/texcoord/etc. attributes, e.g. ATTR0 is the same as POSITION, so best not to mix and match the ATTR ones with TEXCOORD and friends.

Where I can find those specs ? I was trying to look for shader model 3_0 to know how many semantics were predefined but i couldn’t find infos… plus I’m worried about other platforms like ps3 etc even though I guess for ps3 should be similar as it should support shader model 3_0.
Plus in the microsoft documentation http://msdn.microsoft.com/en-us/library/bb509647%28v=vs.85%29.aspx is mentioned that dx9 and dx10 support all those vertex shader input semantics, which are more than the ones you told me (concerning dx9 and therefore shader model 3_0).
Plus, it doesn’t say what’s the maximum allowed value of [n] after a given semantic (e.g. TEXCOORD[n]).

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 168 Mar 13, 2012 at 21:38

I found some information about it in the Cg docs. Here is the vs_3_0 list of semantics, but it doesn’t have the generic attribute mappings. I did find those on the vp40 page; vp40 is an OpenGL profile that I think is (roughly) equivalent with vs_3_0. I might have been mistaken: the generic “ATTR0” etc. might only be applicable to Cg/GLSL; I’m not sure whether they work in D3D9 HLSL.