DX10/DX11 Object States Caching: the Just Cause 2 way ...

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Nov 30, 2011 at 17:09

Hi All,

I’ve read the article on GPU Pro about states caching in Just Cause 2. They use some kind of bit packing mechanism to have a transparent multiplatform interface to manage states. The idea is to create every state block beforehand and cache it (you know dx10/dx11 validation has been shifted at creation time). However state blocks, like the rasterizer state, group a lot of states and I’ve found very difficult to pack them all in a 64bit integer bitmask.
Here is the union/struct layout to outline you guys the basic idea:

struct RasterizerState
{           
     Real32 DepthBiasClamp;    //Real32 is typedef float Real32;
     Real32 SlopeScaleDepthBias;
     union
      {
          U64 index; //index used to lookup the cache
          struct
          {
              S64 DepthBias          : 32;      
              S64 FillMode              : 2;
              S64 CullMode              : 2;
              S64 FrontCounterClockwise : 1; //vertex winding order (CW/CCW)
              S64 DepthClipEnable           : 1;
              S64 ScissorEnable         : 1;
              S64 MultisampleEnable     : 1;
              S64 AntialiasedLineEnable : 1;                        
          };
      };    
  };

I can’t have DepthBiasClamp and SlopeScaleDepthBias to fit in the bit fields. I thought to split them but we loose the advantage of a unique state block and the index must be unique. Any Idea to have everything in ? Or alternate solutions ? the important thing is that “index” is unique. If I chose to not to put them in the

bit field, the value of “index” is not garantied to be unique.

thanks all in advance!

22 Replies

Please log in or register to post a reply.

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 168 Nov 30, 2011 at 17:42

I haven’t seen the article you’re talking about, but how about just hashing the whole state object into a 32-bit or 64-bit hash? Then you could have as much data in there as you wanted.

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Nov 30, 2011 at 17:59

@Reedbeta

I haven’t seen the article you’re talking about, but how about just hashing the whole state object into a 32-bit or 64-bit hash? Then you could have as much data in there as you wanted.

Could you get a bit more in depth? I’m currently using a stl map to cache them not a hash table. With my scheme I get the uniqueness straight from te bit fields, so no need to rely on a hash function, therefore a map should do the job !

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 168 Nov 30, 2011 at 18:35

Well, if you have more bits than you can fit in a 64-bit bitfield, then you either need a wider bitfield or you need to map the data into a narrower representation, which is what hashing does.

Actually, I take back my previous answer - hashing would probably work here, but it’s not necessarily the best solution. You could just use multiple 64-bit bitfields, and write a custom operator< that would compare and sort on all the bitfields; then you could store all these in an STL container and search for them regardless of how much state data you need to keep.

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Nov 30, 2011 at 20:08

@Reedbeta

Well, if you have more bits than you can fit in a 64-bit bitfield, then you either need a wider bitfield or you need to map the data into a narrower representation, which is what hashing does. Actually, I take back my previous answer - hashing would probably work here, but it’s not necessarily the best solution. You could just use multiple 64-bit bitfields, and write a custom operator< that would compare and sort on all the bitfields; then you could store all these in an STL container and search for them regardless of how much state data you need to keep.

Actually I’ve found the type __int128 under msvc, but I doubt that would be a good choice in a cross platform engine, as I think is used to work mostly with see stuff. I’d like to avoid multiples bit fields as I don’t want to add unnecessary overhead on top of the simple integer comparison.

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 168 Nov 30, 2011 at 20:20

Well, you must either add some overhead somewhere, or compress/cut the supported states to fit in 64 bits. These lookups are all being done at engine startup or while loading a level, right? So why the concern over a few extra compares?

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Nov 30, 2011 at 20:24

@Reedbeta

Well, you must either add some overhead somewhere, or compress/cut the supported states to fit in 64 bits. These lookups are all being done at engine startup or while loading a level, right? So why the concern over a few extra compares?

Well, actually the lookup is done anytime that we have to switch state, so at engine startup I create the state and cache it and then I access the cache once that I have to switch the state object ! I switch state only after that I’ve checked the indices against each other to manage redundancy !

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 168 Nov 30, 2011 at 21:13

Why not just have pointers to the state objects instead of indices? When you create them at startup you can ensure they’re unique so that identical state implies equal pointers.

EDIT: To be clear, I mean that each shader would have a pointer to the state block for that shader (assigned on startup / at load time). You could have a global variable keeping track of the current state block, so when you switch shaders you can avoid re-setting the state if it’s the same as the last shader.

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Nov 30, 2011 at 21:31

@Reedbeta

Why not just have pointers to the state objects instead of indices? When you create them at startup you can ensure they’re unique so that identical state implies equal pointers.

Because the engine has to be multiplatform and that way has proven to be the best possible in order to map states across platform transparently abstracting from the underlining API/platform implementation details. That approach is described on gpu pro in the post mortem section of the book ! I understand your point, but I also need to have logarithmic/constant lookup time when I have to search for a state for switching. For this purpose I use a stl map, which is actually a red-black tree, so logarithmic complexity in the worst case. It’s not a hash table but is still good. You want to use pointers straight away and compare them to see I’d they are different and in that case set the state. I’m not sure if it’s general enough fo accommodate a multiplatform solution in the future.

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 168 Nov 30, 2011 at 21:52

I don’t see what the issue is with multiplatform. Sure, the contents of the state block will be different depending upon the API/platform in use, but why does that matter? Just uniquify the states at startup based on whatever data they happen to contain. This can be done in any STL container you like, using a custom operator< for the state struct. There’s no need to do any lookup at runtime when switching states; just have each shader keep a pointer to its desired state and check to see if that state is already set or not.

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Nov 30, 2011 at 22:00

@Reedbeta

I don’t see what the issue is with multiplatform. Sure, the contents of the state block will be different depending upon the API/platform in use, but why does that matter? Just uniquify the states at startup based on whatever data they happen to contain. This can be done in any STL container you like, using a custom operator< for the state struct. There’s no need to do any lookup at runtime when switching states; just have each shader keep a pointer to its desired state and check to see if that state is already set or not.

Ok, I see what you mean and is very clear to me ! But then I miss their point in doing the things that way … I need to understand the advantages …

340bf64ac6abda6e40f7e860279823cb
0
_oisyn 101 Dec 01, 2011 at 11:50

@SuperPixel

Actually I’ve found the type __int128 under msvc

There is no such thing as an __int128 under MSVC++ (for the standard PC platforms at least)

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Dec 01, 2011 at 16:28

@.oisyn

There is no such thing as an __int128 under MSVC++ (for the standard PC platforms at least)

The syntax highlighter (at least under visual studio 2010) accept __int128, so it looks like they are supported (maybe through register spanning?), I think that is for use with sse things … ? Can you confirm on that ?

46407cc1bdfbd2db4f6e8876d74f990a
0
Kenneth_Gorking 101 Dec 02, 2011 at 11:43

There isn’t, it is propably just reserved for possible future use.

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Dec 02, 2011 at 12:30

@Kenneth Gorking

There isn’t, it is propably just reserved for possible future use.

I made a simple testing application and got that that type is not supported by the current platform hardware :) … ironically :P

340bf64ac6abda6e40f7e860279823cb
0
_oisyn 101 Dec 02, 2011 at 13:08

Even if MSVC++ supported it, the x86 does not contain any 128 bits int arithmetic units or instructions that work on 128 bits ints, so they would need to be simulated in software. You can easily create a class that does just that.

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Dec 02, 2011 at 15:47

@.oisyn

Even if MSVC++ supported it, the x86 does not contain any 128 bits int arithmetic units or instructions that work on 128 bits ints, so they would need to be simulated in software. You can easily create a class that does just that.

The problem is that some of the state blocks of dx10/dx11 are too stateful and I can’t get them to fit into just a 64bit integer index (see my first post on this thread to understand what I mean). I’m wondering if I can drop some of them or squeeze some others to let them to fit right away in the bit packing pattern. The index value is used to lookup the map for the concrete stateObject.

29981539823f049ef795b3ca3b4a8781
0
AnthonyW 101 Dec 03, 2011 at 01:54

The trick is that you don’t need to encode the states into your 64 bit ‘StateKey’. In fact… you definitely do NOT want to do that. Instead, you want the key to be able to lookup predefined/precreated states. The following should give you an idea although it’s obviously not meant to be used directly

// RenderStateManager.h
struct StateKey
{
int rasterizerStateIndex : 8; // these would be appropriately sized based on the number of states you expect to encounter

int depthStencilStateIndex : 4;
};

std::vector<ID3D11RasterizerState*> rasterizerStates;
std::vector<ID3D11DepthStencilState*> depthStencilStates;

StateKey GetStateKey(D3D11_DEPTH_STENCIL_DESC stencil, D3D11_RASTERIZER_DESC rast)
{
StateKey result;

result.depthStencilStateIndex = FindDepthStencilStateInterfaceIndexByDesc(stencil);

result.rasterizerStateIndex = FindRasterizerStateInterfaceIndexByDesc(rast);
return result;
}

BindStates(StateKey key)
{

d3d11Context->OMSetDepthStencilState(depthStencilStates[key.depthStencilStateIndex]);

d3d11Context->RSSetState(rasterizerStates[key.rasterizerStateIndex]);
}

I hope this gives you the idea. You could obviously put whatever you want in your StateKey. Other things to add may be InputAssemblerIndex, RenderTargetsIndex, or even direct state such as a single bit for IsTransparent, or a few bits defining which render pass or full screen phase this entity should be rendered to.

The real point is to realize that the D3D state objects have a LOT of state in each one of them… but you’ll never need every possible combination of those states. I think you’ll find that you’ll likely only have a few to a few dozen different items for each of the actual state objects.

I’m too tired to proof read this - hope it makes sense and more importantly, hope it helps.

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Dec 03, 2011 at 12:19

@AnthonyW

The trick is that you don’t need to encode the states into your 64 bit ‘StateKey’. In fact… you definitely do NOT want to do that. Instead, you want the key to be able to lookup predefined/precreated states. The following should give you an idea although it’s obviously not meant to be used directly

// RenderStateManager.h
struct StateKey
{
int rasterizerStateIndex : 8; // these would be appropriately sized based on the number of states you expect to encounter

int depthStencilStateIndex : 4;
};

std::vector<ID3D11RasterizerState*> rasterizerStates;
std::vector<ID3D11DepthStencilState*> depthStencilStates;

StateKey GetStateKey(D3D11_DEPTH_STENCIL_DESC stencil, D3D11_RASTERIZER_DESC rast)
{
StateKey result;

result.depthStencilStateIndex = FindDepthStencilStateInterfaceIndexByDesc(stencil);

result.rasterizerStateIndex = FindRasterizerStateInterfaceIndexByDesc(rast);
return result;
}

BindStates(StateKey key)
{

d3d11Context->OMSetDepthStencilState(depthStencilStates[key.depthStencilStateIndex]);

d3d11Context->RSSetState(rasterizerStates[key.rasterizerStateIndex]);
}

I hope this gives you the idea. You could obviously put whatever you want in your StateKey. Other things to add may be InputAssemblerIndex, RenderTargetsIndex, or even direct state such as a single bit for IsTransparent, or a few bits defining which render pass or full screen phase this entity should be rendered to.

The real point is to realize that the D3D state objects have a LOT of state in each one of them… but you’ll never need every possible combination of those states. I think you’ll find that you’ll likely only have a few to a few dozen different items for each of the actual state objects.

I’m too tired to proof read this - hope it makes sense and more importantly, hope it helps.

The idea of the Just Cause 2 (JC2) method lies on the fact that to mantain the backward compatibily with the consoles way of managing states, it had to be in the way I depicted in my first post (so a multiplatform scenario, see gpu pro article). It’s main advantage is to facilitate the mapping accross platforms, in a clean and lighter way. Obviously I realize (after that I’ve read your explanation) that I can also drop some states because they might be never used in the end. But your approach is still interesting though different from the one used in JC2.
Infact, they do use 32 bit and 64 bit state key integer to encode that much states and they assign the right number of sub-bits from those keys to encode the right range of states for each field in the union.
I’ll think about your approach as well and I’ll see if I can find any parallel comparison to come up with a good hybrid solution.

Thanks for your help

29981539823f049ef795b3ca3b4a8781
0
AnthonyW 101 Dec 03, 2011 at 17:50

My system would definitely still work in a multiplatform way, I just chose to use D3D11 types to make it obvious. ‘StateKey’ is already multiplatform in the example above. The two state objects that are parameters to GetStateKey() simply need to be made in a platform independent way as well… and then you can hide the implementation behind an interface of some sort.

Here’s how it could look if written more properly.

struct RasterizerState { /* ... for all intents and purposes, you could just copy D3D11_RASTERIZER_DESC ... */ };
struct DepthStencilState { /* ... again, copy D3D11 */ };

struct IRenderStateManager
{
    virtual StateKey GetStateKey(RasterizerState raster, DepthStencilState depth) = 0;
    virtual void BindState(StateKey key) = 0;
}

The D3D implementation would be almost exactly as the previous sample, except that it would have to map or convert from our platform independent state objects to the D3D11 equivalents before rendering.

The OpenGL implementation could instead look something more like.

class OpenGLRenderStateManager : public IRenderStateManager
{
    std::vector<RasterizerState> rasterizerStates;
    std::vector<DepthStencilState> depthStencilStates;
    StateKey currentStateKey;                                           // currentState (for state caching)

public:
    virtual StateKey GetStateKey(RasterizerState raster, DepthStencilState depth)
    {
        // pretty much the same as for D3D only we're storing our StateStructs directly, instead of storing the ID3D11*State objects in our std::vectors (since obviously, opengl doesn't have a direct equivalent)
    }

    virtual void BindState(StateKey key)
    {
        if (key == currentStateKey)
            return;

        BindDepthStencilState(key);
        BindRasterizerState(key);

        currentStateKey = key;
    }

private:
    void BindDepthStencilState(StateKey key)
    {
        if (key.depthStencilIndex == currentStateKey.depthStencilIndex)
            return;

        DepthStencilState& newState = depthStencilStates[key.depthStencilStateIndex];
        DepthStencilState& currentState = depthStencilStates[currentStateKey];

        if (newState.DepthEnable != currentState.DepthEnable)
        {
            if (newState.DepthEnable)
                glEnable(GL_DEPTH_TEST);
            else
                glDisable(GL_DEPTH_TEST);
        }

        // ... other states
    }
}

Hope this help. Anyway, I purchased GPU Pro from Amazon, I expect it to arrive Monday. So… I’ll read it and get back to you. Although, the system I’ve described should work reasonably well for you.

Best of luck.

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Dec 05, 2011 at 15:22

@AnthonyW

My system would definitely still work in a multiplatform way, I just chose to use D3D11 types to make it obvious. ‘StateKey’ is already multiplatform in the example above. The two state objects that are parameters to GetStateKey() simply need to be made in a platform independent way as well… and then you can hide the implementation behind an interface of some sort.

Here’s how it could look if written more properly.

struct RasterizerState { /* ... for all intents and purposes, you could just copy D3D11_RASTERIZER_DESC ... */ };
struct DepthStencilState { /* ... again, copy D3D11 */ };

struct IRenderStateManager
{
    virtual StateKey GetStateKey(RasterizerState raster, DepthStencilState depth) = 0;
    virtual void BindState(StateKey key) = 0;
}

The D3D implementation would be almost exactly as the previous sample, except that it would have to map or convert from our platform independent state objects to the D3D11 equivalents before rendering.

The OpenGL implementation could instead look something more like.

class OpenGLRenderStateManager : public IRenderStateManager
{
    std::vector<RasterizerState> rasterizerStates;
    std::vector<DepthStencilState> depthStencilStates;
    StateKey currentStateKey;                                          // currentState (for state caching)

public:
    virtual StateKey GetStateKey(RasterizerState raster, DepthStencilState depth)
    {
        // pretty much the same as for D3D only we're storing our StateStructs directly, instead of storing the ID3D11*State objects in our std::vectors (since obviously, opengl doesn't have a direct equivalent)
    }

    virtual void BindState(StateKey key)
    {
        if (key == currentStateKey)
            return;

        BindDepthStencilState(key);
        BindRasterizerState(key);

        currentStateKey = key;
    }

private:
    void BindDepthStencilState(StateKey key)
    {
        if (key.depthStencilIndex == currentStateKey.depthStencilIndex)
            return;

        DepthStencilState& newState = depthStencilStates[key.depthStencilStateIndex];
        DepthStencilState& currentState = depthStencilStates[currentStateKey];

        if (newState.DepthEnable != currentState.DepthEnable)
        {
            if (newState.DepthEnable)
                glEnable(GL_DEPTH_TEST);
            else
                glDisable(GL_DEPTH_TEST);
        }

        // ... other states
    }
}

Hope this help. Anyway, I purchased GPU Pro from Amazon, I expect it to arrive Monday. So… I’ll read it and get back to you. Although, the system I’ve described should work reasonably well for you.

Best of luck.

Yeah I know your approach and it was my first idea. Then during my research and because I wanted to have something flexible and simple enough to map states accross different platforms (e.g. consoles too) in the most transparent way, I’ve found the approach described in gpu pro to be quite interesting to experiment with. It is actually part of the whole description of the approaches that the guys that developed Just Cause took to implement many other things, not only states management. So, have a look in the post-mortem part of GPU Pro.

29981539823f049ef795b3ca3b4a8781
0
AnthonyW 101 Dec 06, 2011 at 02:48

Ah, sounds cool. Guess I’ll just have to wait until the book arrives. :)

Cd586a7130b6cb95bed9ae57223fad5c
0
SuperPixel 101 Dec 07, 2011 at 12:12

@AnthonyW

Ah, sounds cool. Guess I’ll just have to wait until the book arrives. :)

Once you’ve read about the state management part I’d like to discuss with you about it :)