Using Bit OPs in vs3.0 to pack normals

782af6deea026deba2e019a216e3ed47
0
spacerat 103 Sep 30, 2006 at 18:41

Hi,

I have to save 3 normal vectors per vertex for every mesh in my current project, but would like to reduce the memory consumption if possible.

Is it possible to use bit wise operations of V3 shaders (shift left/-right/and/or) to unpack 3 byte-normal vectors from one float-normal vector ? The vertex shader should then unpack this float-normal to the former 3 normal vectors.

Each float number could be used like
bit 0-7: byte-normal 1
bit 8-15: byte-normal 2
bit 15-23: byte-normal 3

after unpacking, byte should be converted back to float somehow

Is there any chance to achieve this on current hardware ?
I also thought about using a float-texture ( I think byte textures are not yet supported by the vertex shader ).

So far, I wrote everything in GL

cheers,:whistle:

22 Replies

Please log in or register to post a reply.

340bf64ac6abda6e40f7e860279823cb
0
_oisyn 101 Sep 30, 2006 at 22:21

Why don’t you just use byte inputs instead of a single float? You’re not limited to floats, you know :). Also, as a normal always has length 1, you need only 2 of the components and you can calculate the third.

46407cc1bdfbd2db4f6e8876d74f990a
0
Kenneth_Gorking 101 Oct 01, 2006 at 01:25
782af6deea026deba2e019a216e3ed47
0
spacerat 103 Oct 01, 2006 at 06:59

The idea with packing normals by using the DX extension looks good :yes:
Is there something like this in OpenGL ?

Hm.. but I hoped I also could reduce the amount of used
texture coords for the deformation by packing. :unsure:

I just made an example in GL by using bitwise shift operations,
but it somehow couldnt be compiled, even I have NV GT6600,
newest driver and used cg 1.5 beta which should have the ps3.0 profile :sad:

Is there any flaw in the code, or is it possible that only DX9.c has ps3.0 ?

<CG Code>

C3E2v_Output C3E2v_varying(float2 position : POSITION,
int3 texCoord : COLOR)
{
C3E2v_Output OUT;

int r = texCoord.x & 255;
int g = (texCoord.x >> 8 )& 255;
int b = (texCoord.x >> 16 )& 255;
OUT.position= float4(position,0,1.0);
OUT.color = float3(float(r)/256.0,float(g)/256.0,float(b)/256.0);
}

<C Code, OpenGL>

glBegin(GL_TRIANGLES);
glTexCoord2i(0xff0000, 0);
glVertex2f(-0.8, 0.8);

glTexCoord2i(0x00ff00, 0);
glVertex2f(0.8, 0.8);

glTexCoord2i(0x0000ff, 0);
glVertex2f(0.0, -0.8);
glEnd();

46407cc1bdfbd2db4f6e8876d74f990a
0
Kenneth_Gorking 101 Oct 01, 2006 at 15:05

Is there something like this in OpenGL?

I would be surprised if there wasn’t an OpenGL extension somewhere, it that much was even required.

Is there any flaw in the code, or is it possible that only DX9.c has ps3.0?

The ps3.0 and vs3.0 is a DX only profile, but I doubt you need them anyway.

As for your code, I spottet 2errors:

1: ‘texCoord’ is declarared as an int3, should just be an int
2: ‘texCoord’ is mapped to COLOR semantic, but your code assigns the texture coordinate to TEXCOORD0.

Another optimization you might try, like .oisyn suggested, is to derive the third component of the normal in the shader, and pack the xy part along with the position.

// CG Code
C3E2v_Output C3E2v_varying(float4 position : POSITION)
{
    C3E2v_Output OUT;

    // Extract position
    OUT.position = float4(position.xy, 0.0, 1.0);

    // Derive normal
    float3 normal;
    normal.xy = position.zw;
    normal.z = sqrt( 1 - normal.x*normal.x - normal.y*normal.y );

    OUT.color = normal;
}

// C Code, OpenGL
glBegin(GL_TRIANGLES);
glVertex4f(-0.8,  0.8, 0.0, 0.0);
glVertex4f( 0.8,  0.8, 1.0, 0.0);
glVertex4f( 0.0, -0.8, 0.0, 1.0);
glEnd();

P.S. there is a new version Cg out you might wan’t to try.

782af6deea026deba2e019a216e3ed47
0
spacerat 103 Oct 22, 2006 at 03:30

Hm.. the idea to use the sqrt for saving 1 normal-value seems good - but its not working. you wont get the sign back. sqrt only gives positive numbers.

Maybe I should do something like render to vertex-array, then I could use byte-textures..

46407cc1bdfbd2db4f6e8876d74f990a
0
Kenneth_Gorking 101 Oct 22, 2006 at 14:07

Simple packing and unpacking of the normals is easy, and it’s just a single mad in the shader.

Render to vertex-array is only supported by Direct3D, unless youre hardware supports über-buffers.

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Oct 22, 2006 at 16:56

@Kenneth Gorking

Render to vertex-array is only supported by Direct3D, unless youre hardware supports über-buffers.

I’m not sure what you mean by “über-buffers”, but OpenGL has an extension that provides render to vertex-array, and it’s supported on (I believe) GF 6600 and up (not sure what the equivalent is in the ATI world now…)

46407cc1bdfbd2db4f6e8876d74f990a
0
Kenneth_Gorking 101 Oct 23, 2006 at 17:02

@Reedbeta

I’m not sure what you mean by “über-buffers”, but OpenGL has an extension that provides render to vertex-array, and it’s supported on (I believe) GF 6600 and up (not sure what the equivalent is in the ATI world now…)

Über-buffers, or SuperBuffers as they are also called, are basically just a generalization of the whole GL object thing where you use a memory object instead, that can be used as the memory of textures, framebuffers and vertex arrays. This means that you can render to the memory of a texture, and just attach that memory object to a vertex buffer, giving you render to vertex buffer. Read more…

I had read about pixel buffers, but they are not supported on my hardware so I never looked into them much. The functionality of render to vertex buffer can easily be simulated though, by rendering to a texture and then copying it to vertex buffer. Might not be as fast, but it should be easy to do.

340bf64ac6abda6e40f7e860279823cb
0
_oisyn 101 Oct 23, 2006 at 21:04

@spacerat

Hm.. the idea to use the sqrt for saving 1 normal-value seems good - but its not working. you wont get the sign back. sqrt only gives positive numbers.

You know, I tend to forget these kinds of technicalities, but the funny thing is, although I actually knew about this issue, I just realized why my quaternion-compression code for network packets I wrote ages ago didn’t work like it was supposed to :huh:

782af6deea026deba2e019a216e3ed47
0
spacerat 103 Oct 27, 2006 at 00:11

Ok, now I found a good way to store the normals - I simply use 4-byte-colors with VBO’s. I didnt try yet if I can use multiple attrib’s in GLSL to use more than two colors (average color + 2sided color), but it might be possible.

I’m now experimenting with FBO’s and render to vertex-buffer for skinned animation.

So rendering a float tex to the FBO is quite fast- but when it comes to the glReadPixels( .. ) to copy the FBO buffer to my VBO, the framerate almost drops to the half (rgba) or it drops completely (rgb) !

If the VBO/Framebuffer format is RGBA (GL_FLOAT_RGBA32_NV) it drops from
138 fps (render VBO) to
108 fps (render float texture+render VBO) to
70 fps (render texture+copy to VBO+render VBO)

and in case of RGB (GL_FLOAT_RGB32_NV) its even worse:

:lol: 161 fps (render VBO) to
:happy: 116 fps (render float texture+render VBO) to
:angry: 8 fps !!! (render float texture+copy to VBO+render VBO)
(Did I run into some software emulation ?)

Is there any reason why the glReadPixels is so slow ?
Shouldnt it be as fast (or faster) than rendering the float texture?

At the moment I only can think of using the FBO texture in the vertex-shader as alternative.. Unfortunately I dont have an ATI card so I cant use render to vertex buffer in DX9.

here the copy to VBO source
glReadBuffer(GL_COLOR_ATTACHMENT0_EXT);
glBindBufferARB(GL_PIXEL_PACK_BUFFER_EXT, vbo_vertices_handle);
glReadPixels(0, 0, tex_width, tex_height, GL_RGB, GL_FLOAT, 0);

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Oct 27, 2006 at 00:42

glReadPixels is awful because it forces data to be copied from the GPU’s memory back to system memory. The bus is designed to transfer data most efficiently in the opposite direction and so this kind of ‘readback’ is extremely slow. You should try to find a way to keep the data always on the GPU, if you can. Does your card support GL_ARB_pixel_buffer_object by any chance?

782af6deea026deba2e019a216e3ed47
0
spacerat 103 Oct 27, 2006 at 00:56

But in my case its a copy from GPU to GPU memory - the VBO is in the GPU and the FBO as well. (the readpixels pointer is 0)

I found that Readpixels causes a glFinish to wait all op’s to be finished. Maybe this could be a reason - however, this still cant explain the 8 fps for rgb.

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Oct 27, 2006 at 01:45

I’m pretty sure it really does copy it all the way back to the CPU. See, what’s happening is that the buffers are mapped into system memory, so they’re accessible to the CPU (using DMA or some such trick). Then the CPU grabs every 32-bit word of the one buffer and moves it into the other buffer. ReadPixels isn’t designed to instruct the GPU to perform a copy within itself, it really does force the CPU to touch every word. After all, as you said yourself, what else could explain the 8fps?

46407cc1bdfbd2db4f6e8876d74f990a
0
Kenneth_Gorking 101 Oct 27, 2006 at 02:16

Shouldn’t your glReadPixels(0, 0, tex_width, tex_height, GL_RGB, GL_FLOAT, 0) have GL_FLOAT_RGBA32_NV or GL_FLOAT_RGB32_NV instead of GL_RGB?

782af6deea026deba2e019a216e3ed47
0
spacerat 103 Oct 27, 2006 at 02:29

No,GL_RGBA/GL_RGB is correct. The other returns me an invalid operation.

And the copy should be done by GPU.
At least, that is what is stated here: http://www.gpgpu.org/developer/

I wrote my code by looking at the GPU Particle demo’s Render to vertex array class.

I also dont have an answer for the 8fps.

But I found half a solution. If I setup my FBO/VBO as rgb, and use readpixels with rgba, then its quite fast - however, the data seems to be crashed by doing this..
Is it not just 1:1 copy?

Here is another link:

http://oss.sgi.com/projects/ogl-sample/registry/ARB/pixel_buffer_object.txt

46407cc1bdfbd2db4f6e8876d74f990a
0
Kenneth_Gorking 101 Oct 27, 2006 at 02:59

Hmm, that sucks… Can’t really tell what could be wrong without the code youre using to set this up.

782af6deea026deba2e019a216e3ed47
0
spacerat 103 Oct 27, 2006 at 03:25

I think I know what’s going on. I tried to create a FBO without alpha - but it seems the driver always creates an alpha channel as well. This means the readpixels has to convert from rgba to rgb.

782af6deea026deba2e019a216e3ed47
0
spacerat 103 Oct 27, 2006 at 14:26

Oh - btw - there are a couple of shader instructions to pack/unpack 8-bit values etc. here a few : PK2H, PK2US, PK4B, PK4UB, PK4UBG, UP2H, UP2US, UP4B, UP4UB, UP4UBG

I’m not sure if they are provided by GLSL or HLSL, but they seem definitely useful

B91eae75cd6245bd8074bd0c3f1cc495
0
Nils_Pipenbrinck 101 Oct 27, 2006 at 16:49

@Reedbeta

glReadPixels is awful because it forces data to be copied from the GPU’s memory back to system memory. The bus is designed to transfer data most efficiently in the opposite direction and so this kind of ‘readback’ is extremely slow. You should try to find a way to keep the data always on the GPU, if you can. Does your card support GL_ARB_pixel_buffer_object by any chance?

Btw, while glReadPixels is still a bad idea (due to the CPU/GPU interlock) the copy-performance is not as bad anymore. With PCI-Express cards readback is rather fast.

46407cc1bdfbd2db4f6e8876d74f990a
0
Kenneth_Gorking 101 Oct 27, 2006 at 17:45

Hmm, maybe someone should put together a small tutorial on render-to-vertexbuffers and include the methods from this thread to show how it can alternatively be done, if it’s not supported by ones hardware…

782af6deea026deba2e019a216e3ed47
0
spacerat 103 Oct 28, 2006 at 06:42

render to vertex array is continued here

http://www.devmaster.net/forums/showthread.php?t=7457

782af6deea026deba2e019a216e3ed47
0
spacerat 103 Oct 28, 2006 at 23:51

Here yet another opportunity to pack/unpack normals inside the shader.
(just discovered in the ATI r2vb documentation)

Packing:
Out.Pos_Normal.w = dot(floor(normal * 127.5 + 127.5), float3(1 / 256.0, 1, 256.0));

UnPacking:
float3 normal = frac(Pos_Normal.w * float3(1, 1 / 256.0, 1 / 65536.0)) * 2 - 1;