Jump to content


Using Bit OPs in vs3.0 to pack normals


22 replies to this topic

#1 spacerat

    New Member

  • Members
  • PipPip
  • 25 posts

Posted 30 September 2006 - 06:41 PM

Hi,

I have to save 3 normal vectors per vertex for every mesh in my current project, but would like to reduce the memory consumption if possible.

Is it possible to use bit wise operations of V3 shaders (shift left/-right/and/or) to unpack 3 byte-normal vectors from one float-normal vector ? The vertex shader should then unpack this float-normal to the former 3 normal vectors.

Each float number could be used like
bit 0-7: byte-normal 1
bit 8-15: byte-normal 2
bit 15-23: byte-normal 3

after unpacking, byte should be converted back to float somehow

Is there any chance to achieve this on current hardware ?
I also thought about using a float-texture ( I think byte textures are not yet supported by the vertex shader ).

So far, I wrote everything in GL

cheers,:whistle:

#2 .oisyn

    DevMaster Staff

  • Moderators
  • 1822 posts

Posted 30 September 2006 - 10:21 PM

Why don't you just use byte inputs instead of a single float? You're not limited to floats, you know :). Also, as a normal always has length 1, you need only 2 of the components and you can calculate the third.
C++ addict
-
Currently working on: the 3D engine for Tomb Raider.

#3 Kenneth Gorking

    Senior Member

  • Members
  • PipPipPipPip
  • 911 posts

Posted 01 October 2006 - 01:25 AM

http://www.ati.com/d...texNormals.html
This also works with vs1.1 shaders :w00t:
"Stupid bug! You go squish now!!" - Homer Simpson

#4 spacerat

    New Member

  • Members
  • PipPip
  • 25 posts

Posted 01 October 2006 - 06:59 AM

The idea with packing normals by using the DX extension looks good :yes:
Is there something like this in OpenGL ?

Hm.. but I hoped I also could reduce the amount of used
texture coords for the deformation by packing. :unsure:

I just made an example in GL by using bitwise shift operations,
but it somehow couldnt be compiled, even I have NV GT6600,
newest driver and used cg 1.5 beta which should have the ps3.0 profile :sad:

Is there any flaw in the code, or is it possible that only DX9.c has ps3.0 ?

<CG Code>

C3E2v_Output C3E2v_varying(float2 position : POSITION,
int3 texCoord : COLOR)
{
C3E2v_Output OUT;

int r = texCoord.x & 255;
int g = (texCoord.x >> 8 )& 255;
int b = (texCoord.x >> 16 )& 255;
OUT.position= float4(position,0,1.0);
OUT.color = float3(float®/256.0,float(g)/256.0,float(b)/256.0);
}

<C Code, OpenGL>

glBegin(GL_TRIANGLES);
glTexCoord2i(0xff0000, 0);
glVertex2f(-0.8, 0.8);

glTexCoord2i(0x00ff00, 0);
glVertex2f(0.8, 0.8);

glTexCoord2i(0x0000ff, 0);
glVertex2f(0.0, -0.8);
glEnd();

#5 Kenneth Gorking

    Senior Member

  • Members
  • PipPipPipPip
  • 911 posts

Posted 01 October 2006 - 03:05 PM

Quote

Is there something like this in OpenGL?
I would be surprised if there wasn't an OpenGL extension somewhere, it that much was even required.

Quote

Is there any flaw in the code, or is it possible that only DX9.c has ps3.0?
The ps3.0 and vs3.0 is a DX only profile, but I doubt you need them anyway.

As for your code, I spottet 2errors:

1: 'texCoord' is declarared as an int3, should just be an int
2: 'texCoord' is mapped to COLOR semantic, but your code assigns the texture coordinate to TEXCOORD0.

Another optimization you might try, like .oisyn suggested, is to derive the third component of the normal in the shader, and pack the xy part along with the position.


// CG Code

C3E2v_Output C3E2v_varying(float4 position : POSITION)

{

	C3E2v_Output OUT;


	// Extract position

	OUT.position = float4(position.xy, 0.0, 1.0);


	// Derive normal

	float3 normal;

	normal.xy = position.zw;

	normal.z = sqrt( 1 - normal.x*normal.x - normal.y*normal.y );


	OUT.color = normal;

}


// C Code, OpenGL

glBegin(GL_TRIANGLES);

glVertex4f(-0.8,  0.8, 0.0, 0.0);

glVertex4f( 0.8,  0.8, 1.0, 0.0);

glVertex4f( 0.0, -0.8, 0.0, 1.0);

glEnd();


P.S. there is a new version Cg out you might wan't to try.
"Stupid bug! You go squish now!!" - Homer Simpson

#6 spacerat

    New Member

  • Members
  • PipPip
  • 25 posts

Posted 22 October 2006 - 03:30 AM

Hm.. the idea to use the sqrt for saving 1 normal-value seems good - but its not working. you wont get the sign back. sqrt only gives positive numbers.

Maybe I should do something like render to vertex-array, then I could use byte-textures..

#7 Kenneth Gorking

    Senior Member

  • Members
  • PipPipPipPip
  • 911 posts

Posted 22 October 2006 - 02:07 PM

Simple packing and unpacking of the normals is easy, and it's just a single mad in the shader.

Render to vertex-array is only supported by Direct3D, unless youre hardware supports über-buffers.
"Stupid bug! You go squish now!!" - Homer Simpson

#8 Reedbeta

    DevMaster Staff

  • Administrators
  • 4979 posts
  • LocationBellevue, WA

Posted 22 October 2006 - 04:56 PM

Kenneth Gorking said:

Render to vertex-array is only supported by Direct3D, unless youre hardware supports über-buffers.

I'm not sure what you mean by "über-buffers", but OpenGL has an extension that provides render to vertex-array, and it's supported on (I believe) GF 6600 and up (not sure what the equivalent is in the ATI world now...)
reedbeta.com - developer blog, OpenGL demos, and other projects

#9 Kenneth Gorking

    Senior Member

  • Members
  • PipPipPipPip
  • 911 posts

Posted 23 October 2006 - 05:02 PM

Reedbeta said:

I'm not sure what you mean by "über-buffers", but OpenGL has an extension that provides render to vertex-array, and it's supported on (I believe) GF 6600 and up (not sure what the equivalent is in the ATI world now...)

Über-buffers, or SuperBuffers as they are also called, are basically just a generalization of the whole GL object thing where you use a memory object instead, that can be used as the memory of textures, framebuffers and vertex arrays. This means that you can render to the memory of a texture, and just attach that memory object to a vertex buffer, giving you render to vertex buffer. Read more...

I had read about pixel buffers, but they are not supported on my hardware so I never looked into them much. The functionality of render to vertex buffer can easily be simulated though, by rendering to a texture and then copying it to vertex buffer. Might not be as fast, but it should be easy to do.
"Stupid bug! You go squish now!!" - Homer Simpson

#10 .oisyn

    DevMaster Staff

  • Moderators
  • 1822 posts

Posted 23 October 2006 - 09:04 PM

spacerat said:

Hm.. the idea to use the sqrt for saving 1 normal-value seems good - but its not working. you wont get the sign back. sqrt only gives positive numbers.

You know, I tend to forget these kinds of technicalities, but the funny thing is, although I actually knew about this issue, I just realized why my quaternion-compression code for network packets I wrote ages ago didn't work like it was supposed to :huh:
C++ addict
-
Currently working on: the 3D engine for Tomb Raider.

#11 spacerat

    New Member

  • Members
  • PipPip
  • 25 posts

Posted 27 October 2006 - 12:11 AM

Ok, now I found a good way to store the normals - I simply use 4-byte-colors with VBO's. I didnt try yet if I can use multiple attrib's in GLSL to use more than two colors (average color + 2sided color), but it might be possible.

I'm now experimenting with FBO's and render to vertex-buffer for skinned animation.

So rendering a float tex to the FBO is quite fast- but when it comes to the glReadPixels( .. ) to copy the FBO buffer to my VBO, the framerate almost drops to the half (rgba) or it drops completely (rgb) !

If the VBO/Framebuffer format is RGBA (GL_FLOAT_RGBA32_NV) it drops from
138 fps (render VBO) to
108 fps (render float texture+render VBO) to
70 fps (render texture+copy to VBO+render VBO)

and in case of RGB (GL_FLOAT_RGB32_NV) its even worse:

:lol: 161 fps (render VBO) to
:happy: 116 fps (render float texture+render VBO) to
:angry: 8 fps !!! (render float texture+copy to VBO+render VBO)
(Did I run into some software emulation ?)

Is there any reason why the glReadPixels is so slow ?
Shouldnt it be as fast (or faster) than rendering the float texture?

At the moment I only can think of using the FBO texture in the vertex-shader as alternative.. Unfortunately I dont have an ATI card so I cant use render to vertex buffer in DX9.

here the copy to VBO source
glReadBuffer(GL_COLOR_ATTACHMENT0_EXT);
glBindBufferARB(GL_PIXEL_PACK_BUFFER_EXT, vbo_vertices_handle);
glReadPixels(0, 0, tex_width, tex_height, GL_RGB, GL_FLOAT, 0);

#12 Reedbeta

    DevMaster Staff

  • Administrators
  • 4979 posts
  • LocationBellevue, WA

Posted 27 October 2006 - 12:42 AM

glReadPixels is awful because it forces data to be copied from the GPU's memory back to system memory. The bus is designed to transfer data most efficiently in the opposite direction and so this kind of 'readback' is extremely slow. You should try to find a way to keep the data always on the GPU, if you can. Does your card support GL_ARB_pixel_buffer_object by any chance?
reedbeta.com - developer blog, OpenGL demos, and other projects

#13 spacerat

    New Member

  • Members
  • PipPip
  • 25 posts

Posted 27 October 2006 - 12:56 AM

But in my case its a copy from GPU to GPU memory - the VBO is in the GPU and the FBO as well. (the readpixels pointer is 0)

I found that Readpixels causes a glFinish to wait all op's to be finished. Maybe this could be a reason - however, this still cant explain the 8 fps for rgb.

#14 Reedbeta

    DevMaster Staff

  • Administrators
  • 4979 posts
  • LocationBellevue, WA

Posted 27 October 2006 - 01:45 AM

I'm pretty sure it really does copy it all the way back to the CPU. See, what's happening is that the buffers are mapped into system memory, so they're accessible to the CPU (using DMA or some such trick). Then the CPU grabs every 32-bit word of the one buffer and moves it into the other buffer. ReadPixels isn't designed to instruct the GPU to perform a copy within itself, it really does force the CPU to touch every word. After all, as you said yourself, what else could explain the 8fps?
reedbeta.com - developer blog, OpenGL demos, and other projects

#15 Kenneth Gorking

    Senior Member

  • Members
  • PipPipPipPip
  • 911 posts

Posted 27 October 2006 - 02:16 AM

Shouldn't your glReadPixels(0, 0, tex_width, tex_height, GL_RGB, GL_FLOAT, 0) have GL_FLOAT_RGBA32_NV or GL_FLOAT_RGB32_NV instead of GL_RGB?
"Stupid bug! You go squish now!!" - Homer Simpson

#16 spacerat

    New Member

  • Members
  • PipPip
  • 25 posts

Posted 27 October 2006 - 02:29 AM

No,GL_RGBA/GL_RGB is correct. The other returns me an invalid operation.

And the copy should be done by GPU.
At least, that is what is stated here: http://www.gpgpu.org/developer/

I wrote my code by looking at the GPU Particle demo's Render to vertex array class.

I also dont have an answer for the 8fps.

But I found half a solution. If I setup my FBO/VBO as rgb, and use readpixels with rgba, then its quite fast - however, the data seems to be crashed by doing this..
Is it not just 1:1 copy?

Here is another link:
http://oss.sgi.com/p...ffer_object.txt

#17 Kenneth Gorking

    Senior Member

  • Members
  • PipPipPipPip
  • 911 posts

Posted 27 October 2006 - 02:59 AM

Hmm, that sucks... Can't really tell what could be wrong without the code youre using to set this up.
"Stupid bug! You go squish now!!" - Homer Simpson

#18 spacerat

    New Member

  • Members
  • PipPip
  • 25 posts

Posted 27 October 2006 - 03:25 AM

I think I know what's going on. I tried to create a FBO without alpha - but it seems the driver always creates an alpha channel as well. This means the readpixels has to convert from rgba to rgb.

#19 spacerat

    New Member

  • Members
  • PipPip
  • 25 posts

Posted 27 October 2006 - 02:26 PM

Oh - btw - there are a couple of shader instructions to pack/unpack 8-bit values etc. here a few : PK2H, PK2US, PK4B, PK4UB, PK4UBG, UP2H, UP2US, UP4B, UP4UB, UP4UBG

I'm not sure if they are provided by GLSL or HLSL, but they seem definitely useful

#20 Nils Pipenbrinck

    Senior Member

  • Members
  • PipPipPipPip
  • 597 posts

Posted 27 October 2006 - 04:49 PM

Reedbeta said:

glReadPixels is awful because it forces data to be copied from the GPU's memory back to system memory. The bus is designed to transfer data most efficiently in the opposite direction and so this kind of 'readback' is extremely slow. You should try to find a way to keep the data always on the GPU, if you can. Does your card support GL_ARB_pixel_buffer_object by any chance?

Btw, while glReadPixels is still a bad idea (due to the CPU/GPU interlock) the copy-performance is not as bad anymore. With PCI-Express cards readback is rather fast.





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users