Jump to content


Render to Vertexbuffer in OpenGL (HowTo)


20 replies to this topic

#1 spacerat

    New Member

  • Members
  • PipPip
  • 25 posts

Posted 28 October 2006 - 09:16 AM

Kenneth Gorking said:

Hmm, maybe someone should put together a small tutorial on render-to-vertexbuffers and include the methods from this thread to show how it can alternatively be done, if it's not supported by ones hardware...

I dont know about the new render to vertex array from ATI with DX9,
so I describe the two actual render to vertex array methods for
NVidia cards I know as a kind of Tutorial.

Mihail121 said:

I didnt quite understand... are you showing us stuff or are you asking for help?

The example-source below is working (I just finished my own implementation)
and since there are many traps and things that should be known before
implementing render to vertex array, I hope this little howto can help others.

Comments for improvements and other issues are of course welcome.

Overview:

Method 1: Texture read in the Vertex Shader.

Advantages:
  • no copy to VBO necessary
  • flexible texture lookup
  • consumes less memory

Disadvantages:
  • only supported by NVidia cards
  • just very few special float textures supported (GL_RGBA_FLOAT32_ATI)
  • doesnt run on older GFX hardware
  • slow ( 30M vertices/s on a GF6600GT just by reading the texture. if the
    texture should be additionally rendered by the frame buffer object (FBO)
    its even slower )

Method 2: Copy to Pixel Buffer Object (PBO)


Advantages:
  • Multiple formats should be available for the PBO (not just float RGBA)
  • Faster ( about 58M vertices/s
    for render FBO+copy FBO to PBO+render VBO
    on a GF6600GT. If copy wouldnt be required,
    the speed would be 86Mverts/s)

Disadvantages:
  • Higher memory consumption
  • Extra copy necessary
  • Multiple FBO render targets must have same format
    (like for rendering to a float rgb buffer for positions and
    a byte rgb buffer for normals/binormals is not possible in
    one rendering pass)

:excl: Older GFX-cards dont have multiple rendertargets!

Implementation:

Method 1. Texture read in the Vertex Shader.

Step 1: Create FBO

Create an FBO and attach the texture that will hold the vertex data.
( see further down the text how to create )

Step 2: Create VBO

Create a vertex buffer object (VBO) holding texture coordinates for
referencing the FBO texture (e.g. a position VBO with
2 float-coordinates per vertex)

( see further down the text how to create )

Step 3: Render the FBO

( see further down for details )

Step 4: Render the VBO

Here the vertex coordinates from the VBO are
used to lookup the FBO texture containing the coordinates

Here an example for the vertex program code:

// vertex.position is our 

// index to the real vertex array


!!ARBvp1.0\n

OPTION NV_vertex_program3;

PARAM mvp[4] = { state.matrix.mvp };

TEMP real_position;

TEX real_position, vertex.position, texture[0], 2D;

DP4 result.position.x, mvp[0], real_position;

DP4 result.position.y, mvp[1], real_position;

DP4 result.position.z, mvp[2], real_position;

DP4 result.position.w, mvp[3], real_position;

END\n;

Allowed texture formats for using a texture in the
vertex shader:

Nvidia Documentation said:

Vertex textures are bound using the standard texture calls, using the GL_TEXTURE_2D texture targets. Currently only the GL_LUMINANCE_FLOAT32_ATI and GL_RGBA_FLOAT32_ATI formats are supported for vertex textures. These formats contain a single or four channels of 32-bit floating point data, respectively. Be aware that using ther texture formats or unsupported filtering modes may cause the driver to drop back to software vertex processing, with a commensurate drop in interactive performance. Below an example:

GLuint vertex_texture;
glGenTextures(1, &vertex_texture);
glBindTexture(GL_TEXTURE_2D, vertex_texture);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST_MIPMAP_NEAREST);
glTexImage2D(GL_TEXTURE_2D, 0, GL_LUMINANCE_FLOAT32_ATI, width, height, 0,GL_LUMINANCE, GL_FLOAT, data);

Method 2. Copy to Pixel Buffer Object (PBO).

Step 1: Create a VBO as pixel buffer object:

Here a sample:
GLuint vbo_points_handle;

glGenBuffersARB(1, &vbo_vertices_handle);

glBindBufferARB(GL_PIXEL_PACK_BUFFER_EXT, vbo_vertices_handle);

glBufferDataARB(GL_PIXEL_PACK_BUFFER_EXT, vbo_points.size()*4*sizeof(float),NULL, GL_DYNAMIC_DRAW_ARB );

Step 2: Create a FBO.

Multiple render targets can be helpful for writing
vertex/normal/binormal at the same time.

Here an example to create an FBO:

GLuint fb_handle;

glGenFramebuffersEXT(1,&fb_handle); 

fbo_tex_vertices = NewFloatTex(tex_width,tex_height,0);

fbo_tex_normals = NewFloatTex(tex_width,tex_height,0);

Here an example to create a float texture:

GLuint NewFloatTex(int width,int height,char* buffer,bool alpha=true)

{

	GLuint type = GL_TEXTURE_2D; // alternative: GL_TEXTURE_RECTANGLE_ARB;

	GLuint tex_handle;

	glGenTextures (1, &tex_handle);

	glBindTexture(type,tex_handle);

	

	// set texture parameters

	glTexParameteri(type, GL_TEXTURE_MIN_FILTER, GL_NEAREST);

	glTexParameteri(type, GL_TEXTURE_MAG_FILTER, GL_NEAREST);

	glTexParameteri(type, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);

	glTexParameteri(type, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);

	

	// define texture with floating point format

	GLuint format0 = GL_RGB_FLOAT32_ATI;

	GLuint format1 = GL_FLOAT_RGB32_NV;

	GLuint format2 = GL_RGB32F_ARB;

	GLuint iformat = GL_RGB;

	if (alpha)

	{

		format0 = GL_RGBA_FLOAT32_ATI;

		format1 = GL_FLOAT_RGBA32_NV;

		format2 = GL_RGBA32F_ARB;

		iformat = GL_RGBA;

	}

	glTexImage2D(type,0,format0,width,height,0,iformat,GL_FLOAT,buffer);

	// ATI format supported ?

	if (glGetError() != GL_NO_ERROR) 

	{

		glTexImage2D(type,0,format1,width,height,0,iformat,GL_FLOAT,buffer);

		// NVidia format supported ?

		if (glGetError() != GL_NO_ERROR) 

			glTexImage2D(type,0,format2,width,height,0,iformat,GL_FLOAT,buffer);

	}

	glBindTexture(type,0);	

	return tex_handle;

}

:excl: Rect textures can be used for GPGPU but not
for copying FBO -> PBO !! Only GL_TEXTURE_2D
:excl: Even if an RGB texture is created, it may be RGBA internally
=> slow glReadPixels if RGB is the destination format due to format conversion

Step 3: Render FBO

The input textures contain the necessary data
(vertex position etc) to compute the outputs.

Example to bind the buffer

glBindFramebufferEXT(GL_FRAMEBUFFER_EXT,fbo_handle);


glFramebufferTexture2DEXT(GL_FRAMEBUFFER_EXT, GL_COLOR_ATTACHMENT0_EXT, GL_TEXTURE_2D, fbo_tex_vertices, 0); 

glFramebufferTexture2DEXT(GL_FRAMEBUFFER_EXT, GL_COLOR_ATTACHMENT1_EXT, GL_TEXTURE_2D, fbo_tex_normals, 0); 


GLenum dbuffers[] = 

{GL_COLOR_ATTACHMENT0_EXT, GL_COLOR_ATTACHMENT1_EXT};

glDrawBuffers(2, dbuffers);

Example to render the buffer using a quad

glDisable(GL_DEPTH_TEST);

glDepthMask(GL_FALSE);

glDisable(GL_CULL_FACE);


glActiveTextureARB( GL_TEXTURE0_ARB );

glBindTexture(GL_TEXTURE_2D,tex_vertices);


glActiveTextureARB( GL_TEXTURE1_ARB );

glBindTexture(GL_TEXTURE_2D,tex_normals);


glMatrixMode(GL_PROJECTION); 

glPushMatrix();

glLoadIdentity(); 

gluOrtho2D(0.0,tex_width,0.0,tex_height);

glMatrixMode(GL_MODELVIEW); 

glPushMatrix();

glLoadIdentity(); 


int viewport[4];

glGetIntegerv(GL_VIEWPORT, viewport);

glViewport(0,0,tex_width,tex_height); 


glClampColorARB(GL_CLAMP_VERTEX_COLOR_ARB, GL_FALSE);


glBegin(GL_QUADS); 

glColor3f(1,1,1);

glMultiTexCoord2f( GL_TEXTURE0, 0.0, 0.0);

glVertex2f(0, 0);

glMultiTexCoord2f( GL_TEXTURE0, 1.0, 0.0);

glVertex2f(tex_width,0);

glMultiTexCoord2f( GL_TEXTURE0, 1.0, 1.0);

glVertex2f(tex_width, tex_height);

glMultiTexCoord2f( GL_TEXTURE0, 0.0, 1.0);

glVertex2f(0, tex_height);

glEnd();


// ---> Insert Code to copy FBO to PBO here <--- //


glViewport(viewport[0],viewport[1],viewport[2],viewport[3]); 

glDrawBuffer(GL_BACK);


glEnable(GL_DEPTH_TEST);

glDepthMask(GL_TRUE);

glEnable(GL_CULL_FACE);


glPopMatrix();

glMatrixMode(GL_PROJECTION); 

glPopMatrix();

glMatrixMode(GL_MODELVIEW); 

:excl: gluOrtho2D is necessary ! If missing, glReadPixels wont work
:excl: glClampColorARB is only necessary in case of drawing other
than by gl_FragData[0..n] in the shader, to allow unclamped color values

To write to multiple targets, here an
example Fragment Shader (GLSL):

//uniform sampler2DRect texPoints; example for a rect-texture

//uniform sampler2D texColors; example for a normal texture

//varying vec2 textureCoord;


void main(void)

{

  // vec4 vertex = texture2DRect(texPoints,textureCoord) ;

  // vec4 color   = texture2D(texPoints,textureCoord) ;

  gl_FragData[0] = vec4 (0.0 , 1.0 , 0.0 , 1.0) ;

  gl_FragData[1] = vec4 (1.0 , 1.0 , 0.0 , 1.0) ;

}

Step 4: Copy from FBO to PBO

Example:

glReadBuffer(GL_COLOR_ATTACHMENT0_EXT);

glBindBufferARB(GL_PIXEL_PACK_BUFFER_EXT, vbo_vertices_handle);

glReadPixels(0, 0, tex_width, tex_height, GL_RGBA, GL_FLOAT, 0);


glReadBuffer(GL_COLOR_ATTACHMENT1_EXT);

glBindBufferARB(GL_PIXEL_PACK_BUFFER_EXT, vbo_normals_handle);

glReadPixels(0, 0, tex_width, tex_height, GL_RGBA, GL_FLOAT, 0);


glReadBuffer(GL_NONE);

glBindBufferARB(GL_PIXEL_PACK_BUFFER_EXT, 0 );

( of course vbo_vertices.size() == tex_width * tex_height )

:excl:
If glReadPixels is too slow, different formats
(like RGBA->RGB) might be a problem, since
simple copy doesnt work!

Step 5: Render VBO as usual:

glBindBufferARB(GL_ARRAY_BUFFER_ARB, vbo_vertices_handle);

glEnableClientState(GL_VERTEX_ARRAY);

glVertexPointer  ( 4, GL_FLOAT,4*sizeof(float), (char *) 0);


glBindBufferARB(GL_ARRAY_BUFFER_ARB, vbo_normals_handle);

glEnableClientState(GL_NORMAL_ARRAY);

glNormalPointer(GL_FLOAT, 4*sizeof(float), (char *) 0 );


glDrawArrays( GL_TRIANGLES, 0,vbo_vertices.size() );


glDisableClientState(GL_NORMAL_ARRAY);

glDisableClientState(GL_VERTEX_ARRAY);

References

http://oss.sgi.com/p...ffer_object.txt
http://developer.nvi...x_textures.html
http://wiki.delphigl...p/GLSL_Partikel (german)
http://www.mathemati...al.html#arrays3
http://download.deve...es/samples.html

#2 Mihail121

    Senior Member

  • Members
  • PipPipPipPip
  • 1059 posts

Posted 28 October 2006 - 12:12 PM

I didnt quite understand... are you showing us stuff or are you asking for help?

#3 Kenneth Gorking

    Senior Member

  • Members
  • PipPipPipPip
  • 939 posts

Posted 28 October 2006 - 04:32 PM

spacerat said:

I dont know about the new render to vertex array from ATI with DX9

It is described in detail in the ATI SDK, including 13 samples on: animation, cloth, IK, NPatches, footprints in snow, particle collision, particle system, particle sorting, shadow volume generation, ocean simlation, water physics and terrain morphing. They even run on my crummy radeon 9700 :)
"Stupid bug! You go squish now!!" - Homer Simpson

#4 spacerat

    New Member

  • Members
  • PipPip
  • 25 posts

Posted 28 October 2006 - 09:33 PM

Kenneth Gorking said:

They even run on my crummy radeon 9700

Yes, it seems that render to vertex-buffer is not a special feature of the hardware - its just the software which didn't support it directly so far, isn't it ? :)

I hope there will also be some good GL extensions soon.
Hm.. but after taking a look at the next GL version it might still take a while..
http://www.khronos.o...006/OpenGL_BOF/

But I wonder why r2vb is not possible on NVidia cards at the moment..

I mean, is it not just about setting one pointer of the renderbuffer in the GPU ?

Btw., does anybody have a benchmark of the ATI crowd simulation with 10.000 characters ? How many vertices/s can be transformed on a newer card ?

#5 spacerat

    New Member

  • Members
  • PipPip
  • 25 posts

Posted 31 October 2006 - 07:48 AM

Update:

If the above implementation should be applied for
an older graphics card, such as GF FX5200, then
the FBO texture type should be
GL_FLOAT_RGB32_NV, with
GL_TEXTURE_RECTANGLE_ARB instead of
GL_TEXTURE_2D.

The texture coordinates will change the range from
(0..1 , 0..1) to ( 0..tex_width , 0..tex_height )
and the fragment program will have to use
sampler2DRect and texture2DRect

If only one rendertarget is available, sometimes
gl_FragColor must be used instead of gl_FragData[].
In this case, it might be required to switch off
color clamping (glClampColorARB(..))

:excl:
The implementation on top was done using
Windows XP with 81.98 NVidia Drivers. On the
actual 91.47 drivers there seems to be a bug with
glReadPixels for PBO's. Even there is no GL-error, it
is not copying anything from the FBO to the
PBO/VBO.

#6 spacerat

    New Member

  • Members
  • PipPip
  • 25 posts

Posted 02 November 2006 - 10:27 AM

:excl:

Another important thing is to call glFinish() at
the end of each frame !
Since glReadPixels is working asynchronous, there
will be strong frame-rate deviations otherwise.

Here an example for rendering about
1M vertices on a GF FX 5200

Without glFinish() :

Average Fps:11, 85.833 ms
Time Range: 38 - 152 ms

Frame: 73 Time:59 ms = 16.949 fps
Frame: 74 Time:152 ms = 6.579 fps
Frame: 75 Time:38 ms = 26.316 fps
Frame: 76 Time:95 ms = 10.526 fps
Frame: 77 Time:130 ms = 7.692 fps
Frame: 78 Time:76 ms = 13.158 fps
Frame: 79 Time:152 ms = 6.579 fps
Frame: 80 Time:53 ms = 18.868 fps
Frame: 81 Time:98 ms = 10.204 fps
Frame: 82 Time:105 ms = 9.524 fps
Frame: 83 Time:84 ms = 11.905 fps

With glFinish() :

Average Fps:11, 85.167 ms
Time Range: 92 - 102 ms

Frame: 93 Time:96 ms = 10.417 fps
Frame: 94 Time:97 ms = 10.309 fps
Frame: 95 Time:90 ms = 11.111 fps
Frame: 96 Time:91 ms = 10.989 fps
Frame: 97 Time:91 ms = 10.989 fps
Frame: 98 Time:92 ms = 10.870 fps
Frame: 99 Time:91 ms = 10.989 fps
Frame: 100 Time:89 ms = 11.236 fps
Frame: 101 Time:102 ms = 9.804 fps
Frame: 102 Time:100 ms = 10.000 fps
Frame: 103 Time:94 ms = 10.638 fps
Frame: 104 Time:94 ms = 10.638 fps
Frame: 105 Time:94 ms = 10.638 fps
Frame: 106 Time:92 ms = 10.870 fps

#7 kinzeron

    New Member

  • Members
  • Pip
  • 1 posts

Posted 03 November 2006 - 11:20 AM

Are u sure it's a driver bug? I haven't tested the new driver yet, maybe u need to repport the bug to nvidia? (or did they did it on purpose :))

#8 spacerat

    New Member

  • Members
  • PipPip
  • 25 posts

Posted 03 November 2006 - 12:32 PM

I just tested everything with the new NVPerf-Kit, where the driver gives you additional debug infos, but I couldnt figure out what was going wrong. I asked the NVidia customer service a few days ago, if there have been important changes, but I dont have an exact answer yet. I will post an update when I know what happend or what needs to be fixed.

#9 spacerat

    New Member

  • Members
  • PipPip
  • 25 posts

Posted 10 November 2006 - 08:44 AM

On the GF8, an extra copy operation from FBO to PBO/VBO won't be necessary anymore. EXT_texture_buffer_object seems to solve this issue.

http://developer.nvi...engl_specs.html

#10 someone13

    New Member

  • Members
  • PipPip
  • 17 posts

Posted 17 November 2006 - 03:11 AM

Not to sound dick or anything, but is there a reason why this isn't in the DevMaster Wiki under OpenGL?

Great post by the way.

#11 vutit

    New Member

  • Members
  • Pip
  • 5 posts

Posted 06 May 2007 - 08:11 PM

It's great ! i've tried it, and it works on my NVidia 5700.
By the way, i don't understand why my program continue to use CPU when i render the scene.
In my main loop i've done the copy of FOB PBO to VBO, then draw VBO, normaly, it uses only GPU, isn't it ?

#12 Reedbeta

    DevMaster Staff

  • Administrators
  • 5306 posts
  • LocationBellevue, WA

Posted 07 May 2007 - 02:50 AM

You'll still see 100% CPU usage if your program is busy waiting. Basically it's looping and continually asking the GPU if it's done before rendering the next frame.
reedbeta.com - developer blog, OpenGL demos, and other projects

#13 vutit

    New Member

  • Members
  • Pip
  • 5 posts

Posted 07 May 2007 - 07:24 AM

Thanks, that's right, but when i do some test with different texture resolution, i realize that a texture which higher resolution uses more CPU than usual.
The bottleneck must be in the function copy FBO to PBO/VBO

glReadBuffer(GL_COLOR_ATTACHMENT0_EXT);
glBindBufferARB(GL_PIXEL_PACK_BUFFER_EXT, vbo_vertices_handle);
glReadPixels(0, 0, imgWidth, imgHeight, GL_RGBA, GL_FLOAT, 0);

glReadBuffer(GL_NONE);
glBindBufferARB(GL_PIXEL_PACK_BUFFER_EXT, 0 );


And draw VBO

glBindBufferARB(GL_ARRAY_BUFFER_ARB, vbo_vertices_handle);
glEnableClientState(GL_VERTEX_ARRAY);
glVertexPointer(4, GL_FLOAT, 4 * sizeof(float), (char *) 0);
glDrawArrays(GL_POINTS, 0, imgWidth * imgHeight);
glDisableClientState(GL_VERTEX_ARRAY);


Because when i comment them, the CPU stays idle. But i don't understand why ?

#14 Reedbeta

    DevMaster Staff

  • Administrators
  • 5306 posts
  • LocationBellevue, WA

Posted 07 May 2007 - 03:02 PM

Oh, you're using glReadPixels...that might very well engage the CPU to copy all the data, depending on the driver. Are you sure you're using the latest updated drivers for your 5700?
reedbeta.com - developer blog, OpenGL demos, and other projects

#15 vutit

    New Member

  • Members
  • Pip
  • 5 posts

Posted 10 May 2007 - 01:24 PM

Yes, i've downloaded the latest driver of NVidia (93.71), and this is the same with my Quadro FX 3400. I've tried some tests with R2VB, VTF, and Vertex mapping. I've realize that R2VB is a little faster on my Quadro FX but it has not a constant FPS and Vertex mapping works very well (i don't know why ?)

#16 Reedbeta

    DevMaster Staff

  • Administrators
  • 5306 posts
  • LocationBellevue, WA

Posted 10 May 2007 - 02:08 PM

What do you mean by vertex mapping?
reedbeta.com - developer blog, OpenGL demos, and other projects

#17 vutit

    New Member

  • Members
  • Pip
  • 5 posts

Posted 11 May 2007 - 07:05 AM

I've found it in a tutorial of Vertex Buffer Object http://www.g-truc.ne...icle/vbo-en.pdf
With this technique, i send data from RAM to server OpenGL, and at each frame, i manipulate directly this data with my program as a client

//In init
// Allocate vertex buffer
size = imgWidth * imgHeight * 4 * sizeof(GLfloat);
glGenBuffersARB(1, &vbo_vm);
glBindBufferARB(GL_ARRAY_BUFFER, vbo_vm);
glBufferData(GL_ARRAY_BUFFER, size, 0, GL_DYNAMIC_DRAW);


//In my main loop
//Update VBO
glBindBufferARB(GL_ARRAY_BUFFER_ARB, vbo_vm);
glBufferData(GL_ARRAY_BUFFER, 4 * imgWidth * imgHeight * sizeof(GLfloat), 0, GL_DYNAMIC_DRAW);
GLvoid* gpu_vertices = glMapBuffer(GL_ARRAY_BUFFER_ARB, GL_WRITE_ONLY);
memcpy(gpu_vertices, ram_vertices, 4 * imgWidth * imgHeight * sizeof(GLfloat));
glUnmapBuffer(GL_ARRAY_BUFFER);

//Draw VBO as usual
glEnableClientState(GL_VERTEX_ARRAY);
glVertexPointer(4, GL_FLOAT, 4 * sizeof(GLfloat), (char *) 0);
glDisable(GL_LIGHTING);
glColor3f(1, 1, 1);
glDrawArrays(GL_POINTS, 0, imgWidth * imgHeight);
glEnable(GL_LIGHTING);
glDisableClientState(GL_VERTEX_ARRAY);

#18 Reedbeta

    DevMaster Staff

  • Administrators
  • 5306 posts
  • LocationBellevue, WA

Posted 11 May 2007 - 07:24 AM

Oh, okay. You mean mapping the vertex buffer into system memory. It's not usually called 'vertex mapping'.

Anyways, for a vertex buffer object, the driver might keep a copy in system memory (lets you read from it quickly), and when writing to it, the driver just sends it to the video card via DMA. For render-to-vertex-buffer, it's possible the driver copies the data to system memory and then back to the card...it shouldn't really, but who knows?
reedbeta.com - developer blog, OpenGL demos, and other projects

#19 vutit

    New Member

  • Members
  • Pip
  • 5 posts

Posted 11 May 2007 - 03:13 PM

Thanks Reedbeta, i think you're right because this it the same way that i think, i've done a test, and i've realized that my program uses an extra memory than what i've allocated, but who knows ?
Anyways, i'll use VTF, it's simple to use, consumes less memory and rather efficient for futur graphic card

#20 spacerat

    New Member

  • Members
  • PipPip
  • 25 posts

Posted 04 June 2007 - 04:07 PM

I just released the source of my project (Deformation Styles).
There, render to vertexbuffer is one of the main parts.
You can download everything here:
http://www.xinix.org...ublications.htm





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users