Instancing a good idea in this case?
Posted 26 April 2011 - 10:25 PM
I've run this through OpenGL Profiler and indeed, the calls to glDrawArrays (as I remember) take up the majority of the time.
Each pin is loaded into a VBO and then called. There are 60 x 49 pins in that image, each one has 180 faces (meshlab doesnt tell me exact triangles) but adding it up and it is pretty close to the figure given by gDebugger.
In addition to drawing the colour step, there is also a step for linear depth (in order to setup some SSAO). At the moment, im getting around 12-15fps. I'd like to get it to 30.
I thought about trying for a non linear depth buffer and reading the depth buffer and colour buffer from the FBO in one go to save a pass but commenting out the depth pass for now seemed to make little difference (oddly).
I tried 'pseudo instancing' (i think) by passing in the transformation matrix as a texture to my vertex shader. This didnt give that much in the way of speedup.
As OSX has limited support (annoyingly) the only method I can see to get more speed is to use GL_ARB_instanced_arrays somehow but Im not exactly sure if this will help or improve things. There may be something else I can do to get things a little faster but I'm not sure what. I've gotten the triangles per pin down about as far as I can but I'm not sure what else is best. Any thoughts chaps? Cheers
Posted 26 April 2011 - 11:03 PM
That being said, I'm not sure why you need 180 faces for each pin. They look like they're just cylinders, right? How many divisions do you really need around the circumference of the cylinder? I'm sure you could get away with 16, or even fewer; that's 32 triangles around the circumference, plus 28 for the ends, 60 tris total. That's 1/3 your number.
I'd guess after draw calls, your next bottleneck is going to be vertices, so you'll need to focus on vert reduction and LOD.
Posted 28 April 2011 - 12:58 PM
This is with 100 faces as oppose to 180. You can begin to see the polygon outlines which is not nice. Also, you can see i've reduced the overall number of pins. This double view runs at 30fps. With the original number of pins, 180 faces vs 100 faces makes almost no difference in speed. They both go at around 10fps. Its almost as if there is a cutoff point, beyond which you get no speed up or change at all.
Posted 28 April 2011 - 03:36 PM
Anyway as you're getting some higher polycounts - 1.8M, it approaches edge, where you might consider going CPU raytracing (or GPU raytracing, or CPU+GPU raytracing), and as this is static scene, a good (means SAH) KD-tree could be a win for you. No need for some SSAO, you could use true ambient occlusion then of course.
Although I can't say I'm recommending it, because I don't know the purpose of your application and target hardware.
Note that if you're not memory bounded, then in ray tracing 1M tris is just around 2x slower to compute than 1K tris, while in rasterization it is 1000x slower. If you'd use larger scene (some 250M triangles) raytracing could get faster than rasterization graphics in magnitudes.
If you don't know how to speed up application, go "roarrrrrr!", hit the compiler with the club and use -O3 :D
Posted 28 April 2011 - 06:51 PM
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users