Jump to content


Optomizing Ray Tracing


22 replies to this topic

#1 NomadRock

    Senior Member

  • Members
  • PipPipPipPip
  • 785 posts

Posted 12 September 2005 - 04:56 PM

I am working on optimizing my ray intersect tests for the ambient occlusion project I put up in the dev snapshot.

Do any ray tracing geeks here have some suggestions on how I can speed up this portion of the code? I am only interested in knowing if a ray intersects another triangle at all and I do not care if the hit is the closest one like a normal ray tracer would want.

Currently I am using a KD tree to store the triangles and I test against that. Perhaps someone knows of some good ways to choose the location of the splitting plane when constructing the tree so as to speed up traversal and minimize double tests?
Jesse Coyle

#2 anubis

    Senior Member

  • Members
  • PipPipPipPip
  • 2225 posts

Posted 13 September 2005 - 06:27 AM

One point of improvement is clipping the triangles against the nodes bounding boxes, instead of just using the triangles bounding box as a source for clipping poistions. Then it is often good to continue splitting nodes that seem to bring no improvement on the current level (for one or two more levels) because things might just seem stuck locally but will improve a lot later on.

It also really helps to indentify leaf nodes that share exactly the same triangles, which happens really a lot. This is a preprocessing step though, since it can take up quite a bit of time for larger triangle meshes.
If Prolog is the answer, what is the question ?

#3 bladder

    DevMaster Staff

  • Moderators
  • 1057 posts

Posted 13 September 2005 - 01:18 PM

I'm no raytracing buff, and I don't know *exactly* what you're doing, but it always seems that people forget to take advantage of using some form of temporal coherance when doing their intersection tests. I don't know how applicable it is to raytracing (or ambiant occlusion for that matter) but anyway, I'll explain it with frustum culling and hopefully it will spark a few ideas.

So basically what you do is instead of having a simple bounding box structure, you have some extra information in that structure. Like with frustum culling you have to check all six sides before an object can get accepted right. So if I'm using bounding spheres for instance, my structure is this:

struct {
  float radius;
  vector position;
  int frustumIndex;
};

Then whenever you test the sphere against the frustum, you record which plane it failed against. So the next time you do the test, you start testing from frustumIndex'th plane instead of from plane 0. So instead you doing 6 tests and having the sphere rejected on the sixth test, you'll probably have one test instead.

so:

int count = 0;
int idx = boundingBox.frustumIndex;
while( count++ < NumFrustumPlanes )
{
  if( ObjectIsInPositiveHalfSpace( boundingBox, FrustumPlane[idx] )
  {
    boundingBox.frustumIndex = idx;
    return OBJECT_OUTSIDE
  }
  idx++;
}

Anyway, I'm just shooting in the dark right now, had a very long day so if what I've said is jibberish then I appologize.

#4 NomadRock

    Senior Member

  • Members
  • PipPipPipPip
  • 785 posts

Posted 13 September 2005 - 01:36 PM

Well bladder, I am using rays so the tests aren't quite that involved, and the rays I shoot are going in random directions so the coherency between rays wont be all that high. To top it off I only ever do one frame of testing so time doesn't exist in my world :)

One thing that did occur to me as I was reading your post though, is that it is often the case that a few triangles completely obscure one vertex. Perhaps if I keep a list for the current vertex of each triangle that a ray has hit, I can those few first before I go into the tree. This would only help me in the cases where triangles occlude large portions of the visible area from the vertex and also only if my ray test hits that particular triangle and not some smaller triangle further away that just happened to also be in the path of the ray.
Jesse Coyle

#5 Arjen

    New Member

  • Members
  • Pip
  • 2 posts

Posted 13 September 2005 - 01:50 PM

I'd suggest you read Vlastimil Havran's PhD thesis "On Improving KD-Trees for Ray Shooting". It suggests various strategies for choosing the splitting plane, compares traversal strategies, and so on. Coherence exploitation techniques are discussed in the work of Ingo Wald et al.

#6 anubis

    Senior Member

  • Members
  • PipPipPipPip
  • 2225 posts

Posted 13 September 2005 - 03:14 PM

May I ask how you choose your splitting plane right now ? In other words are you using SAH ?
If Prolog is the answer, what is the question ?

#7 Altair

    Valued Member

  • Members
  • PipPipPip
  • 151 posts

Posted 13 September 2005 - 03:26 PM

NomadRock said:

Well bladder, I am using rays so the tests aren't quite that involved, and the rays I shoot are going in random directions so the coherency between rays wont be all that high.
You can introduce coherency to your algorithm by processing all parallel rays for multiple sampling points (vertices) at once. This maps nicely to GPU if you ever want to speed up your algorithm.

Cheers, Altair
"Only two things are infinite, the universe and human stupidity, and I'm not sure about the former." - Albert Einstein

#8 NomadRock

    Senior Member

  • Members
  • PipPipPipPip
  • 785 posts

Posted 15 September 2005 - 08:43 PM

Sorry for the slow response. Work has slowed on the optomization side to get the code working for multiple processors.

Altair, do you mean modifying the collision routines to check colissions with several rays of different origins at once? and how would this map to the GPU? The only decent ways I have seen to handle it on the GPU is to handle just one vertex at a time and check it's accessibility in the vertex shader.
Jesse Coyle

#9 Altair

    Valued Member

  • Members
  • PipPipPip
  • 151 posts

Posted 15 September 2005 - 10:07 PM

NomadRock said:

Altair, do you mean modifying the collision routines to check colissions with several rays of different origins at once? and how would this map to the GPU? The only decent ways I have seen to handle it on the GPU is to handle just one vertex at a time and check it's accessibility in the vertex shader.
Just render the depth of the model from given direction, then just compare depth of vertices from that direction with the final depth map to determine if the ray hits the model before it reaches the vertex (like shadow mapping). Repeat the process for multiple directions, cumulate the result to the vertex and you are done. You can probably use fairly low resolution depth map.

Btw, AO usually has some range for the ray and attenuation so you would need to modify the algorithm to support that.

Cheers, Altair
"Only two things are infinite, the universe and human stupidity, and I'm not sure about the former." - Albert Einstein

#10 NomadRock

    Senior Member

  • Members
  • PipPipPipPip
  • 785 posts

Posted 18 September 2005 - 05:52 AM

I have gotten it working pretty well, but I test about 100 triangles for each hit. This is clearly unacceptable when working with millions of verticies and 20-100 rays per vertex. My KD tree currently splits down to 13 levels or less and has 5 or less triangles in each leaf node with a few exceptions. This means I am probably checking every triangle in 20 or so leaf nodes where the ray intersects the node's bounding box but doesnt actually intersect any triangle in that bounding box.

The scene consists of a single very large and very complex model like a tractor modeled down to every last detail. I've read some papers dealing with alternate shapes for the nodes in a tree, but they seem to be geared toward closely following the outside of a roughly convex object, and I only have one object that needs to be split a bunch of times...

Anyone have any experience with this or any suggestions? Oh, and altair, i've already considered using the rendering approach, but this approach does not require the use of a graphics card so it can be run (hopefully) efficiently on headless clusters. These boxes may not even have graphics cards much less 3d accelerated cards.
Jesse Coyle

#11 Altair

    Valued Member

  • Members
  • PipPipPip
  • 151 posts

Posted 19 September 2005 - 12:55 PM

NomadRock said:

Oh, and altair, i've already considered using the rendering approach, but this approach does not require the use of a graphics card so it can be run (hopefully) efficiently on headless clusters. These boxes may not even have graphics cards much less 3d accelerated cards.
It would probably be much faster even if you run it in software.

Cheers, Altair
"Only two things are infinite, the universe and human stupidity, and I'm not sure about the former." - Albert Einstein

#12 NomadRock

    Senior Member

  • Members
  • PipPipPipPip
  • 785 posts

Posted 19 September 2005 - 02:39 PM

I have far more triangles than I have rays, I seriously doubt that rasterizing a triangle (at any ammount of pixel draw) would be faster than a ray/triangle intersect and since I would be forced to rasterize every triangle, whereas the ray tests simply stop once one of them is a hit...

As for using the graphics card, i've found a vertex/fragment shader implementation of AO by nvidia that I am going to use before I work out these other algorithms.

At this point I am really just looking for a way to build the KD tree so as to minimize the tri/ray tests that miss.
Jesse Coyle

#13 Altair

    Valued Member

  • Members
  • PipPipPip
  • 151 posts

Posted 19 September 2005 - 02:54 PM

NomadRock said:

I have far more triangles than I have rays, I seriously doubt that rasterizing a triangle (at any ammount of pixel draw) would be faster than a ray/triangle intersect and since I would be forced to rasterize every triangle, whereas the ray tests simply stop once one of them is a hit...

As for using the graphics card, i've found a vertex/fragment shader implementation of AO by nvidia that I am going to use before I work out these other algorithms.

At this point I am really just looking for a way to build the KD tree so as to minimize the tri/ray tests that miss.
It seems you didn't understood the algorithm I'm talking about. What you do currently is that for each vertex you spawn 100 rays and for each ray you test ~100 triangles in average according to your own words. Now, with rasterization approach you rasterize ALL triangles 100 times and that's it (i.e. for each ray direction). The rasterization approach is way more efficient because it's much faster to rasterize few pixels for a triangle than perform ray-mesh intersection test. Rasterization algorithm gives you only information if ray hit the mesh or not though, not the closest triangle like raytracing algo which you may need if you implement proper AO.

Cheers, Altair
"Only two things are infinite, the universe and human stupidity, and I'm not sure about the former." - Albert Einstein

#14 NomadRock

    Senior Member

  • Members
  • PipPipPipPip
  • 785 posts

Posted 19 September 2005 - 06:32 PM

Perhaps I am still totally missing the workings of your algorithm, but let me clarify in my own words what I think you mean. You want to rasterize each triangle individually from 100 different angles and then you do a simple 2d lookup and check the zdepth is in front of the origin of the ray for each ray?

This is indeed possible but requires enormous amounts of buffer space. I am dealing with models that have millions of triangles, even if each buffer is only one byte that means we are working with a megabyte of information. if we work on a very smal resolution of 256x256 pixels optomized to store only 4 bytes for the depth and assume a certain depth implies a miss, then we are well over 200GB of memory needed...

If plan on using only one buffer for each direction, then you loose information about triangles that have been occluded and cannot know the depth of those for rays that start further in than the front triangle.
Jesse Coyle

#15 Altair

    Valued Member

  • Members
  • PipPipPip
  • 151 posts

Posted 19 September 2005 - 06:56 PM

NomadRock said:

Perhaps I am still totally missing the workings of your algorithm
Yeah, seems so ;) The memory consumption of the algorithm isn't dependent on the number of triangles you have in the mesh or number of ray directions you have.

Okay, let me try to put it in different way: What I'm proposing is just the basic simple shadow mapping algorithm, but you run the algorithm 100 times (for lights from 100 directions), so you need a single shadow buffer (e.g. 256x256).

NomadRock said:

If plan on using only one buffer for each direction, then you loose information about triangles that have been occluded and cannot know the depth of those for rays that start further in than the front triangle
That's what I was saying. Anyway, trivial fix to that is to add depth peeling to the algorithm, but then you have to run multiple passes of the algorithm. With some creative thinking I'm sure you could figure out something more clever than that, particularly that you are doing it in SW ;)

Cheers, Altair
"Only two things are infinite, the universe and human stupidity, and I'm not sure about the former." - Albert Einstein

#16 NomadRock

    Senior Member

  • Members
  • PipPipPipPip
  • 785 posts

Posted 19 September 2005 - 11:21 PM

Ok, I hadn't heard of depth peeling so I had to read a couple papers on it. This would be required for the rasterizing algorithm to work and would make the problem possible. Unfortunately though it would not be as bad as one triangle per buffer, it would still be exceedingly wasteful of memory. for dense models consisting of several million triangles in a small space the memory required would clearly still be way too much. This approach could yeild considerably faster ray test times, it would require streaming the data to and from disk, and with how durastically different this is, I dont want to implement it all for testing when our deadline is less than 2 weeks away.
Jesse Coyle

#17 Altair

    Valued Member

  • Members
  • PipPipPip
  • 151 posts

Posted 20 September 2005 - 03:18 AM

NomadRock said:

Unfortunately though it would not be as bad as one triangle per buffer, it would still be exceedingly wasteful of memory. for dense models consisting of several million triangles in a small space the memory required would clearly still be way too much.
Uhm, what do you mean? It doesn't consume more memory than what's needed for a single depth buffer (e.g. 256x256x32bpp), which is way less than what your KD-tree is consuming for several million triangle mesh. In software rasterization implementation you should be able to easily implement the whole thing without depth peeling as well. Anyway, I wouldn't try to implement this algo if your deadline is in 2 weeks.

Cheers, Altair
"Only two things are infinite, the universe and human stupidity, and I'm not sure about the former." - Albert Einstein

#18 NomadRock

    Senior Member

  • Members
  • PipPipPipPip
  • 785 posts

Posted 20 September 2005 - 03:59 AM

a single depth buffer is not nearly enough space to store the information required for a random ray origin
Keep in mind that if my model is the surface waves on an ocean, and the rays in question go parallel to the water surface, then we have many waves overlapping each other from that view. Rays with an origin above the highest peak or below the lowest trough will be trivially rejected. Rays between the peaks and troughs will be a trivial hit as long as the origin is within the accepted distance of the first wave, but rays with an origin past the first wave will not be able to tell how far it is to the next wave with just 32 bits at that pixel. It is possible to just keep track of the furthest points and this would work if we didn't care about distance to the intersection point and just that it eventually hits some wave.

We have already decided to care about rejecting rays that while do intersect the geometry at some point do not within a reasonable distance. This provides a bit better looking results while reducing the search space within the KD tree.

This is something to keep in mind if we decide to go back to not caring about the distance. Currently the KD tree uses about 4MB of memory on a million triangle mesh. In order to preserve decent detail I doubt 256x256 would be enough resolution for any ray on the model but assuming it is we are working with 100 directions * 4 bytes * 256x256 = 25MB, at 512 it is 100MB and 400MB at 1024 which while quite large is still managable on a workstation, and 100 frames would render in a second or so.

We loose the ability to test ray length, but this could be a useable option for quick tests.

Do I understand you now?
Jesse Coyle

#19 Altair

    Valued Member

  • Members
  • PipPipPip
  • 151 posts

Posted 20 September 2005 - 12:57 PM

NomadRock said:

Do I understand you now?
Oh dear mother of frustration ;) No you don't. Why would you ever want to have different depth buffer for each direction? You just keep reusing the same buffer for each direction and you end up using only 256KB of memory (256x256x4). And like I said, with SW implementation you CAN get the first hit of the ray by using rasterization algorithm WITHOUT depth peeling.

Cheers, Altair
"Only two things are infinite, the universe and human stupidity, and I'm not sure about the former." - Albert Einstein

#20 NomadRock

    Senior Member

  • Members
  • PipPipPipPip
  • 785 posts

Posted 20 September 2005 - 02:10 PM

you keep telling me I can do it, and apparently I dont understand how, perhaps you could go into detail how you are getting the first hit?

True, the multiple buffer thing was my oversight, as long as you took the time to iterate through all the objects for each direction you could rasterize once per iteration and reuse the buffer. Interestingly enough, this would also allow for arbitrarily precice calculations (assuming perfect renders) because you could just let the thing calculate for as many ray directions as you have time.
Jesse Coyle





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users