ray tracing instances

Fd80f81596aa1cf809ceb1c2077e190b
0
rouncer 103 Jan 19, 2013 at 07:40

http://raytracey.blogspot.com.au/
at that url, theres 1000 cubes getting raytraced (actually, pathtraced!!!) in realtime, how does this work so fast? youd think it would have to be a separate intersection test per cube! (and 1000 would kill the computer for sure.)

11 Replies

Please log in or register to post a reply.

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Jan 19, 2013 at 08:52

Really good spatial subdivision structures I guess. Either a BVH or a kd-tree that cuts way down on the number of cubes you actually have to test for an individual ray.

There were some interesting articles from NVIDIA recently about parallel BVH traversal and BVH construction. They’re written about CUDA but the same principles should apply to OpenCL or to Direct3D11 compute shaders.

B5262118b588a5a420230bfbef4a2cdf
0
Stainless 151 Jan 19, 2013 at 10:05

Or distance fields.

http://www.iquilezles.org/www/articles/raymarchingdf/raymarchingdf.htm

6eaf0e08fe36b2c23ca096562dd7a8b7
0
__________Smile_ 101 Jan 19, 2013 at 16:28

I did some experiments with instancing ray tracer. In reality, ray tracing can benefit from instancing much more than regular rendering: it doesn’t matter which instance you intersect ray with.

I couldn’t achieve realtime, but total triangle count is very high–each dragon has about 1M tris!

raytracer.png

Fd80f81596aa1cf809ceb1c2077e190b
0
rouncer 103 Jan 20, 2013 at 00:04

}:+()___ (Smile), yes! thats what i figured, it wasnt spacial division its something else. So how did you make yours - even though it wasnt quite fast enough?

The closest I could get to it would be like what stainless said, with distance fields, and just detect the closest primitive…

6eaf0e08fe36b2c23ca096562dd7a8b7
0
__________Smile_ 101 Jan 20, 2013 at 03:38

Well, I have nothing fancy: one global hierarchy + local per object hierarchy. Each hierarchy is recursive list of AABB with about 100 boxes per branching, so total depth is very small. Such hierarchy will be terribly slow if used in per ray basis but I intersect groups of 32N rays with one AABB/triangle list, so I have reduced branching and good parallel performance.

Fd80f81596aa1cf809ceb1c2077e190b
0
rouncer 103 Jan 20, 2013 at 03:51

oh.. so it is spacial division… you know, i had this crazy idea that i could get instancing perfect as long as objects didnt overlap, using the old % mod trick on position, as long as you dont share space you only need to intersect the model once then modulate the position (it comes out like a grid at first) i was thinking you could work with that.

but often you do share space, so maybe its pointless…

B20d81438814b6ba7da7ff8eb502d039
0
Vilem_Otte 117 Jan 20, 2013 at 04:04

Instancing here keeps memory lower, but u still have to traverse whole hierarchy. Basically thats what instancing is about in ray tracing.

6eaf0e08fe36b2c23ca096562dd7a8b7
0
__________Smile_ 101 Jan 20, 2013 at 11:50

@rouncer

you know, i had this crazy idea that i could get instancing perfect as long as objects didnt overlap, using the old % mod trick on position, as long as you dont share space you only need to intersect the model once then modulate the position (it comes out like a grid at first) i was thinking you could work with that.

You still have to move to the next cell if you don’t found intersection in current. I tried grid-based instancing for grass but it’s slow especially near horizon where rays can transverse many cells without hitting anything.

Fd80f81596aa1cf809ceb1c2077e190b
0
rouncer 103 Jan 20, 2013 at 14:27

yeh, its a no free lunch…

B20d81438814b6ba7da7ff8eb502d039
0
Vilem_Otte 117 Jan 21, 2013 at 12:28

Finally i got some time to better answer.

Grid traversal is highly uneffective the further your ray goes (this can be mostly solved by SVO or such though).

Anyway instanced ray tracing isn’t that much faster than standard uninstanced one. I’d say it’s even slower than doing uninstanced rendering. Imagine scene, we have 1M triangle mesh. lets place this mesh on 5 positions in world.

How simple standard ray tracer works:
1. (possible to precompute) Create acceleration structure (F.e. Kd-tree) for whole scene (with F.e. SAH - to get realtime performance), leaves contains triangles
2. Traverse each ray through Kd-tree

How simple instanced ray tracer works:
1. (possible to precompute) Create acceleration structure for our mesh (Kd-tree with SAH), leaves contains triangles
2. (possible to precompute) Create acceleration structure of scene, where leaves countains another Kd-tree (the object’s one)
3. Traverse ray through scene hierarchy
4. If leaf is hit, then traverse ray through object hierarchy (transformed to local object space)

While point 1. will be a lot slower in standard ray tracer (for 5 meshes containing 1M tris, and ideal SAH Kd-tree builder O(N log N) - we create the Kd-tree at least like 5.6 times slower).
In importatnt point 2. the standard ray tracer will most likely own the instanced ray tracer (assuming we have good layout of Kd-tree in memory so we won’t lose that much on cache misses).

Note that in 1st case our demands for memory will be more than 5 times bigger! So this is what instancing saves. Performance - not so much!

We would though need to benchmark it.

6eaf0e08fe36b2c23ca096562dd7a8b7
0
__________Smile_ 101 Jan 21, 2013 at 15:23

Yes, the biggest advantage of instancing raytracer is memory footprint: my test scenes easily exceeded 1G triangle count. Second advantage is dynamic scenes: rebuilding of the objects’ acceleration structure much faster than rebuilding K-d tree of the whole scene.