Jump to content


KD Tree delima!!!


35 replies to this topic

#21 Vilem Otte

    Valued Member

  • Members
  • PipPipPipPip
  • 345 posts

Posted 15 July 2012 - 06:12 PM

So I can point you to Havran's PhD thesis for Kd-trees - http://dcgi.felk.cvu.../phdthesis.html - it's quite huge, but descriptive. But what might be even better is http://pbrt.org/pbrt-2ed-chap4.pdf - PBRT chapter 4 online for free (not just SAH Kd-trees, but also BVHs and grids are described there) ... I bought the book and it's quite awesome - if you're into ray tracing like me, definitely buy it, it's worth it.

Actually you can speed up SAH Kd-tree builder with few tricks, and it can even generate trees in realtime - like adding threading support and pre-sorting bounding edges, and so on... for me optimizing it further is mostly "try & fail" with profiler.

Though if you really need more speed, you can use idea of "Binned SAH building" that is used in BVHs or BIHs and adapt it to Kd-trees - I haven't tried it yet at Kd-trees (only at BVHs and QBVHs), but there is paper from Saarland http://graphics.cg.u...-GPU-kdTree.pdf describing a bit about Binning SAH Kd-tree building (on Gpu - but actually Gpu and Cpu doesn't differ that much and I think that getting idea from this is not that hard)

So there is actually quite huge research how to find best splitting plane. Anyway if you're not targeting ray tracing - but physics, consider median splitting instead of SAH - it's almost instant to build (especially with pre-sorting the primitives), because it actually splits geometry to two halves, which might be better suited for physics (and maybe even for frustum testing - but one would need to profile this).
My blog about game development (and not just game development) - http://gameprogramme...y.blogspot.com/

If you don't know how to speed up application, go "roarrrrrr!", hit the compiler with the club and use -O3 :D

#22 Alienizer

    Member

  • Members
  • PipPipPipPip
  • 435 posts

Posted 15 July 2012 - 08:12 PM

So in overall, is kd-tree faster (at render time, not build time) than bvh, bih, octree and the rest?

#23 Vilem Otte

    Valued Member

  • Members
  • PipPipPipPip
  • 345 posts

Posted 15 July 2012 - 09:45 PM

It actually depends...

Kd-trees are winning a lot when you're tracing random rays through hierarchy (e.g. path tracing) or generally incoherent rays... on the other hand they're losing a bit when testing against packets (because they're not well suited for packet tracing), although note that packet tracing is definitely not a way to go...

BVHs are quite good for coherent rays (but they're winning by just a few percents over decent Kd-trees), and overally their tree quality is worse (because doing full SAH on them is way slower than doing it on Kd-trees).

So actually Kd-trees produce better quality trees well suited for incoherent rays, random rays and good for packets (they're losing just a little).

BIHs are a mix between Kd-trees and BVH - you build them as BVH thus getting faster build times, but traversing them as packet-friendly Kd-trees - thus being very good for ray packets. On the other hand BIHs are outperformed by Kd-trees when your rays are a bit incoherent (they're losing even with reflective surfaces, not even mentioning path tracing).

What is quite a concurrent for Kd-trees is actually QBVH - tetrary BVH. It gives quite good results even for incoherent rays, they're a lot faster to build (although they're still just approximating SAH with binning - we can do the same for Kd-trees and they're not winning that much in building times).

So it really depends - even basic grid or nested grid is usable acceleration structure ... somewhere. Actually SAH Kd-trees have one huge advantage - they give very good quality for almost any kind of scene (where BVH or QBVH dies on long thin triangles - Kd-trees can handle them, even triangle fans can be handled when giving correct numbers to SAH computation).
My blog about game development (and not just game development) - http://gameprogramme...y.blogspot.com/

If you don't know how to speed up application, go "roarrrrrr!", hit the compiler with the club and use -O3 :D

#24 Alienizer

    Member

  • Members
  • PipPipPipPip
  • 435 posts

Posted 15 July 2012 - 10:21 PM

oh, I see, I'll stay with your kd-tree code! Thanks for explaining, I appreciate all the help :)

#25 Alienizer

    Member

  • Members
  • PipPipPipPip
  • 435 posts

Posted 23 July 2012 - 11:49 PM

Is it normal that the kd-tree space goes up to 5GB of RAM for only 400K triangles? It's not always the case, but on some scenes, it does that, and it takes much longer to build. Sometimes it goes realy fast on 900K triangles and takes only 700MB !!??

#26 Reedbeta

    DevMaster Staff

  • Administrators
  • 5305 posts
  • LocationBellevue, WA

Posted 24 July 2012 - 12:34 AM

That sounds extremely excessive. :) How big is your node structure? It should be quite small, I'm guessing around 16-32 bytes. Even assuming you keep building the tree all the way down to ~1 triangle per node, that would only give 27-55 MB for 900K triangles (~1.8M nodes). And you shouldn't keep subdividing that far anyway; stop when there's some small number of triangles in the node (~8-64 I'm guessing), and that will cut the memory by 90-98%. (The memory for a tree is dominated by its leaf nodes, so every extra level of subdivision roughly doubles the memory for the tree.)
reedbeta.com - developer blog, OpenGL demos, and other projects

#27 Alienizer

    Member

  • Members
  • PipPipPipPip
  • 435 posts

Posted 24 July 2012 - 01:30 AM

The node is a qword. For the depth, I use that of pbrt...

MaxDepth = int(8 + 1.3 * log2(PolyCount));

and for the min poly per leaf I use...

log2(PolyCount)

So for a 900k poly, it gives me a depth of about 33 and 19 or so poly min per leaf. Is that too much you think?

#28 Reedbeta

    DevMaster Staff

  • Administrators
  • 5305 posts
  • LocationBellevue, WA

Posted 24 July 2012 - 03:41 AM

By a "qword" you mean 8 bytes, I'm guessing? So then if you're getting 700 MB of allocation, that's 92 million nodes. Since that's far more nodes than triangles, I'd say something's very wrong! Can you make your code count the number of nodes alloc'd to see if this figure is correct?

For 900K triangles with 20 triangles per leaf node, that's 45K leaf nodes, so 90K nodes altogether. At 8 bytes per node, that's only ~700KB of memory. Hmm...did you by any chance confuse 700KB for 700MB?
reedbeta.com - developer blog, OpenGL demos, and other projects

#29 Alienizer

    Member

  • Members
  • PipPipPipPip
  • 435 posts

Posted 24 July 2012 - 04:04 AM

I meant 700MB Reedbeta, not 700KB. The nodes are 8 bytes, but each leaf holds a list of triangle indexes, so at start, the left node may have 200k tri, and the right node 500k. then the right node get split, say 250k on each site, all the way down. Unless I'm doing this all wrong as usual.

I just loaded a 310,000 poly model, and that created 211,881 nodes and that took about 300MB (not KB).

#30 Reedbeta

    DevMaster Staff

  • Administrators
  • 5305 posts
  • LocationBellevue, WA

Posted 24 July 2012 - 05:56 AM

I still think that number of nodes is quite a bit too high compared to the number of triangles, but at least it's more reasonable, not like 92 million or something. But at 8 bytes per node, 211,881 nodes is only 1.6 MB. So where is your 300 MB coming from?
reedbeta.com - developer blog, OpenGL demos, and other projects

#31 Alienizer

    Member

  • Members
  • PipPipPipPip
  • 435 posts

Posted 24 July 2012 - 12:08 PM

Each nodes is 8 bytes, but what about the list of triangles they hold if they are leafs?

#32 Reedbeta

    DevMaster Staff

  • Administrators
  • 5305 posts
  • LocationBellevue, WA

Posted 24 July 2012 - 05:01 PM

You have 310K triangles, so if you have 300MB of triangle data that's about a kilobyte per triangle. I assume your triangle struct is not actually that big. So either the memory is coming from somewhere else or you're generating a hell of a lot of split triangles.

I'm not asking you because I want to play a guessing game; you should instrument your code and find out where the memory is going! :)
reedbeta.com - developer blog, OpenGL demos, and other projects

#33 Vilem Otte

    Valued Member

  • Members
  • PipPipPipPip
  • 345 posts

Posted 24 July 2012 - 08:57 PM

Just for information - it might help a little:
Sample scene - Sponza Atrium
One of the worst scenes for either Kd-trees or BVHs, so you'll get really bad trees (plenty of useless nodes) for it! Although this is actually the point why we test on it. I strongly recommend trying Sponza...

Ray tracer - old GPU ray tracer, mono-tracing in OpenCL application, no actual optimization (just for general testing of tree quality) - e.g. simple ray tracing algorithm. So this one is just brain-dead simple implementation of ray tracer. Wald triangles used, I think...

Total Triangle Count - 67462 tris

Criteria: SAH Splitting, Primitive criteria = 64, Sah intersect cost = 80, Sah traversal cost = 1, Sah empty bonus = 0.5
Created 6081 nodes
Size of nodes = 48648 bytes (less than 50 KiB)
Size of all Kd-tree data = 512696 bytes (0.5 MiB)
Building done in 906 ms (1 core building algorithm posted here)
Average performance = 6.32 MRays/s

Criteria: SAH Splitting, Primitive criteria = 16 (e.g. if there is 16 or less triangles, create node), Sah intersect cost = 80, Sah traversal cost = 1, Sah empty bonus = 0.5
Created 114775 nodes
Size of nodes = 918200 bytes (e.g. less than 1 MiB)
Size of all Kd-tree data = 3606472 bytes (e.g. a bit more than 3 MiB)
Building done in 1401 ms (1 core building algorithm posted here)
Average performance = 7.46 MRays/s

Criteria: SAH Splitting, Primitive criteria = 4, Sah intersect cost = 80, Sah traversal cost = 1, Sah empty bonus = 0.5
Created 5626177 nodes
Size of all nodes = 45009416 bytes (e.g. 45 MiB)
Size of all Kd-tree data = 112243156 bytes (e.g.112 MiB)
Building done in 6848 ms (1 core building algorithm posted here)
Average performance = 3.72 MRays/s
My blog about game development (and not just game development) - http://gameprogramme...y.blogspot.com/

If you don't know how to speed up application, go "roarrrrrr!", hit the compiler with the club and use -O3 :D

#34 Reedbeta

    DevMaster Staff

  • Administrators
  • 5305 posts
  • LocationBellevue, WA

Posted 24 July 2012 - 09:22 PM

Nice numbers Vilem! This illustrates what I was saying about building the tree too far down...see how the size explodes exponentially when you decrease the maximum triangles per node. Quite dramatic really, cutting the triangles per node by 4x could make the tree as much as 40x bigger!
reedbeta.com - developer blog, OpenGL demos, and other projects

#35 Alienizer

    Member

  • Members
  • PipPipPipPip
  • 435 posts

Posted 24 July 2012 - 10:22 PM

Well, I don't think we can calculate the size of the tree just by using the 8 bytes node! I mean, what about the list of triangles each one holds? On each split, 2 sides get a chunk, So for the very first split of 310K triangles, the minimum (not including overlapping triangles) would be 155K each sides. At 4 bytes as an index, that's already 1.2MB just for the first split, as a minimum, for the first depth only. Or is it me that's not doing it right?

#36 Alienizer

    Member

  • Members
  • PipPipPipPip
  • 435 posts

Posted 24 July 2012 - 10:29 PM

For mine, the Sponza using the pbrt way...

227,500 triangles
depth = 18
min tri per node = 31
30MB total size for the tree

Seems to work fine for the Sponza, but on others, it just doesn't. at all.





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users