KD Tree delima!!!

88dc730f0f71e55be39de0ad103bd9ff
0
Alienizer 109 Jul 12, 2012 at 03:35

I have implemented the kd-tree found in pbrt v2, and it works, kind of. At the part where you build the tree, you sort the list of point from the triangle list for the current axis to get the split pos to test against. I’m getting a few triangles here and there that are missing from the render (invisible). if I change the sorting algo from quick sort to another, then I get different sets of missing triangles. I check over and over and over, and the code is identical to that of pbrt, even the kd ray intersect. The model I used is not corrupted in any ways, openGL displays it nicely, and it does this on all models. It always seems to come from the way the list is sorted. Anyone may have a clue as to what this problem maybe? Thanks!

35 Replies

Please log in or register to post a reply.

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Jul 12, 2012 at 03:53

Try to find a minimal case that triggers the bug, i.e. a small model with only a few triangles. Then step through the code in the debugger and see what’s going on. With a small number of triangles the amount of data should be small enough that you can figure out the problem. If the problem disappears with smaller models that’s information too.

88dc730f0f71e55be39de0ad103bd9ff
0
Alienizer 109 Jul 12, 2012 at 04:37

I tied to do that Reedbeta, but I don’t know what to look for! I just did a cube with a cut in the center, on a plane, and the weird thing is, one triangle is fully missing, the other is half way missing!!! So I thought it’s my ray/intersect, but if I sort the list like I’ve explained, then I get other triangles instead, and sometimes everything is fine, it all depend on how the list is sorted! It’s driving me crazy! and it’s the same code as in pbrt.

Ceee4d1295c32a0c1c08a9eae8c9459d
0
v71 105 Jul 12, 2012 at 06:23

Well, i think its something related to the way you cut the planes, put a flag in each triangle to see if it has passed the plane test ,and print the result in a kind of list report, if the triangle hasn’t passed the test, and you know it would, you know where the problem is.
The sorting issue may be related to the order of intersection test, i tend to think that its not a problem related to the construction of the tree, i think its a kind of geometric bug.

B20d81438814b6ba7da7ff8eb502d039
0
Vilem_Otte 117 Jul 12, 2012 at 10:17

It’s kinda hard to search for a bug out of description (e.g. if you can’t find it - I can try to help you finding it) - could you please show Kd-tree building code plus Ray traversal code?

Anyway PBRTv2 Kd-tree building code is okay (I’ve implemented it also and it works), plus optimized it quite a lot (so actually it can be used for dynamic scenes). There can also be precsion issue in ray traversal.

The actuall issue will most-likely be precision issue (in my opinion), or branch that has been forgotten, or pointers as always. :)

88dc730f0f71e55be39de0ad103bd9ff
0
Alienizer 109 Jul 13, 2012 at 02:30

I checked all what you all suggested and still have that problem. One question…

In the tree build, after going through the loop to get the best split pos, we need to go through the sorted list of triangle points (which is 2*tot triangles, for bmin and bmax) for that node and populate the left and right nodes. Do the total of them suppose to be >= to the total number of triangles for that node? I get less sometimes and when I don’t get less, the render is perfect!!!

Ceee4d1295c32a0c1c08a9eae8c9459d
0
v71 105 Jul 13, 2012 at 08:16

If i have understood correctly , this is not possible each node must contain a subset of the entire ‘triangle soup’ , if you take this concept recursiley, in each node you have to find a subset of the source triangle list, if it happens that you have an erroneous condition, this might means 3 things :
1) your plane intersection isn’t correctly working
2) pointer you pass in the recursive function are corrupted in some ways.
3) the split lists for triangle sublists pointers are corrupted / not passed correctly
Start with a simple triangle layout and proceed from there

88dc730f0f71e55be39de0ad103bd9ff
0
Alienizer 109 Jul 13, 2012 at 14:55

That’s what I thought, the sun of both nodes should be at least the same as the parent node! But then again, if we are looking only for bmin of the left of the split, and bmax on the right of the split, that might explain it. But, if that’s a bug, why is it working for all of you? …

    // Classify primitives with respect to split
    int n0 = 0, n1 = 0;
    for (int i = 0; i < bestOffset; ++i)
        if (edges[bestAxis][i].type == BoundEdge::START)
            prims0[n0++] = edges[bestAxis][i].primNum;
    for (int i = bestOffset+1; i < 2*nPrimitives; ++i)
        if (edges[bestAxis][i].type == BoundEdge::END)
            prims1[n1++] = edges[bestAxis][i].primNum;

Is there a specific way to sort the list? I sort it by point on the current axis…

sort(&edges[axis][0], &edges[axis][2*nPrimitives]);
Ceee4d1295c32a0c1c08a9eae8c9459d
0
v71 105 Jul 13, 2012 at 15:57

I am not using this implementation, i have implemented a kd tree many years ago, and i never touched again that code.
a question, why do you sort ? is it necessary in my implementation i never sorted, i suppose this code is using an euristic ( sp?? ) for choosing the best split plane, it could be possibile that the code gives erroneous results and chooses a plane which is way off the current node and in some ways it consider the entire triangle list to be passed down as a sub node , just a thought..

88dc730f0f71e55be39de0ad103bd9ff
0
Alienizer 109 Jul 13, 2012 at 17:02

It is the code from pbrt v2 and here is how it works…

for an axis, buld a list of min/max points from the triangles, this list is 2*ncount because it contain 2 points per triangle. Sort the list, then go through the list and compute the best split pos. After that, do “Classify primitives with respect to split” as the code I posted above.

How do you do yours? I just can’t find a good way, there is always some kind of error, and I know it’s not on my part, I think it’s the algo I use that’s not correct.

Ceee4d1295c32a0c1c08a9eae8c9459d
0
v71 105 Jul 13, 2012 at 22:55

Here is mine :
start with a bounding box containing all the triangles ;
choose a split plane , i don’t perform any heuristic , normally planes are cycled , x, y, z, and so forth
after i have choosen a split plane i perform an half space test, i build two lists , one for each sub node , the triangles which passed the test for upper subspace go in the first list, the other in the second if i have a crossing triange i split it and put one half in the dirst and the second half in the other.
I do this because i put all the triangles in each node in a vbo buffer so i don’t have to perform an horrible ‘if’ for each triangle.
Then i recurse.
hope i am clear

88dc730f0f71e55be39de0ad103bd9ff
0
Alienizer 109 Jul 14, 2012 at 00:12

Very clear thank you. But what if I can’t split a triangle? Do I simply include it in both children?

Ceee4d1295c32a0c1c08a9eae8c9459d
0
v71 105 Jul 14, 2012 at 09:57

Yes, you have , the drawback is that at rendering time, you have to check if the triangle has been already rendered to avoid drawing two times, this adds another stage, scan trhough all the triangles, see if a flag has been set , if not draw and set to true if yes , skip it.
I found this very inefficient so, at the cost of putting more triangles in both node, i skip totally the conditions checking at runtime, because if you don’t you need to build vertex buffer object during the rendering time, for me its a no no.
I split triangles, create vbo in an offline phase and at rendering time i just call the vbos

88dc730f0f71e55be39de0ad103bd9ff
0
Alienizer 109 Jul 14, 2012 at 16:06

ok thanks v71 :)

B20d81438814b6ba7da7ff8eb502d039
0
Vilem_Otte 117 Jul 14, 2012 at 16:51

Okay, so I’ll try to point out full answer…

#v71
You start correctly - with bounding box containing all the triangles, but some kind of heuristics is necessary (and even you must have some) - basically you do “In the middle of current axis” and loop axes… this will end up in very bad tree thus kinda destroying purpose of Kd-trees. In practice you have to use either true correct Surface Area Heuristics (further SAH), or some kind of approximation of SAH.

Also splitting triangles is not necessary - you assign it to both sub-cells and it won’t get rendered double times (correct traversal ends on splitting plane!).

#Alienizer
So i’ll try to explain how to build and traverse Kd-trees on my code…

Basically the core procedure (or method - it’s based upon whether your code is procedural or object oriented) is looking like this:

void CKDTree::RecursiveBuildTree(unsigned int mNodeID, vector<CTriangle> *mCurrent, vector<CAABB> *mPrimBounds, CAABB mBounds, unsigned int *mPrimIDs, unsigned int mPrimCount, CKDTreeBoundEdge *mEdges[3], unsigned int mRecursionDepth)

Where:
mNodeID - ID to current node we’re working on
mCurrent - pointer to whole triangle list
mPrimBounds - pointer to all triangle bounds from triangle list
mBounds - current node bounds
mPrimIDs - IDs of triangles in current node
mPrimCount - number of triangles in current node NOTE: mPrimIDs and mPrimCount could be merged
mEdges - bounding edge of all triangle bounding boxes for each axis and for each object there are 2 - start and end bouding edge
mRecursionDepth - self-explaining itself

So basically in Kd-tree building you do:
1.) Check whether the current node can be leaf - if it can, make leaf out of it:

// Check whether we hit conditions for leaf
if(mPrimCount <= this->mMaxPrimitivesInNode || mRecursionDepth > this->mMaxRecursionDepth)
{
// mTriangleIDs is std::vector containing just uint indices to triangles std::vector, grab pointer to end (e.g. where triangle ids for this node starts)
unsigned int mPointer = (unsigned int)mTriangleIDs.size();

// Push triangle ids from this node to indicies std::vector
for(unsigned int i = 0; i < mPrimCount; ++i)
{
  mTriangleIDs.push_back(mPrimIDs[i]);
}

// Init leaf node with primitive count mPrimCount at mPointer position from mTriangleIDs buffer
this->mNodes[mNodeID].InitLeaf(mPointer, mPrimCount);

// voila!
return;
}

Now if we don’t get to this code, we need to split the node…
2.) Splitting the node

2.1) Finding the split axis
There is about a zillion ways how to find sometimes better, sometimes worse splitting axis. I’ll put here correct SAH splitting:

// Get longest axis
int mAxis = mBounds.GetLongestAxis();
// Split position - if we couln't find a split with good sah cost, let's split in the middle of longest axis
float mSplit = mBounds.mMin[mAxis] + (mBounds.mMax[mAxis] - mBounds.mMin[mAxis]) * 0.5f;
// Some variables for SAH
unsigned int mSahBestAxis = 0;
int mSahBestPos = -1;
int mSahBestCost = 2000000000;
int mSahBelow, mSahAbove;
float mTotalArea = mBounds.GetSurfaceArea();
float mInvTotalArea = 1.0f / mTotalArea;
CVector3 mDimensions = mBounds.mMax - mBounds.mMin;
// Loop through all axis
for(unsigned int i = 0; i < 3; ++i)
{
// Note: this part can be pre-computed to speed up Kd-tree build
// Let's get bounding edges (start and end) from triangle bounding boxes
for(unsigned int j = 0; j < mPrimCount; ++j)
{
  int mCurrentPrimitive = mPrimIDs[j];
  mEdges[mAxis][2 * j] = CKDTreeBoundEdge((*mPrimBounds)[mCurrentPrimitive].mMin[mAxis], mCurrentPrimitive, true);
  mEdges[mAxis][2 * j + 1] = CKDTreeBoundEdge((*mPrimBounds)[mCurrentPrimitive].mMax[mAxis], mCurrentPrimitive, false);
}
sort(&mEdges[mAxis][0], &mEdges[mAxis][2 * mPrimCount]);

// Now we find the best splitting pos for current axis
// Rule: It will be on bounding edge of one of the primitives
mSahBelow = 0;
mSahAbove = mPrimCount;

// So loop through all bounding edges and compute SAH on left and right, store lowest price one
for(unsigned int j = 0; j < mPrimCount * 2; ++j)
{
  if(mEdges[mAxis][j].mType == CKDTreeBoundEdge::END)
  {
   --mSahAbove;
  }

  float mSplitCandidate = mEdges[mAxis][j].mPosition;

  if(mSplitCandidate > mBounds.mMin[mAxis] && mSplitCandidate < mBounds.mMax[mAxis])
  {
   unsigned int mOtherAxis0 = (mAxis + 1) % 3;
   unsigned int mOtherAxis1 = (mAxis + 2) % 3;
  
   float mBelowArea = 2.0f * (mDimensions[mOtherAxis0] * mDimensions[mOtherAxis1] + (mSplitCandidate - mBounds.mMin[mAxis]) * (mDimensions[mOtherAxis0] + mDimensions[mOtherAxis1]));
   float mAboveArea = 2.0f * (mDimensions[mOtherAxis0] * mDimensions[mOtherAxis1] + (mBounds.mMax[mAxis] - mSplitCandidate) * (mDimensions[mOtherAxis0] + mDimensions[mOtherAxis1]));
  
   float mBelow = mBelowArea * mInvTotalArea;
   float mAbove = mAboveArea * mInvTotalArea;
  
   float mEmptyBonus = (mBelow == 0.0f || mAbove == 0.0f) ? this->mSahEmptyBonus : 0.0f;
  
   float mCost = this->mSahTraversalCost + this->mSahIntersectCost * (1.0f - mEmptyBonus) * (mBelow * mSahBelow + mAbove * mSahAbove);
  
   if(mCost < mSahBestCost)
   {
    mSahBestCost = mCost;
    mSahBestAxis = mAxis;
    mSahBestPos = j;
   }
  }

  if(mEdges[mAxis][j].mType == CKDTreeBoundEdge::START)
  {
   ++mSahBelow;
  }
}
// Continue on next axis
mAxis = (mAxis + 1) % 3;
}
// Store best splittng axis and position (it's our split plane)
mAxis = mSahBestAxis;
mSplit = mEdges[mSahBestAxis][mSahBestPos].mPosition;

2.2) Assigning primitives to left or right or left and right sub-nodes
So let’s continue…

// Allocate left and right + set counters to zero (I know I could use std::vector - but it's slower ;) - found out through heavy profiling of this code)
unsigned int* mLeft = new unsigned int[(*mCurrent).size()];
unsigned int mLeftCount = 0;
if(!mLeft)
{
// Out of memory
}
unsigned int* mRight = new unsigned int[(*mCurrent).size()];
unsigned int mRightCount = 0;
if(!mRight)
{
// Out of memory
}
// Note: Those out of memory checks are necessary
// I used this Kd-tree on F.e. Power plant model, which is quite huge and I actually got out of memory sometimes :D
// Not a problem now though - bought more RAM :D - 16GB is enough for now :P
// Now loop through all primitives and get their min and max bound on splitting axis
for(unsigned int i = 0; i < mPrimCount; ++i)
{
float mMax = max(max((*mCurrent)[mPrimIDs[i]].mVerts[0][mAxis], (*mCurrent)[mPrimIDs[i]].mVerts[1][mAxis]), (*mCurrent)[mPrimIDs[i]].mVerts[2][mAxis]);
float mMin = min(min((*mCurrent)[mPrimIDs[i]].mVerts[0][mAxis], (*mCurrent)[mPrimIDs[i]].mVerts[1][mAxis]), (*mCurrent)[mPrimIDs[i]].mVerts[2][mAxis]);

// If their max is lesser than splitting pos, they-re left-only
if(mMax <= mSplit)
{
  mLeft[mLeftCount] = mPrimIDs[i];
  mLeftCount++;
}
// If their min is greater than spliting pos, they-re right-only
else if(mMin >= mSplit)
{
  mRight[mRightCount] = mPrimIDs[i];
  mRightCount++;
}
// Otherwise assign it to BOTH - left AND right
else
{
  mLeft[mLeftCount] = mPrimIDs[i];
  mLeftCount++;

  mRight[mRightCount] = mPrimIDs[i];
  mRightCount++;
}
}

2.3) Now calculate sub-nodes bounding boxes + recursively build them
So…

// Calculate left and right bounds
CAABB mLeftBounds = mBounds;
CAABB mRightBounds = mBounds;
mLeftBounds.mMax[mAxis] = mSplit;
mRightBounds.mMin[mAxis] = mSplit;
// Build left sub-tree (left sub-node is at node+1 address)
this->RecursiveBuildTree(mNodeID + 1, mCurrent, mPrimBounds, mLeftBounds, mLeft, mLeftCount, mEdges, mRecursionDepth + 1);
// Next child will be right bound - now we can init current node as interior
unsigned int mAboveChild = this->mNextFreeNode;
this->mNodes[mNodeID].InitInterior(mAxis, mAboveChild, mSplit);
 
// Build right sub-tree
this->RecursiveBuildTree(mAboveChild, mCurrent, mPrimBounds, mRightBounds, mRight, mRightCount, mEdges, mRecursionDepth + 1);
// Clean used data
delete [] mLeft;
delete [] mRight;

So that’s all!

3.) The Traversal

Now the another important thing is the traversal. Because we have some triangles duplicated (note: splitting is actually very wrong way to do this! - we don’t always have to work with triangles, and splitting F.e. NURBS surfaces is a nightmare), we need to write correct traversal!

// Stack traversal element, contains pointer to current node, start and end of ray
struct CKDTreeStackElem
{
  const CKDTreeNode *mNode;
  float mNear, mFar;
};
// KDtree intersect routine
int CKDTree::Intersect(vector<CTriangle> *mTriangles, const CRay &mRay, CVector3 *mBarycentric, float *mDepth, int *mID)
{
// Start and end of ray in start (yeah, no near/far clipping planes like in bad rasterizers! - just intervals!)
float mNear = 0.0f;
float mFar = INFINITY;

// Early out those rays, that miss the whole scene (bounding box of it)
if(!this->mBounds.IntersectRay(mRay, &mNear, &mFar))
{
  return 0;
}

// Stack for stack traversal
CKDTreeStackElem mStack[KD_TREE_STACK_SIZE];
int mStackPos = 0;

// Temp variables
int mHit = 0;

float mTempDistance = mFar;
CVector3 mTempBarycentric;

// We start at root node
const CKDTreeNode* mNode = &this->mNodes[0];

// While the node to traverse exists (it could be while(1), but it's necessary to handle EMPTY scenes)
while(mNode != NULL)
{
  // If current nearest hit is closer than ray start, this node won't have nearest hit
  if((*mDepth) < mNear)
  {
   break;
  }
 
  // If node is interior node
  if(!mNode->IsLeaf())
  {  
   // get it's splittng axis and hit-distance to splitting plane hit
   int mAxis = mNode->GetSplitAxis();
   float mHitPosPlane = (mNode->GetSplitPosition() - mRay.mOrigin[mAxis]) * mRay.mInvDirection[mAxis];
  
   // Order whether left or right goes first and second (in terms of distance)
   const CKDTreeNode* mFirst;
   const CKDTreeNode* mSecond;
  
   int mBelowFirst = (mRay.mOrigin[mAxis] < mNode->GetSplitPosition()) || (mRay.mOrigin[mAxis] == mNode->GetSplitPosition() && mRay.mDirection[mAxis] >= 0.0f);
  
   // Left first, then right
   if(mBelowFirst)
   {
    mFirst = mNode + 1;
    mSecond = &this->mNodes[mNode->GetAboveChild()];
   }
   // Right first, then left
   else
   {
    mFirst = &this->mNodes[mNode->GetAboveChild()];
    mSecond = mNode + 1;
   }
  
   // Only first
   if(mHitPosPlane > mFar || mHitPosPlane <= 0.0f)
   {
    mNode = mFirst;
   }
   // Only second
   else if(mHitPosPlane < mNear)
   {
    mNode = mSecond;
   }
   // Otherwise push second on stack (with correct ray start/end), and continue in first
   else
   {
    mStack[mStackPos].mNode = mSecond;
    mStack[mStackPos].mNear = mHitPosPlane;
    mStack[mStackPos].mFar = mFar;
    ++mStackPos;
   
    mNode = mFirst;
    mFar = mHitPosPlane;
   }
  }
  // Otherwise we got a leaf node
  else
  {
   // Just test against all primitives (the 1 primitive branch is just optimization for really huge scenes with tiny geometrym you can drop it out)
   unsigned int mNumPrims = mNode->GetPrimitivesCount();
  
   if(mNumPrims == 1)
   {
    const CTriangle &mTri = (*mTriangles)[mNode->mPrimitive];
   
    if(mTri.Intersect(mRay, &mTempBarycentric, &mTempDistance))
    {
     if(mTempDistance < (*mDepth) && mTempDistance > 0.0f)
     {
      (*mDepth) = mTempDistance;
      *mBarycentric = mTempBarycentric;
      *mID = mNode->mPrimitive;
      mHit = 1;
     }
    }
   }
   else
   {
    //const unsigned int *mPrims = mNode->mPrimitives;
    const unsigned int *mPrims = &this->mTriangleIDs[mNode->mPrimitivePtr];
   
    for(unsigned int i = 0; i < mNumPrims; ++i)
    {
     const CTriangle &mTri = (*mTriangles)[mPrims[i]];
    
     if(mTri.Intersect(mRay, &mTempBarycentric, &mTempDistance))
     {
      if(mTempDistance < (*mDepth) && mTempDistance > 0.0f)
      {
       (*mDepth) = mTempDistance;
       *mBarycentric = mTempBarycentric;
       *mID = this->mTriangleIDs[mNode->mPrimitivePtr + i];
       mHit = 1;
      }
     }
    }
   }
 
   // If there are nodes on stack
   if(mStackPos > 0)
   {
    // Pop out
    --mStackPos;
    mNode = mStack[mStackPos].mNode;
    mNear = mStack[mStackPos].mNear;
    mFar = mStack[mStackPos].mFar;
   }
   // Otherwise end traversal
   else
   {
    break;
   }
  }
}

// Return result
return mHit;
}

Okay, this should be all… I hope it can help a little :)

88dc730f0f71e55be39de0ad103bd9ff
0
Alienizer 109 Jul 14, 2012 at 19:17

WOW :) :D :lol: :) Vilem Otte

This works so beautifully and fast, it fixed my problems of missing triangles in the render too!

I don’t know how to thank you for taking the time to explain in such details and provide the code too! That’s really, really, really nice of you……… me–>thank_you * INFINITY;

88dc730f0f71e55be39de0ad103bd9ff
0
Alienizer 109 Jul 14, 2012 at 19:32

One more thing. Is there some kind of algo to determine the tree depth and the min numb of tri per nodes?

Ceee4d1295c32a0c1c08a9eae8c9459d
0
v71 105 Jul 14, 2012 at 23:10

@Villeim Otte , when you say its not necessary to split the triangle, are you referring to a frustum intersection test ar a ray test ?
to be honest i have not understood how the triangle spanning more nodes cannot be drawn two times if both nodes are inside the frustum .
can you explain me that please ?

B20d81438814b6ba7da7ff8eb502d039
0
Vilem_Otte 117 Jul 15, 2012 at 10:57

#v71
I’m mostly referring to ray test of course, as mentioned if you don’t work just with triangle ray tracer, splitting NURBS or such surfaces can be a nightmare (although I’m not saying that it’s impossible). Another problem is, that splitting triangle generates more than 2 triangles in general case (and this is bad - F.e. on Power plant model or such - it can generate quite a huge amount of triangles - which is very bad).

Although as for frustum testing - working with indices like me (btw. first it were pointers, but I needed to move the algorithm to OpenCL, where passing pointers is quite illogical - so the IDs are the way to go) helps a lot. It’s quite easy and fast to remove duplicated IDs of triangles after getting triangle list to-test (to-draw). Although note that this probably doesn’t count for rasterizers, where splitting would be necessary (or maybe polygon offset?) - it’s better to use VBOs - which implies it’s better to use hierarchical bounding volumes (BVH) rather than Kd-trees.

#Alienizer
Minimal number of tree nodes can be checked and stored during construction (global variables - or give CKDTree class a variable to store it).
The tree depth also (max function for recursive variable in argument of procedure/method and some variable in class/global variable to store the depth).

E.g. you can actually pre-compute (it’s actually just store, not actual computing) these variables during construction.

Ehm, and btw. no need to thank so much :) I just posted the code + little explanation I wrote some time ago and have been working on it since (today it’s actually quite complete - no memory leaks, not so slow building times, good resulting kd-trees) + i’ve got quite nice results with GPU ray tracing in these Kd-trees. :)

Ceee4d1295c32a0c1c08a9eae8c9459d
0
v71 105 Jul 15, 2012 at 15:41

@Villeim , yes i thought they were two different things, can you point me at some paper / theory for computing the best split plane ??
i am interested to revamp those level compilers of mine

B20d81438814b6ba7da7ff8eb502d039
0
Vilem_Otte 117 Jul 15, 2012 at 18:12

So I can point you to Havran’s PhD thesis for Kd-trees - http://dcgi.felk.cvut.cz/home/havran/phdthesis.html - it’s quite huge, but descriptive. But what might be even better is http://pbrt.org/pbrt-2ed-chap4.pdf - PBRT chapter 4 online for free (not just SAH Kd-trees, but also BVHs and grids are described there) … I bought the book and it’s quite awesome - if you’re into ray tracing like me, definitely buy it, it’s worth it.

Actually you can speed up SAH Kd-tree builder with few tricks, and it can even generate trees in realtime - like adding threading support and pre-sorting bounding edges, and so on… for me optimizing it further is mostly “try & fail” with profiler.

Though if you really need more speed, you can use idea of “Binned SAH building” that is used in BVHs or BIHs and adapt it to Kd-trees - I haven’t tried it yet at Kd-trees (only at BVHs and QBVHs), but there is paper from Saarland http://graphics.cg.uni-saarland.de/fileadmin/cguds/Users/cygnus/Danilewsk-GPU-kdTree.pdf describing a bit about Binning SAH Kd-tree building (on Gpu - but actually Gpu and Cpu doesn’t differ that much and I think that getting idea from this is not that hard)

So there is actually quite huge research how to find best splitting plane. Anyway if you’re not targeting ray tracing - but physics, consider median splitting instead of SAH - it’s almost instant to build (especially with pre-sorting the primitives), because it actually splits geometry to two halves, which might be better suited for physics (and maybe even for frustum testing - but one would need to profile this).

88dc730f0f71e55be39de0ad103bd9ff
0
Alienizer 109 Jul 15, 2012 at 20:12

So in overall, is kd-tree faster (at render time, not build time) than bvh, bih, octree and the rest?

B20d81438814b6ba7da7ff8eb502d039
0
Vilem_Otte 117 Jul 15, 2012 at 21:45

It actually depends…

Kd-trees are winning a lot when you’re tracing random rays through hierarchy (e.g. path tracing) or generally incoherent rays… on the other hand they’re losing a bit when testing against packets (because they’re not well suited for packet tracing), although note that packet tracing is definitely not a way to go…

BVHs are quite good for coherent rays (but they’re winning by just a few percents over decent Kd-trees), and overally their tree quality is worse (because doing full SAH on them is way slower than doing it on Kd-trees).

So actually Kd-trees produce better quality trees well suited for incoherent rays, random rays and good for packets (they’re losing just a little).

BIHs are a mix between Kd-trees and BVH - you build them as BVH thus getting faster build times, but traversing them as packet-friendly Kd-trees - thus being very good for ray packets. On the other hand BIHs are outperformed by Kd-trees when your rays are a bit incoherent (they’re losing even with reflective surfaces, not even mentioning path tracing).

What is quite a concurrent for Kd-trees is actually QBVH - tetrary BVH. It gives quite good results even for incoherent rays, they’re a lot faster to build (although they’re still just approximating SAH with binning - we can do the same for Kd-trees and they’re not winning that much in building times).

So it really depends - even basic grid or nested grid is usable acceleration structure … somewhere. Actually SAH Kd-trees have one huge advantage - they give very good quality for almost any kind of scene (where BVH or QBVH dies on long thin triangles - Kd-trees can handle them, even triangle fans can be handled when giving correct numbers to SAH computation).

88dc730f0f71e55be39de0ad103bd9ff
0
Alienizer 109 Jul 15, 2012 at 22:21

oh, I see, I’ll stay with your kd-tree code! Thanks for explaining, I appreciate all the help :)

88dc730f0f71e55be39de0ad103bd9ff
0
Alienizer 109 Jul 23, 2012 at 23:49

Is it normal that the kd-tree space goes up to 5GB of RAM for only 400K triangles? It’s not always the case, but on some scenes, it does that, and it takes much longer to build. Sometimes it goes realy fast on 900K triangles and takes only 700MB !!??

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Jul 24, 2012 at 00:34

That sounds extremely excessive. :) How big is your node structure? It should be quite small, I’m guessing around 16-32 bytes. Even assuming you keep building the tree all the way down to \~1 triangle per node, that would only give 27-55 MB for 900K triangles (\~1.8M nodes). And you shouldn’t keep subdividing that far anyway; stop when there’s some small number of triangles in the node (\~8-64 I’m guessing), and that will cut the memory by 90-98%. (The memory for a tree is dominated by its leaf nodes, so every extra level of subdivision roughly doubles the memory for the tree.)

88dc730f0f71e55be39de0ad103bd9ff
0
Alienizer 109 Jul 24, 2012 at 01:30

The node is a qword. For the depth, I use that of pbrt…

MaxDepth = int(8 + 1.3 * log2(PolyCount));

and for the min poly per leaf I use…

log2(PolyCount)

So for a 900k poly, it gives me a depth of about 33 and 19 or so poly min per leaf. Is that too much you think?

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Jul 24, 2012 at 03:41

By a “qword” you mean 8 bytes, I’m guessing? So then if you’re getting 700 MB of allocation, that’s 92 million nodes. Since that’s far more nodes than triangles, I’d say something’s very wrong! Can you make your code count the number of nodes alloc’d to see if this figure is correct?

For 900K triangles with 20 triangles per leaf node, that’s 45K leaf nodes, so 90K nodes altogether. At 8 bytes per node, that’s only \~700KB of memory. Hmm…did you by any chance confuse 700KB for 700MB?

88dc730f0f71e55be39de0ad103bd9ff
0
Alienizer 109 Jul 24, 2012 at 04:04

I meant 700MB Reedbeta, not 700KB. The nodes are 8 bytes, but each leaf holds a list of triangle indexes, so at start, the left node may have 200k tri, and the right node 500k. then the right node get split, say 250k on each site, all the way down. Unless I’m doing this all wrong as usual.

I just loaded a 310,000 poly model, and that created 211,881 nodes and that took about 300MB (not KB).

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Jul 24, 2012 at 05:56

I still think that number of nodes is quite a bit too high compared to the number of triangles, but at least it’s more reasonable, not like 92 million or something. But at 8 bytes per node, 211,881 nodes is only 1.6 MB. So where is your 300 MB coming from?

88dc730f0f71e55be39de0ad103bd9ff
0
Alienizer 109 Jul 24, 2012 at 12:08

Each nodes is 8 bytes, but what about the list of triangles they hold if they are leafs?

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Jul 24, 2012 at 17:01

You have 310K triangles, so if you have 300MB of triangle data that’s about a kilobyte per triangle. I assume your triangle struct is not actually that big. So either the memory is coming from somewhere else or you’re generating a hell of a lot of split triangles.

I’m not asking you because I want to play a guessing game; you should instrument your code and find out where the memory is going! :)

B20d81438814b6ba7da7ff8eb502d039
0
Vilem_Otte 117 Jul 24, 2012 at 20:57

Just for information - it might help a little:
Sample scene - Sponza Atrium
One of the worst scenes for either Kd-trees or BVHs, so you’ll get really bad trees (plenty of useless nodes) for it! Although this is actually the point why we test on it. I strongly recommend trying Sponza…

Ray tracer - old GPU ray tracer, mono-tracing in OpenCL application, no actual optimization (just for general testing of tree quality) - e.g. simple ray tracing algorithm. So this one is just brain-dead simple implementation of ray tracer. Wald triangles used, I think…

Total Triangle Count - 67462 tris

Criteria: SAH Splitting, Primitive criteria = 64, Sah intersect cost = 80, Sah traversal cost = 1, Sah empty bonus = 0.5
Created 6081 nodes
Size of nodes = 48648 bytes (less than 50 KiB)
Size of all Kd-tree data = 512696 bytes (0.5 MiB)
Building done in 906 ms (1 core building algorithm posted here)
Average performance = 6.32 MRays/s

Criteria: SAH Splitting, Primitive criteria = 16 (e.g. if there is 16 or less triangles, create node), Sah intersect cost = 80, Sah traversal cost = 1, Sah empty bonus = 0.5
Created 114775 nodes
Size of nodes = 918200 bytes (e.g. less than 1 MiB)
Size of all Kd-tree data = 3606472 bytes (e.g. a bit more than 3 MiB)
Building done in 1401 ms (1 core building algorithm posted here)
Average performance = 7.46 MRays/s

Criteria: SAH Splitting, Primitive criteria = 4, Sah intersect cost = 80, Sah traversal cost = 1, Sah empty bonus = 0.5
Created 5626177 nodes
Size of all nodes = 45009416 bytes (e.g. 45 MiB)
Size of all Kd-tree data = 112243156 bytes (e.g.112 MiB)
Building done in 6848 ms (1 core building algorithm posted here)
Average performance = 3.72 MRays/s

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Jul 24, 2012 at 21:22

Nice numbers Vilem! This illustrates what I was saying about building the tree too far down…see how the size explodes exponentially when you decrease the maximum triangles per node. Quite dramatic really, cutting the triangles per node by 4x could make the tree as much as 40x bigger!

88dc730f0f71e55be39de0ad103bd9ff
0
Alienizer 109 Jul 24, 2012 at 22:22

Well, I don’t think we can calculate the size of the tree just by using the 8 bytes node! I mean, what about the list of triangles each one holds? On each split, 2 sides get a chunk, So for the very first split of 310K triangles, the minimum (not including overlapping triangles) would be 155K each sides. At 4 bytes as an index, that’s already 1.2MB just for the first split, as a minimum, for the first depth only. Or is it me that’s not doing it right?

88dc730f0f71e55be39de0ad103bd9ff
0
Alienizer 109 Jul 24, 2012 at 22:29

For mine, the Sponza using the pbrt way…

227,500 triangles
depth = 18
min tri per node = 31
30MB total size for the tree

Seems to work fine for the Sponza, but on others, it just doesn’t. at all.