187 replies to this topic

### #81rarefluid

New Member

• Members
• 14 posts

Posted 03 December 2007 - 11:33 AM

I finally implemented the algorithm with the proposed improvement of only interpolating one edge. Though it is quite logical, the implementation puzzled me quite a bit, so I'm posting here for people that are as confused as I was...
Block size is still 8.


...

...

if(DY31 < 0 || (DY31 == 0 && DX31 > 0)) C3++;

// find box test offsets for each edge

int eo1 = 0;

if (FDY12 < 0) eo1 -= FDY12 << 3; //meaning y1 is above y2, block size is 8 ==> shl 3

if (FDX12 > 0) eo1 += FDX12 << 3; //meaning x1 is right of x2

int eo2 = 0;

if (FDY23 < 0) eo2 -= FDY23 << 3; //meaning y2 is above y3

if (FDX23 > 0) eo2 += FDX23 << 3; //meaning x2 is right of x3

int eo3 = 0;

if (FDY31 < 0) eo3 -= FDY31 << 3; //meaning y3 is above y1

if (FDX31 > 0) eo3 += FDX31 << 3; //meaning x3 is right of x1

//these are the offsets fo the bottom-right block corner

int ei1 = (DX12 << 7) - (DY12 << 7) - eo1; //block size is 8 ==> shl 3 + 4

int ei2 = (DX23 << 7) - (DY23 << 7) - eo2;

int ei3 = (DX31 << 7) - (DY31 << 7) - eo3;

// Loop through blocks

for(int y = miny; y < maxy; y += q) {

unsigned char filledFlag = 0; //variable that states if we have found a somehow filled block already

for(int x = minx; x < maxx; x += q) {

// Corners of block

int x0 = x << 4;

int y0 = y << 4;

int CX1 = C1 + DX12 * y0 - DY12 * x0;

int CX2 = C2 + DX23 * y0 - DY23 * x0;

int CX3 = C3 + DX31 * y0 - DY31 * x0;

// Skip block when outside an edge

if((CX1+eo1) < 0 || (CX2+eo2) < 0 || (CX3+eo3) < 0) {

unsigned int *cbuffer = tempBuffer;

if (filledFlag & 0x80) {

//we have hit a filled block before, but now we've hit an empty one. we can skip to the next line.

break;

}

else {

//we have NOT hit a filled block before, so we just skip to the next one.

continue;

}

}

//Accept whole block when totally covered

if((CX1+ei1) > 0 && (CX2+ei2) > 0 && (CX3+ei3) > 0) {

unsigned int *cbuffer = tempBuffer;

for(int iy = 0; iy < q; iy++) {

for(int ix = x; ix < x + q; ix++) {

cbuffer[ix] = 0x00007F00; // Green

}

cbuffer = (unsigned int *)((char *)cbuffer + target->bytesPerLine);

}

}

else {

// Partially covered block

unsigned int *cbuffer = tempBuffer;

int CY1 = CX1;

int CY2 = CX2;

int CY3 = CX3;

for(int iy = y; iy < y + q; iy++) {

CX1 = CY1;

CX2 = CY2;

CX3 = CY3;

for(int ix = x; ix < x + q; ix++) {

if(CX1 > 0 && CX2 > 0 && CX3 > 0) {

cbuffer[ix] = 0x0000007F; // Blue

}

CX1 -= FDY12;

CX2 -= FDY23;

CX3 -= FDY31;

}

CY1 += FDX12;

CY2 += FDX23;

CY3 += FDX31;

cbuffer = (unsigned int *)((char *)cbuffer + target->bytesPerLine);

}

}

filledFlag = 0x80;

}

tempBuffer = (unsigned int *)((char *)tempBuffer + q * target->bytesPerLine);

}

};



Though there are much less calculations going on, the code is not much faster (as in blazing fast) than the original version, because eo1/eo2/eo3 are kept in memory.
The code to calculate + test for "fully inside" (ei1, ei2, ei3) slows down the algorithm a lot too, probably because of the lack of registers. There might be another way of calculating those offsets off the eoX variables.

A slight improvement is the filledFlag variable that skips to the next line of blocks when we have encountered a filled and an empty block in sequence (because that means the rest of the line is empty...). I tried that mechanism for partially filled blocks too (skipping lines of pixels), but noticed no improvement...

comments and flames are welcome ;)

### #82rarefluid

New Member

• Members
• 14 posts

Posted 04 December 2007 - 05:00 PM

I have to check, but I guess on an architecture with many registers this might be really fast, 'cause almost everything is precalculated and kept in registers instead of memory.

Another improvement might be to convert the block inside calculation to interpolate coordinates:
    ...
int CY1 = C1 + DX12 * (miny << 4) - DY12 * (minx << 4);
int CY2 = C2 + DX23 * (miny << 4) - DY23 * (minx << 4);
int CY3 = C3 + DX31 * (miny << 4) - DY31 * (minx << 4);
// Loop through blocks
for(int y = miny; y < maxy; y += q) {

int CX1 = CY1;
int CX2 = CY2;
int CX3 = CY3;
unsigned char filledFlag = 0; //variable stat states if we haven't found a somehow filled block/pixel already

for(int x = minx; x < maxx; x += q) {
// Skip block when outside an edge
if((CX1+eo1) < 0 || (CX2+eo2) < 0 || (CX3+eo3) < 0) {
unsigned int *cbuffer = tempBuffer;
if (filledFlag & 0x80) {
//we have hit a filled block before, but now we've hit an empty one. we can skip to the next line.
break;
}
else {
//we have NOT hit a filled block before, so we just skip to the next one.
goto skip;
}
}
//Accept whole block when totally covered
if((CX1+ei1) > 0 && (CX2+ei2) > 0 && (CX3+ei3) > 0) {
unsigned int *cbuffer = tempBuffer;
for(int iy = 0; iy < q; iy++) {
for(int ix = x; ix < x + q; ix++) {
cbuffer[ix] = 0x00007F00; // Green
}
cbuffer = (unsigned int *)((char *)cbuffer + target->bytesPerLine);
}
}
else {
// Partially covered block
unsigned int *cbuffer = tempBuffer;
for(int iy = y; iy < y + q; iy++) {
int cx1 = CX1;
int cx2 = CX2;
int cx3 = CX3;
for(int ix = x; ix < x + q; ix++) {
if(cx1 > 0 && cx2 > 0 && cx3 > 0) {
cbuffer[ix] = 0x0000007F; // Blue
}
cx1 -= FDY12;
cx2 -= FDY23;
cx3 -= FDY31;
}
CX1 += FDX12;
CX2 += FDX23;
CX3 += FDX31;
cbuffer = (unsigned int *)((char *)cbuffer + target->bytesPerLine);
}
CX1 -= (FDX12 << 3);
CX2 -= (FDX23 << 3);
CX3 -= (FDX31 << 3);
}
filledFlag = 0x80;
skip:
CX1 -= (FDY12 << 3);
CX2 -= (FDY23 << 3);
CX3 -= (FDY31 << 3);
}
CY1 += (FDX12 << 3);
CY2 += (FDX23 << 3);
CY3 += (FDX31 << 3);
tempBuffer = (unsigned int *)((char *)tempBuffer + q * target->bytesPerLine);
}
};

No multiplications anymore! There is a slight improvement for this, which again might be bigger if we could keep some stuff in registers...

I'm really keen to hear your suggestions for further improving speed algorithmically and not converting to SSE or other stuff...

### #83rarefluid

New Member

• Members
• 14 posts

Posted 02 January 2008 - 04:46 PM

Dunno if anyone reads this, but I found an interesting piece for prespective-correct interpolation from Nick :) in another forum

Quote

I think I did find a more practical result though: To compute the classical matrix inverse, a division by the deteminant is required. This takes 1 reciproke, 12 multiplications and 2 addtions. I believe this can be eliminated altogether. Without dividing by the determinant d, instead of computing u/w and 1/w we'd be computing d*u/w and d/w. Per pixel we're computing (d*u/w)/(d/w) = u.

This should be quite ideal for this algorithm...

### #84Jamie

New Member

• Members
• 8 posts

Posted 08 April 2008 - 01:55 PM

Hi,

http://winden.wordpr...ategory/coding/

A very interesting article about old rendering method, for perspective mapping we can have only 2 division per polygon, and if you have vertex cached it's only one division.

### #85Nick

Senior Member

• Members
• 1227 posts

Posted 09 April 2008 - 10:37 AM

Jamie said:

A very interesting article about old rendering method, for perspective mapping we can have only 2 division per polygon, and if you have vertex cached it's only one division.
Turning multiple divisions into one is a very basic math trick:

x = a / b
y = c / d

becomes

y = cb / db

The denominators are equal so you only have to compute 1 / bd once.

However, this has several implications. The first is that if d or b is zero you get a division by zero for both. With triangles, one or two vertices can have w = 0 (but not all three). Secondly, computers have limited precision and so b * (1 / bd) is of lower precision than 1 / d. And it gets worse when doing it for three divisions at once. And thirdly, you quickly have many more muliplications and data movement, making the gains much less worthwhile.

By the way, gradient setup for perspective correct interpolation can even be done with no division at all...

### #86Jamie

New Member

• Members
• 8 posts

Posted 09 April 2008 - 01:47 PM

I tested the optimization for the first 2 division in my scanline perspective correction, i don't see a difference but as you said i have not tested for 3 division.

For the w = 0, you can clipp the triangle with the near plane

One thing that is really good it's he compute all edge,gradient with the 3d coordinate so he has a good precision.

Of course you don't need to have division for the perspective gradient setup.

### #87n.o.p

New Member

• Members
• 1 posts

Posted 28 December 2008 - 08:15 PM

Nick said:

So how do we detect coverage as fast as possible? With the half-space functions, it's really easy. All we have to do is evaluate the half-space functions in the corners of the blocks. A non-covered block has negative half-space values for the corners for at least one edge. A completely covered block has positive half-space values for all edges. Everything else is a partially covered block.

Doesn't that give false positives (flagging a non-covered block as partially covered block)?

+-----+
\   /
\ /
v
+------+
|      |
|      |
+------+


This block detected as partitally covered even though it's not covered at all. Might not be a big problem here (except for some wasted cycles), but it should be noted.

### #88Nick

Senior Member

• Members
• 1227 posts

Posted 29 December 2008 - 01:16 PM

n.o.p said:

Doesn't that give false positives (flagging a non-covered block as partially covered block)?
No, the block in your image is outside of the bounding box of the triangle, so it's never going to be tested in the first place. :happy:

### #89Bernie

New Member

• Members
• 6 posts

Posted 18 January 2009 - 12:14 AM

Very interesting thread! I have some questions ..

Currently I'm working on accurate scan conversion functions for both pixel-centers (e.g. rasterizing triangles) and pixel-coverage (e.g. which tiles touch this triangle). However I want to use floating-point math, and avoid unecessary work such as looping over all pixels in the bounding box. I'm not trying to optimize the performance with assembly at this stage - instead I want to optimize the algorithm so I can then implement it on Altivec and SPU later.

So far I have the pixel-center algorithm working quite well - it handles the edge cases perfectly and requires only a couple multiplies and a floorf() and ceilf() per scanline. The way I do it is to first rearrange the vertices so that vertex A has the highest y-coord and then if the winding is clockwise I swap B and C. This gives me one of two configurations which I handle separately:

. A
. /|
. B |
. \|
. C

or

. A
. |\
. | B
. |/
. C

In the first configuration vertex B is to the left of edge AC ... I scan along edge AB (left) and AC (right) for the upper half and then along BC (left) and AC (right) for the lower half.

In the second configuration vertex B is to the right of edge AC ... I scan along edge AC (left) and AB (right) for the upper half and then along AC (left) and BC (right) for the lower half.

Since I'm only considering pixel centers here, the "upper" and "lower" parts of the triangle are well-defined by simply comparing the y-coord of the center of the span to the y-coord of vertex B, so things are fairly simple.

Now I'm working on the pixel-coverage scan converter (here a "pixel" can be considered as a tile, it makes no difference) and the problem seems a lot more complex. Again I'm looping over the scanlines just like the pixel-center version, and for each scanline I determine the furthest extent {xmin,xmax} of the triangle over the scanline's area. Calling floorf(xmin), ceilf(xmax) gives me the actual extent in integer values. The problem is that for some scanlines I have to consider multiple edges on the left side the right side ...

My approach so far is to process groups of spans. First the "top" span (which includes vertex A), then the "upper" spans (between A and B but not including either), then the "middle" span (B), then the "lower" spans (between B and C but not including either) and finally the "bottom" span ©. Plus, the "upper+top" case where both A and B lie on the same span has to be considered separately, as does the "lower+bottom" case, and the trivial case where all vertices lie on the same span. In each of these 8 cases there is a different equation for getting {xmin,xmax}, but it's always just a couple of multiplies and adds.

The y-extents of the spans are as follows:

"top" span will be {floorf(Ay), floorf(Ay)+1}
"upper" spans will be between {floorf(Ay)+1, floorf(By)}
"middle" span will be {floorf(By), ceilf(By)}
"lower" spans will be between {ceilf(By), ceilf(Cy)-1}
"bottom" span will be {ceilf(Cy)-1, ceilf(Cy)}

Note that some of these extents cover zero area or even negative area .. e.g. if By is an integer then the middle span will not exist. But also if vertices A and B (or vertices B and C) lie on the same span then the middle span must be skipped because it would otherwise overlap.

Has anyone implemented anything like this or does anyone know of any references to this kind of solution?

### #90Nick

Senior Member

• Members
• 1227 posts

Posted 19 January 2009 - 01:48 PM

Hi Bernie,

You should be able to compute pixel-coverage (or other tiles) by resizing the triangle with the dimensions of the tile and then using any point-coverage algorithm. Have a look at post #72 for some details. I hope that helps.

Cheers,

Nick

### #91Bernie

New Member

• Members
• 6 posts

Posted 19 January 2009 - 06:43 PM

Nick said:

Hi Bernie,

You should be able to compute pixel-coverage (or other tiles) by resizing the triangle with the dimensions of the tile and then using any point-coverage algorithm. Have a look at post #72 for some details. I hope that helps.

Cheers,

Nick

i considered that, but it seemed more complex because the expanded triangle is no longer really a triangle due to clipping (especially when the triangle vertices are very acute). also, my solution does not give false-positives. thanks though!

### #92Bernie

New Member

• Members
• 6 posts

Posted 19 January 2009 - 07:02 PM

in case anyone wants to see my pixel coverage code ... i'm pretty sure there are no remaining boundary issues.


INLINE void TriangleSetup(float &Ax, float &Ay, float &Bx, float &By, float &Cx, float &Cy)

{

float temp;

if (Ay > By || Ay >= Cy) // rotate triangle so that Ay is on top

{

temp = Ax; Ax = Bx; Bx = Cx; Cx = temp;

temp = Ay; Ay = By; By = Cy; Cy = temp;

if (Ay > By || Ay >= Cy)

{

temp = Ax; Ax = Bx; Bx = Cx; Cx = temp;

temp = Ay; Ay = By; By = Cy; Cy = temp;

}

}

if ((Bx - Ax)*(Cy - Ay) > (Cx - Ax)*(By - Ay)) // flip vertices B,C so that triangle is counter-clockwise

{

temp = Bx; Bx = Cx; Cx = temp;

temp = By; By = Cy; Cy = temp;

}

}

void RasterizeCoverage(SpanList &spans, float Ax, float Ay, float Bx, float By, float Cx, float Cy)

{

// rasterize pixel "coverage" (e.g. determining which tiles overlap with a triangle)

// debug colors ...

const V4 spanColor_TRIVIAL(0.8f, 0.8f, 0.8f, 1.0f); // white

const V4 spanColor_TOP    (0.8f, 0.0f, 0.0f, 1.0f); // red

const V4 spanColor_TOP1   (0.8f, 0.8f, 0.0f, 1.0f); // red-green

const V4 spanColor_UPPER  (0.4f, 0.4f, 0.4f, 1.0f); // red-gray (red)

const V4 spanColor_MIDDLE (0.0f, 0.8f, 0.0f, 1.0f); // green

const V4 spanColor_LOWER  (0.4f, 0.4f, 0.5f, 1.0f); // blue-gray

const V4 spanColor_BOTTOM1(0.0f, 0.8f, 0.8f, 1.0f); // blue-green

const V4 spanColor_BOTTOM (0.0f, 0.0f, 0.8f, 1.0f); // blue

TriangleSetup(Ax, Ay, Bx, By, Cx, Cy);

ASSERT(IsNormal(Ax) && IsNormal(Ay)); // INF or NaN input are not supported, also denormals are not supported

ASSERT(IsNormal(Bx) && IsNormal(By));

ASSERT(IsNormal(Cx) && IsNormal(Cy));

float Ay0 = Floor(Ay);

float xmin = Min(Ax, Bx, Cx); // triangle bounds

float xmax = Max(Ax, Bx, Cx);

float ymax = Max(By, Cy);

int iy = (int)Ay0; // index of first span

// first check for trivial cases

{

float range = Min(Ceil(xmax) - Floor(xmin), Ceil(ymax) - Ay0);

if (range == 0.0f) // entire triangle spans 0 rows or columns - zero coverage

{

return; // done

}

else if (range == 1.0f) // entire triangle spans 1 row or column

{

while (Ay0 < ymax)

{

++iy, ++Ay0;

}

return; // done

}

}

bool flip = (By > Cy);

if (flip)

{

float temp;

// negate x-coords, swap B and C

Ax = -Ax;

temp = -Bx; Bx = -Cx; Cx = temp;

temp =  By; By =  Cy; Cy = temp;

// negate and swap xmin,xmax

temp = -xmin; xmin = -xmax; xmax = temp;

}

ASSERT(By <= Cy);

float dAB = (Bx - Ax)/(By - Ay); // divide by zero is possible, but the results will not be used ..

float dBC = (Cx - Bx)/(Cy - By);

float dCA = (Ax - Cx)/(Ay - Cy);

float dAB_pos = Max(0.0f, dAB);

float dBC_pos = Max(0.0f, dBC);

float dCA_pos = Max(0.0f, dCA);

float dAB_neg = Min(0.0f, dAB);

float dBC_neg = Min(0.0f, dBC);

float dCA_neg = Min(0.0f, dCA);

float By0      = Floor(By);

float By1      = Ceil (By);

float Cy1_sub1 = Ceil (Cy) - 1.0f;

float y1 = Ay0 + 1.0f; // lower edge of first span

// TOP SPAN ....... [floor(Ay),   floor(Ay)+1]

// UPPER SPANS .... [floor(Ay)+1, floor(By)  ]

// MIDDLE SPAN .... [floor(By),   ceil (By)  ]

// LOWER SPANS .... [ceil (By),   ceil (Cy)-1]

// BOTTOM SPANS ... [ceil (Cy)-1, ceil (Cy)  ]

if (By1 == y1) // top half is 1 span [Ay0..By1], will include or touch (A) and (B) but will not touch (C)

{

ASSERT(IsFinite(dBC_neg));

ASSERT(IsFinite(dCA_pos));

ASSERT(y1 == (float)(iy + 1));

float x0 = (y1 - By)*dBC_neg + Min(Ax, Bx);

float x1 = (y1 - Ay)*dCA_pos + Ax;

ASSERT(xmin <= x0 && x0 < x1 && x1 <= xmax);

++iy, ++y1;

By1 = By0; // skip middle span

}

else if (By1 > y1) // top half is multiple spans ..

{

// top span [Ay0..Ay0+1], will include or touch (A) on top but will not touch (B) or (C)

{

ASSERT(IsFinite(dAB_neg));

ASSERT(IsFinite(dCA_pos));

ASSERT(y1 == (float)(iy + 1));

float x0 = (y1 - Ay)*dAB_neg + Ax;

float x1 = (y1 - Ay)*dCA_pos + Ax;

ASSERT(xmin <= x0 && x0 < x1 && x1 <= xmax);

++iy, ++y1;

}

while (y1 <= By0) // upper spans [Ay0+1..By0], may touch (B) on bottom but will not touch (A) ... may touch (C) if Cy=By is an integer

{

ASSERT(IsFinite(dAB));

ASSERT(IsFinite(dCA));

ASSERT(y1 == (float)(iy + 1));

float x0 = ((y1 - By)*dAB - dAB_pos) + Bx; // must = Bx exactly when y = By and dAB <= 0

float x1 = ((y1 - Cy)*dCA - dCA_neg) + Cx; // must = Cx exactly when y = Cy and dCA >= 0

ASSERT(xmin <= x0 && x0 < x1 && x1 <= xmax);

++iy, ++y1;

}

}

float y0 = y1 - 1.0f; // now we track the upper edge of the span

if (By0 == Cy1_sub1) // bottom half is 1 span [By0..Cy1], will include or touch (B) and (C) but will not touch (A)

{

ASSERT(IsFinite(dAB_pos));

ASSERT(IsFinite(dCA_neg));

ASSERT(y0 == (float)(iy + 0));

float x0 = (y0 - By)*dAB_pos + Min(Bx, Cx);

float x1 = (y0 - Cy)*dCA_neg + Cx;

ASSERT(xmin <= x0 && x0 < x1 && x1 <= xmax);

//++iy, ++y0;

}

else if (By0 < Cy1_sub1) // bottom half is multiple spans ..

{

if (By0 < By1) // middle span [By0..By1], will include or touch (B) on top or bottom but will not touch (A) or (C)

{

ASSERT(IsFinite(dAB_pos));

ASSERT(IsFinite(dBC_neg));

ASSERT(IsFinite(dCA));

ASSERT(y0 == (float)(iy + 0));

ASSERT(y1 == (float)(iy + 1));

ASSERT(dAB_pos*dBC_neg == 0.0f); // NOTE: dAB_pos and dBC_neg cannot simultaneously be non-zero due to CCW winding

float x0  = ((y0 - By)*dAB_pos + (y1 - By)*dBC_neg) + Bx;

float x1  = ((y0 - Ay)*dCA     +           dCA_pos) + Ax; // all four of these (x1,x1b,x1c,x1d) are valid ...

float x1b = ((y1 - Ay)*dCA     -           dCA_neg) + Ax;

float x1c = ((y0 - Cy)*dCA     +           dCA_pos) + Cx;

float x1d = ((y1 - Cy)*dCA     -           dCA_neg) + Cx;

ASSERT(xmin <= x0 && x0 < x1 && x1 <= xmax);

++iy, ++y0;

}

while (y0 < Cy1_sub1) // lower spans [By1..Cy1-1], may touch (B) on top but will not touch (C) ... may touch (A) if Ay=By is an integer

{

ASSERT(IsFinite(dBC));

ASSERT(IsFinite(dCA));

ASSERT(y0 == (float)(iy + 0));

float x0 = ((y0 - By)*dBC + dBC_neg) + Bx; // must = Bx exactly when y = By and dBC >= 0

float x1 = ((y0 - Ay)*dCA + dCA_pos) + Ax; // must = Ax exactly when y = Ay and dCA <= 0

ASSERT(xmin <= x0 && x0 < x1 && x1 <= xmax);

++iy, ++y0;

}

// bottom span [Cy1-1..Cy1], will include or touch (C) on bottom but will not touch (A) or (B)

{

ASSERT(IsFinite(dBC_pos));

ASSERT(IsFinite(dCA_neg));

ASSERT(y0 == (float)(iy + 0));

float x0 = (y0 - Cy)*dBC_pos + Cx; // must = Cx exactly when y = Cy

float x1 = (y0 - Cy)*dCA_neg + Cx; // must = Cx exactly when y = Cy

ASSERT(xmin <= x0 && x0 < x1 && x1 <= xmax);

//++iy, ++y0;

}

}

if (flip) // negate and swap spans

{

for (int i = 0; i < spans.mSpanCount; ++i)

{

Span &s = spans.mSpans[i];

float x = -s.mX0;

s.mX0   = -s.mX1;

s.mX1   = x;

}

}

}

...

e.g.:

SpanList spans;

RasterizeCoverage(spans, ax, ay, bx, by, cx, cy);

for (int i = 0; i < spans.mSpanCount; ++i)

{

const Span &s = spans.mSpans[i];

int x0 = (int)Floor(s.mX0);

int x1 = (int)Ceil (s.mX1);

int y = s.mY;

for (int x = x0; x < x1; ++x)

{

--> pixel {x,y} is covered by triangle {ax,ay,bx,by,cx,cy}

}

}



### #93Frogblast

New Member

• Members
• 3 posts

Posted 20 January 2009 - 07:26 AM

Nick said:

No, the block in your image is outside of the bounding box of the triangle, so it's never going to be tested in the first place. :happy:

Unfortunately, the answer isn't quite so easy. I've just run into the following example: there really is an intersection between the quad and the triangle, even though no sample locations are covered. This can occur at the corners of the triangle.

I haven't been able to find a way to reduce this false positive. Unfortunately, when dealing with very small triangles, this can become a significant expense. For example, in one scene I have an average triangle size of 11 pixels and a block size of 2x2 pixels. About 30% of my blocks are determined to be partially covered even though no samples are covered. As the block size increases, the problem worsens. At 4x4, 55% of all partially covered blocks contain no covered samples.

Does anyone have some insight on how to catch some of these cases before I spend time testing every single sample point?

### #94Bernie

New Member

• Members
• 6 posts

Posted 20 January 2009 - 08:12 AM

Frogblast said:

Unfortunately, the answer isn't quite so easy. I've just run into the following example: there really is an intersection between the quad and the triangle, even though no sample locations are covered. This can occur at the corners of the triangle.
...

Although I've gone with a completely different solution, I think in your case you could test whether all four corners of the quad are outside of the *same* plane. If this isn't the case, *and* if the quad intersects the bounding box of the triangle, then the quad cannot be rejected.

This is basically the same as testing whether two convex 2D polygons intersect by considering each polygon edge as a potential separating-plane.

So for each corner you would construct a 3-bit flag indicating which of the 3 edges (planes) fails the distance check. If the bitwise-AND of these four 3-bit values is nonzero then the quad is outside. If the bitwise-OR of these values is zero, then the quad is trivially inside. Otherwise, it is partially covered.

### #95Nick

Senior Member

• Members
• 1227 posts

Posted 20 January 2009 - 09:05 AM

Frogblast said:

I've just run into the following example: there really is an intersection between the quad and the triangle, even though no sample locations are covered. This can occur at the corners of the triangle.
Indeed, that type of false positives do happen. It's actually unavoidable. We really want to know the coverage of the pixel centers but the tile test determines intersection between the triangle and a rectangular shape.

Quote

Unfortunately, when dealing with very small triangles, this can become a significant expense. For example, in one scene I have an average triangle size of 11 pixels and a block size of 2x2 pixels. About 30% of my blocks are determined to be partially covered even though no samples are covered. As the block size increases, the problem worsens. At 4x4, 55% of all partially covered blocks contain no covered samples.
How significant is it really? Rasterization is only a fraction of the total graphics pipeline. Besides, I think it's just something we have to accept; you can't have higher detail geometry without additional cost. Graphics hardware struggles with this too and recent generations dedicate more logic to rasterization than before as average triangle sizes approach single digit pixel counts. In fact DirectX 11 will up the demands even more as it introduces standardized tessellation. But that's still nothing compared to the shading demands.

All we can do is ensure that our test tiles cover just the pixel centers, not the entire pixels, and maximize the use of vector instructions. SSE2 can compute the coverage of 8 pixels in just a handful of instructions (less than 1 clock cycle per pixel on modern CPUs).

Anyway, are you really using 2x2 tiles? That's actually useless since the cost of testing the tile is equivalent to the cost of just testing the four pixels (and the latterobviously gives you 0% false positives).

### #96Frogblast

New Member

• Members
• 3 posts

Posted 20 January 2009 - 05:09 PM

Nick said:

Anyway, are you really using 2x2 tiles? That's actually useless since the cost of testing the tile is equivalent to the cost of just testing the four pixels (and the latterobviously gives you 0% false positives).

For now, I'm just trying to get it to work with any block size. I haven't yet settled on a final block size. I would intuitively expect the proportion of false positives to decrease as the blocks get larger (fewer block edges for a triangle corner to happen to and on), so I need to figure that out before increasing it further.

Nick said:

SSE2 can compute the coverage of 8 pixels in just a handful of instructions (less than 1 clock cycle per pixel on modern CPUs).

The best I've come up with is 6 instructions for 4 pixels using SSE4.1, using 32bit line equation results (also should execute in 6 instructions, but I haven't yet verified that). Are you using 16bit values for your line equation results?

### #97Reedbeta

DevMaster Staff

• 5311 posts
• LocationSanta Clara, CA

Posted 20 January 2009 - 05:32 PM

Frogblast said:

The best I've come up with is 6 instructions for 4 pixels using SSE4.1, using 32bit line equation results (also should execute in 6 instructions, but I haven't yet verified that). Are you using 16bit values for your line equation results?

I think he means that with pipelining, you can get the results for a new pixel each clock cycle. Not that there is only one cycle of latency.
reedbeta.com - developer blog, OpenGL demos, and other projects

### #98.oisyn

DevMaster Staff

• Moderators
• 1842 posts

Posted 09 March 2009 - 04:23 PM

Nick said:

Anyway, are you really using 2x2 tiles? That's actually useless since the cost of testing the tile is equivalent to the cost of just testing the four pixels (and the latterobviously gives you 0% false positives).
Actually, testing the four corners is too much in the first place. Per plane you only need the two outer-most corners, depending on the orientation of the plane. By describing the box as a center b.c and an extents vector b.x and the plane by normal p.n and distance p.d to world origin, you'll only need to test:
float c = dot(p.n, b.c);
float x = dot(abs(p.n), b.x);
if (c + x < p.d)
// box is outside for this plane
where abs(x) is the component-absolute function for a vector
-
Currently working on: the 3D engine for Tomb Raider.

### #99Nick

Senior Member

• Members
• 1227 posts

Posted 10 March 2009 - 07:20 AM

.oisyn said:

Actually, testing the four corners is too much in the first place.
I know, I was merely illustrating that a 2x2 tile doesn't win you anything as it's equivalent to determining coverage of each and every pixel.

Quote

Per plane you only need the two outer-most corners...
Actually it can be done with just one point coverage test per tile, by adjusting the half-space inequalities, as described and illustrated in a post above.

### #100.oisyn

DevMaster Staff

• Moderators
• 1842 posts

Posted 10 March 2009 - 11:01 AM

Nick said:

Actually it can be done with just one point coverage test per tile, by adjusting the half-space inequalities, as described and illustrated in a post above.
I was talking in the generic sense. For testing an arbitrary box you'll need to test two points rather than 4. But if you were going to optimize the code for this particular case, it is apparent that b.x is constant (as all boxes have the same size) and therefore the computed value of 'x' is constant for every plane. Which effectively leaves you with the center point test . This is basically the same as adjusting the planes themselves (as c + x < p.d <=> c < p.d - x).