Advanced Rasterization
#21
Posted 13 September 2004 - 11:56 PM
#22
Posted 14 September 2004 - 08:20 AM
john said:
Quote
Quote
Quote
#23
Posted 14 September 2004 - 01:39 PM
#24
Posted 14 September 2004 - 04:27 PM
Mihail121 said:
I wouldn't really say it has shader support though. :blush: You can derive a class from a PixelShader class, and implement the execute() function. This allows to simply write the 'shader' in C++. In that sense every software renderer would have shader support even before one line of it is implemented.
Anyway, it seems like it could become an open-source 'reference rasterizer'...
#25
Posted 16 September 2004 - 02:28 PM
swShader Documentation
If there's any part that can be improved, like things that are unclear and need more explanation, please let me know!
#26
Posted 16 September 2004 - 09:40 PM
-Loving a Person is having the wish to see this Person happy, no matter what that means to yourself.
-No matter what it means to myself....
#27
Posted 16 September 2004 - 11:10 PM
davepermen said:
I don't have the money for an upgrade, but even then it would be a dilemma. The really interesting processors with a combination of all the latest technology will only be available and affordable in 2006 or so. Realistically, I'm looking forward to nForce 4 motherboards with PCI-Express and socket 939. I can then have an Athlon 64, and next year upgrade to dual-core. Dual Xeon with EMT has also crossed my mind...
#28
Posted 17 September 2004 - 04:48 AM
64bit, well, yeah, i have to wait for as well... it will help you much, thanks to the registers.. it will help much thanks to the huge memory space, too... performance should be great on them. oh, and, native 64bit integer math would lead to much bigger possible render targets, native, too.. (with subpixels taken care of).
i'd be happy to spend you some pc or such, but, well.. i'm on a 2.7ghz celeron myself. intels proof that mhz don't mather really at all... :(
-Loving a Person is having the wish to see this Person happy, no matter what that means to yourself.
-No matter what it means to myself....
#29
Posted 23 September 2004 - 09:31 PM
that'll rock. semprons are hell cheap..
-Loving a Person is having the wish to see this Person happy, no matter what that means to yourself.
-No matter what it means to myself....
#30
Posted 27 September 2004 - 04:18 AM
I hope you have more of these to share with us
#31
Posted 27 September 2004 - 02:33 PM
I have plans to write about projection and clipping. That's another stage in the rendering pipeline that is usually handled by the hardware, but again deeper knowledge of it is useful beyond software rendering.
#32
Posted 28 September 2004 - 03:40 AM
How much assembly (and MMX, SIMD, etc.) should one know in order to implement an efficient software renderer? I know little assembly (but i'm an expert in C/C++) and I'm wondering if assembly is a prerequisite for such things.
btw, I personally use DevPartner and it's great, especially for memory leaks.
#33
Posted 28 September 2004 - 06:05 AM
#34
Posted 28 September 2004 - 06:57 AM
Small right-edge optimization to your original simple algorithm, should work in the blocked one too:
bool done = false;
for(int x = minx; x < maxx; x++)
{
if(CX1 > 0 && CX2 > 0 && CX3 > 0)
{
colorBuffer[x] = 0x00FFFFFF;
done=true;
}
else
if (done) break;
CX1 -= FDY12;
CX2 -= FDY23;
CX3 -= FDY31;
}
Haven't done any benchmarking to see if getting rid of on average half the time wasted outside the triangle is worth the extra check.. could easily see it being worth it in an ARM implementation on GBA for example because of its cheap conditionals. Or you could spend some extra code size to split the inner loop into two, the first will only step forwards until it finds the triangle, the second one will fill and exit as soon as the condition becomes false.
Anyhow, I haven't given it much though yet but how would you efficiently interpolate things like texture coordinates and colors over the triangle? How to do it is obvious with the scanline algorithm but here I guess a little bit of line equation trickery is needed and it's too early in the morning for me to work it out
#35
Posted 28 September 2004 - 11:18 AM
cdgray said:
Quote
Quote
#36
Posted 28 September 2004 - 11:30 AM
ector said:
Indeed on a GBA it could be very useful, though I doubt if this parallel rasterizer actually has use there at all?
Quote
Once you got the coordinates for one pixel you obviously only have to add du/dx and/or du/dy to get the texture coordinates for the neighboring pixels. So I do the above calculation once per block, then use this linear interpolation for the other pixels in the block.
#37
Posted 28 September 2004 - 09:39 PM
Nick said:
Quote
Once you got the coordinates for one pixel you obviously only have to add du/dx and/or du/dy to get the texture coordinates for the neighboring pixels. So I do the above calculation once per block, then use this linear interpolation for the other pixels in the block.
#38
Posted 15 October 2004 - 03:08 AM
I'm new to these boards but Have been very interested in software rasterization for a long while. I have my own project in the early stages, Hopefully I'll find more time to work on it but I'm back in school now.
Anyhow, in Graphics class today I was thinking about the technique in your article. I had a moment of potential inspiration and ended up with somewhat of a branch from your technique, I wonder what your thoughts might be on it.
What I propose is that, given a triangle, create the smallest square bounding box which is a power of 2 along both axis inclusive of the three points. In the case of a 2x2 box, the half-spaces are immediatly calculated for each corner and each of the 4 pixels are lit accordingly. Otherwise, the box is sub divided into 4 equal, square portions recursivly. The half-space values are calculated for the corners and used to determine if the box is completely lit, rejected or partially lit as in your technique. Partially Lit boxes are further sub divided untill such time that they form a 2x2 box, the calculated half-space values are then used to light each of the 4 pixels accordingly.
This technique has the advantage that the largest possible boxes are accepted/rejected as early as possible. 2x2 and 4x4 boxes produce 4 and 16 half-space checks respectively, 8x8 boxes produce 80 half-space checks, boxes larger than 8x8 is where the benefit kicks in, as the likelyhood of finding large boxes to accept/reject increases. LargeTriangles become quite efficient. I also believe that this technique can lend itself to parellel processing as well.
I would be glad to hear your input on this technique, perhaps its already been done and I am unaware of it. Did you make a decision to chose 8x8 boxes for other benefits like per-box lighting calculations prehaps? How well do you think this technique might perform? If it interests you I would be glad to discuss it further.
Should this idea be something new (and more importantly, a good idea) I'd call it "Rasterization by recursive bilateral reduction." Also, aside from triangles, this approach, and yours as well I believe, can be made to apply to any convex n-gon.
#39
Posted 15 October 2004 - 08:04 AM
Theoretically your algorithm has a lot of value. Indeed it would be most efficient of all, especially for gigantic resolutions. However, I have tried similar techniques in practice and they all turned out to have more or less the same performance as the 8x8 block method. The reason is that, although the recursive algorithms are algorithmically more efficient, they are computationally more expensive. To make a close analogy; it's like using quick-sort for three elements. :wink:
Like you say, it only starts to really work for boxes larger than 8x8. But for big polygons, the rasterization cost is, compared to the time spend in the pixel pipelines, relatively small. 8x8 blocks are skipped in around 10 clock cycles or so, while 8x8 shaded pixels with one textures takes a lot more than 1.000 clock cycles. So in this perspective I think it's not all that useful to look for more efficient rasterization.
But the primary reason why I'm still using 8x8 blocks is actually the implementation complexity. As you can see in the article, it is relatively compact and straightforward to implement. All other attempts added some complexity, and this comes at the cost of flexibility. Also, I'd like to keep the setup cost for tiny polygons as low as possible. With a recursive approach it would optimize for the big polygons but make small polygons less efficient. So currently I'm keeping the 8x8 blocks approach, because it still gives me the freedom to refine it afterwards, if necessary.
Another reason is that consistently using 8x8 blocks makes it easy to implement a visibility system without complications. I already had to deal with situations where the render target's dimentions are not a multiple of 8. Using other sizes of blocks would only make this worse, while I believe it wouldn't add much benefit to improve the visibility algorithm. In 640x480 mode, there are 'only' 80x60 blocks anyway.
The most succesful alternative so far is to have a specialized rasterizer for tiny and huge polygons. Tiny polygons often fall into a 2x2 pixel block, so they can be passed to the pixel piplines directly. This occurs frequently for models with a large number of polygons, far away. For huge polygons several 8x8 blocks can be skipped at once by computing exact start and end values of the polygon edges per row. This requires a slow division though, so I haven't seen much performance increase of it for medium resolutions, unfortunately.
But your idea is definitely valuable for big resolutions. For non-real-time rendering with anti-aliasing I believe an approach like that becomes a necessity. Either way, I'll keep you updated on any new findings that apply to my software renderer, and I wish you good luck with your project!
#40
Posted 05 November 2004 - 08:44 PM
It's needless to say that Your article precisely describes fast and efficient method for such a basic routine as triangle rasterization. Simply, nothing to add, nothing to remove. However it wouldn't be smart to use it instead of existing solutions unless they haven't been created. The majority of programmers rather aviods diving into low-level code, which generally is implemented as hardware service. In matter of fact, the presented algoritm was used as compact asm routines in 80x86 simple 3D intros/games, but it could be completely forgotten since the development of hardware dedicated to boost graphic drawing. Nowadays, the modern 3D applications aren't even similiar to these from the past.
The discussion leading to optimize the basics isn't as productive as before. The algorithmical analyses and theoretical complexity evaluations doesn't produce any interesting results. The only tool programmer can use is profiler (or other time-based feedbacks of course), which role is to find bottlenecks; so these can be found only by empeiria. My experience had taught me, that analysing code isn't good method for determinig time complexity. That, what usually looks ineffective is often fast. Simply, I believe in 80/20 theory.
Despite the fact, that mentioned algorithm is satisfying, it doesn't change anything.
Best regards
- Mario
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users












