you don't know swshader but want to talk in here.. i think nick will be unhappy.. :D
Yeah, I have heard of it, but don't really know much about it, sorry Nick ;) IIRC, it's kind of replacement to D3D reference rasterizer, no? It would be really nice, if Nick would modify the swshader to collect statistics about rendering, i.e. how many texels were accessed during the frame, how many quads/pixels gets z-culled, what's the utilization of vertex cache, etc. Also PIX like pixel debugging on XBox would be neat, i.e. you can pick a pixel on the screen and all pixel shaders and their values executed for that pixel gets listed ;)
A hugely important thing from performance POV in rendering is coherency. With raytracing you totally loose that. Think about it for a second. How much data you need to process for each ray? You need to first traverse through the spatial data structure (SDS for sort), which is KD-tree in case of SaarCOR, to find the object the ray hits in the scene. Once you have found an object, you need to transfer the ray to object space (cheap vector*Matrix operation). Then you need to continue traversal of the ray in object space SDS to find a triangle in the object the ray hits (we don't want to test the ray against all 10000 or so triangles in an object, right?). If you are lucky, the ray hits a triangle in the object. If not, you have to go back up to the scene spatial data structure to continue the ray traversal. This is extremely huge amount of data processing to just find the intersection point alone (as a reference, for rasterizing HW this is orders of magnitudes faster for visible pixels).
Now you may think that let's not only trace 1 ray at the time, but bunch of them, like let say 2x2 like SaarCOR chip does IIRC. All goes fine and dandy, until the rays start to go further away from viewpoint. Rays start to hit different SDS scene nodes and require different object space SDS traversals. But this was the case where raytracing was supposed to be powerful, i.e. output sensitive!
And what's even worse, the story doesn't end here. Once you have actually found the intersection point with a ray and triangle after huge amount of number and data crunching, you need to figure out lighting for the point (lets forget recursive ray traversal of the ray for a sec). For sake of simplicity, lets also forget GI for a sec and just go with old fashioned dot3 diffuse lighting. You'll need to start the same ray traversal AGAIN for EACH light which might potentially lit the surface! If you go for Monte Carlo raytracing that's like 100 rays / pixel for decent results and for each of those 100 rays you might want to have secondary pounche of N rays for better radiosity, so you might be talking about multiplying the first-hit ray cost by let say 1000 which would be conservative estimate!
Talking about real world scenarios and "advanced effects", how does raytracing handle for instance per-pixel reflections and refractions? You say it's built-in feature in raytracing. Sure, but for what cost? The normal map texture totally splits up the 2x2 block (or what ever size you choose) of rays. This is bad even with current GPUs because it causes random accesses to the cubemap, but with raytracing you are talking about random accesses to the entire scene! I wonder why none of those SaarCOR chip screenshots actually have reflections nor refractions other than few flat reflections. How about other basic rendering features, like let say 1-bit alpha textures? 4 fetches to a texture (bilinear) and possibly back to scene traversal.
So what about if we would parallelize the raytracing HW like crazy and scale it to the level of GF6? Now that we know that raytracing is extremely incoherent way to render, think how big caches you would need per pipeline. As a reference I think it was mentioned somewhere (unofficially) that L2 cache on GF6 was something like 8kb, shared between all the 16 pipes. Even if you think that all GF6 resources would be available for raytracing speed-ups, even in the paper they say that the most prominent lack in the raytracing HW is programmable shaders, which would already take all the GF6 resources.
I'm no raytracing nor HW design guru, but it sadly seems that raytracing HW has no future, atleast not until we harness the quantum pocessing power ;) But maybe I'm missing something obvious here.
"Only two things are infinite, the universe and human stupidity, and I'm not sure about the former." - Albert Einstein