This paper details the architecture of a multithreaded software rasterizer designed specifically for current and future generation multi-core processors. The pipeline rasterizes and shades four fragments in parallel using SSE instructions, and utilizes as an extensive SIMD optimized math library for all transformations. By strategically utilizing the vector units widely available in modern desktop processors as well as multiple threads, performance is drastically higher than a fully serial implementation. Additionally, rendering order is preserved and a rich set of features is supported, including the ability to write custom vertex and fragment shaders in C++, set different render targets, and utilize z-buffering, backface culling and polygon clipping, as well as perspective correct texture mapping.
Here is the link to the final paper:
When I started the project, I knew very little about concurrency programming with threading and SSE. This is also my first software rasterizer, my only previous experience was reading Lamothe's second Tricks book. The framework is built off of SDL for frame buffer, input, threading, and bitmap loading management. The shaders are written in C++ and use classes which encapsulate the SSE intrinsic functions. The demo features per pixel lighting, simple shadow mapping, and precomputed ambient occlusion. I learned a lot and I'm excited with the results. Here are a couple screenshots of the the final product:
I'd like to thank you guys for helping me when I had questions. Several of you are quite brilliant and I learned a lot from you. So thanks! I'm surprised at the lack of resources about designing a good concurrent software rendering pipeline. I pretty much at to piece things together myself from various papers, articles, and forum posts (hence the research aspect), which ended up being a lot of fun! I doubt my architecture is ideal, but I am pleased with the performance.
If I did it again I would probably have incorporated AVX into the pipeline. The problem was that I didn't own a Sandy Bridge processor until just a couple months ago (new laptop)! I am curious what kind of performance benefit I could get from moving in that direction. To be honest, I chose this project simply because I thought it would be fun. But after finding out about Larrabee, SwiftShader, and Microsoft WARP, it seems like software rendering could be coming back in the near future. Intel seems especially interested in it.
If you have any questions, feel free to ask!