Hi, I wrote a DX11 DirectCompute implementation of a Buddhabrot/Nebulabrot fractal renderer. The submitted picture is a Nebulabrot with max iteration values set to red:10,000 green:1,000 blue:100. I rendered the above image (original image was 1592×1028) at around 14 fps (140,000 samples a second) for around 35 minutes.
On my HD5750, with the above max iteration values, I actually get frame rates around 42 fps, but I didn't change the default yielding behavior of DXUT when the application becomes inactive, so the above render was performed at 14 fps (doh!).
I wrote a CPU implmentation (non-simd & non-multithreaded) as well and my DirectCompute implementation is around 4-6 times faster than the CPU version. My CPU is an Intel Core2Quad Q6600 2.4ghz (not overclocked). I had earlier written a Mandelbrot DirectCompute implementation and that was 50+ times faster than CPU. Since the Buddhabrot is more complex than the Mandelbrot, I guess reduced performance is to be expected. I'm guessing the extensive scattered global memory writes of a Buddhabrot implementation may be slowing down the DirectCompute version.
For more details about my implementation (source & binary provided) see my blog post: