DX11 DirectCompute Buddhabrot & Nebulabrot

yakiimo02 101 Mar 30, 2010 at 14:00


Hi, I wrote a DX11 DirectCompute implementation of a Buddhabrot/Nebulabrot fractal renderer. The submitted picture is a Nebulabrot with max iteration values set to red:10,000 green:1,000 blue:100. I rendered the above image (original image was 1592×1028) at around 14 fps (140,000 samples a second) for around 35 minutes.

On my HD5750, with the above max iteration values, I actually get frame rates around 42 fps, but I didn’t change the default yielding behavior of DXUT when the application becomes inactive, so the above render was performed at 14 fps (doh!).

I wrote a CPU implmentation (non-simd & non-multithreaded) as well and my DirectCompute implementation is around 4-6 times faster than the CPU version. My CPU is an Intel Core2Quad Q6600 2.4ghz (not overclocked). I had earlier written a Mandelbrot DirectCompute implementation and that was 50+ times faster than CPU. Since the Buddhabrot is more complex than the Mandelbrot, I guess reduced performance is to be expected. I’m guessing the extensive scattered global memory writes of a Buddhabrot implementation may be slowing down the DirectCompute version.

For more details about my implementation (source & binary provided) see my blog post:


4 Replies

Please log in or register to post a reply.

Reedbeta 168 Mar 30, 2010 at 16:41

Very cool! I’ve been meaning to spend some time checking out this compute shader stuff…

poita 101 Mar 31, 2010 at 03:04

Very nice!

Have you tried writing it in a good ol’ pixel shader for comparison?

roel 101 Mar 31, 2010 at 08:50

Cool indeed! And subscribed to your blog :)

yakiimo02 101 Mar 31, 2010 at 14:05

Hi everyone. Thanks for the comments!

Yeah, compute shaders seem pretty cool. It seems not everything, but lots of cool stuff can be sped up using it. Personally want to try gi pathtracer, fluid dynamics and post effects stuff in the future (those seem to be what other ppl have had success with so far.)

The Buddhabrot algorithm requires a lot of random scattered writes. The above Nebulabrot has an iteration max of 10,000, so in the worst case, 9,999+999+99 scattered writes to the output uav buffer are gonna occur in a single compute shader thread (and there are 10,000 threads executing in parallel. The # 10,000 for total thread count is not related to the iteration max. Just coincidence that I have 100 thread groups each with 100 threads = 10,000 ttl.) I think it’ll be hard and unnatural to implement it in the pixel shader.

Thanks for subscribing! :)