Fun with OpenCL
Posted 11 April 2011 - 12:38 PM
I played around with CUDA a while back, but I wasn't all that hyped on it. The syntax was shockingly similar to programing with shaders, so I just dumped it figuring it was a tool for non graphics programmers to make use of the GPU. I recently decided to give GPGPU another try with OpenCL. I couldn't find any decent demos out there, so I was losing interest pretty fast. I decided to go ahead and implement my own demo to see what all the fuss was about.
The first problem was learning the API. Its not like other popular APIs where you just type a single word in Google and you are bathed in helpful websites. Being a relatively new and underrated tech, you actually have to do your homework. Luckily I found a site that gives a decent intro to the OpenCL API. It's not perfect, it's missing some important cleanup routines and doesn't cover performance issues. I had to read over the OpenCL spec to learn about those (note: not fun).
So now that I had a wrapper for OpenCL, I went ahead with my first demo. I was losing patience fast, so I built something I knew was computationally insane and also easy to implement. Ahem... the Universe :)
At 32768 stars, 193 GFLOPS, the demo runs at about 10 fps on my ATI 5750 HD. Simulating the environment with all 6 cores on my AMD Thuban, it takes 63 seconds to render a single frame.
Overall I'm pleased with the experience. Runtime kernel compilation makes developing and debugging kernels a pleasure. Once you get know the API, the rest pretty much follows through. From what I know, PhysX is a good example of what some games out there today take advantage of, but I look forward to seeing some neat uses such as real time ocean (water) dynamics and real time smoke effects (see Blender 3D smoke emitter). I would not be surprised if you could make a great game on any one of those two topics alone.
Posted 11 April 2011 - 01:37 PM
gpgpu looks real good to me, and opencl must make it better, im still stuck with shaders for the moment though, like you said its tougher to find educational sites for CUDA and openCL than with shaders, so it makes it a little harder to learn.
nice images. *thumbs up*
My weapon for gpgpu is an nVidia GTX 480, and it hell kicks my cpus bum. Makes sense to take advantage of it.
Posted 11 April 2011 - 04:14 PM
Posted 11 April 2011 - 04:20 PM
Posted 11 April 2011 - 05:04 PM
fireside, at present there's no downloads. I rushed to get it finished so I could go to sleep :) If I have some time I'll add a UI and some fun features to poke around with. It's also built for the desktop right now, although when WebCL gets released I'll be all over that.
mcneilm, at 193 GFLOPS you better believe there be physics going on :) The primary formula is Newton's law of universal gravitation. Each star is affected by the sum of gravitational forces by all other stars in the system. With 32768 stars, that's 32768 * 32767 = 1x10^9 comparisons.
Posted 11 April 2011 - 05:55 PM
Posted 12 April 2011 - 05:23 AM
Is the OpenCL emulator really this slow, or are you using another method to run it on the CPU?
Posted 12 April 2011 - 08:02 AM
Posted 12 April 2011 - 12:08 PM
Reed, possible given sufficient time :)
Posted 12 April 2011 - 04:05 PM
Posted 12 April 2011 - 10:48 PM
tobeythorn, I can't release source and I don't have enough time to finish an article or tutorial on the subject anytime soon. I do plan to write a couple tutorials on varying subjects once my WebGL engine matures though. In the meantime, the link I posted above should get you jump started with the OpenCL API. It shows you what you need to do from start to finish. If you have questions along the way, I could try to answer them for you. Having a background in parallel programming is helpful.
Posted 13 April 2011 - 09:38 AM
But the biggest speedup of all would come from algorithmic optimization. Note that you can partition your stars. Every group of stars is observed (in a gravitational sense) by other stars as a single heavy mass. So you can compute the center of gravity and total mass of each partition of stars, and then for each star individually add up the force from each other partition and each of the other starts in the partition it belongs to. Using an octree probably makes sense.
Note that a GPU is much less suited for running 'intelligent' algorithms, so I wouldn't be surprised if the CPU can actually outrun the GPU...
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users