Parallelizing N-body Force-based Physics Simulations for Point-like Objects

A77e71b962cd6c7c3b885f0488452f1f
0
tobeythorn 101 Mar 03, 2011 at 20:26

Hi All,
I woke up this morning, had a moment of inspiration, and figured out a cool way to partially parallelize force based physics simulations. I’ve been getting into writing articles/tutorials, so here you go:

http://tobeythorn.com/Parallelizing-N-body-Force-based-Physics-Simulations-for-Point-like-Objects-R2.pdf

I believe that this method is a bit different than the one that appears in GPU Gems 3.

Peace,
Tobey

PS: My computer is old and not parallel and thus the absence of a working code sample.

Update: fixed a few minor mistakes in the article (and reduced the number of interactions that need to be calculated even further).

5 Replies

Please log in or register to post a reply.

A77e71b962cd6c7c3b885f0488452f1f
0
tobeythorn 101 Mar 06, 2011 at 15:06

Made a significant update to my n-body paper. It now includes better/bigger/more diagrams and I hope, a clearer explanation of the technique.

http://tobeythorn.com/Parallelizing-N-body-Force-based-Physics-Simulations-for-Point-like-Objects-R2.pdf

Edit: To make a correction pointed out by EricTheRed, it does not run in O(N) time (for asymptotically big N), but rather could theoretically run in N time so long as there are at least N/2 cores.

2d727e94e347cd5d1ef860a48d6b0bd2
0
EricTheRed 101 Mar 07, 2011 at 03:42

@tobeythorn

… it theoretically runs in O(N) time on highly parallel hardware.

The number of cores does not affect the complexity so does it also run in O(n) time in a single thread?

A77e71b962cd6c7c3b885f0488452f1f
0
tobeythorn 101 Mar 07, 2011 at 14:42

Hi Eric,
Maybe I do not correctly understand the exact meaning of complexity denoted by O-notation. What I’ve done is taken the N-Body force-interaction problem, which requires at least N(N-1)/2 forces to be calculated (typically in serial), and rearranged the them into groups such that it should be possible to calculate the forces N/2 at a time without any fighting over shared memory (supposing you have N/2 cores). Ideally, the time for computations should therefor be linear in N.

Yesterday I spent a bit of time reading about GPGPU and was disappointed to find that apparently GPUs are quite limited in the kinds of problems they are suited for (eg, one must avoid shared global memory as much as possible and one must break work into certain sized chunks depending on hardware, and transferring data to and from GPU is slow as well). Unfortunately, it seems that the parallelization trick I came up with may only be practical on systems with many CPUs (ie, supercomputer) or on specialized FPGA’s.

2d727e94e347cd5d1ef860a48d6b0bd2
0
EricTheRed 101 Mar 07, 2011 at 16:52

Big-O notation is used to describe how the complexity increases asymptotically (i.e. as n gets very big). For your algorithm, it may run in O(n) if the number of cores is greater than n/2, but that is not the asymptotic behavior. That is one of the shortcomings of using big-O. Your algorithm is still O(n\^2), but the constant factor is divided by the number of cores. This constant factor is an important part of the practical performance, but unfortunately cannot be expressed in big-O. Saying that the algorithm is highly parallelizable is accurate and will get your point across.

A77e71b962cd6c7c3b885f0488452f1f
0
tobeythorn 101 Mar 07, 2011 at 17:04

Eric,thanks for the explanation and correction. I understand the difference now and will edit my claims accordingly.