Random probability

88dc730f0f71e55be39de0ad103bd9ff
0
Alienizer 109 Mar 20, 2012 at 14:22

Here is my problem…

I have X number of triangles, each with a different area (different size). I would like to randomly pick one from the list (that part is easy), BUT, those with a larger area must have a greater chance of being picked, and those with mallest area, a lesser chance of getting hit.

Any idea on how to do this? I googled around by found nothing!
Thanks

17 Replies

Please log in or register to post a reply.

D619d95cddb1edb227f51ef539d15cdc
0
Nautilus 103 Mar 20, 2012 at 15:01

Here’s a 5 minutes cheap solution until you come up with a more elegant one by yourself:

1) Precompute the area of each triangle.

2) Estabilish some arbitrary chance-factor to multiply by the computed area of each triangle (ideally this must give you an integer number that increases as the area of the triangle increases). You can also just reuse the very Triangle’s area, if that ends up working for you (below you’ll see why).

3) Assign an arbitrary numeric integer Id to each triangle (no two triangles must have the same Id).

4) Now prepare an array of Id’s. In it you write a copy of each triangle’s Id, based on the magnitude of the chance factor for the triangle in exam. For example, a triangle with Area = 10 might have 10 instances of its Id in the array, whereas a triangle with Area = 3 might have only 3 instances of its Id.

5) Randomly pick any of the Ids inside the array.

Simplicity is bliss.
Bigger triangles will naturally have more Ids in the array, and thus more chances to be picked. Conversely, smaller triangles will have less Ids in the array, and so less chances to be picked. I ignore how many triangles we’re talking about here (you’ve been vague). If your list is reasonably big, you ought to find a better procedure than this. You’ll probably want to, anyway. This is only good for a draft - or for a scenario wherein performance is irrelevant..

88dc730f0f71e55be39de0ad103bd9ff
0
Alienizer 109 Mar 20, 2012 at 15:17

That make sense thank you. Now that I’ve read your post, something comes to mind. Can I use the “Russian Roulette” based on the area? Say, I run a loop and if frand(0..1) < (TriangleArea/MaxArea) then pick that triangle else itterate?

D619d95cddb1edb227f51ef539d15cdc
0
Nautilus 103 Mar 20, 2012 at 15:41

@Alienizer

if frand(0..1) < (TriangleArea/MaxArea) then pick that triangle else itterate?

By doing so you process one triangle at a time. In that way the triangles that are processed first, do acquire more chances to be picked over the triangles with the same area (and that should have the same chances as the one being processed) further down in the ‘list’.
Instead you want to pick ‘one’ out of the ‘many’. To preserve the correct chance distribution you have to treat the many triangles as a whole.

88dc730f0f71e55be39de0ad103bd9ff
0
Alienizer 109 Mar 20, 2012 at 16:00

So then I can randomly pick one triangle, and if it pass the “Russian Roulette” test, keep it, otherwise repeat the process?

820ce9018b365a6aeba6e23847f17eda
0
geon 101 Mar 20, 2012 at 16:23

Nautilus, with your method you get a very limited probability “resolution”. A simpler way exists;

  1. Precompute all triangle areas.
  2. Let X = sum of all triangle areas
  3. Loop over the list of triangles
  4. For each one, the chance of picking it is the area/X
  5. Subtract the area from X.

As you see, the chance of picking any one triangle is always proportional to the area divided by the ones left in the list. At the last triangle the probability would be 1. (Beware of float error build-up.)

Please tell me if I’ve screwed it up too badly.

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Mar 20, 2012 at 16:53

Even better way:

  1. Precompute all triangle areas and the sum of areas
  2. Precompute a list containing, for each triangle, the sum of all areas up to this triangle, divided by the total area of all triangles. So entry i in the list contains the sum of all triangles from 0 to i (inclusive). The list is monotonically increasing and the last entry is
  3. To pick a triangle, get a random value in [0, 1) and do a binary search on the above array, as if you were going to insert the random value in sorted order. The index you get back is the triangle to pick.
88dc730f0f71e55be39de0ad103bd9ff
0
Alienizer 109 Mar 20, 2012 at 20:16

@Reedbeta

Even better way: 1. Precompute all triangle areas and the sum of areas
2. Precompute a list containing, for each triangle, the sum of all areas up to this triangle, divided by the total area of all triangles. So entry i in the list contains the sum of all triangles from 0 to i (inclusive). The list is monotonically increasing and the last entry is 1.
3. To pick a triangle, get a random value in [0, 1) and do a binary search on the above array, as if you were going to insert the random value in sorted order. The index you get back is the triangle to pick.

hmmm, this must be a very clever way Reedbeta, because I can’t seem to understand the logic of it. Do you mind to elaborate on this please?

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Mar 20, 2012 at 20:37

It works by partitioning the [0, 1] interval into subintervals, each of which corresponds to a single triangle. The subintervals are sized proportional to the triangle’s area. To pick a triangle, you pick a random number on [0, 1) and see in which interval it falls, hence you’re picking a triangle with probability proportional to area.

To represent the intervals you only need a list of the points where you cross from one interval to the next, since they are a partition (they cover the whole unit interval). This is the array constructed in step 2. To see in which interval any random value X falls, you search for the smallest element in the array that is greater than X. This can be done with binary search since the array is sorted in ascending order. If you pretend you’re going to insert X into the array, it’s equivalent to finding the index at which you’d put it to maintain the sort.

88dc730f0f71e55be39de0ad103bd9ff
0
Alienizer 109 Mar 20, 2012 at 21:11

So the list is the same size as the number of triangles? I’m still lost as to how this works Reedbeta, I’m pulling my hair on this one. oh boy you got me thinking hard LOL

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Mar 20, 2012 at 21:53

Yeah, the list is the same size as the number of triangles.

If the triangles’ areas are 1, 4, 2, and 3, then the list would be 1, 5, 7, 10 - each entry is the sum of all the preceding triangles’ areas. Then divide by the total area (10) so it would be 0.1, 0.5, 0.7, 1.0.

Then when you pick a random value between 0 and 1, 10% of the time it will be < 0.1, 40% of the time it will be between 0.1 and 0.5, 20% of the time between 0.5 and 0.7, and 30% of the time between 0.7 and 1.0. The probabilities are proportional to the areas.

For example, if the random value you pick is 0.567, that falls in the third interval (between 0.5 and 0.7), so you’d pick the third triangle. To find that, you’d do a binary search for 0.567 in the list 0.1, 0.5, 0.7, 1.0, and it should return index 2, which is where you’d insert if you were going to insert 0.567 into the list.

88dc730f0f71e55be39de0ad103bd9ff
0
Alienizer 109 Mar 20, 2012 at 22:21

oh I get it now! But should I sort the list to 1,2,3,4 first as oppose to 1,4,2,3?

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Mar 20, 2012 at 22:35

No, the triangles don’t need to be sorted by area to begin with. The list you build in step 2 will always come out sorted, no matter what, because you’re adding items together to build it. Sorting the triangles from smaller to larger area neither hurts nor helps.

88dc730f0f71e55be39de0ad103bd9ff
0
Alienizer 109 Mar 21, 2012 at 00:03

I got it, make sense. Thanks again Reedbeta, I really appreciate your help and time. Thanks to the others who helped as well.

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Apr 04, 2012 at 18:54

I just saw this article on AltDevBlogADay that describes a potentially even faster method for doing this sort of thing, although it requires a more complicated preprocess and more storage.

88dc730f0f71e55be39de0ad103bd9ff
0
Alienizer 109 Apr 05, 2012 at 02:39

Interesting! it’s kind of neet, and almost like what Nautilus suggested.

6aa952514ff4e5439df1e9e6d337b864
0
roel 101 Apr 05, 2012 at 08:34

@Reedbeta

I just saw this article on AltDevBlogADay that describes a potentially even faster method for doing this sort of thing, although it requires a more complicated preprocess and more storage.

I visited this topic for exactly the same reason :)

9479f7677e636e5177fbf4cfe3c3c4ec
0
gooncorp 102 Apr 13, 2012 at 16:55

stochastic expressions are a big part of my technique