Since sqrt(x*y) is the geometric mean of x and y, I’m guessing for 3D you want the geometric mean of x, y, and z, which would be the cube root of x*y*z. But I suspect that’s not as easy to estimate by bit-twiddling as sqrt(x*y) is…

Please log in or register to post a reply.

Thanks for the reply; I see that my reference to hypot() causes confusion but actually all I want is the magnitude of a 3d vector that avoids overflow and underflow that one would normally get in the interim calculation of

sqrt(x*x + y*y + z*z).

{Square, not cube root in my case}.

Hi.

Take a look at this paper:

http://torus.untergrund.net/misc/distance_by_pythagorean_sums.pdf

It shows how to calculate the vector length in a non obvious but mathematical much more stable way. It’s not *that* fast though.

Don’t bookmark the pdf please. I’ll take it offline in a couple of days..

Cheers,

Nils

Can you not extend your original idea?

sqrt(x\^2+y\^2+z\^2) == k*sqrt((x/k)\^2+(y/k)\^2+(z/k)\^2) for k>0

Assuming you want the geometric mean (as suggested by Reedbeta) you
want k to approximate (xyz)\^(1/3) as a power of 2 so

```
2^n = (xyz)^(1/3)
2^3n = xyz
n = (ln xyz) / 3(ln2)
```

Urgh maybe its not the way to go, but only one ln needs to be
calculated. I don’t know of any useful approximations to ln x, but it
would only have to be loosely accurate

After that, x/k, y/k and z/k can be done again with bit shifts.

(Thanks to Nils for the fascinating paper).

Yes I believe that is the idea. If you assume IEEE format for the floats you can do some bit-fiddling to approximate a log_2 function by extracting the exponent bits.

Something like this might work for positive x:

floor(log_2(float x)) is something like: (floatToIntBits(x) >> 23) - 127

where in C you might implement floatToIntBits(x) as: *(int*)&x

or (preferably?) use an anonymous union:

union {

int asInt;

float asFloat;

} u;

u.asFloat = x;

return (u.asInt >> 23) - 127;

So I think you want this. The 2 float/integer conversions are probably the slowest bits.

```
float get_k(const float &x, const float &y, const float &z)
{
union
{
float mult;
int bits;
};
mult = x*y*z;
const int exp = (((*reinterpret_cast<int*>(&mult)) >> 23) & 255) - 128;
bits = (bits & ~(255 << 23)) + 127 << 23;
mult = (((-1.0f/3.0f) * mult + 2.0f) * mult - 2.0f/3.0f + static_cast<float>(exp))* 0.69314718f; // mult = ln(xyz)
mult *= 0.48089834; // mult = (ln xyz) / 3(ln2) = n
// k = 2^n = 2^mult
return (1<<static_cast<int>(mult));
}
```

for(long i = 0; i < 500000; i++){

ax = (float)rand() + 1 / 100.0f;

ay = (float)rand() + 1 / 100.0f;

az = (float)rand() + 1 / 100.0f;

bx = (float)rand() + 1 / 100.0f;

by = (float)rand() + 1 / 100.0f;

bz = (float)rand() + 1 / 100.0f;

dx = ax - bx;

dy = ay - by;

dz = az - bz;

dist = sqrt(dx + dy + dz);

}

takes 0.07 seconds on a core2 duo 2.14ghz - of course, that’s all I was doing….

- Upcoming Multiplatform Game Program...
- Our first game - looking for feedbacks
- Network Emulation Tool
- Trouble with accessing GLSL array
- Fiction
- Game Programming Patterns: Bytecode
- Interactive WebGL Water Demo
- Skeletal Animation Tutorial with GP...
- Unreal Engine 4
- Microsoft xbox one selling poorly

I need a fast way to compute sqrt(x*x + y*y + z*z) that avoids intermediate overflow and underflow.

I know that most runtime libraries (C,C++,Java) provide a 2d hypot() function that does this for 2 variables, _AND_ I realize that I can achieve my result using 2 calls to it as follows:

hypot( hypot(x, y), z)

but that seems uneccessarily slow. Is there a faster way that still avoids intermediate overflow and underflow? I am _NOT_ looking for a fast sqrt() function here. I am well aware of the _NUMEROUS_ threads on fast sqrt() and fast reciprocal sqrt(). What I am looking for is a way to scale the 3 inputs before calling <any fast sqrt method here>, and then to un-scale afterwards to obtain the accurate result.

So I started looking at how to implement a 2d hypot() function:

Define hypot(x, y) as sqrt(x*x + y*y) but that avoids intermediate overflow and underflow:

Using the fact that sqrt(x\^2+y\^2) == k*sqrt((x/k)\^2+(y/k)\^2) for k>0

you can implement a fast hypot() by choosing k to be a power of 2 close to sqrt(x*y). Since you choose k to be a power of 2, computing k, x/k, and y/k can be done using bit twiddling and knowledge of IEEE floating point representation. I can go into that further, but my problem is how to generalize this for 3 components x,y,z without simply nesting the call as hypot(hypot(x,y),z). Any suggestions?