I need a fast way to compute sqrt(x*x + y*y + z*z) that avoids intermediate overflow and underflow.
I know that most runtime libraries (C,C++,Java) provide a 2d hypot() function that does this for 2 variables, _AND_ I realize that I can achieve my result using 2 calls to it as follows:
hypot( hypot(x, y), z)
but that seems uneccessarily slow. Is there a faster way that still avoids intermediate overflow and underflow? I am _NOT_ looking for a fast sqrt() function here. I am well aware of the _NUMEROUS_ threads on fast sqrt() and fast reciprocal sqrt(). What I am looking for is a way to scale the 3 inputs before calling <any fast sqrt method here>, and then to un-scale afterwards to obtain the accurate result.
So I started looking at how to implement a 2d hypot() function:
Define hypot(x, y) as sqrt(x*x + y*y) but that avoids intermediate overflow and underflow:
Using the fact that sqrt(x^2+y^2) == k*sqrt((x/k)^2+(y/k)^2) for k>0
you can implement a fast hypot() by choosing k to be a power of 2 close to sqrt(x*y). Since you choose k to be a power of 2, computing k, x/k, and y/k can be done using bit twiddling and knowledge of IEEE floating point representation. I can go into that further, but my problem is how to generalize this for 3 components x,y,z without simply nesting the call as hypot(hypot(x,y),z). Any suggestions?
hypot()-like function but for 3 components
Started by kwhatmough, Aug 01 2008 09:53 PM
8 replies to this topic
#1
Posted 01 August 2008 - 09:53 PM
#2
Posted 01 August 2008 - 09:58 PM
Since sqrt(x*y) is the geometric mean of x and y, I'm guessing for 3D you want the geometric mean of x, y, and z, which would be the cube root of x*y*z. But I suspect that's not as easy to estimate by bit-twiddling as sqrt(x*y) is...
reedbeta.com - developer blog, OpenGL demos, and other projects
#3
Posted 01 August 2008 - 10:07 PM
Thanks for the reply; I see that my reference to hypot() causes confusion but actually all I want is the magnitude of a 3d vector that avoids overflow and underflow that one would normally get in the interim calculation of
sqrt(x*x + y*y + z*z).
{Square, not cube root in my case}.
sqrt(x*x + y*y + z*z).
{Square, not cube root in my case}.
#4
Posted 01 August 2008 - 10:08 PM
Sorry now I understand. You're saying cubeRoot(xyz) for the moderating factor. I see.
#5
Posted 02 August 2008 - 09:05 AM
Hi.
Take a look at this paper:
http://torus.untergr...gorean_sums.pdf
It shows how to calculate the vector length in a non obvious but mathematical much more stable way. It's not *that* fast though.
Don't bookmark the pdf please. I'll take it offline in a couple of days..
Cheers,
Nils
Take a look at this paper:
http://torus.untergr...gorean_sums.pdf
It shows how to calculate the vector length in a non obvious but mathematical much more stable way. It's not *that* fast though.
Don't bookmark the pdf please. I'll take it offline in a couple of days..
Cheers,
Nils
My music: http://myspace.com/planetarchh <-- my music
My stuff: torus.untergrund.net <-- some diy electronic stuff and more.
My stuff: torus.untergrund.net <-- some diy electronic stuff and more.
#6
Posted 08 August 2008 - 07:55 PM
Can you not extend your original idea?
sqrt(x^2+y^2+z^2) == k*sqrt((x/k)^2+(y/k)^2+(z/k)^2) for k>0
Assuming you want the geometric mean (as suggested by Reedbeta) you want k to approximate (xyz)^(1/3) as a power of 2 so
After that, x/k, y/k and z/k can be done again with bit shifts.
sqrt(x^2+y^2+z^2) == k*sqrt((x/k)^2+(y/k)^2+(z/k)^2) for k>0
Assuming you want the geometric mean (as suggested by Reedbeta) you want k to approximate (xyz)^(1/3) as a power of 2 so
2^n = (xyz)^(1/3) 2^3n = xyz n = (ln xyz) / 3(ln2)Urgh maybe its not the way to go, but only one ln needs to be calculated. I don't know of any useful approximations to ln x, but it would only have to be loosely accurate
After that, x/k, y/k and z/k can be done again with bit shifts.
#7
Posted 09 August 2008 - 12:31 AM
(Thanks to Nils for the fascinating paper).
Yes I believe that is the idea. If you assume IEEE format for the floats you can do some bit-fiddling to approximate a log_2 function by extracting the exponent bits.
Something like this might work for positive x:
floor(log_2(float x)) is something like: (floatToIntBits(x) >> 23) - 127
where in C you might implement floatToIntBits(x) as: *(int*)&x
or (preferably?) use an anonymous union:
union {
int asInt;
float asFloat;
} u;
u.asFloat = x;
return (u.asInt >> 23) - 127;
Yes I believe that is the idea. If you assume IEEE format for the floats you can do some bit-fiddling to approximate a log_2 function by extracting the exponent bits.
Something like this might work for positive x:
floor(log_2(float x)) is something like: (floatToIntBits(x) >> 23) - 127
where in C you might implement floatToIntBits(x) as: *(int*)&x
or (preferably?) use an anonymous union:
union {
int asInt;
float asFloat;
} u;
u.asFloat = x;
return (u.asInt >> 23) - 127;
#8
Posted 09 August 2008 - 01:52 AM
So I think you want this. The 2 float/integer conversions are probably the slowest bits.
float get_k(const float &x, const float &y, const float &z)
{
union
{
float mult;
int bits;
};
mult = x*y*z;
const int exp = (((*reinterpret_cast<int*>(&mult)) >> 23) & 255) - 128;
bits = (bits & ~(255 << 23)) + 127 << 23;
mult = (((-1.0f/3.0f) * mult + 2.0f) * mult - 2.0f/3.0f + static_cast<float>(exp))* 0.69314718f; // mult = ln(xyz)
mult *= 0.48089834; // mult = (ln xyz) / 3(ln2) = n
// k = 2^n = 2^mult
return (1<<static_cast<int>(mult));
}
#9
Posted 09 August 2008 - 06:02 PM
for(long i = 0; i < 500000; i++){
ax = (float)rand() + 1 / 100.0f;
ay = (float)rand() + 1 / 100.0f;
az = (float)rand() + 1 / 100.0f;
bx = (float)rand() + 1 / 100.0f;
by = (float)rand() + 1 / 100.0f;
bz = (float)rand() + 1 / 100.0f;
dx = ax - bx;
dy = ay - by;
dz = az - bz;
dist = sqrt(dx + dy + dz);
}
takes 0.07 seconds on a core2 duo 2.14ghz - of course, that's all I was doing....
ax = (float)rand() + 1 / 100.0f;
ay = (float)rand() + 1 / 100.0f;
az = (float)rand() + 1 / 100.0f;
bx = (float)rand() + 1 / 100.0f;
by = (float)rand() + 1 / 100.0f;
bz = (float)rand() + 1 / 100.0f;
dx = ax - bx;
dy = ay - by;
dz = az - bz;
dist = sqrt(dx + dy + dz);
}
takes 0.07 seconds on a core2 duo 2.14ghz - of course, that's all I was doing....
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users











