Normalizing the sum of two vectors without using a square root

10 replies to this topic

#1Sniffman

Member

• Members
• 48 posts

Posted 01 July 2007 - 07:50 PM

Okay, so I've been developing a nice little algorithm and have found out that the determination of a 3d vector's length is by far the most time-intensive part of it. This process is part of adding two non-unit vectors and normalizing the sum. To get the length of the new vector, I have to calculate this rather large square root of sqrt(x² + y² + z²), which eats up time.

So I was wondering if there is any way to eliminate the use of this immense square root, either by finding out the length without using one, or by normalizing without having to divide by the length. But so far, no luck in finding any solutions.

#2Nils Pipenbrinck

Senior Member

• Members
• 597 posts

Posted 01 July 2007 - 08:45 PM

Hi Sniffman,

You can get rid of the square root and divide. How? Calculate 1/sqrt directly and turn the divides into multiplies.

Google a bit about inverse square root and reciprocal square root. There are hundrets of pages about that stuff out there. For floating point, fixed point, approximations and exact results. The CPU you're working on might even have a dedicated instruction for that job.

I need to normalize vectors as well, but I'm using square roots and divides because it turned out that the approximation method and using reciprocals lost to much precision for my needs. That costs quite some cycles, but you'll always get what you pay for.

It depends on what you're doing, a quick approximation using a small table-lookup and a newton-raphson step for 15 bit precision might be sufficient for your needs. I need full 64 bits of precision, so even float or double wouldn't do the job for me.

What exactly are you doing, what's your target hardware and the number-format you're working with? If I know what you're after i might know the best solution for you.

Regards,
Nils
My music: http://myspace.com/planetarchh <-- my music

My stuff: torus.untergrund.net <-- some diy electronic stuff and more.

#3flux00

Valued Member

• Members
• 108 posts

Posted 02 July 2007 - 06:24 AM

Simply adding to Nils, I don't believe there's a way to simplify the problem even if you know the magnitude of the vectors you're adding. The only law you have regarding that is |A+B| <= |A| + |B|.
Right now I'm using a fast inverse square root outlined on wikipedia, which gets me the precision I need.

#4Nils Pipenbrinck

Senior Member

• Members
• 597 posts

Posted 02 July 2007 - 09:00 AM

And here is fixed point stuff for the inverse (ARM specific, but there is C-Code as well)

http://www.finesse.d...en/invsqrt.html
My music: http://myspace.com/planetarchh <-- my music

My stuff: torus.untergrund.net <-- some diy electronic stuff and more.

#5Sniffman

Member

• Members
• 48 posts

Posted 02 July 2007 - 12:07 PM

Oh dear, where to start...

I don't know the target hardware, since I'm writing a library for use in other projects. No way of knowing what they use it for and on what hardware.

What I'm actually doing is to place slices along a path. The z-axis of such a slice is to be matched with the average path at its point. Since the path is saved as a set of vectors from one slice to the next, I can just add the two vectors of the last and the current slice together and normalize the new vector, then I have the z-axis. This normalization is what takes a square root. I'm not actually out for 100 percent accuracy, though some amount of it would be nice, since the library is very likely to be used in computer graphics.

And I'm not talking about a few cycles here. The square root almost doubles the time my algorithm takes (since its also done mutliple times, once for each slice).

And to be honest, I'm not much of a hard-core mathematician, so I would be very pleased to recieve an explanation of this inverse square root thing.

Since you already brough up the Newton-Raphson approach (of which I've heard before, but of which I also know just about as much to know that it takes a guess and refines it iteratively by derivations), then it would be nice to know how you get a square root with such a Newton-Raphson approach. Maybe even this Quake3 approach described in the Wikipedia article.

#6Nils Pipenbrinck

Senior Member

• Members
• 597 posts

Posted 02 July 2007 - 12:58 PM

Hi Sniffman,

Stick with the inverse square root. You can do a Newton-Raphson of the square root as well, but ironically the inverse is much cheaper to compute.

About Newton-Raphson steps in general - consult Wikipedia or Google. They show how it works (it's really simple once you've got it)

Start wit the Quake3 method from the wikipedia page. The last line of it: x = x * (1.5f - xhalf * x * x); is the Newton-Raphson step. If the precision is not sufficient for you simply add another line (or even a third one) of it. Each of this line doubles the number of correct bits.

If you need the square root instead of the inverse a simple multiply will do the trick:

sqrt (x) = x * 1/sqrt(x) <-- cool, eh?

Nils
My music: http://myspace.com/planetarchh <-- my music

My stuff: torus.untergrund.net <-- some diy electronic stuff and more.

#7Sniffman

Member

• Members
• 48 posts

Posted 03 July 2007 - 06:24 PM

As part of my policy to understand the code I'm using, I stumbled across the fact that you use the roots of a certain function to determine the square root of a number. I've found two functions for this:

f(x) = x² - n,
f(x) = 1 - n / x²,
where in both, n is the number of which we want the square root.

Both I found at Wikipedia and right know, I don't get behind how these functions look the way they do? How do you get to them? And, from my naive eleventh-grader point-of-view, why do you have to use a Newton-Raphson iteration on them to get the root, instead of just determining it exactly? After all, you can equal them to zero and solve after x. (Probably very dumb question, and I admit that laziness plays a part in my not trying harder to find it out myself).

Ah wait, found the last answer myself: Because it would involve calculating a square root, sending the whole thing ad absurdum.

#8Reedbeta

DevMaster Staff

• 5308 posts
• LocationSanta Clara, CA

Posted 03 July 2007 - 06:37 PM

The functions you have noted are constructed so that they are equal to zero when x = sqrt(n). Newton-Raphson iteration is a technique for finding places where a function equals zero, so it naturally homes in on the square root.
reedbeta.com - developer blog, OpenGL demos, and other projects

#9Nils Pipenbrinck

Senior Member

• Members
• 597 posts

Posted 03 July 2007 - 10:53 PM

Sniffman said:

Ah wait, found the last answer myself: Because it would involve calculating a square root, sending the whole thing ad absurdum.

And remember the great property of the Newton Raphson iteration: It doubles the number of valid bits each time it's applied. If you do an eduated guess and know your guess it precise up to 8 bits or so (a table lookup, or a guess based on the exponent such as the Quake3 method) you get 16 bits after one iteration and 32 bits after the second.
My music: http://myspace.com/planetarchh <-- my music

My stuff: torus.untergrund.net <-- some diy electronic stuff and more.

#10Sniffman

Member

• Members
• 48 posts

Posted 05 July 2007 - 01:15 PM

I included the algorithm into my code and modified the rest of the little test program I was using to use the inverse square root.

I am a bit disappointed that it only made it about one fourth quicker, when leaving out the entire square root line almost halved the time needed. But I guess that comes from the three multiplications and two additions needed in the Pythagorean theorem, right? (a² + b² + c²) Is this all I should expect or is there any way to get more out of this?

#11Nick

Senior Member

• Members
• 1227 posts

Posted 11 July 2007 - 11:56 AM

Sniffman said:

I am a bit disappointed that it only made it about one fourth quicker, when leaving out the entire square root line almost halved the time needed. But I guess that comes from the three multiplications and two additions needed in the Pythagorean theorem, right? (a² + b² + c²) Is this all I should expect or is there any way to get more out of this?
On an x86 platform you can get a massive speedup by using SSE.

1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users