0
101 Jul 01, 2007 at 19:50

Okay, so I’ve been developing a nice little algorithm and have found out that the determination of a 3d vector’s length is by far the most time-intensive part of it. This process is part of adding two non-unit vectors and normalizing the sum. To get the length of the new vector, I have to calculate this rather large square root of sqrt(x² + y² + z²), which eats up time.

So I was wondering if there is any way to eliminate the use of this immense square root, either by finding out the length without using one, or by normalizing without having to divide by the length. But so far, no luck in finding any solutions.

#### 10 Replies

0
101 Jul 01, 2007 at 20:45

Hi Sniffman,

You can get rid of the square root and divide. How? Calculate 1/sqrt directly and turn the divides into multiplies.

Google a bit about inverse square root and reciprocal square root. There are hundrets of pages about that stuff out there. For floating point, fixed point, approximations and exact results. The CPU you’re working on might even have a dedicated instruction for that job.

I need to normalize vectors as well, but I’m using square roots and divides because it turned out that the approximation method and using reciprocals lost to much precision for my needs. That costs quite some cycles, but you’ll always get what you pay for.

It depends on what you’re doing, a quick approximation using a small table-lookup and a newton-raphson step for 15 bit precision might be sufficient for your needs. I need full 64 bits of precision, so even float or double wouldn’t do the job for me.

What exactly are you doing, what’s your target hardware and the number-format you’re working with? If I know what you’re after i might know the best solution for you.

Regards,
Nils

0
101 Jul 02, 2007 at 06:24

Simply adding to Nils, I don’t believe there’s a way to simplify the problem even if you know the magnitude of the vectors you’re adding. The only law you have regarding that is |A+B| <= |A| + |B|.
Right now I’m using a fast inverse square root outlined on wikipedia, which gets me the precision I need.

0
101 Jul 02, 2007 at 09:00

And here is fixed point stuff for the inverse (ARM specific, but there is C-Code as well)

0
101 Jul 02, 2007 at 12:07

Oh dear, where to start…

I don’t know the target hardware, since I’m writing a library for use in other projects. No way of knowing what they use it for and on what hardware.

What I’m actually doing is to place slices along a path. The z-axis of such a slice is to be matched with the average path at its point. Since the path is saved as a set of vectors from one slice to the next, I can just add the two vectors of the last and the current slice together and normalize the new vector, then I have the z-axis. This normalization is what takes a square root. I’m not actually out for 100 percent accuracy, though some amount of it would be nice, since the library is very likely to be used in computer graphics.

And I’m not talking about a few cycles here. The square root almost doubles the time my algorithm takes (since its also done mutliple times, once for each slice).

And to be honest, I’m not much of a hard-core mathematician, so I would be very pleased to recieve an explanation of this inverse square root thing.

Since you already brough up the Newton-Raphson approach (of which I’ve heard before, but of which I also know just about as much to know that it takes a guess and refines it iteratively by derivations), then it would be nice to know how you get a square root with such a Newton-Raphson approach. Maybe even this Quake3 approach described in the Wikipedia article.

0
101 Jul 02, 2007 at 12:58

Hi Sniffman,

Stick with the inverse square root. You can do a Newton-Raphson of the square root as well, but ironically the inverse is much cheaper to compute.

About Newton-Raphson steps in general - consult Wikipedia or Google. They show how it works (it’s really simple once you’ve got it)

Start wit the Quake3 method from the wikipedia page. The last line of it: x = x * (1.5f - xhalf * x * x); is the Newton-Raphson step. If the precision is not sufficient for you simply add another line (or even a third one) of it. Each of this line doubles the number of correct bits.

If you need the square root instead of the inverse a simple multiply will do the trick:

sqrt (x) = x * 1/sqrt(x) <– cool, eh?

Nils

0
101 Jul 03, 2007 at 18:24

As part of my policy to understand the code I’m using, I stumbled across the fact that you use the roots of a certain function to determine the square root of a number. I’ve found two functions for this:

f(x) = x² - n,
f(x) = 1 - n / x²,
where in both, n is the number of which we want the square root.

Both I found at Wikipedia and right know, I don’t get behind how these functions look the way they do? How do you get to them? And, from my naive eleventh-grader point-of-view, why do you have to use a Newton-Raphson iteration on them to get the root, instead of just determining it exactly? After all, you can equal them to zero and solve after x. (Probably very dumb question, and I admit that laziness plays a part in my not trying harder to find it out myself).

Ah wait, found the last answer myself: Because it would involve calculating a square root, sending the whole thing ad absurdum.

0
167 Jul 03, 2007 at 18:37

The functions you have noted are constructed so that they are equal to zero when x = sqrt(n). Newton-Raphson iteration is a technique for finding places where a function equals zero, so it naturally homes in on the square root.

0
101 Jul 03, 2007 at 22:53

@Sniffman

Ah wait, found the last answer myself: Because it would involve calculating a square root, sending the whole thing ad absurdum.

And remember the great property of the Newton Raphson iteration: It doubles the number of valid bits each time it’s applied. If you do an eduated guess and know your guess it precise up to 8 bits or so (a table lookup, or a guess based on the exponent such as the Quake3 method) you get 16 bits after one iteration and 32 bits after the second.

0
101 Jul 05, 2007 at 13:15

I included the algorithm into my code and modified the rest of the little test program I was using to use the inverse square root.

I am a bit disappointed that it only made it about one fourth quicker, when leaving out the entire square root line almost halved the time needed. But I guess that comes from the three multiplications and two additions needed in the Pythagorean theorem, right? (a² + b² + c²) Is this all I should expect or is there any way to get more out of this?

0
102 Jul 11, 2007 at 11:56

@Sniffman

I am a bit disappointed that it only made it about one fourth quicker, when leaving out the entire square root line almost halved the time needed. But I guess that comes from the three multiplications and two additions needed in the Pythagorean theorem, right? (a² + b² + c²) Is this all I should expect or is there any way to get more out of this?

On an x86 platform you can get a massive speedup by using SSE.