# Regression curve through a set of points

17 replies to this topic

### #1Sniffman

Member

• Members
• 48 posts

Posted 30 January 2009 - 02:48 PM

As part of a larger algorithm, I need to find a nonlinear regression curve through a number of points in 3d space. I then have to position regularly spaced perpendicular planes along this line. As of now, I pretty much have no idea how to do this. I tried out just interpolating the normal of a plane between the normals of the previous and following plane, but that failed most impressively.

So, I am since reading Wiki articles about regression stuff, but came to no solution yet. Any pointing in the right direction would be greatly appreciated.

DevMaster Staff

• Moderators
• 1716 posts

Posted 30 January 2009 - 04:03 PM

Which language? Have you looked into math libraries in your language of choice?

Example for Python:

BTW, there are lots of issues with visualizing z(x,y) data...

### #3Reedbeta

DevMaster Staff

• 5310 posts
• LocationSanta Clara, CA

Posted 30 January 2009 - 05:19 PM

Are you talking about a smooth curve through 3D space? Then a parametric spline, such as a Bezier or Catmull-Rom spline, may be what you are looking for.
reedbeta.com - developer blog, OpenGL demos, and other projects

### #4Sniffman

Member

• Members
• 48 posts

Posted 31 January 2009 - 04:31 PM

Yes, splines are of course the way to go for this. Sorry, the question seems to be ambiguous. My question was more into the direction of how to fit such a spline to the points.

### #5Reedbeta

DevMaster Staff

• 5310 posts
• LocationSanta Clara, CA

Posted 31 January 2009 - 07:23 PM

Well, if you're happy with assigning a t parameter value to each point, you can actually do this with linear regression. These splines are polynomials, so they're of the form sum(c_i * t^i) for constants c_i with 0 <= i <= n, the degree of the spline. If you plug in the value of t for each point to be fitted, you get constraints that are linear in the c_i and can be solved immediately by least squares.

On the other hand, if for instance the points aren't ordered already or you don't want to fix their t values ahead of time, this becomes harder...I'm afraid I don't know off the top of my head how to do the regression for those more general cases.
reedbeta.com - developer blog, OpenGL demos, and other projects

### #6Sniffman

Member

• Members
• 48 posts

Posted 01 February 2009 - 12:27 AM

I'm sorry if I appear overly slow-witted, but I am afraid I shall have to ask you to elaborate a bit on that. I do know that the spline control points are interpolated by polynomials (are we still speaking about Bezier splines?). Im assuming you mean that when you say the splines are polynomials. I'm also not sure what you meant by your formula. What I know about Bezier splines is that they are for each t the sum of the control points weighted by the Bernstein polynomials. How does that translate into what you said?

### #7Reedbeta

DevMaster Staff

• 5310 posts
• LocationSanta Clara, CA

Posted 01 February 2009 - 04:16 AM

Bezier, Hermite, and so on really refer to a method of finding the polynomial coefficients. Underneath they are the same kind of object - a parametric polynomial spline. I assume you know that a polynomial is in general a function of the form c_0 + c_1 * t + c_2 * t^2 + ... up to some power, where t is the parameter and c_0, c_1, etc. are constants. That's what I meant whan I said sum(c_i * t^i) for 0 <= i <= n. For a 3D spline there are three such functions, one each for x, y, and z, with different values for the c_i. Alternatively, you can think of it as being one function where the c_i are vectors.

Although you can express a Bezier spline as the sum of the control points weighted by the Bernstein polynomials, if you multiply everything out, it comes down to the form described above. In that case the c_i will be some mixture of the control points and the coefficients involved in the Bernstein polynomials.

Now, before going further, I'd like to know a bit more about the problem (so I don't lead you off down the wrong track).

- Are the points in your data set already sorted in the order that they should appear along the spline, or not?
- Are the points roughly evenly spaced along the spline that should fit them, or not?
- About how many points are we talking about - 4? 10? Hundreds?
- Is the spline required to pass through each and every data point, or just approximately come near them?
reedbeta.com - developer blog, OpenGL demos, and other projects

### #8Sniffman

Member

• Members
• 48 posts

Posted 01 February 2009 - 11:28 AM

I see that thing about polynomials now, anthough I still don't quite know how to get from the weighted point to the strictly polynomial form.

As for your questions: The points are evenly spaced and ordered along the z-axis, so the answer to the first question is definitely yes. You could also say they are very roughly evenly spaced, although there are also sometimes rather large differences in x and y. We are talking about 200-300 points. No, the spline does not have to pass through each of them exactly, I only want an approximation of the general direction, not an interpolation.

### #9Reedbeta

DevMaster Staff

• 5310 posts
• LocationSanta Clara, CA

Posted 01 February 2009 - 09:45 PM

Okay, that makes things a bit simpler. Since the points are already spaced and ordered along z, I think what I would recommend is using functions that are parameterized by the z coordinate, rather than having a separate parameter t. This will reduce the number of degrees of freedom in the problem.

In other words, the solution will be a pair of functions x(z) and y(z) that give the curve's x and y locations for a given z.

If we're using polynomials (which seems like a good place to start), the first thing to do is to pick a degree. Lower degrees will give a smoother curve, while higher ones will fit the data more closely...you'll have to experiment to find what works best. Degree 3 (cubic) is probably a good place to start.

Let's let 'n' be the degree. I'll use a_i and b_i to refer to the coefficients of the x and y polynomials, respectively, where 0 <= i <= n. So the x and y polynomials we are trying to solve for are:

x(z) = a_0 + z * a_1 + z^2 * a_2 + ... + z^n * a_n
y(z) = b_0 + z * b_1 + z^2 * b_2 + ... + z^n * b_n

where the a_i and b_i are unknown. Let 'm' be the number of data points and x_j, y_j, z_j be their coordinates, where 1 <= j <= m. Then the constraints we are trying to satisfy are:

x(z_1) = x_1, y(z_1) = y_1
x(z_2) = x_2, y(z_2) = y_2
...
x(z_m) = x_m, y(z_m) = y_m

This forms two linear systems of m equations in n unknowns, one for x and one for y. Since m > n the system is overdetermined and doesn't have an exact soluton, but we can still use least squares to find the approximate solution with the smallest mean-squared-error.
reedbeta.com - developer blog, OpenGL demos, and other projects

### #10Sniffman

Member

• Members
• 48 posts

Posted 02 February 2009 - 04:39 PM

Okay, that I now understand. However, why do you set x(z_i) = x_i and the same for y? That would mean that all points would lie on the resulting spline, would it not?

Also, it now appears rather as though the assumption of equal z-axis spacing doesn't always hold true. Therefore, I need a more general approach. I was thinking one could simply approximate the parameter t for each point and then do what you proposed, except now with three functions x(t), y(t) and z(t) of the form sum(t^i * c_i) for i = 0 to n.

And I had another thought: Can't we just think of the points to be approximated as the spline's control points? In that case, the coefficient c_i for t^i could be calculated easily. Although I just realize that the partial splines would always meet at their respective end control points, thereby making the curve pass through at least a number of points. If I want to find an average path through them, that might not be desired. This could in turn be prevented by using a very-high-order spline. But I somehow doubt that's a good idea.

### #11Reedbeta

DevMaster Staff

• 5310 posts
• LocationSanta Clara, CA

Posted 02 February 2009 - 05:26 PM

Sniffman said:

However, why do you set x(z_i) = x_i and the same for y? That would mean that all points would lie on the resulting spline, would it not?

If we could solve the equations exactly, then yes. However, as mentioned, the system is overdetermined and so probably cannot be solved exactly. The least-squares algorithm produces a solution that is in a certain sense as close as possible.

Sniffman said:

I was thinking one could simply approximate the parameter t for each point and then do what you proposed, except now with three functions x(t), y(t) and z(t) of the form sum(t^i * c_i) for i = 0 to n.

You could certainly try that. But, just to clarify, the method I outlined doesn't require the z-axis points to be evenly spaced. So, I would still give that a try.

Sniffman said:

Can't we just think of the points to be approximated as the spline's control points?

Yes, this is another valid approach, and for instance a Catmull-Rom spline does exactly this. However, using all points as control points would lead to a very large number of spline segments, and depending on how much noise there is in the data, it could lead to a "wiggly" curve that is not as smooth as desired.

Another possibility is a hybrid of the two approaches; you could use more than one spline segment, but use least-squares for each one. For instance, you could take the first 20 points and fit a single spline to them, then fit a spline to the next 20 points, etc. You might have to do some fixing up to get them to link end to end smoothly.
reedbeta.com - developer blog, OpenGL demos, and other projects

### #12Sniffman

Member

• Members
• 48 posts

Posted 02 February 2009 - 07:43 PM

Well, yes, it indeed doesn't matter if the points are regularly spaced. What I meant was rather that they don't necessarily have to be arranged along the z-axis. Although on second thought, that might not change a thing. I guess x and y can be expressed as a function of z regardless of whether z is the primary axis of arrangement.

What do you mean by getting more spline segments when using more control points? The way I see it, the more points used per spline, the larger the degree, up to the maximum of taking all points for one spline.

Well, anyway, I've got plenty to try out now.

### #13Reedbeta

DevMaster Staff

• 5310 posts
• LocationSanta Clara, CA

Posted 02 February 2009 - 08:08 PM

Sniffman said:

What do you mean by getting more spline segments when using more control points? The way I see it, the more points used per spline, the larger the degree, up to the maximum of taking all points for one spline.

Yes, what I meant by "spline segments" is simply making the whole curve out of multiple splines joined together end to end, where each spline uses some contiguous subset of the points.
reedbeta.com - developer blog, OpenGL demos, and other projects

### #14Sniffman

Member

• Members
• 48 posts

Posted 05 February 2009 - 06:33 PM

Another connected question, more out of interest than out of necessity. As I need to consruct planes along the spline, I need to get the curve's derivative. I have gotten that if C(t) is the Bezier curve, P_i a control point and B_i,n(t) is a Bernstein polynomial, then:

C'(t) = sum(n * (B_i-1,n-1(t) - B_i,n-1(t)) * P_i) for 0 <= i <= n

I have confirmed the correctness of this. It is stated in a few sources. However, much more often the derivative is expressed as:

C'(t) = sum(B_i,n-1(t) * n * (P_i+1 - P_i)) for 0 <= i <= n-1

How does this second expression follow from the first?

### #15Reedbeta

DevMaster Staff

• 5310 posts
• LocationSanta Clara, CA

Posted 05 February 2009 - 07:18 PM

It's just a different way of ordering the same terms.

Let's look at an example with n = 2. The derivative by the first formula is:

2*(-B_0,1 * P_0 + (B_0,1 - B_1,1) * P_1 + B_1,1 * P_2)

By the second formula it is:

2*(B_0,1 * (P_1 - P_0) + B_1,1 * (P_2 - P_1))

If you multiply them out you'll see they're the same thing. You can probably prove by induction the two formulas are the same for all n, if you like.
reedbeta.com - developer blog, OpenGL demos, and other projects

### #16Sniffman

Member

• Members
• 48 posts

Posted 05 February 2009 - 11:05 PM

Okay, got that. But I noticed you just excluded B_-1,1 and B_2,1. They are nonsense, of course, but does that mean you can simply drop them?

### #17Reedbeta

DevMaster Staff

• 5310 posts
• LocationSanta Clara, CA

Posted 05 February 2009 - 11:09 PM

Yes; they are defined to be zero (precisely to make formulas like these work ).
reedbeta.com - developer blog, OpenGL demos, and other projects

### #18JarkkoL

Senior Member

• Members
• 475 posts

Posted 10 March 2009 - 08:47 PM

I have a generic spline fitting code on-line if you want to have a look. It's not perfect or anything, but it takes a set of position (or rotation) keyframes and spews out cubic splines within given error tolerance. It's something you could use for example compressing keyframed animations.

#### 1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users