Regression curve through a set of points
#1
Posted 30 January 2009 - 02:48 PM
So, I am since reading Wiki articles about regression stuff, but came to no solution yet. Any pointing in the right direction would be greatly appreciated.
#2
Posted 30 January 2009 - 04:03 PM
Example for Python:
http://code.google.c...ythonequations/
BTW, there are lots of issues with visualizing z(x,y) data...
#3
Posted 30 January 2009 - 05:19 PM
#4
Posted 31 January 2009 - 04:31 PM
#5
Posted 31 January 2009 - 07:23 PM
On the other hand, if for instance the points aren't ordered already or you don't want to fix their t values ahead of time, this becomes harder...I'm afraid I don't know off the top of my head how to do the regression for those more general cases.
#6
Posted 01 February 2009 - 12:27 AM
#7
Posted 01 February 2009 - 04:16 AM
Although you can express a Bezier spline as the sum of the control points weighted by the Bernstein polynomials, if you multiply everything out, it comes down to the form described above. In that case the c_i will be some mixture of the control points and the coefficients involved in the Bernstein polynomials.
Now, before going further, I'd like to know a bit more about the problem (so I don't lead you off down the wrong track).
- Are the points in your data set already sorted in the order that they should appear along the spline, or not?
- Are the points roughly evenly spaced along the spline that should fit them, or not?
- About how many points are we talking about - 4? 10? Hundreds?
- Is the spline required to pass through each and every data point, or just approximately come near them?
#8
Posted 01 February 2009 - 11:28 AM
As for your questions: The points are evenly spaced and ordered along the z-axis, so the answer to the first question is definitely yes. You could also say they are very roughly evenly spaced, although there are also sometimes rather large differences in x and y. We are talking about 200-300 points. No, the spline does not have to pass through each of them exactly, I only want an approximation of the general direction, not an interpolation.
#9
Posted 01 February 2009 - 09:45 PM
In other words, the solution will be a pair of functions x(z) and y(z) that give the curve's x and y locations for a given z.
If we're using polynomials (which seems like a good place to start), the first thing to do is to pick a degree. Lower degrees will give a smoother curve, while higher ones will fit the data more closely...you'll have to experiment to find what works best. Degree 3 (cubic) is probably a good place to start.
Let's let 'n' be the degree. I'll use a_i and b_i to refer to the coefficients of the x and y polynomials, respectively, where 0 <= i <= n. So the x and y polynomials we are trying to solve for are:
x(z) = a_0 + z * a_1 + z^2 * a_2 + ... + z^n * a_n
y(z) = b_0 + z * b_1 + z^2 * b_2 + ... + z^n * b_n
where the a_i and b_i are unknown. Let 'm' be the number of data points and x_j, y_j, z_j be their coordinates, where 1 <= j <= m. Then the constraints we are trying to satisfy are:
x(z_1) = x_1, y(z_1) = y_1
x(z_2) = x_2, y(z_2) = y_2
...
x(z_m) = x_m, y(z_m) = y_m
This forms two linear systems of m equations in n unknowns, one for x and one for y. Since m > n the system is overdetermined and doesn't have an exact soluton, but we can still use least squares to find the approximate solution with the smallest mean-squared-error.
#10
Posted 02 February 2009 - 04:39 PM
Also, it now appears rather as though the assumption of equal z-axis spacing doesn't always hold true. Therefore, I need a more general approach. I was thinking one could simply approximate the parameter t for each point and then do what you proposed, except now with three functions x(t), y(t) and z(t) of the form sum(t^i * c_i) for i = 0 to n.
And I had another thought: Can't we just think of the points to be approximated as the spline's control points? In that case, the coefficient c_i for t^i could be calculated easily. Although I just realize that the partial splines would always meet at their respective end control points, thereby making the curve pass through at least a number of points. If I want to find an average path through them, that might not be desired. This could in turn be prevented by using a very-high-order spline. But I somehow doubt that's a good idea.
#11
Posted 02 February 2009 - 05:26 PM
Sniffman said:
If we could solve the equations exactly, then yes. However, as mentioned, the system is overdetermined and so probably cannot be solved exactly. The least-squares algorithm produces a solution that is in a certain sense as close as possible.
Sniffman said:
You could certainly try that. But, just to clarify, the method I outlined doesn't require the z-axis points to be evenly spaced. So, I would still give that a try.
Sniffman said:
Yes, this is another valid approach, and for instance a Catmull-Rom spline does exactly this. However, using all points as control points would lead to a very large number of spline segments, and depending on how much noise there is in the data, it could lead to a "wiggly" curve that is not as smooth as desired.
Another possibility is a hybrid of the two approaches; you could use more than one spline segment, but use least-squares for each one. For instance, you could take the first 20 points and fit a single spline to them, then fit a spline to the next 20 points, etc. You might have to do some fixing up to get them to link end to end smoothly.
#12
Posted 02 February 2009 - 07:43 PM
What do you mean by getting more spline segments when using more control points? The way I see it, the more points used per spline, the larger the degree, up to the maximum of taking all points for one spline.
Well, anyway, I've got plenty to try out now.
#13
Posted 02 February 2009 - 08:08 PM
Sniffman said:
Yes, what I meant by "spline segments" is simply making the whole curve out of multiple splines joined together end to end, where each spline uses some contiguous subset of the points.
#14
Posted 05 February 2009 - 06:33 PM
C'(t) = sum(n * (B_i-1,n-1(t) - B_i,n-1(t)) * P_i) for 0 <= i <= n
I have confirmed the correctness of this. It is stated in a few sources. However, much more often the derivative is expressed as:
C'(t) = sum(B_i,n-1(t) * n * (P_i+1 - P_i)) for 0 <= i <= n-1
How does this second expression follow from the first?
#15
Posted 05 February 2009 - 07:18 PM
Let's look at an example with n = 2. The derivative by the first formula is:
2*(-B_0,1 * P_0 + (B_0,1 - B_1,1) * P_1 + B_1,1 * P_2)
By the second formula it is:
2*(B_0,1 * (P_1 - P_0) + B_1,1 * (P_2 - P_1))
If you multiply them out you'll see they're the same thing. You can probably prove by induction the two formulas are the same for all n, if you like.
#16
Posted 05 February 2009 - 11:05 PM
#17
Posted 05 February 2009 - 11:09 PM
#18
Posted 10 March 2009 - 08:47 PM
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users












