Computing an objects transformation matrix from intersecting pixel coordinates

E5748a4e6c760a0adb0f88d4ced98616
0
lynedavid 101 Nov 28, 2008 at 14:23

Can anyone help me solve this problem?

I have three known world-space xyz points located on a plane. All three of these points are visible to the camera.
The cameras frustum is known and the locations in the cameras pixel space of each of the three points of
the plane are also known.

From this information is it possible to calculate the transformation matrix of the plane? I can only think of an
iterative method that doesn’t seem very robust.

Thanks in advance!

5 Replies

Please log in or register to post a reply.

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Nov 28, 2008 at 19:18

Do you have depth information in screen space for the points as well? If not, the transformation isn’t uniquely defined because the whole plane could move farther or nearer to the camera, scaling correspondingly, and the points would stay at the same screen space locations.

If you do have depth information, you can trace rays from the camera through those pixels to the given depth to find the points in eye space. Then you should be able to solve for a 3x3 matrix connecting the object-space points to the eye-space points; you’ll have 9 unknowns (the matrix elements) and 9 constraints (3 for each point), and it should be linear, so the system should be straightforward to solve.

Note however that 3x3 matrices don’t include translation, so this will only look for a strictly linear transformation that maps the required points. This could be any weird thing or even not exist at all (if one of the points is at the origin in local space for instance). If you want to include translation, that gives you 3 more unknowns to solve for, but you still have only 9 constraints, so the system is underdetermined. You’ll have to think of 3 additional constraints to impose.

E5748a4e6c760a0adb0f88d4ced98616
0
lynedavid 101 Nov 28, 2008 at 23:41

@Reedbeta

Do you have depth information in screen space for the points as well? If not, the transformation isn’t uniquely defined because the whole plane could move farther or nearer to the camera, scaling correspondingly, and the points would stay at the same screen space locations.

If you do have depth information, you can trace rays from the camera through those pixels to the given depth to find the points in eye space. Then you should be able to solve for a 3x3 matrix connecting the object-space points to the eye-space points; you’ll have 9 unknowns (the matrix elements) and 9 constraints (3 for each point), and it should be linear, so the system should be straightforward to solve.

Note however that 3x3 matrices don’t include translation, so this will only look for a strictly linear transformation that maps the required points. This could be any weird thing or even not exist at all (if one of the points is at the origin in local space for instance). If you want to include translation, that gives you 3 more unknowns to solve for, but you still have only 9 constraints, so the system is underdetermined. You’ll have to think of 3 additional constraints to impose.

I dont have depth information. However, I do know the positions of the points in world xyz space. So surely that determines the intersections to be a certain distance down the vectors?

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Nov 29, 2008 at 00:08

I may have misunderstood what you’re trying to do. Are you trying to solve for the plane’s local-to-world matrix? If so, you need a reference for the points in local space, and the camera doesn’t come into the problem at all. Are you trying to solve for the world-to-camera matrix? Then no, without depth there’s not enough information - think of a camera that’s close to the plane with a wide field of view, vs one that’s far from it with a narrow field of view, and you could end up with the points in the same location in screen space. Unless you also know the camera’s field of view ahead of time, in which case you can put more constraints on the problem.

E5748a4e6c760a0adb0f88d4ced98616
0
lynedavid 101 Nov 29, 2008 at 00:12

I’ll go into a little more detail to explain exactly what I’m trying to achieve. As I originally worded it in a way that did not go into specifics.

Firstly, the camera that I mention is actually a real camera (A canon G9 camera that I’m operating remotely)

I know the focal length of the camera so basically I can work out the perspective matrix of the camera (theres also
a bit of lens distortion but its not too bad).

The camera is located anywhere in the room but is always facing the inside of a sphere. My problem is that I need
to calculate the location of the camera in relation to the centre of the sphere.

In order to calculate this I figure that I need to physically mark three arbitary points on the inside of
the sphere that the camera can see. I can measure the XYZ locations of each of these points relative
to spheres centre.

The resulting camera position would be a best-fit solution.

I can do this with a messy iterative method that homes-in on the solution I’d prefer a non-iterative method.

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Nov 29, 2008 at 05:20

Ahh, I see. That makes a lot more sense.

Is the camera’s direction also contrained, e.g. the camera is on a tripod and can only pan, not tilt? Or pan and tilt, but not gimbal (roll)? That could help contrain the problem.

One thing you may be able to do is look at the triangle formed by the three points. Each edge of the triangle should belong to a plane through the camera location. If you write down the three plane equations, then requiring the camera point to be on all three may be enough to solve the problem.