Do you have depth information in screen space for the points as well? If not, the transformation isn’t uniquely defined because the whole plane could move farther or nearer to the camera, scaling correspondingly, and the points would stay at the same screen space locations.

If you do have depth information, you can trace rays from the camera through those pixels to the given depth to find the points in eye space. Then you should be able to solve for a 3x3 matrix connecting the object-space points to the eye-space points; you’ll have 9 unknowns (the matrix elements) and 9 constraints (3 for each point), and it should be linear, so the system should be straightforward to solve.

Note however that 3x3 matrices don’t include translation, so this will only look for a strictly linear transformation that maps the required points. This could be any weird thing or even not exist at all (if one of the points is at the origin in local space for instance). If you want to include translation, that gives you 3 more unknowns to solve for, but you still have only 9 constraints, so the system is underdetermined. You’ll have to think of 3 additional constraints to impose.

Can anyone help me solve this problem?

I have three known world-space xyz points located on a plane. All three of these points are visible to the camera.

The cameras frustum is known and the locations in the cameras pixel space of each of the three points of

the plane are also known.

From this information is it possible to calculate the transformation matrix of the plane? I can only think of an

iterative method that doesn’t seem very robust.

Thanks in advance!