I'm trying to understand the following least squares example:

You have three lines:

2x - y = 2

x + 2y = 1

x + y = 4

The lines do not have a common intersection point, so the example is using least squares to find a point that minimizes the distance to the lines.

It goes on to build the matrices:

A(3x2) = ((2 1 1)^T (-1 2 1)^T)

b(3x1) = (2 1 4)^T

And then uses the least squares equation:

A^T A x* = A^T b

to come up with the least squares solution, x*.

I understand how least squares works. I also understand how it is used to come up with a solution given the matrices A and b.

We are minimizing the distance to b from the column spaces of A. The column spaces form a plane in R^3. I don't understand how this plane relates to our original problem. Why would the closest point to b in that plane somehow be the closest point to the three lines in our initial problem formulation ? (That is, is it somehow representing the closest point to the lines as calculated by a sum of perpendicular vectors from our least squares solution. And if so, why? Or, is it somehow representing the closest point to the lines as calculated by a sum of y-differences from our least squares solution. And if so, why? Or is it representing something totally different, and what?)