where Y is an (n x 1) random vector, X is an (n x p) matrix of known coefficients, and beta is a (p x 1) vector of unknown parameters; epsilon is an (n x 1) vector of random errors (mean 0, typically with iid normal components).
A reasonable goal under these circumstances is to minimize
where is the Euclidean length of a vector squared. It can be shown that the value of that satisfies this is given by
This is the least squares estimate of beta. The proof is not very difficult if you have the appropriate machinery from matrix algebra (projections and so forth); you can also show it with a little bit more effort via matrix calculus (less background required) - set the gradient Q(beta) to zero and show that the Hessian is positive-definite.