Least Squares Approximation Theorem:

Let $\displaystyle f$ be continuous on $\displaystyle [a,b]$, and let $\displaystyle W$ be a finite-dimensional subspace of $\displaystyle C[a,b]$. The least square approximating function of $\displaystyle f$ with respect to $\displaystyle W$ is given by

$\displaystyle g = \langle f,\mathbf{w_1} \rangle \mathbf{w_1} + \langle f,\mathbf{w_2} \rangle \mathbf{w_2} + ... + \langle f,\mathbf{w_n} \rangle \mathbf{w_n}, $

$\displaystyle \text{where } B = \{\mathbf{w_1}, \mathbf{w_2}, ..., \mathbf{w_n}\} \text{ is an orthonormal basis for } W.$

Proof:

To show that $\displaystyle g$ is the least squares approximating function of $\displaystyle f$, prove that the inequality

$\displaystyle \parallel f-g \parallel \,\, \leq \,\, \parallel f-\mathbf{w} \parallel $

is true for any vector $\displaystyle \mathbf{w}$ in $\displaystyle W$. By writing $\displaystyle f-g$ as

$\displaystyle f - g = f - \langle f,\mathbf{w_1} \rangle \mathbf{w_1} + \langle f,\mathbf{w_2} \rangle \mathbf{w_2} + ... + \langle f,\mathbf{w_n} \rangle \mathbf{w_n} $

you can see that $\displaystyle f-g$ is orthogonal to each $\displaystyle \mathbf{w_i}$, which in turn implies that it is orthogonal to each vector in $\displaystyle W$. In particular, $\displaystyle f-g$ is orthogonal to $\displaystyle g-\mathbf{w}$

$\displaystyle \text{...the rest of the proof continues} $

-------------Proof Ends------------

My question has two parts.

First question:

Why proving

$\displaystyle \parallel f-g \parallel \,\, \leq \,\, \parallel f-\mathbf{w} \parallel $

is enough for this proof? There might be other vectors that make the least square approximation smaller. Why choose particular orthonormal basis?

Second question:

What is the reason behind the implicit statement $\displaystyle g-\mathbf{w}$ is in the subspace $\displaystyle W$ ?

What reasoning make $\displaystyle g-\mathbf{w} \in W $? Is there any theorem I can't remember now for this question?