# Thread: point estimate question: sufficiency, bias, order stat

1. ## point estimate question: sufficiency, bias, order stat

Question.

Suppose that $\displaystyle Y_1, ... Y_n$ is a random sample from a population with density $\displaystyle f_Y(y)=\frac{2y}{\theta^2}$ for $\displaystyle 0<y\leq\theta$, where $\displaystyle \theta>0$.

(a) is $\displaystyle Y_{(n)}$ a sufficient statistic for $\displaystyle \theta$?

(b) Derive the density of the last order statistic $\displaystyle Y_{(n)}$.

(c) Find an unbiased estimator of $\displaystyle \theta$ that is a function of $\displaystyle Y_{(n)}$.

(d) By considering the condition that a function of $\displaystyle Y_{(n)}$ is unbiased for $\displaystyle \theta$, determine whether there is a better unbiased estimator for $\displaystyle \theta$.

I've just started stat inference and my confidence is so low, I feel like threading on seashells... basically I worked through most of proofs but I don't see how I can apply them. This is a sample exam question (to which I don't have an answer). So I'd appreciate your pointers.

(a) sufficiency of $\displaystyle Y_{(n)}$

Should I apply factorisation theorem here to factorise the density of Y into two parts: one is the function of $\displaystyle Y_{(n)}$ and $\displaystyle \theta$, another is a function of y only?

Another thought, given the fact that y is limited by $\displaystyle \theta$, could it be that y somehow depends on $\displaystyle \theta$ and therefore the highest order statistic $\displaystyle Y_{(n)}$ cannot be a sufficient statistic simply because of this dependency?

(b) Density for $\displaystyle Y_{(n)}$

find anti-derivative of $\displaystyle f_Y(y)$

$\displaystyle F_Y(y)=\frac{2}{\theta^2}\frac{y^2}{2}, 0<y\leq\theta$ - for a single observation

$\displaystyle F_{Y_{(n)}}(y)=P(Y_{(n)}{\leq}y)=[F_Y(y)]^n=(\frac{y^2}{\theta^2})^n, 0<y\leq\theta$

$\displaystyle f_{Y_{(n)}}(y)=\frac{d}{dy}F_{Y_{(n)}}(y)=n(\frac{ y^2}{\theta^2})^{n-1}\frac{2y}{\theta^2}$

(c) unbiased estimator of $\displaystyle \theta$ that is a function of $\displaystyle Y_{(n)}$

I cannot 'see' it just from looking at the distribution, so I was thinking to use the definition of the unbiased estimator, but the integration seems like a puzzle.

Define a function of $\displaystyle Y_{(n)}: g(Y_{(n)})$. Then
$\displaystyle E(g(Y_{(n)})=\theta$ according to the definition of an unbiased estimator.

Then I would try to find $\displaystyle g(Y_{(n)}$ form the equation

$\displaystyle E(g(Y_{(n)})=\int_0^{\theta}g(Y_{(n)})f_{Y_{(n)}}( y)dy=\theta$

If I do that, I get an integral of a product of $\displaystyle g(Y_{(n)})$ and $\displaystyle y^{2n-1}$, on the left side, and an expression involving theta and n on the right side - is it solvable?

2. the largest order stat is suff for theta

I believe that

$\displaystyle E(Y_{(n)})={2n\theta\over 2n+1}$

If that's right, then $\displaystyle {2n+1\over 2n}Y_{(n)}$ is unbiased for theta

3. ## expected value of i-th order statistic

One more (general) question related to (c), how do you find an expected value of an i-th order statistics - do you use the usual formula (summation or integration)

$\displaystyle \int_{range of Y}Y_{(i)}f_Y_{(i)}(y)dy$

I can find the density of i-th order stats, but and what do I put as $\displaystyle Y_{(i)}$? y? or I need to derive a formula for $\displaystyle Y_{(i)}$, based on the distribution given?

4. Originally Posted by matheagle
(that took 1 minute, by the way)
That took you n years of studying and practising maths, right?
I wasn't able to integrate AT ALL before last August. I didn't know anything about order statistic three days ago.

5. $\displaystyle \int yf_{Y_{(i)}}(y)dy$

Little y, $\displaystyle Y_{(i)}$ is the name/label of the rv
y is the dummy variable we are summing/integrating over.

$\displaystyle \int yf_{Y_{(i)}}(y)dy=\int uf_{Y_{(i)}}(u)du$

I always use f(u) in class
it's the only time I can say fu without getting in trouble.

6. Originally Posted by matheagle
the largest order stat is suff for theta
Can I ask how you made this conclusion? is that so obvious?... or some mathematical manipulations were performed

Originally Posted by matheagle
I believe that

$\displaystyle E(Y_{(n)})={2n\theta\over 2n+1}$

If that's right, then $\displaystyle {2n+1\over 2n}Y_{(n)}$ is unbiased for theta
I see the logic now...

7. the likelihood functions factors into the five parts

$\displaystyle {2^n\prod y_i\over \theta^{2n}} I(Y_{(1)}>0) I(Y_{(n)}<\theta)$

the only part involving the data that cannot be separated from theta is the one with the largest order stat.
that is always the case when dealing with a parameter greater than the random variables.
When the parameter is a lower bound, the the smallest order stat is suff for that parameter.

8. Originally Posted by matheagle
the likelihood functions factors in to the five parts

$\displaystyle {2^n\prod y_i\over \theta^{2n}} I(Y_{(1)}>0) I(Y_{(n)}<\theta)$

the only part involving the data that cannot be separated from theta is the one with the largest order stat.
that is always the case when dealing with a parameter greater than the random variables.
When the paramter is a lower bound, the the smallest order stat is suff for that parameter.
thanks, I got it. I factorised initially, but I missed out the Indicator function.

9. Originally Posted by Volga
That took you n years of studying and practising maths, right?
I wasn't able to integrate AT ALL before last August. I didn't know anything about order statistic three days ago.
According to certain people it's N years

10. Is that to say N>>n?

11. according to moo n mr fantasy, way bigger
maybe N!

12. Originally Posted by matheagle
the likelihood functions factors into the five parts

$\displaystyle {2^n\prod y_i\over \theta^{2n}} I(Y_{(1)}>0) I(Y_{(n)}<\theta)$
So, to continue to part (d), I think I need to check 'attaining the Cramer-Rao bound'.

Given the likelihood function for n observations above, $\displaystyle l(\theta;y)=nln2+\Sigma_{i=1}^nY_i-2ln\theta, 0<y\leq\theta$,

$\displaystyle s(\theta;y)=\frac{d}{d\theta}(l(\theta;y))=-\frac{2n}{\theta}$

Now I think I should check the 'attaining Cramer-Rao lower bound' condition, whether or not the score function is linear with the estimator function

$\displaystyle s(\theta;y)=b(\theta)[h(y)-g(\theta)]$

$\displaystyle -\frac{2n}{\theta}=b(\theta)[\frac{2n+1}{2n\theta}Y_{(n)}-g(\theta)]$

should I find b and g of theta here?

I am just copying the formula from the book. How do N! mathematicians do that?...

13. Another attempt at (d) By considering the condition that a function of $\displaystyle Y_{(n)}$ is unbiased for $\displaystyle \theta$, determine whether there is a better unbiased estimator for $\displaystyle \theta$.

If $\displaystyle {2n+1\over 2n}Y_{(n)}$ is an unbiased etimator for $\displaystyle \theta$ based on a random sample $\displaystyle Y_1, ... Y_n$, then

$\displaystyle Var({2n+1\over 2n}Y_{(n)})\geq\frac{1}{nI(\theta)}$ ("I" here denotes information - what it the real Latex code for this letter? is it Greek?...)

$\displaystyle Var({2n+1\over 2n}Y_{(n)})\geq\frac{1}{nI(\theta)}=\frac{\theta^2 }{4n^3}$

Now I find the variance of my unbiased estimator and compare to the above variance threshold

$\displaystyle Var(({2n+1\over 2n}Y_{(n)})=\frac{(2n+1)^2}{4n^2\theta^2}Var({Y_{( n)})$

$\displaystyle E[(Y_{(n)})^2]=\int_0^{\theta}y^2n(\frac{y^2}{{\theta}^2}\frac{2 y}{\theta^2}dy=...=\frac{{\theta^2}2n}{2n+2}$

then $\displaystyle Var(({2n+1\over 2n}Y_{(n)})=E[(Y_{(n)})^2]-E(Y_{(n)})^2=\frac{{\theta^2}2n}{2n+2}-(\frac{2n\theta}{2n+1})^2=\theta^2[\frac{2n}{2n+2}-\frac{4n^2}{(2n+1)^2}]$

$\displaystyle Var(({2n+1\over 2n}Y_{(n)})=\frac{(2n+1)^2}{4n^2\theta^2}\theta^2[\frac{2n}{2n+2}-\frac{4n^2}{(2n+1)^2}]=...=\frac{1}{4n^2+4n}$

I am concerned that the final Var does not depend on theta - theta squared cancelled out in the computation.

Now I understand one would see how this variance compares with $\displaystyle \frac{\theta^2}{4n^3}$

???

14. Seriously, what the name of the letter used as notation for Fisher information?