# Thread: Expectation and variance of top 10% of any normal distribution

1. ## Expectation and variance of top 10% of any normal distribution

Short:
With the standard curve (i.e. mean=0, std dev=1), does the top 10% form its own normal distribution with the expectation about 1.74 and the standard deviation about .40?

Long:
I wrote a little program that takes the numbers 0.005,.015,.025,...,0.995, which is 100 numbers. I take the top 10 (.905-.995), find their z-score (1.31,1.37,...,2.58), sum them and divide by 10 and get 1.7447. In a similar manner, I go about finding the variance and get .1578, or a standard deviation of about .3972.

All seems well so far. I then take the lower 90 numbers and calculate the expectation and get -.194. This works because if I use the law of iterated expectations, the expectation turns out to be 0, which we expect from the standard curve.

I then use the law of total variance and get the unusual answer of 1.118, instead of 1. The expectation of the two variances is .658 and the variance of the two expectations is .460.

This doesn't seem so bad, after all I'm approximating with 100 numbers. So then I ran it again with 1000 numbers and then with 10,000 numbers. It seems to be converging on a variance of about 1.116, which I did not expect.

Any ideas on the discrepancy or a better way to figure this out?

TIA, Cary

2. ## Re: Expectation and variance of top 10% of any normal distribution

If I understand you in the derived distribution

$P[X < \Phi^{-1}(0.9)] = 0$

A normal distribution has non-zero probability for it's entire range of support.

It's true that the given a multivariate normal distribution conditioning one variable on another produces a normal distribution but that's not what you're talking about here.

3. ## Re: Expectation and variance of top 10% of any normal distribution

I don't fully recognize your notation, but I don't think so. I probably wasn't clear.

It was straightforward to calculate the expectation and variance of the top 10% using my sample numbers. I got 1.7447 and .1578. For the bottom 90%, it was -0.1939 and .7133.

After that, I wanted to check my work. By the law of iterated expectations (or total expectations), I did:
E[X] = E[X|Y=.1] + E[X|Y=.9] = .1(1.7447) + .9(-.194) = .17447 - 1746 = ~0.

Then I wanted to check the variance by the law of total variance:
Var(X) = E[Var(X|Y)] + Var(E[X|Y])
E[Var(X|Y)]=.1(.1577) + .9(.7133)=.6578
Var(E[X|Y])=.1((.1577-0)^2) + .9((.7133-0)^2))=.4604
Var(X) = .6578 + .4604 = 1.118, which should be the variance of the original standard curve, which should be 1.

Again, it's not too bad, but enough that I wanted to check. Oddly, if I keep using more and more sample numbers in my program, it doesn't get much better. Am I doing something wrong?

TIA again,
Cary