Standard deviation...Confused about 3 sigmas rule

Greetings. I am new to this forum so thank you in advance for the help.

From what I understand, There is a "3 sigmas" rule applied to standard deviation which states that about 68.3% of a set of numbers will lie within one standard deviation from the mean, about 95.4% will lie within two standard deviations from the mean and about 99.7% will lie within three standard deviations of the mean.

So when I try to apply this rule I go into Excel and make a column of 100 numbers in order from 1 to 100. The mean is 50.5. Using Excel's Standard Deviation function reveals the standard deviation for this set is 28.87. Adding the standard deviation to the mean is 79.4. Subtracting the standard deviation from the mean is 21.6. So, according to the 3 Sigmas rule, shouldn't there be about 68 of the numbers which lie between 79.4 and 21.6? In fact there are 58.

Furthermore, adding two sigmas above the mean gives a number above 100, the maximun number in the list (two sigmas above mean is 108.2).

I must be failing to apply this correctly. Please help and thanks again!

Hugh

Re: Standard deviation...Confused about 3 sigmas rule

The "3 sigmna" rule that you referenced applies to normally-distributed data, and your list of 1 through 100 is not normally distributed. One standard deviation is not defined to be the range that cfovers 68.3% of the data - the actual definition for a standard deviation for $\displaystyle N$discrete data points is:

$\displaystyle \sigma = \sqrt{ \frac 1 {N-1} \sum_{i=1}^N (x_i - \bar x)^2} $

where $\displaystyle x_i$ and $\displaystyle \bar x $ are the $\displaystyle i^{th}$ data point and the mean, respectively.

This is the formula that Excel's "stdev" function uses. It turns out that if the data is normally distributed then one standard deviation will cover about 68% of the middle values.

Re: Standard deviation...Confused about 3 sigmas rule

Thanks for the reply ebaines.

So what is the defintion of a "normally-distributed" data set? I am crunching football statistics. For example: If a team runs the football 25 times in a game, can I assume that about 68% of the running plays will be within one standard deviation of the average yards gained?

Thanks again!

Hugh

Re: Standard deviation...Confused about 3 sigmas rule

Quote:

Originally Posted by

**Hugh** So what is the defintion of a "normally-distributed" data set? I am crunching football statistics. For example: If a team runs the football 25 times in a game, can I assume that about 68% of the running plays will be within one standard deviation of the average yards gained?

I hate to say it, but a normally distributed data set is one that follows the sigma rules you mentioned earlier! In esssence if you plot he data and it takes on the familiar bell-shape curve then it's pretty close to normally distribuited. The data for yards gained in a football game may be close to normally distributed, but I don't know that 25 data points will be enough to ensure it as one big play can cause quite a deviation. For example if 16 plays lead to yardage gains of 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6, 6, and 7 yards then it's pretty normal (here the average = 4 and stdev = 1.6). But add in one additional big play of 99 yards and it skews the results - the new average is 9.6 and the std dev 23, and it no longer looks normal.

Re: Standard deviation...Confused about 3 sigmas rule

okay got it. Very well explained. Thanks for clarifying!