# Goodness of fit - explaining standard deviations

• February 20th 2011, 11:38 AM
davemk
Goodness of fit - explaining standard deviations
Evening folks.

Hopefully someone can help me with this as stats isn't my forte and hours of internet trawling hasn't been any help!

I've performed an experiment to find out whether a certain scenario fits a negative binomial distribution. After collecting the data, I performed a chi-squared goodness of fit test. The result of this was that there was was no evidence to reject the null, so it was likely that the data did fit a Neg. Bin. Distribution.

My problem lies with the mean and standard deviations. The mean for the observed data and the expected are equal. Why is that so? Is it because the expected values are calculated from the observed values??

Secondly, the standard deviations of the observed and expected are not equal (but are close: 5.6 and 6.5 approximately). Why is there a difference in the SD?

*Edit* - just to clarify, I know what SD is, it's just the reasoning behind the observed and expected SDs being different that I'm unclear on.

• February 20th 2011, 12:16 PM
pickslides
In this situation it would be beneficial to see your data set or at least some summary statistics.

As far as the mean being the same and std being different, this is just a property of the data sets you have. I would just accept this.

For example consider $\displaystyle n_1 = 5, p_1=0.2 \implies \mu_1 = 1, \sigma_1 = 0.89$ and $\displaystyle n_2 = 10, p_2=0.1 \implies \mu_2 = 1, \sigma_2 = 0.94$ you can see that $\displaystyle \mu_1 =\mu_1, \sigma_1 \neq \sigma_2$ thisis just an example off the top of my head, so this can happen.
• February 20th 2011, 12:26 PM
davemk

Chi-square value was 17.24. Critical value at p<.05 is 22.36 and p<.01 is 27.69

Observed & expected means are 12.8

Obs, SD = 5.29
Exp, SD = 6.47
• February 21st 2011, 12:12 PM
davemk
Still not making any headway with this.

Any assistance appreciated.
• February 22nd 2011, 02:28 AM
davemk
(Itwasntme) I think I might have the answer I was looking for and, if I'm right, then I'm stupid for not seeing this in the first place. If I'm wrong, I'm stupid for thinking this might be the answer. Either way, stats is not my thing!

So, here's why I think the difference in the SD can be explained (hopefully the graph shows).

http://i34.photobucket.com/albums/d1...emk/ObsExp.jpg

The expected values show values for X > 30. In the experiment, the observed values were always X < 27. So the expected takes into account values the possibility of seeing values of X greater than observed.

Wouldn't this explain the greater SD for the expected?
• February 22nd 2011, 05:09 AM
CaptainBlack
Quote:

Originally Posted by davemk
Evening folks.

Hopefully someone can help me with this as stats isn't my forte and hours of internet trawling hasn't been any help!

I've performed an experiment to find out whether a certain scenario fits a negative binomial distribution. After collecting the data, I performed a chi-squared goodness of fit test. The result of this was that there was was no evidence to reject the null, so it was likely that the data did fit a Neg. Bin. Distribution.

My problem lies with the mean and standard deviations. The mean for the observed data and the expected are equal. Why is that so? Is it because the expected values are calculated from the observed values??

Secondly, the standard deviations of the observed and expected are not equal (but are close: 5.6 and 6.5 approximately). Why is there a difference in the SD?

*Edit* - just to clarify, I know what SD is, it's just the reasoning behind the observed and expected SDs being different that I'm unclear on.

How did you select the expected distribution?

CB
• February 22nd 2011, 05:45 AM
davemk
I ran an experiment and estimated the probability from that data. Then used the following formula to work out the probability of the expected values. n = no. of required successful observations, x = no. of failures, p = estimated probability.

http://i34.photobucket.com/albums/d1...mk/formula.jpg
• February 22nd 2011, 06:30 AM
CaptainBlack
Quote:

Originally Posted by davemk
I ran an experiment and estimated the probability from that data. Then used the following formula to work out the probability of the expected values. n = no. of required successful observations, x = no. of failures, p = estimated probability.

http://i34.photobucket.com/albums/d1...mk/formula.jpg

There are two parameters p and n, how did you estimate them from your data?

CB
• February 22nd 2011, 06:41 AM
davemk
n was the number of successes I needed (3).

p was estimated from the observed data. 100 observations were made (needing 3 successes each). Therefore estimate of p was given by (3 x 100)/total number of trials.
• February 22nd 2011, 08:50 AM
CaptainBlack
Quote:

Originally Posted by davemk
n was the number of successes I needed (3).

p was estimated from the observed data. 100 observations were made (needing 3 successes each). Therefore estimate of p was given by (3 x 100)/total number of trials.

OK the means are the same because you have set things up that way, the standard deviations are not matched so not usually equal.

CB
• February 22nd 2011, 09:26 AM
davemk
Thank you very much pickslides and CaptainBlack.