Let's just say it's been a while since college, and I can't remember everything about hypothesis testing, and now I'm faced with a situation at work where I need to refresh some of that old knowledge. Before I get too far, let me also state that I have no access to any statistical analysis software packages (like Minitab). The only thing I have is Excel.
I have some data regarding item failures along with a number of other variables. I have the date range in which the item was manufactured, and some other variables like environmental variables (temp, humidity, etc.) during the test of the items. The item is tested with a result of 'pass' or 'fail'. For each range variable, I want to see if the failures show any correlation. (I don't want to check combinations between variables, at least, not yet.)
For example,
After much research, I determined that I ought to find the Pearson chi-squared statistic, and then Cramer's V. (I also calculated it with Yate's correction, since many expected counts were < 5.) For the example above, I chose the null hypothesis that the failures are independent of date, with the alternative hypothesis being that the failures are dependent of date.Code:Date Range Pass Fail Total ------------------------------------- Jan 2010 14 1 15 Feb 2010 1 0 1 Mar 2010 19 2 21 Apr 2010 59 1 60 May 2010 13 0 13 Jun 2010 17 4 21 ------------------------------------- Total 123 8 131
So, these are the values I calculated:
Now, I go look in a chi-square table, and I see there is 90% probability that chi-square will be greater than 1.61 for the 5 degrees of freedom in my example. Clearly, my chi-square value is greater than that, whether I look at the Yates-corrected one or not. So, my null hypothesis cannot be disproved, which effectively tells me nothing.Code:Date Range Expected Pass Expected Fail Total ----------------------------------------------------- Jan 2010 14.084 0.916 15 Feb 2010 0.939 0.061 1 Mar 2010 19.718 1.282 21 Apr 2010 56.336 3.664 60 May 2010 12.206 0.794 13 Jun 2010 19.718 1.282 21 ----------------------------------------------------- Total 123 8 131 Date Range Deg/Free Chi-2 Chi-Yates V Yates-V ----------------------------------------------------------------------- Jan 2010 1 0.008 0.201 Feb 2010 1 0.065 3.360 Mar 2010 1 0.428 0.039 Apr 2010 1 2.063 1.361 May 2010 1 0.846 0.116 Jun 2010 0 6.133 4.084 ----------------------------------------------------------------------- Total 5 9.543 9.162 0.270 0.264
So, here are my issues:
1) To be perfectly honest, I can't remember how I'm supposed to choose which "side" of the table to look up the value on, and I'm assuming I chose the right value from the table, but I really don't know if I did. Did I? How do I know which side to choose? I recall graphs of a bell curve and regions in the tails on either side of the curve representing where these types of values fall, but this is a vague recollection...
--EDIT: From what I recall, if you do a two-tailed test, you have to cut your significance in half, (0.05 becomes 0.025) because there is a region of significance at each end of the probability curve. I think the chi-squared test for independence is two-tailed, but every example I find online still only looks at one p-value, and appears to use the full significance (0.05 instead of 0.025). Can someone explain this?
--
2) Assuming I picked the right value, and my test really does tell me nothing, how can I reverse my null hypothesis and alternative hypothesis, such that I'm testing a null hypothesis that the failures are dependent of date?
Thanks in advance for help on this!


LinkBack URL
About LinkBacks