Need Help With Hypothesis Test

Printable View

• Mar 28th 2011, 10:28 AM
elleg
Need Help With Hypothesis Test
Let's just say it's been a while since college, and I can't remember everything about hypothesis testing, and now I'm faced with a situation at work where I need to refresh some of that old knowledge. Before I get too far, let me also state that I have no access to any statistical analysis software packages (like Minitab). The only thing I have is Excel.

I have some data regarding item failures along with a number of other variables. I have the date range in which the item was manufactured, and some other variables like environmental variables (temp, humidity, etc.) during the test of the items. The item is tested with a result of 'pass' or 'fail'. For each range variable, I want to see if the failures show any correlation. (I don't want to check combinations between variables, at least, not yet.)

For example,
Code:

Date Range        Pass        Fail        Total
-------------------------------------
Jan 2010        14        1        15
Feb 2010        1        0        1
Mar 2010        19        2        21
Apr 2010        59        1        60
May 2010        13        0        13
Jun 2010        17        4        21
-------------------------------------
Total                123        8        131

After much research, I determined that I ought to find the Pearson chi-squared statistic, and then Cramer's V. (I also calculated it with Yate's correction, since many expected counts were < 5.) For the example above, I chose the null hypothesis that the failures are independent of date, with the alternative hypothesis being that the failures are dependent of date.

So, these are the values I calculated:
Code:

Date Range        Expected Pass        Expected Fail        Total
-----------------------------------------------------
Jan 2010        14.084                0.916                15
Feb 2010        0.939                0.061                1
Mar 2010        19.718                1.282                21
Apr 2010        56.336                3.664                60
May 2010        12.206                0.794                13
Jun 2010        19.718                1.282                21
-----------------------------------------------------
Total                123                8                131

Date Range        Deg/Free        Chi-2        Chi-Yates        V        Yates-V
-----------------------------------------------------------------------
Jan 2010        1                0.008        0.201
Feb 2010        1                0.065        3.360
Mar 2010        1                0.428        0.039
Apr 2010        1                2.063        1.361
May 2010        1                0.846        0.116
Jun 2010        0                6.133        4.084
-----------------------------------------------------------------------
Total                5                9.543        9.162                0.270        0.264

Now, I go look in a chi-square table, and I see there is 90% probability that chi-square will be greater than 1.61 for the 5 degrees of freedom in my example. Clearly, my chi-square value is greater than that, whether I look at the Yates-corrected one or not. So, my null hypothesis cannot be disproved, which effectively tells me nothing.

So, here are my issues:
1) To be perfectly honest, I can't remember how I'm supposed to choose which "side" of the table to look up the value on, and I'm assuming I chose the right value from the table, but I really don't know if I did. Did I? How do I know which side to choose? I recall graphs of a bell curve and regions in the tails on either side of the curve representing where these types of values fall, but this is a vague recollection...
--EDIT: From what I recall, if you do a two-tailed test, you have to cut your significance in half, (0.05 becomes 0.025) because there is a region of significance at each end of the probability curve. I think the chi-squared test for independence is two-tailed, but every example I find online still only looks at one p-value, and appears to use the full significance (0.05 instead of 0.025). Can someone explain this?
--
2) Assuming I picked the right value, and my test really does tell me nothing, how can I reverse my null hypothesis and alternative hypothesis, such that I'm testing a null hypothesis that the failures are dependent of date?

Thanks in advance for help on this!
• Mar 28th 2011, 12:26 PM
elleg
additional info
I've been thinking about this some more, and I think what I really need to be comparing is the significance level of my test (I was aiming for 0.05) to the p-value that corresponds with my calculated chi-squared statistic (9.543 or 9.162 with 5 D/F).

9.543 --> p-value of 0.09
9.162 --> p-value of 0.10

Again, either way, the null hypothesis cannot be rejected as false.

But, I'm not sure there actually is any way for me to flip my hypotheses around... In order to flip them, I'd have to be able to calculate the expected number of failures given the null hypothesis that they are dependent on date. But, I don't think that's possible, so I may just be stuck here...

Additionally, since I have expected values of < 5 in more than 20% of cases, how reliable is this test for my situation anyway? Is there anything I can use for a reliable test, or can my only conclusion be that I can't come to one?