# Not sure how to interpret this data.

• Sep 12th 2006, 05:29 PM
CysticDuct
Not sure how to interpret this data.
Hello all. I found this site through a websearch and I am hoping someone can help me out.

I am trying to find out the most appropriate method for interpreting some data.

This is a current ongoing research project so I will use an analogy.

So here is what I am trying to show.

Left handed people have a higher likelihood of developing arthritis than right handed people. I have divided my patients into 4 groups.

Group 1: patients 40-49 years old with arthritis
Group 2: patients 50-59 years old with arthritis
Group 3: patients 60-69 years old with arthritis
Group 4: patients >69 years old with arthritis

A = LEFT handed
B = RIGHT handed

Now, I am also using the knowledge that 10% of the population from which I am getting my sample is LEFT handed.

SO...I would think that if my data shows that more than 10% of the people in my study are LEFT handed, this would be significant. Correct?

Here is my problem.

Thus far my data is coming out where close to 50% of the people who develop arthritis are LEFT handed. WOW! 5 times what I anticipated based on the percentage of LEFT handed people in the general population.

NOW>..how do I break down the data to show this is statistically significant?

Here is my RAW data:

Group 1: A = 3 B= 5
Group 2: A = 5 B = 3
Group 3: A = 6 B = 6
Group 4: A = 3 B = 4
N = 35
Total A = 17, Total B = 18

Again, with 35 people in the study and 10% of the general population being left handed, I would think that a random sampling of arthritis patients would be 10% lefty and 90% righty. But my data does not show this.

My data is showing that DESPITE the fact that most people are right handed, patients with arthritis are split down the middle.

Therefore, being LEFT handed increases your chance of developing arthritis versus a right handed person. Correct?

I think my reasoning is correct, but now I need to prove the data is statistically significant.

So I did a Chi Square...not sure if this is the correct method.

When I did the Chi Square and calculated the "Expected", my Significance came out to 0.77 with a Chi of 1.115 and Df = 3.

When I did the Chi Square and ASSIGNED the "Expected" based on what I would think would happen based on the fact that 10% population is lefty, 90% righty...my significance came out to 0.002, Chi of 14.496 Df of 3.

Is this the correct way of doing this? Simply assigning values in the "expected" square based on population trends, or do I need to calculate it using a formula? Is a chi Square the proper way to interpret this data?

ANY help would be greatly appreciated.

This is the last hurdle in moving forward with further write up and analysis.

Thank you so very much!
• Sep 12th 2006, 10:01 PM
CaptainBlack
Quote:

Originally Posted by CysticDuct
Hello all. I found this site through a websearch and I am hoping someone can help me out.

I am trying to find out the most appropriate method for interpreting some data.

This is a current ongoing research project so I will use an analogy.

So here is what I am trying to show.

Left handed people have a higher likelihood of developing arthritis than right handed people. I have divided my patients into 4 groups.

Group 1: patients 40-49 years old with arthritis
Group 2: patients 50-59 years old with arthritis
Group 3: patients 60-69 years old with arthritis
Group 4: patients >69 years old with arthritis

A = LEFT handed
B = RIGHT handed

Now, I am also using the knowledge that 10% of the population from which I am getting my sample is LEFT handed.

SO...I would think that if my data shows that more than 10% of the people in my study are LEFT handed, this would be significant. Correct?

Here is my problem.

Thus far my data is coming out where close to 50% of the people who develop arthritis are LEFT handed. WOW! 5 times what I anticipated based on the percentage of LEFT handed people in the general population.

NOW>..how do I break down the data to show this is statistically significant?

My advice is - you don't. You most probably have another unidentified selection effect
at work in your sampling frame.

An effect so large in a common complaint would have been so obvious that you would
not have to ask the question, so there must be something peculiar with your data.

Now if only a fraction of 0.1% of the population develop arthritis then difference
in rates might have gone unnoticed, but even then it is difficult to believe that half
of the patients at an arthritis clinic being LHed would have not been noticed decades
ago.

RonL
• Sep 12th 2006, 10:08 PM
JakeD
Quote:

Originally Posted by CysticDuct
Hello all. I found this site through a websearch and I am hoping someone can help me out.

I am trying to find out the most appropriate method for interpreting some data.

This is a current ongoing research project so I will use an analogy.

So here is what I am trying to show.

Left handed people have a higher likelihood of developing arthritis than right handed people. I have divided my patients into 4 groups.

Group 1: patients 40-49 years old with arthritis
Group 2: patients 50-59 years old with arthritis
Group 3: patients 60-69 years old with arthritis
Group 4: patients >69 years old with arthritis

A = LEFT handed
B = RIGHT handed

Now, I am also using the knowledge that 10% of the population from which I am getting my sample is LEFT handed.

SO...I would think that if my data shows that more than 10% of the people in my study are LEFT handed, this would be significant. Correct?

Here is my problem.

Thus far my data is coming out where close to 50% of the people who develop arthritis are LEFT handed. WOW! 5 times what I anticipated based on the percentage of LEFT handed people in the general population.

NOW>..how do I break down the data to show this is statistically significant?

Here is my RAW data:

Group 1: A = 3 B= 5
Group 2: A = 5 B = 3
Group 3: A = 6 B = 6
Group 4: A = 3 B = 4
N = 35
Total A = 17, Total B = 18

Again, with 35 people in the study and 10% of the general population being left handed, I would think that a random sampling of arthritis patients would be 10% lefty and 90% righty. But my data does not show this.

My data is showing that DESPITE the fact that most people are right handed, patients with arthritis are split down the middle.

Therefore, being LEFT handed increases your chance of developing arthritis versus a right handed person. Correct?

I think my reasoning is correct, but now I need to prove the data is statistically significant.

So I did a Chi Square...not sure if this is the correct method.

When I did the Chi Square and calculated the "Expected", my Significance came out to 0.77 with a Chi of 1.115 and Df = 3.

When I did the Chi Square and ASSIGNED the "Expected" based on what I would think would happen based on the fact that 10% population is lefty, 90% righty...my significance came out to 0.002, Chi of 14.496 Df of 3.

Is this the correct way of doing this? Simply assigning values in the "expected" square based on population trends, or do I need to calculate it using a formula? Is a chi Square the proper way to interpret this data?

ANY help would be greatly appreciated.

This is the last hurdle in moving forward with further write up and analysis.

Thank you so very much!

On the surface, your methodology looks OK except the Df calculation. But the results are so striking that I have to question the methodology and/or the data.

Your hypothesis is that left-handers have a higher proportion of arthritis-sufferers than right-handers. The typical way to test this would be to collect a sample of left- and right-handers with and without arthritis and use a two-way table (neglecting age here). Instead, you are testing that arthritis-suffers who made it into the sample are more likely to be left-handed. Neglecting age, you have a one-way table. There is no one in the sample without arthritis.

The peril of not having anyone without arthritis is that you have no statistical test of how representative the sample is of the general population in terms of left-handedness. It could be for some reason the sample is biased for left-handedness. If you have people without arthritis in the sample you could test this. Since essentially half the sample is left-handed, I have to suspect that the sample has been stratified to over-sample left-handers. Did you collect the raw data yourself so you know there is no bias?

That said, I did not check your statistical calculations, but what you describe seems OK except for the Df. The Df for your first test should be the number of cells 8 less 1 less another 1 for the parameter estimated from the sample. The Df for your second test should just be 8 - 1 because no parameter was estimated.

The interpretation of your first test is that age does not signifcantly affect handedness of people with arthritis. So you could combine the age groups and just do a one-way chi-square test. You could also do a one-tailed t-test that the proportion of left-handers is greater than 10%. I did that and found the t-statistic was 7.6, that is, off-the-charts in terms of significance. That is what led me to question the data.

Good luck!
• Sep 13th 2006, 08:07 AM
CaptainBlack
Quote:

Originally Posted by CysticDuct
Hello all. I found this site through a websearch and I am hoping someone can help me out.

I am trying to find out the most appropriate method for interpreting some data.

This is a current ongoing research project so I will use an analogy.

So here is what I am trying to show.

Left handed people have a higher likelihood of developing arthritis than right handed people. I have divided my patients into 4 groups.

Group 1: patients 40-49 years old with arthritis
Group 2: patients 50-59 years old with arthritis
Group 3: patients 60-69 years old with arthritis
Group 4: patients >69 years old with arthritis

A = LEFT handed
B = RIGHT handed

Now, I am also using the knowledge that 10% of the population from which I am getting my sample is LEFT handed.

SO...I would think that if my data shows that more than 10% of the people in my study are LEFT handed, this would be significant. Correct?

Here is my problem.

Thus far my data is coming out where close to 50% of the people who develop arthritis are LEFT handed. WOW! 5 times what I anticipated based on the percentage of LEFT handed people in the general population.

NOW>..how do I break down the data to show this is statistically significant?

Here is my RAW data:

Group 1: A = 3 B= 5
Group 2: A = 5 B = 3
Group 3: A = 6 B = 6
Group 4: A = 3 B = 4
N = 35
Total A = 17, Total B = 18

Again, with 35 people in the study and 10% of the general population being left handed, I would think that a random sampling of arthritis patients would be 10% lefty and 90% righty. But my data does not show this.

My data is showing that DESPITE the fact that most people are right handed, patients with arthritis are split down the middle.

Therefore, being LEFT handed increases your chance of developing arthritis versus a right handed person. Correct?

I think my reasoning is correct, but now I need to prove the data is statistically significant.

So I did a Chi Square...not sure if this is the correct method.

When I did the Chi Square and calculated the "Expected", my Significance came out to 0.77 with a Chi of 1.115 and Df = 3.

When I did the Chi Square and ASSIGNED the "Expected" based on what I would think would happen based on the fact that 10% population is lefty, 90% righty...my significance came out to 0.002, Chi of 14.496 Df of 3.

If I recall correctly you should agregate the cells so the the expected
frequencies are 5 or more.

Also what are your cells here?

RonL
• Sep 13th 2006, 04:44 PM
CysticDuct
Thanks for the replies thus far.

I dont know what you mean by "cells". :confused:

I collected the data myself.

Now, the data is actually surgical in nature. I used left handedness and arthritis as an example.

I guess a better explanation would be "Patients who smoke cigarettes develop gall bladder disease", where the questionaire was filled out by every patient who had their gall bladder removed at a single surgical site. Each patient was asked their age and if they smoked or not. Only people who smoked >1/2 pack per day were included.

I know the % of the US adult population who smokes.
All of my patients are US adults.

And again, I would expect my data to indicate that the percentage of patients with GB disease that smoke would mirror the population of US adults who smoke. But again, this is not what I am finding. I am finding that nearly half of my patients thus far smoke...whereas the percentage of smokers in the general population is less than half THAT. So I would expect that only 1/4 of my patients would be smokers.
• Sep 13th 2006, 08:00 PM
JakeD
Quote:

Originally Posted by CysticDuct
Thanks for the replies thus far.

I dont know what you mean by "cells". :confused:

This is 8 cells.

Group 1: A = 3 B= 5
Group 2: A = 5 B = 3
Group 3: A = 6 B = 6
Group 4: A = 3 B = 4

Quote:

Originally Posted by CysticDuct
I collected the data myself.

Now, the data is actually surgical in nature. I used left handedness and arthritis as an example.

I guess a better explanation would be "Patients who smoke cigarettes develop gall bladder disease", where the questionaire was filled out by every patient who had their gall bladder removed at a single surgical site. Each patient was asked their age and if they smoked or not. Only people who smoked >1/2 pack per day were included.

I know the % of the US adult population who smokes.
All of my patients are US adults.

And again, I would expect my data to indicate that the percentage of patients with GB disease that smoke would mirror the population of US adults who smoke. But again, this is not what I am finding. I am finding that nearly half of my patients thus far smoke...whereas the percentage of smokers in the general population is less than half THAT. So I would expect that only 1/4 of my patients would be smokers.

The only issue I see is that you are assuming your patients are representative of the general population in terms of smoking and other factors influencing GB disease.

Here is a hypothetical example of how that might not be true. Suppose you in the past did a great job on a few smokers who spread the word to other smokers who then come to you as patients. So by historical accident you tend to attract smokers as your patients. Then your data say nothing about causation of GB disease.

To be statistically convincing, you have to rule out all possible stories like this. That's difficult to do without carefully generated data and that is why controlled trials and double-blind experiments are used.

However, although I don't know, isn't it true that a lot of published medical articles do not report on rigorously controlled trials but are still useful? Perhaps they prompt someone else to do the controlled trials.
• Sep 13th 2006, 10:23 PM
CaptainBlack
Quote:

Originally Posted by CysticDuct
Thanks for the replies thus far.

I dont know what you mean by "cells". :confused:

I collected the data myself.

Now, the data is actually surgical in nature. I used left handedness and arthritis as an example.

I guess a better explanation would be "Patients who smoke cigarettes develop gall bladder disease", where the questionaire was filled out by every patient who had their gall bladder removed at a single surgical site. Each patient was asked their age and if they smoked or not. Only people who smoked >1/2 pack per day were included.

I know the % of the US adult population who smokes.
All of my patients are US adults.

The proportion of the US population that smokes varies with geographical
area. So while the "typical" rate may be ~25%, there are regions where it is
~35%, and others where it is ~19%.

As you have taken data from a single centre you need to know the
proportion of the population served by the centre who smoke not a
national average.

Now if the centre is in a high smoking region the centre may serve a smaller
sub-population and the smoking rate again may differe from that of the
region. For instance if the smoking rate varies significantly with age, you will
be sampling from a different population from that of all adults in your region
(in fact it looks like you are sampling from an older population that the

In conclusion - you cannot use the US national smoking statistics as
a reference smoking rate, you have to find that data for the population that
your centre serves (including the correct age band).

RonL
• Sep 14th 2006, 04:33 PM
CysticDuct
Excellent ideas.

I will admit this is indeed a preliminary study that will hopefully increase interest in future studies of this kind. I am not looking to publish a ground breaking paper but rather to demonstrate a possible correlation in hopes that someone will pick this up and use it in future work. This is also a 100% non-funded study...which is why I am online trying to figure out how to do the math! No one involved is being paid to do any research on anything else so its really all up to me.

Can anyone direct me to a website that can walk me through step-by-step instructions on plugging these numbers into an equation.

I am embarassed to admit that after 21 years of school I am still a "see it then do it" sort of guy!

Thanks so much for all the help. I really do appreciate it!
• Sep 15th 2006, 12:59 AM
JakeD
Quote:

Originally Posted by CysticDuct
Excellent ideas.

I will admit this is indeed a preliminary study that will hopefully increase interest in future studies of this kind. I am not looking to publish a ground breaking paper but rather to demonstrate a possible correlation in hopes that someone will pick this up and use it in future work. This is also a 100% non-funded study...which is why I am online trying to figure out how to do the math! No one involved is being paid to do any research on anything else so its really all up to me.

Can anyone direct me to a website that can walk me through step-by-step instructions on plugging these numbers into an equation.

I am embarassed to admit that after 21 years of school I am still a "see it then do it" sort of guy!

Thanks so much for all the help. I really do appreciate it!

The first site in this Google search has an extensive explanation and a calculator.