# Filtering sample data -- catastrophic failure?

• Dec 23rd 2012, 09:31 PM
JohnathanStein
Filtering sample data -- catastrophic failure?
Is it legitimate to filter results AFTER a test, but BEFORE calculating statistics? Here are two examples, each using a sample size of 100:

1. MEDICAL TEST

Testing variations (several dozen) of a medical treatment, evenly split between sexes.

Without treatment, all subjects die. When treated, all males die. However, in females, variations of the treatment result in a 25% to 90% cure rate.

QUESTION: When comparing treatment variations, should the data for males be removed before calculating basic stats? All tests are done on 100 subjects, equally divided into 50 males and 50 females, but in each test all males die, so the median would be calculated on females only.

2. COMMERCIAL TEST

Testing variations of a "spray on" treatment for car tires, which is expected to increase tread wear; after application, each car must travel 5,000 miles. Tread will be measured before and after the trip.

Upon application to all four tires, between 10% and 80% of them melt off the rim (all four melt or none do); thus, they do not complete (or even start) the road test. However, vehicles with tires that survive treatment do complete the road test; results vary from -20% to +150% change in tread wear, versus untreated (each test uses the same cars, with four tires of the same brand on each).

QUESTION: Is it valid to remove from each test result data set those tires which melted, before comparing successful variations? This would mean that the median would be calculated on varying numbers of non-melted tires; ie: One data set might have 30 of 100 non-melted, another may have 60 of 100, yet another 90 of 100, etc.

NOTE: In both of the above tests, the sample size is 100, but for actual tests, this number could vary from 15 to 1500.

The main question is whether it is valid to remove data before comparing statistical measures such as the median, variance, etc., in the event of an unanticipated catastrophic failure.

Please feel free to elaborate; any & all replies will be thought upon. Thank you for your time,

--Johnathan
• Dec 23rd 2012, 10:11 PM
chiro
Re: Filtering sample data -- catastrophic failure?
Hey JohnathanStein.

The key question you should ask is: "Will filtering your data in some particular way will introduce bias into your results in a way that confounds the analysis and gives you the wrong inference for your question"?

The answer will involve the question you are trying to answer, the nature of the data and the process, how data is collected and sampled, and the expert knowledge within your domain amongst other things.
• Dec 24th 2012, 12:03 AM
JohnathanStein
Re: Filtering sample data -- catastrophic failure?
Quote:

Originally Posted by chiro
The key question you should ask is: "Will filtering your data in some particular way will introduce bias into your results in a way that confounds the analysis and gives you the wrong inference for your question"?

Could you possibly give an answer in the cases of the two examples listed? Details and/or pitfalls would be appreciated. For example, in the first example (MEDICAL TEST), would it be accurate to compare medians of cure rates by drug formula for females only? If not, why not?

Quote:

Originally Posted by chiro
The answer will involve the question you are trying to answer, the nature of the data and the process, how data is collected and sampled, and the expert knowledge within your domain amongst other things.

Do you have any references, web or books, that might have practical examples? "Methods of Comparisons" or some such would probably suffice.

Thanks,

--Johnathan
• Dec 24th 2012, 12:09 AM
chiro
Re: Filtering sample data -- catastrophic failure?
I'll take a look at the first question later on.

But for the second one, this is why statisticians are professional advisors: it just takes a lot of experience to know what to look for and what to ask when speaking with a client.

Unfortunately it is hard to clearly quantify this in a book and even qualify some of the concepts and if I knew a book that went through these things I wouldn't hesitate to give it to you.

The mathematics is only a small part of what a statistician does: the real value of a statistician comes in the advice they gave and the better the advice, the better the statistician.