Is it legitimate to filter results AFTER a test, but BEFORE calculating statistics? Here are two examples, each using a sample size of 100:
1. MEDICAL TEST
Testing variations (several dozen) of a medical treatment, evenly split between sexes.
Without treatment, all subjects die. When treated, all males die. However, in females, variations of the treatment result in a 25% to 90% cure rate.
QUESTION: When comparing treatment variations, should the data for males be removed before calculating basic stats? All tests are done on 100 subjects, equally divided into 50 males and 50 females, but in each test all males die, so the median would be calculated on females only.
2. COMMERCIAL TEST
Testing variations of a "spray on" treatment for car tires, which is expected to increase tread wear; after application, each car must travel 5,000 miles. Tread will be measured before and after the trip.
Upon application to all four tires, between 10% and 80% of them melt off the rim (all four melt or none do); thus, they do not complete (or even start) the road test. However, vehicles with tires that survive treatment do complete the road test; results vary from -20% to +150% change in tread wear, versus untreated (each test uses the same cars, with four tires of the same brand on each).
QUESTION: Is it valid to remove from each test result data set those tires which melted, before comparing successful variations? This would mean that the median would be calculated on varying numbers of non-melted tires; ie: One data set might have 30 of 100 non-melted, another may have 60 of 100, yet another 90 of 100, etc.
NOTE: In both of the above tests, the sample size is 100, but for actual tests, this number could vary from 15 to 1500.
The main question is whether it is valid to remove data before comparing statistical measures such as the median, variance, etc., in the event of an unanticipated catastrophic failure.
Please feel free to elaborate; any & all replies will be thought upon. Thank you for your time,