I have a few questions;
Does anyone use SAS?
Either way can anyone think of a logical way to identify whether a variable is nominal, ordinal or continous? I will have to code this up, at the moment I'm simply thinking count the number of distinct values within that variable, ie out of a sample of 10,000 if I have 3,000 distinct values it's obviously continous, If I only have 5 then it's probably ordinal or nominal. It's a bit of an assumption, can anyone suggest a better method.
Also how many categories is too many for ordinal data? Do you think 20 distinct values is too many? Is 2 too little. What is the best for doing some predictive modelling?
If I have a sample of 20,000 observations with an even 50/50 split of my target variable, what calculation should I be using to decide what the optimal split is for nominal data. ie 10% of the sample contains 90% of my target variable, 20% contains 75%, 30% contains 50%. Is there a formula I should be using that gives a decent sample size with a decent ratio of my target?
I will be back with more questions I'm sure. I know sas can do all this without me knowing. But I want to learn why it's doing what it does, which also gives me more control over the model.