I have a large amount of data that I wish to analyse. The data is structured as follows:
- 1 dependent variable
- 9 independent variables
- The independent variables are categorical and not of the same size. That is, one category has 4 values, one has 10, one has 8 etc.
- The dependent variable is numeric (quantitative)
- Approx 15,000 data points, representing various, but not all, combinations of the 9 independent variables.
I want to work out what is the relevant impact of each of the independent variables. Common questions will be: “What is the effect of having value X compared to value Y”, “Which variable has the largest effect on the dependent variable?” etc.
I was advised that stepwise regression would be the best approach, however I believe that to use this all the categories must be the same size. Is this true?
I am at a total loss as to how to tackle this, any suggestions appreciated!