I am currently working on a project aimed at creating a statistical model for predicting domestic (household) water demand. I have obtained lots of data from a third party with the dependant variable being daily water usage (in litres per day) and lots of predictor statistics such as household occupancy rate, type of house, income of householders, number of adults, number of children, types of fixtures/fittings, garden or no garden etc etc.
I need to firstly determine which predictor variables are significant, then build a statistical model to estimate water demand based on the predictor variables found to be significant. Problem is I am not sure what approach to take for the first phase of this (I assume I need some kind of multi-variate regression model for the second phase but how do I first determine which variables are significant?)
I was thinking of using ANOVA but am not sure how to break my data set down into different groups. Do I need to analyse every possibility as a separate group, e.g. a 1 occupancy detached house with income of $40,000 with a garden, or do I just look at a smaller set of groupings, e.g. group all houses into occupancy rate bands then test those groups, ignoring for the moment the other variables such as house type, income etc, then do another ANOVA where I group all houses into house type only, then compare those ignoring occupancy rate, income etc.
Can anyone please advise on the correct approach?