Appropriate Regression Analysis?
Does this even make sense?
Am told to do a multiple regression analysis. The response variable and the explanatory variables add up and should give a percent of the total product. Example:
Milk = water + fat + protein + e ~= 96% (all are in terms of percentages, 96% is the average total from the dataset)
where e is the error such that e~NID(o,sigma)
The regression I was asked to do is
protein = β + water*x1 + fat*x2 + e
We want this so we can calculate protein without having to measure it (data is from some test samples)
This to me makes absolutely no sense...Since the equation is just
protein = -water-fat-e +100
so the coefficients should be negative 1 and beta = 100???
Would it be more appropriate to do a hypothesis test as such:
Protein + Water + Fat ~= 96% of total milk weight (the total average is the average from data)
Hypothesis test:
H0 := (Water + Fat) - Protein = 45.1 (average difference obtained from data)
H1 := (Water + Fat) - Protein != 45.1
Not sure what analysis would be best and any help would be most appreciated. Thanks!
Re: Appropriate Regression Analysis?
i dont really understand your post im afraid, but it sounds like you have multicollinearity and/or errors that are correlated to your explanatory variables.
But its hard to say as you have written Milk ~= 96% which i cant really interpret at all.
Re: Appropriate Regression Analysis?
Sorry for being unclear. ~= is supposed to mean approximately equal to, maybe just ~ would be more clear but that seemed confusing with having a distributiuon.
It should read
water+fat+protein+e ~= 96% of Milk
and the approximately equal to is because not all the milk contents are being measured and the total of what is being measured is averaging out to be around 96%.
I am thinking I am having a problem of systematic errors vs random errors. I am going to do some studying tonight and hopefully have a more well formed question tomorrow and will hopefully stick with the normal linear regression. Thanks!