# Regression fundamentals (linear, non linear, simple, multiple)

Printable View

• Nov 4th 2009, 08:03 PM
ontherocks
Regression fundamentals (linear, non linear, simple, multiple)
I am trying to understand the fundamentals of Regression. I did a lot of reading, still things are not clear to me due to ambiguity in use of terms at every other material that I read.

The following is a quote obtained from wikipedia (Regression analysis - Wikipedia, the free encyclopedia)

'In linear regression, the model specification is that the dependent variable, yi is a linear combination of the parameters (but need not be linear in the independent variables)"

Could anyone elaborate with a real example what constitutes "parameter", "dependent variable", "independent variable" ?
• Nov 6th 2009, 04:39 AM
statmajor
I'm taking Regression Analysis right now, so I'll try my best to explain it.

In regression analysis, as you said, we try to create an equation to relate the dependent variable (y) to the dependent variables(x1, x2,...,xn).

It's called the dependent variable because it's value depends on the value of the independent variable (whose value doesn't depend on anything).

In a simple model, there is only 1 independent variable, while in multiple model, there are many.

Simple model - $\displaystyle y = B_0 + B_1 x$

multiple model - $\displaystyle y = B_0 + B_1 x_1 + B_2 x_2 + ... +B_n x_n$

where $\displaystyle B_0$ is a parameter.

Example

We want to create an equation to predict the crime rate to different facters. In other words, we want to relate the dependent variable ( y - crime rate) to the independent variables (x - crime factors)

X1 - Location
X2 - Status of the economy
X3 - Unemployment rate

So the multiple regression model in this example would be $\displaystyle y = B_0 + B_1 x_1 + B_2 x_2 + B_3 x_3$

We just started non-linear, so I don't know much about it. The previous examples are linear, while the following isnt:

$\displaystyle y = B_0 + B_1 x_1^{B_1} + B_2 x_2$

$\displaystyle y = B_0 + B_1 x_1^2 + B_2 x_2$ is also linear, which is what you quoted from Wikipedia "but need not be linear in the independent variables". It has to be linear in terms of the parameters.
• Nov 6th 2009, 11:52 PM
ontherocks
Are you sure only $\displaystyle B_0$ is called the parameter.
I think all the B's are called "parameters" or "regression coefficients".

Now, parameters are just numbers that are obtained when you fit data points to a model. They don't correspond to any real physical entity (unlike variables).
What does the wikipedia statement mean then? How do I know if they are linear or non-linear in B's ?
For example I have a model
$\displaystyle y = 3 + 13 x_1 + 7 x_2$
How do I know if 3 or 13 or 7 are squares or cubes or fourth power of something.

The other question is, which term in the model
$\displaystyle y = B_0 + B_1 x_1 + B_2 x_2 + ... +B_n x_n$
gives the information about which variable is dominant and which is not?
• Nov 7th 2009, 07:25 AM
theodds
Yeah, all the $\displaystyle \beta_i$ are parameters, I think statmajor meant that $\displaystyle \beta_0$ is just an example of one. The parameters, generally speaking, allow for physical interpretations, but I wouldn't get too far carried away with them.

A function is linear in the $\displaystyle \beta_i$ means more or less that you are multiplying the $\displaystyle \beta_i$ by constants and adding them up (the constants in this case being the X's). A model of the form, say,

$\displaystyle Y = \beta_0 e^{\beta_1 X} + \epsilon$

is not linear in the parameters. Epsilon is the error term here.

Quote:

For example I have a model
http://www.mathhelpforum.com/math-he...011c89a7-1.gif
How do I know if 3 or 13 or 7 are squares or cubes or fourth power of something.
The model choice is (speaking rather loosely) up to whoever is doing the study. There would be no reason to choose the model as, say, $\displaystyle Y = \beta_0 ^2 + \beta_1 ^2 X + \epsilon$, when getting rid of the squares would do the job just as well. Maybe you would be interested in the square root of a parameter for some reason though. What generally happens is the person doing the regression says "I believe for theoretical/practical/BecauseI'mLazy reasons that the relationship is of this form." Then they go and fit the model, do diagnostics, etc. to find out what they need to know and determine if their assumptions were reasonable.
• Nov 7th 2009, 09:08 AM
ontherocks
Quote:

Maybe you would be interested in the square root of a parameter for some reason though.

I cannot fathom for what reason would anyone want to think that way.
I mean, if the parameters don't correspond to any physical entity how does it matter whether I say for example
$\displaystyle pressure = 3 + 16*temperature$ (".....you know pressure is 16 times the temperature....")
or
$\displaystyle pressure = 3 + 4^2*temperature$ (".....you know pressure is 4 squared times the temperature......." ok, so?)
• Nov 7th 2009, 09:35 AM
theodds
Okay, let me briefly explain what is going on when we are using regression to model a phenomena.

1) We have a random response (Y) that we want to model as a function some predictors (the Xs).
2) We decide what form our model is going to take. The choice of model can be motivated by many things. The model will have (i)a systematic component (the deterministic part of the model), which is a function of unknown parameters (the Betas) and the fixed, known independent variables (the Xs) and (ii) a random component (the part that reflects that the outcome is not deterministic; this is usually done by using a single random term epsilon, with mean 0 and some unknown variance that we also try to estimate; there are other ways of building in a random component however). Just so I'm not leaving anything out, we might also have (iii) what is called a link function, but I would just ignore this if I were you until you understand the basics.
2a) If our model happens to be linear in the parameters, i.e. to get the deterministic part of the model we sum multiples of the parameters together, then we have a linear model and we get to use all the wonderful theory developed for it. This is something that drives people to choose to model phenomena using linear models.
3) We then use the theory developed to fit the model, and estimate the parameters. Then we find out what we need to know, do diagnostics, test hypothesis, check our assumptions, whatever.

So, for your example, we want to model pressure as a function of temperature. I'm assuming that this isn't a deterministic relationship, since otherwise there would be no reason to do a regression. So we 1) have a random response (pressure) that I want to model as a function of a predictor (temperature). 2) We choose to model this phenomena with a linear model of the form $\displaystyle Y = \beta_0 + \beta_1 X + \epsilon$. This is motivated by our theoretical background, let's say. 2a) Our model happens to be linear, so we get the benefit of all the theory that has been developed for linear models. 3)We then fit the model, and estimate the two unknown parameters and do what we need to do.
• Nov 7th 2009, 09:46 PM
ontherocks
Thanks for that detailed explanation.
(Crying) Still its not clear to me what "parameters" are.
Are they constants? Or are they functions of something?

If they are constants then the statement that model is a function of constants & variables has no meaning.
Its like saying "y is a function of constants"
• Nov 8th 2009, 05:41 AM
theodds
In this context, the parameters are fixed constants. Sorry about the abuse of language in saying that Y is a function of them. They don't change. A parameter is fixed and usually unknown in this setting. The experimenter decides to model the deterministic aspect of a response as a function of a set of predictors using a general functional form, that is known up-to some unknown constants (parameters). The random component also features parameters.

Don't get hung up on the use of the term parameter; it would probably be best to just think of them as constants. To be honest, though, there are different viewpoints in Statistics on how to think of parameters - the old Classical vs. Bayesian debate.
• Nov 8th 2009, 07:44 PM
ontherocks
Ok great.
Now my next questions.

Q1. Which term in the model (for example in a multiple linear regression)
$\displaystyle y = B_0 + B_1 x_1 + B_2 x_2 + ... +B_n x_n$
gives the information about which variable is dominant and which is not?

I think the parameters (I mean the magnitude of the parameters) tell if the corresponding variable is dominant or not, am I right?

Q2. Again a quote from wikipedia (Coefficient of determination - Wikipedia, the free encyclopedia)
"In many (but not all) instances where $\displaystyle R^2$ is used, the predictors are calculated by ordinary least-squares regression: that is, by minimizing SSerr. In this case R-squared increases as we increase the number of variables in the model ($\displaystyle R^2$ will not decrease)."
Could you explain why $\displaystyle R^2$ increases as the number of variables is increased and vice versa?
• Nov 9th 2009, 06:35 AM
theodds
Quote:

Ok great.
Now my next questions.

Q1. Which term in the model (for example in a multiple linear regression)
$\displaystyle y = B_0 + B_1 x_1 + B_2 x_2 + ... +B_n x_n$
gives the information about which variable is dominant and which is not?
I think the parameters (I mean the magnitude of the parameters) tell if the corresponding variable is dominant or not, am I right?

None of the terms in the model do. The estimates of the parameters don't give any indication as to which predictor is "dominant." This can be seen if you notice that the X's may be scaled differently. If you standardized the predictors, then you would have a better case for interpreting the Beta's in this way. The first step is usually figuring out which predictors are statistically significant though.

Quote:

Q2. Again a quote from wikipedia (Coefficient of determination - Wikipedia, the free encyclopedia)
"In many (but not all) instances where $\displaystyle R^2$ is used, the predictors are calculated by ordinary least-squares regression: that is, by minimizing SSerr. In this case R-squared increases as we increase the number of variables in the model ($\displaystyle R^2$ will not decrease)."
Could you explain why $\displaystyle R^2$ increases as the number of variables is increased and vice versa?
Before thinking about $\displaystyle R^2$, you should probably first learn about the concept of partitioning the overall variability of the response. This material always comes before discussing $\displaystyle R^2$ when learning regression. See ANOVA, particularly the section on partitioning sums of squares. Then, just know that in Regression we partition the total variability as SS(Error) and SS(Regression) instead, and we have the formula $\displaystyle R^2 = \frac{SS(Regression)}{SS(Total)}$. The intuitive reason why $\displaystyle R^2$ strictly increases is that, as we add more predictors, we can only ADD predictive power to the model. After all, the worst case scenario is that we've added a predictor that is unrelated, and in that case we should end up with the same model effectively anyways. The flip side is that having a lot of unnecessary predictors creates a host of problems relating to lack of parsimony and bias (among other things), so $\displaystyle R^2$ is a pretty awful criteria for determining how many predictors to have in your model.
• Nov 9th 2009, 11:58 PM
ontherocks
Thanks theodds for the R-square explanation.

I now have more terms to get used to (Happy)
Y = Dependent Variable, Response
X = Independent Variable, Predictor
B = Parameter

Quote:

The first step is usually figuring out which predictors are statistically significant though.

How do I determine that? I mean what methods/tools do I use?

My bad I was under the impression that
$\displaystyle y=b_0+{b_1}^2*x_1+{b_2}^2*x_2$ is a non-linear model. (Headbang)
I think I understand the difference between linear & non-linear model.
• Nov 10th 2009, 06:30 AM
theodds
The model, as written, is non-linear. But it is pointless to think of a model that way. The only thing you've done by fitting a model that way is (i) complicated things and (ii) forced the relationship between the response and predictor to be positive. You would be better off fitting a model where you set $\displaystyle \gamma_i = \beta_i ^2$, which would be linear and remove the imposed restriction. If you can get, you want the model to be linear. You typically would only fit a non-linear model if there wasn't any way you could make it linear. For example, if you want to fit the model $\displaystyle Y = \beta_0 \beta_1 ^X \epsilon$, you could do that by fitting the model $\displaystyle log Y = \gamma_0 + \gamma_1 X + \delta$ instead.

As for determining what predictors are statistically significant, you probably should read a textbook or take a course in this material. It would also clear up any misunderstandings you have. Regression is, at the very least, a semester long undergraduate course that typically focuses 90% on Linear Regression, and has at least a semester of background material as a prerequisite that you've hopefully already had. I really can't answer that question succinctly well given where you are, and even if I could, I would risk causing you to screw up whatever you are doing, since the correct thing to do in a Regression setting often depends on technical details. That is, I think it would be best for you if I didn't answer that question, and just direct you to a textbook. I can post some course notes if you want.
• Nov 11th 2009, 10:35 PM
ontherocks
Yes, let me know some references I can go through.
Actually simple(univariate) linear regression was clear to me. Now simple(univariate) non linear regression is somewhat clear to me.
What is troubling me is multiple regression.(Since it calls for handling more than two dimensions.)

For univariate (X-Y axes) I would fit a line.
For bivariate (X-Y-Z axes) I would fit a plane(??)
For trivariate and above I don't know what to fit.
But all regression softwares fit a line to multivariate data. How come its possible?

Also for the immediate moment let me know the tools for determining the magnitude of significance among variables.
• Nov 12th 2009, 06:47 AM
theodds
Quote:

Yes, let me know some references I can go through.
Actually simple(univariate) linear regression was clear to me. Now simple(univariate) non linear regression is somewhat clear to me.
What is troubling me is multiple regression.(Since it calls for handling more than two dimensions.)

For univariate (X-Y axes) I would fit a line.
For bivariate (X-Y-Z axes) I would fit a plane(??)
For trivariate and above I don't know what to fit.
But all regression softwares fit a line to multivariate data. How come its possible?

For P predictors, you would fit a hyperplane in (P + 1) dimensional Euclidean space. If you have polynomial terms or interaction terms or, in general, functions of predictors with associated Beta's of their own, you can get more complicated curves in space using OLS. The fact that Linear Models are called such is a little misleading; we can get quite complicated curves using them. Trigonometric curves, hyper-parabolas, etc are all available to us. But obviously we aren't restricted to 3 dimensions; it's just that we can't visualize a hyperplane, but that's what the technique does.

Quote:

Also for the immediate moment let me know the tools for determining the magnitude of significance among variables.
Okay. It turns out that, if you assume that the error term in an Ordinary Least Squares model is normally distributed, then the least-squares estimates for the Beta's (usually denoted $\displaystyle \hat{\beta}_i$ are also normally distributed (actually, the joint distribution of the $\displaystyle \hat{\beta}_i$'s is multivariate normal). To test for the significance of a single predictor, given that all the others are to be included in the model, you just conduct a T-Test of the hypothesis that $\displaystyle \beta = 0$ using the test statistic $\displaystyle \frac{\hat{\beta}_i}{s\{ \hat{\beta_i}\}}$. This all comes standard in the output of any automated procedure, so don't worry about having to calculate anything, just make sure you can interpret. For more complicated hypothesis, your test statistic is different. But to tell you how to do that would be to teach you the entire subject matter.

On the subject of practical significance, you would need to look at the size of the Beta's relative to their predictors. Or, you could standardize the predictors, so that all the Beta's are directly comparable in terms of scale, or use the partial-R-Square or one of it's many brothers. This gets quite hazy, however, when you have predictors that are themselves highly correlated, since you run into the issue of multicollinearity. Again, I really think you should study the entirety of the subject matter before you start trying to figure out things like this.

Another thing to consider is that, unless you are doing a controlled experiment, Regression CANNOT establish a causal relationship at all, let alone the relative importance of the predictors.

My undergraduate text was Kutner: "Applied Linear Statistical Models" which I think would be adequate for your needs. The first 13 chapters are on Ordinary Least Squares.