Logistic regression

Feb 2016
21
0
UK
Hi again,

This one will probably be very simple for those of you well-versed in statistics.

I am looking into discriminant analysis at the moment (as per my recent thread in this same sub-forum) and I got onto logistic regression. I came across an example which is as follows:

$L_( B | x_1, x_2, ..., x_m) = {\boldsymbol\Pi} [\frac{1}{1+\exp^{({\boldsymbol -x}^\mathrm{t}}{\boldsymbol B})}]^\mathrm{y_j} [1-\frac{1}{1+\exp^{({\boldsymbol -x}^\mathrm{t}}{\boldsymbol B})}]^\mathrm{1-y_j}$

Where:
$L_( B | x_1, x_2, ..., x_m)$ = Likelihood function
$B$ = Supposed to be the Greek letter beta but I couldn't figure out how to add it (is there another menu for this math text I should be using?). Beta is a vector representing my scalar weight parameters. My understanding is that for the logistic regression I have to maximise these values in order to best discriminate between my cases.
${\boldsymbol\Pi}$ = product symbol, i.e. I need to get the product of respective probabilities (from j=1 to k)
${\boldsymbol x}$ = Vector of parameters upon which the probability depends
${y_j}$ = A dichotomous variable (0 or 1) depending on whether point j (=1 to k) is a member of class 1 or class 2 that I am trying to discriminate between.

What I'm stuck with is what point j is supposed to be (aside from a value between 1 and k)? Moreover, what is k?

The purpose of the $\mathrm{y_j}$ and $\mathrm{1-y_j}$ exponents seems to be to remove one of the terms in the product so that only one goes through. That is:

[left term]$^\mathrm{y_j}$ [right term]$^\mathrm{1-y_j}$

...turns into:

[left term]$^1$ [right term]$^0$

... when $\mathrm{y_j}$ = 1. In this example the above equals:

[left term]

... this then gets multiplied by the next probability until we reach j = k.

So yes, what is j and how does it link to the other parameters?

Thanks again for any help explaining this. I'm sure I must have missed some piece of basic understanding on this.
 
Last edited:

chiro

MHF Helper
Sep 2012
6,608
1,263
Australia
Hey Unwise.

Usually the likelihood function in any statistic will be a function of a sample and in this case k will usually be the number of samples.

Following on from this j is the dummy index of the sum. So yj will take on either one value for class 1 or another for class 2 - and I'm guessing it's either 0 or 1.

This could be interpreted as a sub-distribution inside the product term and a whole distribution when these sub-distributions are multiplied.

If the x's are in the x-vector and the y's represent class information then your total sample will contain the x's for each class category that ranges from 1 to k in terms of index.

This means there should be m*k sample variables - of which some may be correlated in some way, if the above is true.
 
Feb 2016
21
0
UK
Okay, so how do I know whether the jth sample is part of class 1 or 2, and therefore whether yj = 1 or 0? I thought that was the aim of this analysis - to determine the boundaries of different classes. Or do I determine whether the jth sample is part of class 1 or class 2 by maximising B? If so, do I not get caught in a chicken/egg situation whereby I don't know the value of yj and therefore don't know what probability (left or right side) gets included in the product?
 
Last edited:

chiro

MHF Helper
Sep 2012
6,608
1,263
Australia
That should be determined by the y(j,k) value - or the class information for a particular data point. If the class value is constant across all values of k then it will just be y(j) - i.e. a function of j.

The best way to figure this out is to either look at the sample data yourself or look at the experimental protocol with what data is collected and how its collected.

In terms of what is being maximized you will need to look at what test statistic you are trying to measure.

These will typically measure parameters of a distribution - which include probabilities.

You should not get caught in a chicken/egg situation if all data points are independent. If the sample values are all considered independent then this should never happen.

The distribution being multiplied looks like a standard Bernoulli distribution - or a Beta distribution when generalized. You should read the literature for more on this distribution and how its used in analysis and modeling.
 
Feb 2016
21
0
UK
I'm sorry - you probably feel that you are banging your head against a wall - but I still don't quite understand.

If I have datapoint, j, with parameters (x1, x2, ..., xm), how do I know whether j is part of class 1 or 2, and therefore what value of yj to use (0 or 1)? I thought the likelihood function was supposed to help me decide what class to put the data points in rather than be dependent on my (subjective) classification?

Again, sorry if this is infuriating for you - I'm just not understanding.
 
Feb 2016
21
0
UK
Actually, I think I've got it: given the various parameters (x1, x2, ..., xm), for each datapoint (j) I have a binary outcome (yj = 0 or 1). Values of beta in the likelihood function are then optimised to best discriminate between the the binary outcomes, and a logistic regression curve is thereafter formed.

Is this it?
 

chiro

MHF Helper
Sep 2012
6,608
1,263
Australia
Are you just estimating the probabilities of getting a class given your various parameters?

If that is the case then I'm guessing its a basic binary logistic regression where you are estimating your probabilities given the various inputs.

If that is the case then judging by the above formula - that looks right.
 
Feb 2016
21
0
UK
Excellent! Thanks a lot for your help!