Hello,
My Continuous Random Variables and Probability Distribution knowledge is limited to Normal, Binomial and Poisson Distribution. However I have received a task related to Logistic Regression for which I need some help from you people.
I have to code a computer program that takes some data as input and apply logistic regression on it to predict output for any given query number. The data contains a two columned table and a query number. The first column of the table contains a continuous variable and the second column contains a binary variable. Below is a sample table:
Col. A | Col. B
10 | 0
20 | 0
30 | 0
35 | 1
40 | 0
41 | 1
45 | 1
60 | 1
Query Number: 42
Now my computer program has to apply Logistic Regression on the data in the table to predict the value for the query number, in this case: 42. Column A is plotted on x-axis and Column B is its output on y-axis. The predicted value can be either a '1' or a '0'.
I have studied some material related to Logistic Regression and this is my understanding of how to solve the problem using it:
1. Use this equation: P = 1/(1+e^-(a+bx)).
2. 'a' and 'b' are regression coefficients.
3. Use a numerical method to find values for regression coefficient using the input data in the table.
4. Finally compute P = 1/(1+e^-(a+bx)) using values for the regression coefficients.
5. If P>0.5 Answer is 1 else it is 0
My questions from you people:
1) Which numerical method should I use to find values for regression coefficients?
2) I have read some websites talking about maximizing likelihood function. What is it and how is it related to logistic regression?
3) I have also saw some websites using t-test or chi-sq distributions to find results. Is there anyways I can do to avoid using them?
4) Is correlation required to solve this problem?
When you write an answer please do state with it which question number you are answering...
Regards