My Continuous Random Variables and Probability Distribution knowledge is limited to Normal, Binomial and Poisson Distribution. However I have received a task related to Logistic Regression for which I need some help from you people.
I have to code a computer program that takes some data as input and apply logistic regression on it to predict output for any given query number. The data contains a two columned table and a query number. The first column of the table contains a continuous variable and the second column contains a binary variable. Below is a sample table:
Col. A | Col. B
10 | 0
20 | 0
30 | 0
35 | 1
40 | 0
41 | 1
45 | 1
60 | 1
Query Number: 42
Now my computer program has to apply Logistic Regression on the data in the table to predict the value for the query number, in this case: 42. Column A is plotted on x-axis and Column B is its output on y-axis. The predicted value can be either a '1' or a '0'.
I have studied some material related to Logistic Regression and this is my understanding of how to solve the problem using it:
1. Use this equation: P = 1/(1+e^-(a+bx)).
2. 'a' and 'b' are regression coefficients.
3. Use a numerical method to find values for regression coefficient using the input data in the table.
4. Finally compute P = 1/(1+e^-(a+bx)) using values for the regression coefficients.
5. If P>0.5 Answer is 1 else it is 0
My questions from you people:
1) Which numerical method should I use to find values for regression coefficients?
2) I have read some websites talking about maximizing likelihood function. What is it and how is it related to logistic regression?
3) I have also saw some websites using t-test or chi-sq distributions to find results. Is there anyways I can do to avoid using them?
4) Is correlation required to solve this problem?
When you write an answer please do state with it which question number you are answering...