# Independent boolean values for Negative Binomial Regression

#### Manitoba

Hi,

I am trying to analyze some data by using the Negative Binomial Regression. Input for a function of the Python library StatsModels are the independent variable x, containing true (1) or false (0) and corresponding y-values with some double values. Does it make sense to use the Negative Binomial regression in such cases of boolean independent x-values? If I just simply compute the avg y-value for the true- and for the false cases, the value of the true-cases is higher. My goal is just to verify that true-cases are just more likely to happen.

Main problem could be, that the y-values are ratios, normalized due to 2 other parameters such as length of a track.

Any help would be fantastic as I am definetely not an expert in statistics.

Thanks

Last edited:

#### chiro

MHF Helper
Hey Manitoba.

The thing about the negative binomial has more to do that it means the independence and identically distributed criterion for each sample element.

If these are followed (which you suggest is the case) then the regression model based on this distribution should suffice.

Do the probabilities change as a function of the sample index or time or are they assumed constant?

#### Manitoba

Hi Chiro,

then it is may be more an issue concerning the library in Python. Sadly, I don't find anything about the capability to work with such boolean values for the independent x-values. The values (sample of ~35.000) are collected for a certain time range, so I guess the error rate is quite low. The current result, using StatsModels as a Python library, is the following:

==============================================================================
Dep. Variable: y No. Observations: 35963
Model: GLM Df Residuals: 35961
Model Family: NegativeBinomial Df Model: 1
Method: IRLS Log-Likelihood: -957.53
Date: Thu, 25 Aug 2016 Deviance: 450.85
Time: 10:07:00 Pearson chi2: 5.77e+03
No. Iterations: 7
==============================================================================
coef std err z P>|z| [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept -6.4628 0.131 -49.483 0.000 -6.719 -6.207
x 1.0924 0.135 8.094 0.000 0.828 1.357
==============================================================================

Reminding, that the true-values (1) have a higher avg values, the coefficient (coef) for the dependent variable y is positive with 1.0924. If I just change the 0 to 1 and the 1 to 0 (just as a test), of course, the value is negative with -1.0924. Could that really be a correct result? I still have some doubts.

Thanks

Best regards

Last edited:

#### chiro

MHF Helper
Is what you are writing on a log scale?

If you are using a binomial style regression (including negative binomial) then getting the estimate for the probability requires using a transformation.

Since you get negative values, I'm assuming you have the log-value estimate which needs to be transformed to a probability between 0 and 1 (both inclusive).

Have you done any courses or work on Generalized Linear Models before?

#### Manitoba

Hi chiro,

no, I just need those skills for one single project. The dependent variables contain double values such as 0,025241052 (the result of a ratio computation). So each value of the independent variable, containing either the values 0 or 1, is mapped to such double values.
The log is used by the implementation as the link function. Since the coefficient value is 1.0924 (marked by red color in result), what negative value do you mean?

Thanks.

Kind regards