I have a data set of soccer matches, their respective results and the respective pre-game odds. I would like to study a simple strategy based on historical information.

I would like create a simple model that would predict the goals scored by a team from the implied probability of the odds. So I would like to carry out poisson / negative binomial regression to estimate the mean number of goals scored by a team in a match with given bookmaker odds for various outcomes. Then I could plug these lambdas into poisson probability functions to estimate (roughly) probabilities of certain scores.

Now, many authors have done similar modelling but lacking proper knowledge of stats, I do not fully understand how the regressions should be carried out.

E.g. D. Dyte and S. R. Clarke (2000) study whether FIFA rankings explain goals scored by estimating a model

ln(m) = a + bTR + cOR +v

where m is the expected number of goals scored, TR is the team's FIFA ranking, OR is the opponents FIFA ranking and v is a parameter that measures the venue (home, away, neutral) *.

What I am actually wondering then is this:

Isn't it so that the explanatory variables in poisson regression model the the mean of the response variable? Therefore I cannot just regress the goals scored by e.g. the home team to the implied probability of the home team. What should be my dependent variable?


ln(m) = a + bPR

where m is the expected number of goals scored and PR is the implied probability of the team. Does anyone have a clue what is 'the expected number of goals scored' because it cannot be the actual number goals scored? Or is the equation just an expression and you actually regress the realized scores with the explanatory variables?

Sorry about a confusing question and thanks for everyone who's willing to help!

* D. Dyte and S. R. Clarke (2000). A ratings based Poisson model for World Cup soccer simulation. Journal of the Operational Research Society. Vol. 51., pp. 993-998.