# Soccer Prediction Problem

Printable View

• Jul 12th 2008, 01:37 PM
Jameselaprendi
Soccer Prediction Problem
I have a specific problem concerning the prediction of how many goals TEAM A will score in a soccer match against TEAM B.

Ive decided which variables I want to use, but Im struggling to figure out exactly how to use them...

I am comparing TEAM A's attack with TEAM B's defence rather than the team's overall strengths.

For the entire previous soccer season, where every team played each other twice, I have:

AGS - Average goals scored per team per game
AGC - Average goals conceded per team per game
OAGSC - Average goals scored/conceded in all games
%FTS - % of games a team fails to score a goal
%CS - % of games a team doesn't concede a goal (Clean Sheet)

My original formula was:

PREDICTED TEAM A GOALS vs TEAM B = (AGS TEAM A + AGC TEAM B)/2

However I feel this is too simplistic so I created Overall Average Goals Scored per match and Overall Average Goals Conceded per match - which are of course the same, in order to compare the team's attacking and defensive strength with the average.

OAGSC is 1.32
TEAM A AGS is 2.11
TEAM B AGC is 1.05

So TEAM A scores an average 0.79 (=2.11 - 1.32) goals per game above the league average but TEAM B concedes 0.27 (=1.32 - 1.05) goals per game below the average.

Therefore TEAM A has a significantly more effective attack than the league average while TEAM B has a slightly better defence than the league average.

I feel that this means while TEAM A would normally score 2.11 goals against an average defence, they might score less because TEAM B has a slightly better than average defence. However, TEAM B would normally concede 1.05 goals to an average attack but TEAM A has a significantly stronger than average attack so TEAM B should concede more than their average/game.

Q1) How to I incorporate this into my formula and get a more accurate prediction for TEAM A goals scored in the match vs TEAM B?

I also have figures, as stated at the beginning, telling me how often TEAM failed to score and how often TEAM didnt concede.

TEAM A %FTS = 7.9% of games they played (3 of 38 matches)
TEAM B %CS = 42.1% of games they played (16 of 38 matches)

So TEAM A score at least 1 goal in 92.1% of their matches but TEAM B also manage not to concede in 42.1% of their matches (almost half!)

My instinct is to use these figures to create a second prediction, perhaps % chance that TEAM A wont score. Again I would like to weight this using the league averages as above, to recognise the relative strength of the attack and defence.

League Average %FTS and %CS is also of course the same and is 28.7% - i.e. 28.7% of the time a team in this league fails to score or doesnt concede.

Q2) How do I use these percentages to create a prediction of TEAM A failing to score? Should I combine all variables somehow or keep them as two seperate predictions?

Thanks in advance.
• Jul 13th 2008, 05:54 AM
CaptainBlack
Quote:

Originally Posted by Jameselaprendi
I have a specific problem concerning the prediction of how many goals TEAM A will score in a soccer match against TEAM B.

Ive decided which variables I want to use, but Im struggling to figure out exactly how to use them...

I am comparing TEAM A's attack with TEAM B's defence rather than the team's overall strengths.

For the entire previous soccer season, where every team played each other twice, I have:

AGS - Average goals scored per team per game
AGC - Average goals conceded per team per game
OAGSC - Average goals scored/conceded in all games
%FTS - % of games a team fails to score a goal
%CS - % of games a team doesn't concede a goal (Clean Sheet)

My original formula was:

PREDICTED TEAM A GOALS vs TEAM B = (AGS TEAM A + AGC TEAM B)/2

However I feel this is too simplistic so I created Overall Average Goals Scored per match and Overall Average Goals Conceded per match - which are of course the same, in order to compare the team's attacking and defensive strength with the average.

OAGSC is 1.32
TEAM A AGS is 2.11
TEAM B AGC is 1.05

So TEAM A scores an average 0.79 (=2.11 - 1.32) goals per game above the league average but TEAM B concedes 0.27 (=1.32 - 1.05) goals per game below the average.

Therefore TEAM A has a significantly more effective attack than the league average while TEAM B has a slightly better defence than the league average.

I feel that this means while TEAM A would normally score 2.11 goals against an average defence, they might score less because TEAM B has a slightly better than average defence. However, TEAM B would normally concede 1.05 goals to an average attack but TEAM A has a significantly stronger than average attack so TEAM B should concede more than their average/game.

Q1) How to I incorporate this into my formula and get a more accurate prediction for TEAM A goals scored in the match vs TEAM B?

I also have figures, as stated at the beginning, telling me how often TEAM failed to score and how often TEAM didnt concede.

TEAM A %FTS = 7.9% of games they played (3 of 38 matches)
TEAM B %CS = 42.1% of games they played (16 of 38 matches)

So TEAM A score at least 1 goal in 92.1% of their matches but TEAM B also manage not to concede in 42.1% of their matches (almost half!)

My instinct is to use these figures to create a second prediction, perhaps % chance that TEAM A wont score. Again I would like to weight this using the league averages as above, to recognise the relative strength of the attack and defence.

League Average %FTS and %CS is also of course the same and is 28.7% - i.e. 28.7% of the time a team in this league fails to score or doesnt concede.

Q2) How do I use these percentages to create a prediction of TEAM A failing to score? Should I combine all variables somehow or keep them as two seperate predictions?

Thanks in advance.

There was some discussion in the Mathematical Gazette some years ago, where it was suggested that a reasonable mobel of a game is that the number of goals scored by each team was a Poisson RV (presuamably with means determined by each teams form - though how one would prove this escapes me).

If this is so the probability of a clean sheet for either team is determined by the mean number of goals to be expected by the other team when playing the team in question given the current state of form.

A simple model might be (assuming I have understood your notation):

Sore(A)=OAGSC/2 + (AGS(A)-OAGSC/2) - (AGC(B)-OAGSC/2)
Sore(B)=OAGSC/2 + (AGS(B)-OAGSC/2) - (AGC(A)-OAGSC/2)

But you should also include some correction for home team advantage (I beleive there was a paper on this recently on ArXiv.org but I don't have a link for that).

RonL
• Jul 13th 2008, 11:10 PM
CaptainBlack
Quote:

Originally Posted by CaptainBlack
There was some discussion in the Mathematical Gazette some years ago, where it was suggested that a reasonable mobel of a game is that the number of goals scored by each team was a Poisson RV (presuamably with means determined by each teams form - though how one would prove this escapes me).

If this is so the probability of a clean sheet for either team is determined by the mean number of goals to be expected by the other team when playing the team in question given the current state of form.

A simple model might be (assuming I have understood your notation):

Sore(A)=OAGC/2 + (AGS(A)-OAGS/2) - (AGC(B)-OAGS/2)
Sore(B)=OAGC/2 + (AGS(B)-OAGS/2) - (AGC(A)-OAGS/2)

But you should also include some correction for home team advantage (I beleive there was a paper on this recently on ArXiv.org but I don't have a link for that).

RonL

I have been doing some research to find the ArXiv.org reference which is this. You have to go to the section on Myths for the home advantage discussion.

RonL
• Jul 14th 2008, 11:10 AM
Jameselaprendi
Hi RonL

Thanks for the response.

I have done similar research and found out about poisson, etc - but it really just was too difficult for me to get stuck into, and overkill for my puprose.

The simple formula you've given me looks like a good indicator on the face of it.

I had already planned to do completely seperate home and away calculations, which I didnt mention im sorry. For example I would actually compare TEAM A AGS at home with TEAM B AGC away, etc

I do notice two slight problems with your formula though. Firstly you've treated OAGS and OAGC as seperate things but they are always the same figure - whenever a goal is scored in the league someone else is conceding!

Another potentially problem is your formula allows some teams to have a negative prediction - e.g. that they'll score -0.25 goals or concede -0.5 goals. This occurs when a team's average goals scored or conceded is less than the difference between the League's OAGSC and the team's AGS or AGC.

I had come up with something very similar to your and had these problems too (Speechless)

Ill post again when i've actually tested your version of the formula, thanks (Cool)
• Jul 14th 2008, 07:09 PM
CaptainBlack
Quote:

Originally Posted by Jameselaprendi
Hi RonL

Thanks for the response.

I have done similar research and found out about poisson, etc - but it really just was too difficult for me to get stuck into, and overkill for my puprose.

The simple formula you've given me looks like a good indicator on the face of it.

I had already planned to do completely seperate home and away calculations, which I didnt mention im sorry. For example I would actually compare TEAM A AGS at home with TEAM B AGC away, etc

I do notice two slight problems with your formula though. Firstly you've treated OAGS and OAGC as seperate things but they are always the same figure - whenever a goal is scored in the league someone else is conceding!
(Cool)

That is a typo they should be the same thing

RonL
• Jul 14th 2008, 11:20 PM
CaptainBlack
Quote:

Originally Posted by Jameselaprendi
Another potentially problem is your formula allows some teams to have a negative prediction - e.g. that they'll score -0.25 goals or concede -0.5 goals. This occurs when a team's average goals scored or conceded is less than the difference between the League's OAGSC and the team's AGS or AGC.

I had come up with something

It is a generic problem with a linear model of this form, the usual solution is to treat anything negative as zero.

RonL