
Originally Posted by
iancao
Hi,
I wonder if you guys can help me with a statistics problem I´m facing :
Regarding football(soccer for americans), there´s a common view that referees tend to compensate teams for when they´re called for a pennalty kick. Supposedly, when a referee calls a penalty kick against a team they are more likely to call a penalty kick for them later in the game. I have data for penalty kicks and wish to run a statistical test to check for that hypothesis. I tought that, comparing the the likelyhood of a penalty being called for a team, after a penalty had been scored against it would be a good reasoning. But theres a problem: when a team has a penalty called against, some time has already gone by and so there´s less time for a penalty to be called and, if the probability of a penalty being called can be assigned as a liner distribution over time, this would be a bias for the statiscal test --as an example if a penalty is called when there are only 5 minutes left(againt the 90 minutes full game)it would obviously be less likely that another pennalty happens. I then tought about using a variable, pennaltys by minute that would fix this problem : I would create a series of : pennalty for with earlier pennalty against per hour: as an example if a pennalty is called againt a team when there are 10 minutes left for the game and then a pennalty is called for that team, my variable would be assigned a value of 6(6 pennaltys per hour).
I´m not sure tough as how to compare this series with a series that would represent the average pennalty per game, I mean : whats the likelyhood of a pennalty to be called in a soccer game per hour? Shoul I create a series using the number of pennalty for game per minute and use an average ? If I do this, my sample for average game would contain thos with a pennalty for each side wich is my object of study an I think this would probably create a bias...
thank you very much