Hey Tabby.
Can you tell us what routines in R you used and the arguments you passed to those routines (i.e. your R code)?
I think I see what is going on, but when I run my results in R I get strange results that cannot be right. I am thrown off by the subscripts. I would like to see what your take is on the write up.
My interpretation is I have two 32X32 (number teams) systems of the form A*x = b where, for example, b is the historical defensive statistic and each row in A represents the match ups for team k. So in the first step of the algorithm A is filled with 1's and 0's representing the matchup for that season. However, this can only be done per season, but the paper indicates the usage of historical statistics for the system. I am not sure I can use matchup information at that level as every team has faced off if you back far enough meaning A would be filled with 1's and as result, be singular. When I do it this way I end up with negative, so I don't think it is correct.
I think the subscripts are throwing me off or possible incorrect. Any feedback would be appreciated.
Thanks
Any thoughts?
Here is the R code
library(MASS)
W <- X <- matrix(0,nrow=32,ncol=32)
y <- matrix(wide.off[,34],nrow=32,ncol=1)
z <- matrix(wide.def[,34],nrow=32,ncol=1)
max_itr <- 3
for(n in 1:max_itr) {
if(n==1) { W <- matrix(ifelse(wide.off[,2:33]=='NaN',0,1),nrow=32,ncol=32)
} else { W <- as.matrix(sapply(1:32, function(r,c) ifelse(W[r,c]==0,0,def.stat[c]))) }
sol.svd <- svd(W)
D <- diag(1/sol.svd$d)
U <- as.matrix(sol.svd$u[,])
V <- as.matrix(sol.svd$v[,])
def.stat <- (V %*% D %*% t(U)) %*% y
}
Where the wide.off and wide.df datasets look like:
The far right column bas the defensive stat in this example. Wide.df would be the opposite.
I have also used the normal equations to get the multipliers, but the results are the same. The more I look at it seems like an eigenvalue problem where I am solving Ax = v*x where v is constant. That would make more sense since an SVD is recommended and would produced the singular values. As you cant see, I am just using SVD to solve the least squares problem, which seems kind of circuitous....
Well thats where I am confused. The line W <- matrix(ifelse(wide.off[,2:33]=='NaN',0,1),nrow=32,ncol=32)" is coding the matrix 1 when there is a value and 0 otherwise, that is how I understand the algorithm in the paper. I have used the for loops below to code the 'NaN' values zero and leave the numeric values and preform a least squares to derive the coefficients. I get a number of negative values that don't seem correct.
for(i in 1:32){
for(j in 1:32){
W[i,j] <- ifelse(wide.off[i,j+1]=='NaN',0,wide.off[i,j+1])
X[i,j] <- ifelse(wide.def[i,j+1]=='NaN',0,wide.def[i,j+1])
}
}
Either way, I don't understand the algorithm that is implemented and why SVD / iteration would even be necessary.