Dear math-lovers!
Can anyone help a clustering new-beginner?
Given vectors of different length and a distance measure:
d = sqrt[sum{(x(i)-y(i))^2}], i = 1...s
i.e. allmost Euclidean distance, but the lengths are different.
In our application we duplicate the shortest vector if its length is less that half the length of the longest one. So if x is the shortest length, s becomes:
s = t*length(x), if length(x) < 0.5*length(y). t = floor(length(y)/length(x))
length(x) otherwise (the remaining values of y are ignored)
I can see that d is not Euclidean, so the clustering techniques assuming Euclidean distance cannot be used (kmeans and others, based on average distances). The question is what clustering technique can be used? I have received a hint to use this formula:
w = sqrt(sum(d^2)/n)
But i have no clue of what it means or how I should use it.
Does anyone know?
Also I am wondering if all clustering methods require d to be metric? And if yes, how can I check if d is metric in my case with different lengths (especially the triangle inequality and the condition that d=0 for x=y)?
Thanks,
Elena