
Distance Metric Proof
Say I have two lists, List1 and List2 containing elements such as words. Some words are common two both List1 and List2. I want to create a distance metric that tells me how far apart the two lists are based on a similarity "score". The similarity score and distance metric are as follows:
Similarity score: Intersection(List1,List2) / Union(List1,List2)
Distance = 1  Similarity Score
In other words, the similarity score is just the percentage overlap between the two lists and the distance is 0 when the two lists are the same. Say I generalize this to n lists and I calculate the distances between lists (a symmetric matrix of distances). My question is, is this distance formula valid? In other words, does it satisfy the triangle inequality? How do I check this?
A first attempt if all list sizes are equal:
Let N = size of the list, x, y & z = pairwise overlaps between three lists. You must have that x >= y+zN. Distance between lists that give you x is 1x/(2Nx). With some algebra you can conclude that the triangle inequality is satisfied for all valid x, y & z.
Is this true for arbitrary list sizes?

Hello, please check this out Symmetric difference  Wikipedia, the free encyclopedia. At the bottom it talks about symmetric difference being somewhat of a metric. Here is how it relates to your distance metric. Let $\displaystyle A$ and $\displaystyle B$ be your two lists (sets). Your distance function is:
$\displaystyle \displaystyle d(A,B) = 1  \frac{A \cap B}{A \cup B} = \frac{A \cup B  A \cap B}{A \cup B } = \frac{(A/ B) \cup (B /A)}{A \cup B } = \frac{A \Delta B}{A \cup B}$
where $\displaystyle \Delta$ is the symmetric difference.