1. ## Distance Metric Proof

Say I have two lists, List1 and List2 containing elements such as words. Some words are common two both List1 and List2. I want to create a distance metric that tells me how far apart the two lists are based on a similarity "score". The similarity score and distance metric are as follows:

Similarity score: Intersection(List1,List2) / Union(List1,List2)
Distance = 1 - Similarity Score

In other words, the similarity score is just the percentage overlap between the two lists and the distance is 0 when the two lists are the same. Say I generalize this to n lists and I calculate the distances between lists (a symmetric matrix of distances). My question is, is this distance formula valid? In other words, does it satisfy the triangle inequality? How do I check this?

A first attempt if all list sizes are equal:

Let N = size of the list, x, y & z = pairwise overlaps between three lists. You must have that x >= y+z-N. Distance between lists that give you x is 1-x/(2N-x). With some algebra you can conclude that the triangle inequality is satisfied for all valid x, y & z.

Is this true for arbitrary list sizes?

2. Hello, please check this out Symmetric difference - Wikipedia, the free encyclopedia. At the bottom it talks about symmetric difference being somewhat of a metric. Here is how it relates to your distance metric. Let $A$ and $B$ be your two lists (sets). Your distance function is:

$\displaystyle d(A,B) = 1 - \frac{|A \cap B|}{|A \cup B|} = \frac{|A \cup B| - |A \cap B|}{|A \cup B| } = \frac{|(A/ B) \cup (B /A)|}{|A \cup B| } = \frac{|A \Delta B|}{|A \cup B|}$

where $\Delta$ is the symmetric difference.