# Help on direction needed

• Mar 23rd 2010, 06:33 PM
hbetter
Set similarity
Hi,

I need to compare two sets and determine their similarity, based on following considerations:

Example, given two sets: $s1=\{a, b, ...\}$ and $s2=\{a,c,...\}$

1. The two sets are inter-connected. Each element in $s1$ has connections to those in $s2$, e.g. $b$ and $c$ are connected with a weight $w_{bc}$. This weight varies between [0,1]. For overlapping elements like $a$, the weight is 1; weights between different elements are ususally less than 1. The more connections there are, and the stronger the connections are, the more similar the two sets should be.

2. Count the frequency of each element. Each element has a frequency value, eg. $a$ appears 3 times in $s1$ and twice in $s2$. Higher frequency contributes more to the similarity. The Cosine similarity measure can be used here, but it can't work with the inter-connections.

3. Normalization. The final scores should be normalized properly to [0,1]

Does this problem belong to set theory? If so, what's it called in set theory?

Btw, if this isn't the right place to post my question, please let me know. Thank you.