# Help on direction needed

• Mar 23rd 2010, 05:33 PM
hbetter
Set similarity
Hi,

I need to compare two sets and determine their similarity, based on following considerations:

Example, given two sets: \$\displaystyle s1=\{a, b, ...\}\$ and \$\displaystyle s2=\{a,c,...\}\$

1. The two sets are inter-connected. Each element in \$\displaystyle s1\$ has connections to those in \$\displaystyle s2\$, e.g. \$\displaystyle b\$ and \$\displaystyle c\$ are connected with a weight \$\displaystyle w_{bc}\$. This weight varies between [0,1]. For overlapping elements like \$\displaystyle a\$, the weight is 1; weights between different elements are ususally less than 1. The more connections there are, and the stronger the connections are, the more similar the two sets should be.

2. Count the frequency of each element. Each element has a frequency value, eg. \$\displaystyle a\$ appears 3 times in \$\displaystyle s1\$ and twice in \$\displaystyle s2\$. Higher frequency contributes more to the similarity. The Cosine similarity measure can be used here, but it can't work with the inter-connections.

3. Normalization. The final scores should be normalized properly to [0,1]

Does this problem belong to set theory? If so, what's it called in set theory?

Btw, if this isn't the right place to post my question, please let me know. Thank you.