Results 1 to 1 of 1

Math Help - Need help finding best match between two unordered sets

  1. #1
    Newbie
    Joined
    Dec 2010
    Posts
    1

    Need help finding best match between two unordered sets

    I have two sets of data, each with approximately 300,000 rows, and 20 columns. My goal is to find the row in set 'b' that would require the fewest changes for it to match the row in set 'a'. The order of the 20 columns is important. The data in each column can be any one of about 300 different values. The data in each column can be utilized as a number or string.

    A simple scenario using 10 columns of data:
    =================================
    Given a row 'a' of: 1,3,19,4,12,56,200,9,110,15

    Using the sample rows 'b' below, row 3 would be the one I'd want as it would require the fewest changes to become row 'a'

    1: 1,3,19,4,99,22,30,66,110,15 (matches 6 columns)
    2: 12,56,1,3,19,4,110,15,200,9 (all same values, but different order)
    3: 64,1,3,19,4,12,56,200,9,110 (removing 1st & adding last produces 100% match)

    I've been looking over different matching algorithms (Levenshtein Distance, etc.) at String Similarity Metrics for Information Integration but haven't found anything I could adapt to my need.

    For reference, this is data in a SQL Server database, but can be pulled out and worked on in a client application if necessary.

    Thank you in advance for any help you may be able to provide.

    Edit - I did want to mention that I'd attempted scoring each row on both sides, and matching based on the closest scores. My plan was to then go through the scores iteratively and find the best one to use. Unfortunately, I wasn't able to produce a score that provided values I could rely on.
    Last edited by airjrdn; December 7th 2010 at 08:08 AM.
    Follow Math Help Forum on Facebook and Google+

Similar Math Help Forum Discussions

  1. sorting of unordered elements - worst case scenario
    Posted in the Discrete Math Forum
    Replies: 2
    Last Post: January 7th 2012, 08:24 AM
  2. Replies: 1
    Last Post: October 17th 2011, 05:19 AM
  3. Finding Data Sets
    Posted in the Advanced Statistics Forum
    Replies: 0
    Last Post: September 30th 2011, 06:55 AM
  4. Counting Unordered Selections: Card Selection
    Posted in the Discrete Math Forum
    Replies: 7
    Last Post: September 14th 2010, 02:23 PM
  5. ordered and unordered partitions
    Posted in the Statistics Forum
    Replies: 2
    Last Post: October 12th 2009, 11:39 AM

Search Tags


/mathhelpforum @mathhelpforum