Results 1 to 10 of 10

Math Help - numerical problem: efficient euclid distance with very large numbers

  1. #1
    Newbie
    Joined
    Dec 2007
    Posts
    14

    numerical problem: efficient euclid distance with very large numbers

    Hi all,


    Iīm looking for an efficient solution for the following problem:

    Let V be a set containing m vectors V={v1,v2,...,vm}. Given an input vector vi I need to find the vector from V which has the smallest euclid distance to vi (find best matching unit).
    Normally not an issue. However, every element of Vīs vectors and the input vector vi always have a value from the open interval (0,1).

    example 2D vector:
    v1 = (0.0123123423423 , 0.00023567887)

    Assume that every number has several thousand or even million digits. So practically spoken, if you implement euclid distance for this kind of numbers, you will need special high precision libraries, where every mathematical operation (like add, mult or sqrt) will run many times slower than with primitives such as double.


    Recently, I had the idea to tokenize a long number into smaller peaces and execute the euclid distance only on this peaces.
    so for instance lets assume we are in 1d (for simplicity) and we have the following two vectors v1, v2 and the input vector vi

    v1 = 0.001222345566
    v2 = 0.123456649900
    vi = 0.1923343233232

    lets say we are tokenizing the numbers in peaces of length 5, then the first token (t1) for each vector would look like this:

    v1_t1 = 00122
    v2_t1 = 12345
    vi_t1 = 19233
    (here I am leaving away "0." since it is shared by all numbers)

    here we see that the 5 first numbers of v2 are closer to vi than v1īs. Thus whatever comes after this 5 numbers, v2 will always be closer to vi. And thus vi is the best matching unit.
    If the first tokens of v1 and v2 had the same (or very close) distance to vi, then the the second token containing the next 5 numbers would be compared.


    The questions:

    does this approach make sense ? (with approach I mean tokenizing the numbers and comparing them part by part)
    Am I overlooking anything fundamental ?
    How can this idea be improved ?


    edit @ 18.12.07, 17:56 :
    for new readers and future responses : Please accept the fact that Iīm using huge numbers, this is simply a requirement and canīt be changed.


    thanks in advance,
    Andreas
    Last edited by sirandreus; December 18th 2007 at 08:57 AM. Reason: some corrections again
    Follow Math Help Forum on Facebook and Google+

  2. #2
    Member
    Joined
    May 2006
    Posts
    244
    Quote Originally Posted by sirandreus View Post
    Hi all,


    Iīm looking for an efficient solution for the following problem:

    Let V be a set containing m vectors V={v1,v2,...,vm}. Given an input vector vi I need to find the vector from V which has the smallest euclid distance to vi (find best matching unit).
    Normally not an issue. However, every element of Vīs vectors and the input vector vi always have a value from the open interval (0,1).

    example 2D vector:
    v1 = (0.0123123423423 , 0.00023567887)

    Assume that every number has several thousand or even million digits. So practically spoken, if you implement euclid distance for this kind of numbers, you will need special high precision libraries, where every mathematical operation (like add, mult or sqrt) will run many times slower than with primitives such as double.


    Recently, I had the idea to tokenize a long number into smaller peaces and execute the euclid distance only on this peaces.
    so for instance lets assume we are in 1d (for simplicity) and we have the following two vectors v1, v2 and the input vector vi

    v1 = 0.001222345566
    v2 = 0.123456649900
    vi = 0.1923343233232

    lets say we are tokenizing the numbers in peaces of length 5, then the first token (t1) for each vector would look like this:

    v1_t1 = 00122
    v2_t1 = 12345
    vi_t1 = 19233
    (here I am leaving away "0." since it is shared by all numbers)

    here we see that the 5 first numbers of v2 are closer to vi than v1īs. Thus whatever comes after this 5 numbers, v2 will always be closer to vi. And thus vi is the best matching unit.
    If the first tokens of v1 and v2 had the same (or very close) distance to vi, then the the second token containing the next 5 numbers would be compared.


    The questions:

    does this approach make sense ?
    Am I overlooking anything fundamental ?
    How can this idea be improved ?

    Note that the solution to this problem does not need to be exact. Thus approximations are welcome as well


    thanks in advance,
    Andreas
    If it does not need to be "exact" then why not just truncate all the numbers
    to 10 significant digits and work with those in standard double precission
    floating point?

    ZB
    Follow Math Help Forum on Facebook and Google+

  3. #3
    Newbie
    Joined
    Dec 2007
    Posts
    14
    hi,

    the representation of the numbers must be in a high precision, but practically spoken "not exact".

    I cant just consider the first 10 digits, see counterexample:

    0.00000000000002354534534
    0.00000000000001123545455

    You can not compare this two numbers, if you cut everything but the first 10 positions.

    In my original idea (see first post), you would first compare the 10 first numbers, which are equal. Because they are equal you would compare the next 10 positions (assuming you are moving in steps of 10).
    There you would compare:

    0002354534
    0001123545



    cheers,
    Andreas
    Last edited by sirandreus; December 18th 2007 at 04:52 AM. Reason: corrected constraints
    Follow Math Help Forum on Facebook and Google+

  4. #4
    Grand Panjandrum
    Joined
    Nov 2005
    From
    someplace
    Posts
    14,972
    Thanks
    4
    Quote Originally Posted by sirandreus View Post
    hi,

    the results have to be exact (corrected original post as well).

    cheers,
    Andreas
    Why do you need such high precision?

    Working with say 13 digit precision the probability of two distances being
    indistinguishable is \sim 10^{-13} (with a bit of scalling and to a hand waving
    approximation), so you will need a lot of vectors before the chance
    of encountering a problem becomes significant.


    RonL
    Follow Math Help Forum on Facebook and Google+

  5. #5
    Newbie
    Joined
    Dec 2007
    Posts
    14
    hi Captain,

    why I need such high precision is descripted here: http://www.mathhelpforum.com/math-he...html#post90666

    In short words: having an (0,1)^n as an inputspace, and k vectors v1,...,vk in that space, this k vectors are mapped uniquely into one vector in (0,1)^n.
    This works by interleaving the decimal expansions of the vectors and thus a very huge vector can be created.


    Sorry CaptanBack, but I canīt figure out how your statement about indistinguishable distances is related to my problem
    As I understand you are talking about the probability that 2 vectors have the same digits (with a length of 13). In fact, this probability is not relevant, because Im looking for the best matching unit (for a vector from the set V which is the closest (in terms of euclid distance) to the input vector v_i )


    kind regards,
    Andreas
    Last edited by sirandreus; December 18th 2007 at 01:26 AM.
    Follow Math Help Forum on Facebook and Google+

  6. #6
    Grand Panjandrum
    Joined
    Nov 2005
    From
    someplace
    Posts
    14,972
    Thanks
    4
    Quote Originally Posted by sirandreus View Post
    hi Captain,

    why I need such high precision is descripted here: http://www.mathhelpforum.com/math-he...html#post90666

    In short words: having an (0,1)^n as an inputspace, and k vectors v1,...,vk in that space, this k vectors are mapped uniquely into one vector in (0,1)^n.
    This works by interleaving the decimal expansions of the vectors and thus a very huge vector can be created.


    Sorry CaptanBack, but I canīt figure out how your statement about indistinguishable distances is related to my problem
    As I understand you are talking about the probability that 2 vectors have the same digits (with a length of 13). In fact, this probability is not relevant, because Im looking for the best matching unit (for a vector from the set V which is the closest (in terms of euclid distance) to the input vector v_i )


    kind regards,
    Andreas
    You are doomed to failure unless you recast your problem.

    Almost all reals in (0,1) have non-terminating decimal expansions, which
    leaves you with infinitly long calculations.

    The interleaving of the decimal expansions is an "in-principle process" in
    practice it is impossible to do exactly.

    RonL
    Follow Math Help Forum on Facebook and Google+

  7. #7
    Newbie
    Joined
    Dec 2007
    Posts
    14
    ok lets assume exactness is not a major requirement. For me its more important if the modified euclid distance computation works (see first post)

    In practice you could use a max precision, such as 100000 digits. In practice nothing is exact or at least not efficiently possible
    Follow Math Help Forum on Facebook and Google+

  8. #8
    Newbie
    Joined
    Dec 2007
    Posts
    14
    Quote Originally Posted by CaptainBlack View Post
    You are doomed to failure unless you recast your problem.

    Almost all reals in (0,1) have non-terminating decimal expansions, which
    leaves you with infinitly long calculations.

    RonL

    Correct, but not an issue for me. Because I already have numbers in a finite representation before I do the interleaving.

    I dont care what this numbers should exacly be. I only see the numbers and assume that they are accurate. From a practical and numerical perspective there is nothing else you can do. I mean if your sensor gives you a 0.33333333335 thats it, you canīt ask yourself if it could mean 0.3333333333333.....33333....

    cheers,
    Andreas
    Follow Math Help Forum on Facebook and Google+

  9. #9
    Grand Panjandrum
    Joined
    Nov 2005
    From
    someplace
    Posts
    14,972
    Thanks
    4
    Quote Originally Posted by sirandreus View Post
    Correct, but not an issue for me. Because I already have numbers in a finite representation before I do the interleaving.

    I dont care what this numbers should exacly be. I only see the numbers and assume that they are accurate. From a practical and numerical perspective there is nothing else you can do. I mean if your sensor gives you a 0.33333333335 thats it, you canīt ask yourself if it could mean 0.3333333333333.....33333....

    cheers,
    Andreas
    1. Check your sensors real accuracy, 24 ADC's don't give 24 bit accuracy
    may be 20 bits. You may not have as much real precision as you think.

    2. Work with int64's (or if you have them int128)

    3. Look at getting an arbitary precision arithmetic packge/class (google for
    such for the programming language you intend to use).

    RonL
    Follow Math Help Forum on Facebook and Google+

  10. #10
    Newbie
    Joined
    Dec 2007
    Posts
    14
    Hi,

    captain thank you for your feedback.

    However, the main question of this thread remains still unanswered. Does the modified euclid distance work ? Please read first post for that.

    If no, is it possible to modify it so that it works ?
    Or is the idea fundamentally wrong ? I mean the idea of tokenizing the numbers and comparing the number part by part

    kind regards,
    Andreas
    Follow Math Help Forum on Facebook and Google+

Similar Math Help Forum Discussions

  1. Divide two large numbers
    Posted in the Advanced Applied Math Forum
    Replies: 1
    Last Post: May 17th 2011, 06:56 AM
  2. law of large numbers
    Posted in the Advanced Statistics Forum
    Replies: 0
    Last Post: February 12th 2011, 12:39 PM
  3. Replies: 1
    Last Post: April 24th 2010, 05:51 PM
  4. Law of Large Numbers
    Posted in the Advanced Statistics Forum
    Replies: 1
    Last Post: August 3rd 2009, 12:53 AM
  5. Euclid's theorem and primes numbers
    Posted in the Number Theory Forum
    Replies: 1
    Last Post: October 10th 2006, 08:33 AM

Search Tags


/mathhelpforum @mathhelpforum