numerical problem: efficient euclid distance with very large numbers

Hi all,

Iīm looking for an efficient solution for the following problem:

Let V be a set containing m vectors V={v1,v2,...,vm}. Given an input vector vi I need to find the vector from V which has the smallest euclid distance to vi (find best matching unit).

Normally not an issue. However, every element of Vīs vectors and the input vector vi always have a value from the open interval (0,1).

example 2D vector:

v1 = (0.0123123423423 , 0.00023567887)

Assume that every number has several thousand or even million digits. So practically spoken, if you implement euclid distance for this kind of numbers, you will need special high precision libraries, where every mathematical operation (like add, mult or sqrt) will run many times slower than with primitives such as double.

Recently, I had the idea to tokenize a long number into smaller peaces and execute the euclid distance only on this peaces.

so for instance lets assume we are in 1d (for simplicity) and we have the following two vectors v1, v2 and the input vector vi

v1 = 0.001222345566

v2 = 0.123456649900

vi = 0.1923343233232

lets say we are tokenizing the numbers in peaces of length 5, then the first token (t1) for each vector would look like this:

v1_t1 = 00122

v2_t1 = 12345

vi_t1 = 19233

(here I am leaving away "0." since it is shared by all numbers)

here we see that the 5 first numbers of v2 are closer to vi than v1īs. Thus whatever comes after this 5 numbers, v2 will always be closer to vi. And thus vi is the best matching unit.

If the first tokens of v1 and v2 had the same (or very close) distance to vi, then the the second token containing the next 5 numbers would be compared.

The questions:

does this approach make sense ? (with approach I mean tokenizing the numbers and comparing them part by part)

Am I overlooking anything fundamental ?

How can this idea be improved ?

edit @ 18.12.07, 17:56 :

for new readers and future responses : Please accept the fact that Iīm using huge numbers, this is simply a requirement and canīt be changed.

thanks in advance,

Andreas