I imagine this is a very basic question, but I am having a real problem finding the answer. I think the big issue is I don't know the proper vocabulary to express my question in mathematically, so my searches just aren't turning up the right kinds of results.
Anyway, say I have an arbitrarily large number of points in space (I have upwards of 20,000 in my data set). I know the distances or path lengths between some of them, but not all of them. How do I go about using the known path lengths to calculate the unknowns?
I know how to do this by hand with a small number of points. That's easy. But there must be some sort of method for determining this based on very very large data sets?
So, to make my example more clear, say I have this matrix of values. Each value is the distance between the point indicated by the row and the point indicated by the column. Blank cells indicate that the distance for those points is unknown.
A B C D A 0 5 10 7 B 5 0 2 C 10 0 D 7 2 0
So, in this simplified case, we have four points. A, B, C, and D. We know the distances between A and B, A and C, A and D, and B and D. We do not know the distances between B and C or between C and D.
What sort of method would solve for those values?
No. The points are not arranged linearly. Even though we know the distances between points, we don't know their absolute positions, only the relative distances.
We don't even necessarily know the dimensionality of these points - it could be in 2, 3, or more dimensions. I imagine the degree of dimensionality will change the calculations required, however.
I just don't understand what you mean by "properties." Exactly what information do you need?
Wouldn't that only be true for increasing dimensionality? For example, if this is in a one-dimensional plane, there is only going to be one correct value for each distance. In a two-dimensional plane, there could be 2 possible correct values, etc. However, there aren't going to be an arbitrarily large number of possible correct distances because they are constrained by the distances we do know. And my real data set contained thousands upon thousands of values, which actually constrains things even further - say we have 2210 known distances from one point to other points and only one unknown, there are not going to be infinite possible values for that unknown.
There has to be some way to be able to calculate the distance, as by a formula or algorithm. Essentially, you need to know the properties of the "space" that you are working in.
Given a large data set, it is possible to fix a certain distance, but you would still need know about the properties of the "distance". For instance, is it Cartesian? (i.e. d^2 = x^2 + y^2) or is it a "taxi cab metric" |d| = |x| + |y| (imagine a staircase going from one point to another)? How are you defining distance?
I see. We are treating our data as Cartesian.
In our situation we are actually working with a dissimilarity matrix. So the "distance" is actual just the difference between dissimilarity scores between the stimuli specified by the row and that specified by the column. In our case, the dissimilarity "scores" are determined by examining neuronal activity patterns. So, the "distance" represents the degree of difference between these firing patterns.
That is our specific data set. However, I am interested in this technique for a variety of reasons not limited to analysis of this particular data set. So even if you think this is not an appropriate method for our data, I am still interested in figuring out how to do it, assuming Cartesian distance.