Little trick for shortest path
A neat trick to calculate the shortest path between vertices of a connected graph. Wanted to share it and validate it's reasoning.
Let f(x) be a positive strictly monotone decreasing real-valued function over the +ve naturals (plenty of choices) and f(0) = 0. Say the initial vertex is A and the target is B.
a) Assign f(0) to all edges.
b) Assign edges incident on A i.e. having A as an end point, the value f(1) and consider the set of end points S1 of these edges which are not A.
c) For each member X of S1 repeat the step b) and collect the vertices resulting into a set S2.
d) repeat step c) obtaining successive sets Sn till Sn is empty. Obviously n can be at most equal to (number of vertices in the graph - 1).
Claim: The first time B appears in one of the sets Sm, m is the length of the shortest path.
Each vertex V can appear in an Sm exactly once. This is because at step m when V appears for the first time in S(m), all edges incident on V will be assigned an f-value. If V is reachable through a longer path it would be through one of the edges incident on V and since each edge already has an f-value, V will not figure in any set Sj where j > m.
Since the graph is connected, each vertex will certainly appear in one of the sets Sn. The initial vertex can be thought to be the the sole member of S0, if you please.
To find the shortest path from B to A:
1) Choose the edge incident on B which has the highest f-value. Consider the end-point B1 != B of this edge.
2) repeat the step 1) with B1 to get B2, getting successive Bn until you hit A. The sequence of Bi where B0=B upto Bk=A will be a shortest path from B to A.
If there were a shorter path it would be assigned f-values first by construction and given the highest f-values possible (remember f is monotone decreasing so that edges getting f-values first get higher f-values) along it's edges.
Again, you are guaranteed to hit A since the graph is connected.
Clean, constructive and quick too.
Observations on the number of steps involved:
Obs1) You only assign f-values to edges once. You run into each edge at most twice, since each of it's vertices will appear exactly in one of the sets Sn. The number of operations per edge is at most 3 - compare and assign the first time; compare (and dont assign) at most one other time. The number of operations is thus 3xNumber of Edges; besides you could stop assigning f-values when you encounter B (the target vertex). In case of sparse graphs the Number of Edges is a small multiple of the Number of Vertices - the work is then ord(Number of Vertices).
Obs2) The remaining work is in creating each of the sets Sn and involves sorting to guarantee uniqueness of each added vertex. We can eliminate the need for sorting, by attempting to assign f-values to the incident edges of a vertex V just as we are about to add it to a set Sn, a pre-assignment of f-values. This ensures uniqueness of elements in Sn (in cases there are multiple shortest paths leading up to an element in Sn) without the need to sort. This modification only re-organizes the work done in obs1) and does not add to it, since a vertex is chosen for addition to a set Sn only when there is an edge without an assigned f-value incident on it - pre-assignment ensures that you still run into an edge exactly twice. Since Sn are a partition of the vertices of the graph, the work involved including the work considered in obs1) is at most 3x(Number of Edges) + (Number of Vertices). For sparse graphs as in obs1) this is ord(Number of Vertices)
Obs3) Finally, the actual work of constructing the shortest path is at most equal to the number of edges of the graph (some trivial cases hit the bound). As in obs1), for sparse graphs this is ord(Number of Vertices). The total work done is ord(Number of Vertices). The memory overhead only involves the sets Sn which are a partition of vertices of the graph and f-values of the edges. The memory overhead is thus = (Number of Vertices) + (Number of Edges). For sparse graphs this overhead is as in obs1) ord(Number of Vertices).
-sj (for a friend who discovered it)
positive and negative costs
I think you are trying to work with a strict +ve monotone increasing f if you choose f(x) = x. Then you can construct the path using decreasing f-values. The f-values record the shortest path for you.
Diskstra (only for +ve edge costs) will have work ord(n*log(n)). This one has ord(n).
You are right to point out that the edge weights are all 1, but you can easily modify. Say the edge weights were +ve naturals, then when you run into an edge E with weight K you assign it the f-value f(m+k) where f(m) is the f-value upto Sm at which point the edge E is being considered in order to construct S(m+1). The proof works along the same lines as before.
If the costs are +ve real then you can come up with a small-enough constant C such that each edge cost is sufficiently well-approximated by N*C for some N. Then when you run into an edge E with cost K*C you assign the f-value f(m+K) where f(m) is the f-value upto Sm at which point the edge E is being considered to construct S(m+1). The proof is same as in the +ve naturals case.
For -ve weights you would first have to "adjust" all edge weights to be positive by adding a large enough +ve constant K1 to all edge costs and then use the f-value assignment for +ve weights case. The proof is obviously same as the +ve reals case.
It's was probably better to have the arbitrary cost considerations in a seperate message.
Explanation for the extension to positive natural edge costs
It is of course computationally very cheap.
As for this:
And as far as I can tell, the part marked in red above does not hold, because f(m) doesn't necessarily correspond to the shortest path from A to that vertex, just some path with the least number of steps, because we didn't account for comparisons of multiple paths to the same vertex like in Dijkstra's algorithm.
We are merely "breaking down edges" into "sequences of virtual edges" where each "virtual edge" is of cost 1. The step counts would be aggregate step counts of these "virtual edges". With this view you can directly apply the original argument.
As a concrete example if A-B costs 50 and A-C-B was another path with costs 20, 10, for the first path you would have the f-value sequence f(0+50) corresponding to A-B and for the other the sequence f(0+20),f(20+10) corresponding to the (A-C, C-B) path. The construction would discover the shorter path since f is monotone decreasing and not the path with the least number of steps.
The f-value at each point is a function of the (aggregate virtual) step count uptil that point and f being monotone we again see the shortest path characterized with a "minimum valued sequence of f-values" if I may use the phrase. You choose to ignore the f-values at virtual edges in these minimum valued sequences.