1. Understanding Montgomery Multiplication

Hi
I am writing the code I explained at
Puppy Linux Discussion Forum :: View topic - Checking overflow in C
to implement Montgomery Multiplication for 1024 bit numbers. I am having a bit of a problem the theory behind it. I am attaching here the article I am referencing as a bitmap file. I understand what it is trying to say in equation 1, but how does it relate to the algorithm given at the bottom of the page? I can see that the loop from 0 t m-1 and the divide by 2 represent the multiplication by r^-1, but why add the M though (line 5 of algorithm). I know it has something to do with the division by M that should be performed (since we are working mod M) but I don't quite see the connection.
Also, when I tried the following numerical:
X=8 (01000b)
Y=11 (01011b)
M=17 (10001b)
with n = 5 (obvious, depending on the number of bits in binary representation of M; please see the line just below equation 1),
supplying the values of X, Y and M in equation 1 gives

88/32 mod 17 = 2 mod 17 = 2

However, taking their binary values and applying the given algorithm gives me the result 00111b (7 decimal)

Where did I go wrong???