The following two images should be enough to illustrate the remaining parts of the algorithm.

So, once you have an arrangement for most significant digits, choose the top row of least significant digits, for example:

This corresponds with the beginning output of the program, above. (0's stand in for empty slots.)

This is just one of 4! possibilities for the top row; the program is written to exhaustively try all of them. But as it turns out, the very first one produces a regular square.

Now, consider that we only have two options for least significant digit for each cell.

To see why, consider the cell in the second row, first column, containing the numbers 1 (blue) 2 3 (red). We can't have 11 because that's already in the first row. And we can't have 10 because we already used 0 as least significant digit in the first row, for that column. So that leaves 12 and 13.

Now, suppose we choose 12 to go into that cell. Then notice that the cell in row 4, column 1 must become 31. Consequently the cell in row 3, column 1 must become 23. Do this throughout the entire square and see that choosing that initial 2 in row 2, column 1, uniquely determines all other cells. The result is the regular square I mentioned above.

As it turns out, there are 24 different initial arrangements of most significant digits. Each of those has 4! possible first rows. Each of those has 2 possible candidate squares.

So the program (if finished) would be finding and testing 24 * 4! * 2 candidates.

The only part I didn't explain is how a candidate can fail to be a regular square. If we follow all the steps above, we are not guaranteed to have all numbers in {0, 1, ..., 15} represented. We have to check the candidates for that property.

For the purposes of counting how many 4 by 4 regular squares exist, we would need to specify whether two squares are equivalent up to permutations of {0,1,2,3}, rotation, reflection.