Here are a few further thoughts on the problem. On an infinite grid, the best possible layout is something like this:

Code:

. . . . . . . . . .
. . . . . . . . . .
... 1 2 3 4 5 1 2 3 4 5 ...
... 4 5 1 2 3 4 5 1 2 3 ...
... 2 3 4 5 1 2 3 4 5 1 ...
... 5 1 2 3 4 5 1 2 3 4 ...
... 3 4 5 1 2 3 4 5 1 2 ...
... 1 2 3 4 5 1 2 3 4 5 ...
... 4 5 1 2 3 4 5 1 2 3 ...
... 2 3 4 5 1 2 3 4 5 1 ...
. . . . . . . . . .
. . . . . . . . . .

Here, each of the numbers 1,2,3,4,5 occurs with equal frequency, so the mean density is 3. To see that this is the best possible, notice first that every cell in the grid must either contain a 1 or be adjacent to a 1. But each 1 can only "cover" one cell together with its four neighbours. Therefore at least one-fifth of all the cells must contain a 1. Each remaining cell must either contain a 2 or be adjacent to a 2. But each 2 must have a 1 as a neighbour, and the cell itself together with its three remaining neighbours can only cover four cells. Therefore at least one-quarter of these remaining cells (or one-fifth of the total) must contain a 2. Continuing this argument, you see that in order to maximise the mean density, each number must occur with equal frequency, as in the above example.

In a large finite grid, it looks as though the above layout will again provide the optimal solution, except that some modification will be needed round the edges (for example, a 5 can never occur in a cell on the edge of the grid, and a 4 can never appear in a corner cell). As the grid gets larger, the proportion of border cells decreases, so for a sufficiently large grid it will be possible to get a mean density arbitrarily close to 3 (though never actually equal to 3).

Using that strategy on an 8×8 chessboard, the best total that I can get is 167 (or a mean density 167/64 ≈ 2.61), as follows:

None of this is any help with the original 5×5 problem, since the border cells then form much too great a proportion of the total.