Laplacian smoothing is quite simple. It is defined by the following formula:

p(A_i) = [N(A_i) + 1]/[F + N(A)],

where, A_i is the i-th event, N(A_i) is the number of A_i (where A_i may be a symbol in a text, for instance), and N(A) is the total number of A (the total length of a text, for instance). This smoothing technique ensures that a probability be assigned to those symbols that do not appear in a text, i.e. when N(A) = 0, those symbols get a uniform probability value.

Now, Lidstone's law is a modification of Laplacian smoothing, where the "1" in the numerator is replaced by 0.5 and N(A) in the denominator is replaced by 0.5*N(A). A more general approach to this is letting parameter LAMBDA take a value of 0.5 or 0.05 or 0.01 etc...

In my case, I have an alphabet of 2^64 symbols, or letters, call them as you wish. I have a sample where only 10 million out of the 2^64 symbols happen. I calculate the frequency of each, but now I need to do smoothing so I don't leave the other symbols without assigning a probability. This is useful when calculating the entropy, for instance. As you may understand, if I use Laplacian smoothing or Lidstone's law, symbols with a frequency of, say, 50%, will be re-assigned a smoothed probability value which is extremely small, because 0.5*2^64 will give a HUGE denominator value.

My question is, given Lidstone's law with parameter LAMBDA , can I use for LAMBDA an extremely small value, say 0.5*10^(-10) in order to smooth the probabilities? Is this smoothing correct? What else would you suggest?

Please, I need some input for this problem.

Thanks a lot!