I'm use the maximum entropy principle to estimate the probability of observing a rare event. For example, if I rolled a fair, six sided, die I know the probability of rolling any number is 1/6 and the expected outcome as 3.5. However, if I observed something rare, say , I want to use the large derivation/maximum entropy principle and gradient descent to determine the a gibbs representation of the empirical distribution (the frequency of each event).

Would be grateful for any help.