For the simplicity, I will describe the idea on a simple sample, but in real life I have a problem, which can be solved only with MLE.

Consider we have 1000 integer numbers (so-called outcomes) generated with Poisson distribution with some lambda /let's say lambda_true=10/.

There are various ways to estimate the lambda:

1) Build histogram indicating frequency of each outcome. Then fit it with Poisson. The initial data contains enough data to have a good fit. /Let's say, the estimation will be something like lambda_fit=9.8+-0.3/ Fine!

2) Build MLE function depending on a single variable lambda_MLE:

P = p1(lambda_fit)*p2(lambda_fit)*...*p1000(lambda_fit),

where pi(lambda_fit) is probability to observe outcome pi at given value lambda_fit.

Then we maximize P (of logP, it doesn't matter). /Let's say, We get something like lambda_fit=10.1/

Then how do we calculate an error? We read something like this: http://www.stat.umn.edu/geyer/old03/5102/notes/fish.pdf

and then calculate second derivative d^2(P)/d(lambda)^2, then divide it by n=1000 to get Fisher information I(lambda), then calculate var:

var(lambda_MLE)=-1/I(lambda).

The problem is that i get huge var(lambda) like lambda_MLE itself /for example, lambda_MLE = 10.1 +- 12/

I get this much error, because MLE function is very unsensitive to the change of lambda. It is, because most each particular outcome has low probability in Poisson, and thus P is multiplication of something like 0.1x0.05x0.1x0.2x... for lambda = 10. As the average probability is ~10%, lambda shift doesn't make it much lower or higher - the sensitivity is low, the second derivative is low, the var is high.

To conclude: we have nice low-error lambda when fitting histogram, but we have large error of lambda when calculate error through Fisher information.

The question is: what am I doing wrong? I believe that in correct way these two method must give the same answer not only for estimated lambda, but also for it's error. How to match these results?