What is the right narrative (ie in words, not in symbols) of this definition of a maximum likelihood estimator (Penzer, LSE):

Let me try.

We have a sample

. This sample is parametrised by a parameter

which can take values in a parameter space

.

Then we have a set of all possible likelihood estimators given that sample and that parameter. Then the maximum likelihood estimator MLE

"hat" is the least upper bound of this set of all possible likelihood estimators.

is that right? I want to check that I understand the notation fully.

Also, is this the same as another definition of MLE (Casella and Berger): theta hat is a parameter value at which the likelihood function attains its maximum as a function of theta. Why is here a clear cut 'maximum' while in the above definition it is a 'supremum' - does that mean that there is a possibility that this value is not a part of the set of all likelihood estimators?