As far as there are two trials and y is observed number of heads likelihood function looks as follows:
To obtain estimate based on likelihood maximize L() over two possible values for p=1/4 or 3/4. For
y=0 you have L(p)=(1-p)^2 which is maximized by p=1/4. For y=1 we have L(p)=2*p*(1-p) is to be maximized
by both p=1/4 and p=3/4. Analogously for y=2 the p=3/4 would be maximum likelihood estimate.
Second is invariance which is one of basic properties of maximum likelihood estimates. If L(teta) is likelihood function
then L'=L( t^(-1) (beta) ) is likelihood function for beta. As far as teta' is maximum of L the beta'=t(teta') is maximum of L'.
As a consequence of invariance you have that ML estimate is generally biased.