it's just a notational issue. the image of is an element of , and the elements of are themselves mappings from X to X.
so one ought to write: the mapping that takes .
so instead of the image of being a "point" it's an "arrow".
to amplify, usually the action is "hidden", one just defines what you mean by g.x (often written as simply gx, or sometimes x → g(x), which is what happens when you call "the function g").