The answer is twofold. A piece of differential calculus, and a piece of linear algebra.

Let me first set notations. Consider a diffeomorphism $\displaystyle \phi:U\to V$ where $\displaystyle U$ and $\displaystyle V$ are open subsets of $\displaystyle \mathbb{R}^n$, and $\displaystyle f:V\to\mathbb{R}$ be a measurable function (piecewise continuous, for instance) that is integrable on $\displaystyle V$. The change of variable formula is

$\displaystyle \int_U f(\phi(x)) |J\phi(x)| dx=\int_V f(y) dy$

where $\displaystyle J\phi(x)$ is the Jacobian of $\displaystyle \phi$ at $\displaystyle x$.

In particular, if $\displaystyle V$ has finite area, then

$\displaystyle {\rm area}(V)=\int_U |J\phi(x)| dx$.

Thus, indeed, $\displaystyle |J\phi(x)|$ measures the area enlargement at point $\displaystyle x$. Let me try to explain how this can be understood. (Note that I write $\displaystyle {\rm area}$ even if this would rather be called "volume" in dimension higher than or equal to 3)

First, remember (?) that $\displaystyle J\phi(x)=\det \left(\frac{\partial\phi_i}{\partial x_j}\right)_{1\leq i,j\leq n}$ can also be written $\displaystyle J\phi(x)=\det(d_x\phi)$, where $\displaystyle d_x\phi$ is the differential of $\displaystyle \phi$ at point $\displaystyle x\in U$. The differential of $\displaystyle \phi$ is the linear function that approximates $\displaystyle \phi$ at the best near $\displaystyle x$. In dimension 1, this amounts to approximating $\displaystyle \phi$ by its tangents. To state it rigorously, we have (by definition) $\displaystyle \phi(x+h)=\phi(x)+(d_x\phi)(h)+o_{h\to 0}(\|h\|)$ hence $\displaystyle \phi(x+h)\simeq \phi(x)+(d_x\phi)(h)$.

Imagine $\displaystyle U$ is a very small neighbourhood of $\displaystyle x_0$. Then $\displaystyle \phi(x_0+h)$ can be approximated by $\displaystyle \phi(x_0)+(d_{x_0} \phi)(h)$ for any $\displaystyle h$ (necessarily small) such that $\displaystyle x_0+h\in U$. Hence $\displaystyle V=\phi(U)\simeq \phi(x_0)+(d_{x_0}\phi)(U)$, and thus $\displaystyle {\rm area}(V)\simeq {\rm area}((d_{x_0}\phi)(U))$.

We are thus reduced to finding how the linear map $\displaystyle A=d_{x_0}\phi$ transforms the area of $\displaystyle U$. We need to find $\displaystyle {\rm area}(A(U))$ where $\displaystyle A$ is a linear (non-singular) map.

a) If $\displaystyle A(x)=\lambda x$ then it is well-know that $\displaystyle {\rm area}(A(U))=|\lambda|^n {\rm area}(U)$, hence $\displaystyle {\rm area}(A(U))=|\det A| {\rm area}(U)$ (remember I work in dimension $\displaystyle n$). Good point.

b) If $\displaystyle A$ is a rotation, then of course $\displaystyle {\rm area}(A(U))={\rm area}(U)$, hence $\displaystyle {\rm area}(A(U))=|\det A|{\rm area}(U)$ since the determinant of a rotation is 1. The same works for any orthogonal map $\displaystyle A$ (rotation + symmetry, determinant $\displaystyle \pm 1$).

c) If $\displaystyle A$ is a diagonal matrix $\displaystyle A=\begin{pmatrix} a_1 & & \\ & \ddots & \\ & & a_n\end{pmatrix}$, then $\displaystyle A$ maps the cube $\displaystyle [0,1]^n$ to the "rectangle" $\displaystyle [0,a_1]\times [0,a_2]\times\cdots\times [0,a_n]$ which has volume $\displaystyle |a_1|\times \cdots\times |a_n|$, and similarly $\displaystyle {\rm area}(A(C))=|a_1\cdots a_n|{\rm area}(C)=|\det A|{\rm area}(C)$ for any other cube (change its sidelength using paragraph *a)* and rotate it using *b)*). Then if we decompose $\displaystyle U$ into small cubes (up to an error tending to 0), we see that we have as well $\displaystyle {\rm area}(A(U))=|\det A|{\rm area}(U)$, even if $\displaystyle U$ is not a cube.

From a),b),c), we can deduce that $\displaystyle {\rm area}(A(U))=|\det A|{\rm area}(U)$ for any linear map $\displaystyle A$. Indeed, any linear map can be decomposed into a product of orthogonal and diagonal maps. This results from the "polar decomposition" (for matrices) or "OS-decomposition". Any linear map is the product of an orthogonal map and a symmetric map, and a symmetric map is orthogonally diagonalizable (on $\displaystyle \mathbb{R}$). This was the linear algebra part. Maybe someone can provide an intuitive explanation of the polar decomposition for matrices if you ask for one. I don't have one in mind right now...

To conclude the connection between both parts, we finally have $\displaystyle {\rm area}(\phi(U))={\rm area}(V)\simeq {\rm area}( (d_{x_0}\phi)(U))= |\det d_{x_0}\phi|{\rm area}(U)$ $\displaystyle =|J\phi(x_0)|{\rm area}(U)$ for a small set $\displaystyle U$ containing $\displaystyle x_0$. Heuristically, if "$\displaystyle {\rm area}(U)=dx$", "$\displaystyle {\rm area}(\phi(U))=\phi(dx)$" and we have thus proved "$\displaystyle \phi(dx)=|J\phi(x)|dx$". Summing over small pieces where $\displaystyle \phi$ is almost linear, we deduce that the integral formula $\displaystyle {\rm area}(V)=\int_U |J\phi(x)|dx$ is true for "large" sets $\displaystyle U$.

Differential calculus always plays with interactions between calculus and linear algebra: near a point, any (smooth) map is almost linear, hence we can take its determinant, invert it,... Looking more closely, we can use a second-order approximate of $\displaystyle \phi$, which is a quadratic map instead of a linear map. Then we can apply bilinear algebra to study local extrema... Plenty of nice things.