OK, even asking the question is difficult.
I am trying to understand how part A works. At first blush it seems that "a" and "b" are just two different names for the same thing so naturally they end with results that agree with one another at the conclusion of part A
I am thinking that since the x + y, the expression that we are trying to define, contains two variables the idea is to say, "let x be any specific natural number, the result is the same", that is, the two statements forming the definition hold regardless if x = x_{r} , specifically, or if x = x_{s} specifically. That the reason for doing this is so that (x+y) is then a statement with a single variable and so its definition for all natural numbers is amenable to a proof by basic induction.
I was also toying with the idea that perhaps:
a = x_{r}+y' and then b = x_{s}+y',
just to let you know what I am struggling to understand.
So, the underlying question is, what distinguishes "a" from "b". Thanks.
The point of the theorem is to prove that there exist a unique number which corresponds to x+ y. They are, basically, saying suppose there are two such numbers, a and b, and then showing that they must be the same. You are not distinguishing "a" from "b" because you are assuming that there is only one such number to begin with.
I went through my version of what I believed part A was about in my post
Proving a fundamental Theorem using PMI
And Deveno corroborated my analysis, and I believe you are pointing the same direction.
I paste a copy of that proof below as a convienence.
The problem I am having is that the proof seems like a tautology.
It seem like, for example ... I have machine call it "A" that can construct an equilateral triangle out out three equal length line segments. I have a second identical machine, call it "B" that can also construct an equilateral triangle out of three equal length line segments, both machines are programmed using the same code. Ergo I have shown that I have a machine that produce a UNIQUE triangle given three line segments of equal length.
Maybe such an analogy is not helpful but that is exactly what the proof I offered below seems to do!
Additionally my reading indicates that inductive proofs with two variables are sometimes resolved by allowing one variable to be held constant to prove the case inductively for the remaining variable, and then using a second inductive case to recursively prove the first case is still true when the variable that was previously held constant is now returned to a variable status.
Finally it seems to make sense that something should distinguish, the case: a = x+y, and case: b = x+y.
I DO continue to think about this and perhaps eventually will see where I making an obvious error (2+2 = 5), but for now, I just don't see how even part A of theorem #4 is meaningful if my analysis (below) is correct.
---------------
My version of the first half of the proof, Part A.
This theorem introduces and defines the addition operator designated by the infix symbol “+”, i.e “x + y”.
Using a prefix operator symbol one can write “x + y” = sum(x,y) which I prefer as an aid to my thinking process.
So the claim is that given that x, y are elements of N, where N is the set of Natural numbers, sum(x,y) is a function that produces a unique natural number, say “s” . sum(a,b) is required to have the properties:
Property 1: sum(x, 1) = x’
Property 2: sum(x, y’) = (sum(x, y))’
The first half of the proof says let x be a particular fixed natural number and show that sum(x,y) produces a unique “s” no matter what value of y is chosen.
The proof then refers to ay and by but does not say what either of those refer too. I assume to the experienced eye their meaning is obvious but I have considered three different possibilities and finally settles on this: (am I right? A critical question.)
Suppose sum(x,y) is not the only possible function that has the two required properties, suppose there is a
sum_A(x, y) = ay and a sum_B(x, y) = by ,
each of which subscribe to the two properties but when loaded with the same values for “x” and “y” produces different results. The proof then endeavors to show that each case produces the same result therefore they are only two different names for the same function and so
sum_A(x, y) = sum_B(x, y) = sum(x, y).
Now then with x a fixed natural number and assuming both sum_A(x,y) and sum_B(x,y) subscribe to the two required properties we say:
Let “M” be the set of natural numbers for which both functions, sum_A and sum_B produce the same results. We can write for both functions, since they each, by inductive hypthosis, subscribes to property 1, when y = 1 :
sum_A(x, 1) = x’ and
sum_B(x, 1) = x’
Therefore sum_A(x, 1) = sum_B(x, 1) when y =1 and so 1 is an element of M.
Now we come to the critical point for the first half of the proof. The author writes,
if “y” is an element of M then
1: sum_A(x, y) = sum_B(x, y) (by “inductive hypothesis” … is that correct?)
Since both sum_A(x, y) and sum_B(x, y) are expressions for the same natural number it follows by the uniqueness of a successor that:
2: [sum_A(x, y)]’ = [sum_B(x, y)]’
Finally, using property 2 from the right hand side back to the left hand side:
3: [sum_A(x, y)]’ = sum_A(x, y’), and
4: [sum_B(x, y)]’ = sum_B(x, y’),
So that 2: becomes
5: sum_A(x, y’) = sum_B(x, y’),
which shows that what is true for the inductive hypothesis with argument “y” : sum_A(x, y) = sum_B(x, y)
is also true for “y’ ” : sum_A(x, y’) = sum_B(x, y’),
therefore M contains all natural numbers y and so it is established that for a fixed “x” there is only one function:
sum(x, y), that subscribes to properties 1 and 2 and hence defines that defines addition for a fixed value of x.
I think I sort of understand your difficulty.
What we want to prove is that:
$\text{sum}(-,-)$ defines a function $\Bbb N \times \Bbb N \to \Bbb N$. This lets us treat a sum $x+y$ like any other natural number.
The proof actually uses a technique called "currying", where instead of seeing $\text{sum}(-,-)$ as a function of TWO variables, we create an "intermediate" function:
$\text{sum}(x,-): \Bbb N \to \Bbb N$, and then have the function:
$f: \Bbb N \to \Bbb N^{\Bbb N}$, where $f(x)$ is the FUNCTION $\text{sum}(x,-)$.
For $f$ to be a function, the function $\text{sum}(x,-)$ has to be uniquely defined, for every natural number $x$.
Why do we do this in such a fashion? Because we want to apply induction to "one variable at a time". Addition takes TWO variables.
Now, once we know $f$ is a bona-fide function, we do this:
$\text{sum}(x,y) = [f(x)](y)$.
In your text, the image $[f(a)](y)$ is denoted by $a_y$ (presumably, so the author didn't have to introduce an extra letter for the function "$f$").
There is something deep, and left implicit going on, here:
There is a bijection between binary operations on a set:
$S\times S \to S$
and functions from $S$ to the set of functions on $S$ to $S$.
The set of functions from one set $A$ to another $B$ is written $B^A$, and the cartesian product $A \times A$ is written $A^2$, so the above says:
$S^{S^2} \cong (S^S)^S$
This is a bit complicated to explain, but I'll try. Suppose we have such a binary operation:
$(x,y) \to x\ast y$.
For a FIXED $x \in S$, we can create the function $L_x: S \to S$ by: $L_x(y) = x\ast y$. The function that sends $x \mapsto L_x$ is the image of our binary operation $\ast$.
On the other hand, suppose we have a function from $S$ to the set of functions $f:S \to S$. It's handy to call the function we derive from $x \in S$ as $f_x$.
This, in turn, allows us to define a binary operation:
$x\ast y = f_x(y)$.
Now it helps to have "names" for these correspondences, so let's call the mapping which sends $\ast(-,-) \mapsto L_{(-)}(-)$, by the letter $F$, and the mapping which sends $f_{(-)}(-) \mapsto \ast(-,-)$, by the letter $G$.
What happens if we compose $G \circ F$?
Well for a GIVEN binary operation $\ast$, we get the mapping $L_{(-)}(-)$ (this takes the "first argument" of $\ast$ and "hits the second argument from the left").
It should be clear that $G$ takes $L_{(-)}(-)$ and just returns $\ast$, our original binary operation.
Similarly, if we start with a function that takes an element $x$ of $S$, and creates a function $f_x$ (I am calling this function $f_{(-)}$, so that $f_{(-)}(x) = f_x$), and then use $G$ to create a binary operation from that, when we apply $F$ to that binary operation, we will recover $f_{(-)}$. In other words, $F$ and $G$ are inverse functions, so both are bijections.
I'm afraid all of this is a bit abstract, so let's look at a baby example:
Suppose $S = \{a,b\}$ and we define our operation like so:
$a\ast a = a$
$a\ast b = b$
$b \ast a = b$
$b \ast b = a$.
We are going to get TWO functions out of this, which I will call $L_a$ and $L_b$. Explicitly:
$L_a(a) = a$
$L_a(b) = b$
$L_b(a) = b$
$L_b(b) = a$.
So $F$ changes $\ast(x,y)$ to $L_x(y)$, and $G$ does the opposite.
Unfortunately, even with an $S$ this small, there are 16 possible binary operations we could define, and showing each and every one of them can be modeled by a curried function would be tedious.
Thank you for your explanation, and in particular in addressing my specific question concerning the meaning of “a_{y}”. I don’t know if I would have ever figured that out on my own. I am afraid that understanding theorem #4 must once again be put on hold while I digest this background material.
I have read your post many many times and in conjuction with some online research, have come to a certain understanding plus a couple of questions. My questions however come in the context of my current understanding. After many, many rewrites here is what I think I know, condensed.
Let sum(x,y) = z = x + y, z:NxN --> N
In this scenario, for (x,y) = (x_{1},y_{1}) all values of z are eliminated from the codomain, except the unique correct value, by use of the SINGLE OPERATION of directly locating (x_{1},y_{1}) on th x/y plane and looking above it for the value of z = x_{1}+y_{1}.
Another scenario/procedure for locating the z corresponding to (x_{1},y_{1}) is to, in a first operation, go out along the x axis to x_{1} and thereby eliminate all values of z except those corresponding to the points:
{(x_{1},1), (x_{1},2), (x_{1},3), … (x_{1},y)},
Then in a second operation, use the value of y_{1} to eliminate all values of z except the one above
(x_{1},y_{1}) , again z = x_{1}+y_{1}.
The second procedure can be generalized to produce the points:
{(x,1), (x,2), (x,3), … (x,y)} with corresponding values of
z = {x+1, x+2, x+3, … x+y},
but {x+1, x+2, x+3, … x+y} is a set of expressions that, if they can be shown as defined, express a function that we can write as
[f(x)](y), f(x) = x+-, f:N-->N^{N}
At this point then, we have established that:
[f(x)](y), is the POSSIBLY the curried equivalent of sum(x+y) … (there is still the question of bijection, going form the curried form to the binary domain set form).
Question#1:
Is the foregoing accurate, for example can I interpret S^{SS} =(S^{S})^{S} in terms of it?.
Questions #2:
You have written:
#1 sum(x,−):N→N, and then have the function:
#2 f:N→N^{N}, where f(x) is the FUNCTION sum(x,−).
Now #2 seems to say that you can write
f:N→N^{N} as ..... sum(x,-):N→N^{N}, which, from #1 and #2 gives both
#1 sum(x,−):N→N, and
#3 sum(x,-):N→N^{N} i.e. the same function having two different codomains.
Now I am thinking that #3 actually encompasses #1 and you may have written #1 to make a point about the ultimate codomain of sum(x,-) once it has been applied to y. (?)
That's it ... so before going further I thought I would check in and see if I am headed North or South.
Thanks.
Wait, I just had a thought ... isn't a recursive function just a function whose output is looped back to its input. Pairing it's output sequence with a set of successive natural numbers defines an equivalent function ... the base case, the domain, the codomian are the relevant particulars. Maybe this is too obvious to mention, or perhaps I am squinting too hard.
This is OK, except you have the LHS wrong for the "binary side": it should be S^{SxS}, not S^{SS}.
#3 is written wrong. I'll endeavor to explain. A function has 3 parts:Questions #2:
You have written:
#1 sum(x,−):N→N, and then have the function:
#2 f:N→N^{N}, where f(x) is the FUNCTION sum(x,−).
Now #2 seems to say that you can write
f:N→N^{N} as ..... sum(x,-):N→N^{N}, which, from #1 and #2 gives both
#1 sum(x,−):N→N, and
#3 sum(x,-):N→N^{N} i.e. the same function having two different codomains.
Now I am thinking that #3 actually encompasses #1 and you may have written #1 to make a point about the ultimate codomain of sum(x,-) once it has been applied to y. (?)
That's it ... so before going further I thought I would check in and see if I am headed North or South.
Thanks.
We write $f:A \to B$, or sometimes $b = f(a)$ to say three things:
1. We start with elements of $A$. The set $A$ is called the domain of $f$.
2. We assign to each element of $A$, a UNIQUE element of $B$. The set $B$ is called the co-domain of $f$. Note that not all of $B$ may be "seen" or "covered" by $f$.
3. The assignment of such pairs $(a,b) = (a,f(a))$ is often called the RULE of $f$. But the rule doesn't have to be given "explicitly", for example, if:
$A = \{1,2\}$ and $B = \{1,2,3,4\}$, we can write: $f:A \to B$ given by $f(a) = 2a$, or we could simply write $f = \{(1,2),(2,4)\}$.
The important thing to realize is $f$ and $f(a)$ are two different things. The first is the function itself: ALL the ordered pairs $(a,f(a))$, the second is just some particular "second coordinate" (some element of $B$ that is the IMAGE of some $a \in A$) of $f$.
One often sees things like: "Consider the function $x^2$", and this is very bad. The proper thing to say is: "Consider the function defined by $f(x) = x^2$, for all $x$ in the domain of $f$".
The reason I bring this up is it is important to distinguish between the function $f:x \mapsto \text{sum}(x,-)$ and $\text{sum}(x,-)$, which is just the IMAGE $f(x)$. It's a bit hard to wrap one's mind around, because $f(x)$ in this case, is not a NUMBER, but a function.
For example, $f(2)$ is the function: "Add 2", and $f(1)$ is the function: "Add 1" (you might recognize this as the successor function, although we can't DEFINE the successor function this way, because we are using the successor function to define "add").
In effect, what we are doing is defining a single operation: "$x+y$", by defining an entire FAMILY of functions (in one variable): $x+(-)$ (we get a different function for each natural number $x$, which should make sense: if $x$ and $z$ are different numbers (in the naive intuitive sense) then "Add $x$" and "Add $z$" should give different results, even when applied to the same natural number $y$).
Now to define "Add $x$", we use the inductive nature of the natural numbers: we define what "Add $x$" means first for 1:
$(\text{Add}\ x)(1) = \text{sum}(x,1) = x' = \text{suc}(1)$.
This is, in effect, DEFINING $x + 1 = x'$.
We next need to define "Add $x$" for every OTHER natural number. Since on this set (every other natural number but 1), the successor function is ONTO, we can just define what "Add $x$" means for $y'$ (Either $y = 1$, and we already know what THAT means, or $y = w'$ and we repeat the process with $w$).
$(\text{Add}\ x)(y') = [(\text{Add}\ x)(y)]' = \text{suc}((\text{Add}\ x)(y))$
This is defining $x + y' = (x + y)'$.
So, let's use this to add 2 and 3. I'll use s(x) for x', to make it easier to keep track.
$2 + 3 = s(1) + 3 = s(1) + s(2) = s(1) + s(s(1)) = s(s(1) + s(1))$ <---first application of our recursive rule.
$ = s(s(s(1) + 1))$ <---2nd application of our recursive rule (since $s(1) + s(1) = s(s(1) + 1)$).
$ = s(s(s(s(1))))$ <---3rd application of our recursive rule, now back to the "initial point" (that is, $s(1) + 1 = s(s(1))$), which eliminates the plus sign.
$ = s(s(s(2)))$ <---using the abbreviaton 2 for s(1).
$ = s(s(3))$ <---using the abbreviation 3 for s(2).
$ = s(4)$ <---using the abbreviation 4 for s(3).
$= 5$ <---using the abbreviation 5 for s(4), note we are just saying here that 5 is the fourth successor of 1.
Note how, when we do this, we in effect, bring the "s's" to the outside, so we can make a direct evaluation. This may seem a bit complicated, but it is how we teach MACHINES to add.
A child, however, would see this as:
2 + 3 = 2 + (1 + 1 + 1) = (2 + 1 ) + (1 + 1) = 3 + (1 + 1) = (3 + 1) + 1 = 4 + 1 = 5, perhaps starting off at 2 fingers, and counting off: 3,4,5.
And that will bring you to the "next step" with adding, we need to show it doesn't matter how we "lump-sum", that is, addition is ASSOCIATIVE:
a + (b + c) = (a + b) + c.
This will prove handy, later on, because all those parentheses are annoying.
OK. Before even going into your post, I thank you for your patience, my abilities are are probably of the mean but my interest is exceptional.
I do hope and trust others are gaining by your perseverance as well.