1. Numbering Turing Programs

Does this procedure give an effective way of assigning unique Godel numbers to Turing programs?

Let $p_i$ denote the $i$th prime number. Let $Q = \{q_0,q_1,\dots\}$ be internal states, let $\{B,1\}$ denote tape symbols and let $\{L,R\}$ denote direction symbols. Assign a code number to each of these symbols as follows: $n(B) = p_1, n(1) = p_2, n(L) = p_3, n(R) = p_4, n(q_k) = p_{k+5}$.

Now suppose $L$ is a line of a Turing program $P$. Then $L = (q_i,s,q_j,s',X)$ where $s,s' \in \{B,1\}$ and $X \in \{L,R\}$. Assign a code number to $L$ as follows: $n(L) = p_1^{n(q_i)}p_2^{n(s)}p_3^{n(q_j)}p_4^{n(s')}p_5^{ n(X)}$

Lastly suppose that $P = \{L_1,...,L_k\}$ is a Turing program. Assign a code number to $P$ as follows: $n(P) = p_{n(L_1)}^{n(L_1)}\cdots p_{n(L_k)}^{n(L_k)}$.

2. Re: Numbering Turing Programs

I believe this coding does assign unique numbers to programs, but it can be made more economical. First, you can encode a program $\{L_1,\dots,L_n\}$ by $p_{n(L_1)}\cdots p_{n(L_k)}$ (this allows unambiguous recovering of $n(L_1),\dots,n(L_n)$). Also, instead of $n(B) = p_1$, $n(1) = p_2$, ..., $n(q_k) = p_{k+5}$ you can have $n(B) = 1$, $n(1) = 2$, ..., $n(q_k) = k+5$.

3. Re: Numbering Turing Programs

So when we are assigning a Godel numbering to the Turing program, the only important thing is that each program be assigned a unique natural number, correct? My text is really vauge on this point, so I was uncertain whether each symbol, formula, etc. needed a unique Godel number.

4. Re: Numbering Turing Programs

We need two facts: (1) each program is assigned a unique natural number and (2) we can determine the same properties given a program and given its number. The first fact is well-defined mathematically while the second is not. We have intuition about what a computer can find out about a Turing machine (TM): e.g., it can find the number of characters and instructions, run it for 100 steps to see if it stops, determine if it loops within the first 10^5 steps, etc. However, we don't have a mathematical definition of a computer yet; that's why we are working with TMs to begin with, and TMs work with numbers, or, rather, with two-character strings. Therefore, we have to encode a machine as such a string, but we want another machine to be able to discover all facts starting from this string code that our intuition says a computer is able to discover about the original TM. In other words, a TM as a sequence of instructions, i.e., tuples $(q_i,s,q_j,s',X)$, must contain the same information as its two-character encoding. This claim is not a theorem; rather, it is the idea behind the definition of the encoding. Note that (1) above is, in fact, a consequence of (2).

To get an example of a bad encoding, consider an injection $f:\mathbb{N}^5\to\mathbb{N}$ whose inverse is not computable and encode instructions $(q_i,s,q_j,s',X)$ as $f({n(q_i)},{n(s)},{n(q_j)},{n(s')},{n(X)})$. Then another TM would not be able to parse such instructions and, therefore, determine the run-time properties of the TM. So, we can extract fewer properties from the code than from the original TM. For another example, let $\mathop{\mbox{halt}}(P)=1$ if a TM $P$ stops on the empty tape and $\mathop{\mbox{halt}}(P)=0$ otherwise. Then let $n'(P)=p_1^{n(P)}p_2^{\mathop{\text{halt}}(P))}$ where $n(P)$ is your encoding. Now, we can get more information from encodings than from TMs: we can find out if they halt, which is impossible (halting problem).

Ideally, instead of TMs we should have a model of computation that works not with binary strings but with arbitrary datatypes (lists, trees, etc.). Then we could feed TMs as input to other TMs directly, without encoding through prime numbers and so on.