# Thread: Optimal algorithm for code cracking

1. ## Optimal algorithm for code cracking

Hi! I've thought of a problem concidering door codes. A door code consists typically of 4 digits, i.e. any number between 0000 and 9999. When I enter 5 digits, for instance 1 2 3 4 5, I test two codes if they match the correct code, namely 1 2 3 4 and 2 3 4 5. For every new digit I enter, I test the code for this digit combined with the three preceding digits. I've wondered what the optimal way of choosing my entered number sequence could be, namely the least total amount of numbers in the sequence that has to be used to test all 10000 combinations. If there is a way to construct the sequence so that every new guess is unique, the total amount of numbers in the sequence would be 10003. But using a simple brute force approach, namely entering 0000 0001 0002 etc. I guess several codes multiple times, entering a total amount of 40000 digits. So there has to logically be an optimal way of entering the sequence with a total amount of between 10003 and 40000 numbers. I have thought of the problem a little bit, but I haven't been able to draw any further conclusions. If anyone has an answer to this question, I won't use the algorithm for any bad purposes, I promise . I'm just interested from a mathematician's point of view.

Thanks in advance.

/Simon

2. A sliding window of length four.

abcd
.bcde
..cdef
...defg
....efgh

efgh is completely independent of abcd.
The sequences:
ABCD
EFGH
IJKL
MNOP
then the next number greater than ABCD or ABCD+1

0001 2501 5001 7501 . 0002 2502 5002 7502 . 0003

Just a wild random thought.
Actually, I never gave it much thought, and it shows.

Just print all the possible choices and use a computer
program to search the shortest path thorough it.
Djikstra (<- spelling?) comes to mind.

3. After several attempts -- NO EASY WAY.
All tries created some excess beyond the optimum 10003 digits.

The smallest sequence I could generate that contains all the
the numbers 0000 to 9999 is 10120 characters in length.

How can one find a specific 4 digit number in the 10000+ character string, without doing a sequential search? That is the wall, at the moment, that is stopping any advance.

Interesting question.

4. I've found that if there is a hypothetical optimal 10003 digit sequence, there can only be 3 different compositions of the digits in the first three and the last three digits combined. If the first 3 digits are of the form a,b,c where $\displaystyle a \neq b, b \neq c, a \neq c$ then the last 3 digits should be of the form a,b,c. And if the first 3 digits are of the form a,b,a, then the last 3 digits should be either c,b,c or b,b,b (again $\displaystyle a \neq b, b \neq c, a \neq c$). If the composition is:

a,b,c ... a,b,c or
a,b,a ... c,b,c

the numbers a, b and c should occur 999 times in the 9997 digit middle section, and the rest occurs 1000 times (3*999 + 7*1000 = 9997). If the composition is:

a,b,a ... b,b,b

the number a should occur 999 times and b 998 times, and the rest 1000 times (998 + 999 + 8*1000 = 9997). The composition b,b,b ... a,b,a is equivalent to this, since any optimal sequence could be reversed and still be optimal.

The proof for these conclusions is a bit messy, but it's not that hard to understand. Each number has to occur exactly 4000 times in different codes in total, and that's basically the only assumption you have to make. So if there is an optimal sequence of 10003 digits, these conclusion at least exclude some of them. I suppose there are several other different ways to exclude different 10003 digit sequences, but I can't think of any right now. I suppose an argument for the occurance of an optimal 10003 sequence could be to study a smaller system and show that there is an optimal sequence. For instance if you consider 2 digits codes constructed with 3 numbers, instead of 4 digit codes from 10 numbers. For this smaller system there is an optimal code, namely: 0 1 1 2 2 1 0 0 2 0, where every different two digit combination occurs only once.

5. Originally Posted by sitho
Hi! I've thought of a problem concidering door codes. A door code consists typically of 4 digits, i.e. any number between 0000 and 9999. When I enter 5 digits, for instance 1 2 3 4 5, I test two codes if they match the correct code, namely 1 2 3 4 and 2 3 4 5. For every new digit I enter, I test the code for this digit combined with the three preceding digits. I've wondered what the optimal way of choosing my entered number sequence could be, namely the least total amount of numbers in the sequence that has to be used to test all 10000 combinations. If there is a way to construct the sequence so that every new guess is unique, the total amount of numbers in the sequence would be 10003. But using a simple brute force approach, namely entering 0000 0001 0002 etc. I guess several codes multiple times, entering a total amount of 40000 digits. So there has to logically be an optimal way of entering the sequence with a total amount of between 10003 and 40000 numbers. I have thought of the problem a little bit, but I haven't been able to draw any further conclusions. If anyone has an answer to this question, I won't use the algorithm for any bad purposes, I promise . I'm just interested from a mathematician's point of view.

Thanks in advance.

/Simon
Hi Simon,

What you want is called a 10-ary de Bruijn sequence. See De Bruijn sequence - Wikipedia, the free encyclopedia. In the notation of the Wikipedia article, you want B(10,4). The length of the sequence is 10^4, but that's because it is cyclic. In order to get full coverage for your combination cracking you would need to wrap around when you hit the end of the sequence and enter the first 3 digits in the sequence for the second time; so you can crack the lock with 10,003 key presses.

You can find algorithms for generating de Bruijn sequences in "The Art of Computer Programming", Volume 4, Fascicle 2, "Generating All Tuples and Permutations", by Knuth.