Optimal algorithm for code cracking

• Feb 23rd 2009, 03:33 PM
sitho
Optimal algorithm for code cracking
Hi! I've thought of a problem concidering door codes. A door code consists typically of 4 digits, i.e. any number between 0000 and 9999. When I enter 5 digits, for instance 1 2 3 4 5, I test two codes if they match the correct code, namely 1 2 3 4 and 2 3 4 5. For every new digit I enter, I test the code for this digit combined with the three preceding digits. I've wondered what the optimal way of choosing my entered number sequence could be, namely the least total amount of numbers in the sequence that has to be used to test all 10000 combinations. If there is a way to construct the sequence so that every new guess is unique, the total amount of numbers in the sequence would be 10003. But using a simple brute force approach, namely entering 0000 0001 0002 etc. I guess several codes multiple times, entering a total amount of 40000 digits. So there has to logically be an optimal way of entering the sequence with a total amount of between 10003 and 40000 numbers. I have thought of the problem a little bit, but I haven't been able to draw any further conclusions. If anyone has an answer to this question, I won't use the algorithm for any bad purposes, I promise (Angel). I'm just interested from a mathematician's point of view.

/Simon
• Feb 23rd 2009, 10:39 PM
aidan
A sliding window of length four.

abcd
.bcde
..cdef
...defg
....efgh

efgh is completely independent of abcd.
The sequences:
ABCD
EFGH
IJKL
MNOP
then the next number greater than ABCD or ABCD+1

0001 2501 5001 7501 . 0002 2502 5002 7502 . 0003

Just a wild random thought.
Actually, I never gave it much thought, and it shows.

Just print all the possible choices and use a computer
program to search the shortest path thorough it.
Djikstra (<- spelling?) comes to mind.
• Feb 24th 2009, 07:25 PM
aidan
After several attempts -- NO EASY WAY.
All tries created some excess beyond the optimum 10003 digits.

The smallest sequence I could generate that contains all the
the numbers 0000 to 9999 is 10120 characters in length.

How can one find a specific 4 digit number in the 10000+ character string, without doing a sequential search? That is the wall, at the moment, that is stopping any advance.

Interesting question.
• Feb 25th 2009, 05:01 PM
sitho
I've found that if there is a hypothetical optimal 10003 digit sequence, there can only be 3 different compositions of the digits in the first three and the last three digits combined. If the first 3 digits are of the form a,b,c where $a \neq b, b \neq c, a \neq c$ then the last 3 digits should be of the form a,b,c. And if the first 3 digits are of the form a,b,a, then the last 3 digits should be either c,b,c or b,b,b (again $a \neq b, b \neq c, a \neq c$). If the composition is:

a,b,c ... a,b,c or
a,b,a ... c,b,c

the numbers a, b and c should occur 999 times in the 9997 digit middle section, and the rest occurs 1000 times (3*999 + 7*1000 = 9997). If the composition is:

a,b,a ... b,b,b

the number a should occur 999 times and b 998 times, and the rest 1000 times (998 + 999 + 8*1000 = 9997). The composition b,b,b ... a,b,a is equivalent to this, since any optimal sequence could be reversed and still be optimal.

The proof for these conclusions is a bit messy, but it's not that hard to understand. Each number has to occur exactly 4000 times in different codes in total, and that's basically the only assumption you have to make. So if there is an optimal sequence of 10003 digits, these conclusion at least exclude some of them. I suppose there are several other different ways to exclude different 10003 digit sequences, but I can't think of any right now. I suppose an argument for the occurance of an optimal 10003 sequence could be to study a smaller system and show that there is an optimal sequence. For instance if you consider 2 digits codes constructed with 3 numbers, instead of 4 digit codes from 10 numbers. For this smaller system there is an optimal code, namely: 0 1 1 2 2 1 0 0 2 0, where every different two digit combination occurs only once.
• Feb 26th 2009, 01:43 PM
awkward
Quote:

Originally Posted by sitho
Hi! I've thought of a problem concidering door codes. A door code consists typically of 4 digits, i.e. any number between 0000 and 9999. When I enter 5 digits, for instance 1 2 3 4 5, I test two codes if they match the correct code, namely 1 2 3 4 and 2 3 4 5. For every new digit I enter, I test the code for this digit combined with the three preceding digits. I've wondered what the optimal way of choosing my entered number sequence could be, namely the least total amount of numbers in the sequence that has to be used to test all 10000 combinations. If there is a way to construct the sequence so that every new guess is unique, the total amount of numbers in the sequence would be 10003. But using a simple brute force approach, namely entering 0000 0001 0002 etc. I guess several codes multiple times, entering a total amount of 40000 digits. So there has to logically be an optimal way of entering the sequence with a total amount of between 10003 and 40000 numbers. I have thought of the problem a little bit, but I haven't been able to draw any further conclusions. If anyone has an answer to this question, I won't use the algorithm for any bad purposes, I promise (Angel). I'm just interested from a mathematician's point of view.