1. ## Combinatorics and DNA

A DNA code is a sequence of the symbols A,G,C,T (called bases).

a)
How many possible such sequences are there of length 20 containing 5 of each base?

b) How many such sequences (containing 5 of each base) are there if we can't distinguish a sequence from its mirror image?

c) What is the number of DNA sequences of length 10 which don't contain two consecutive bases which are the same (this means: no AA,GG,CC,TT in the sequence)?

I've been trying these 3 problems for a while and any suggestions on how to go about doing any of them would be greatly appreciated.

2. Hello, FatherMike!

A DNA code is a sequence of the symbols $A,G,C,T$ (called bases).

(a) How many possible such sequences are there of length 20 containing 5 of each base?

There are: . $\frac{20!}{5!\,5!\,5!\,5!} \;=\;11,732,745,024$ possible sequences.

b) How many such sequences (containing 5 of each base) are there
if we can't distinguish a sequence from its mirror image?

The answer is half the answer of part (a): . $5,866,372,512$

Note: I was worried about palindromes, which have no mirror image.

. . . . Then I realized that, with an odd number of each base, there are no palindromes.

c) What is the number of DNA sequences of length 10 which do not contain
two consecutive bases which are the same? . (No AA,GG,CC,TT in the sequence)

The first letter can be any of the 4 bases: 4 choices.

The second must not be the first letter: 3 choices.

The third must not be the second letter: 3 choices.

The fourth must not be the third letter: 3 choices.

. . and so on . . .

Answer: . $4 \times 3^9 \;=\;78,732$

3. Hey I was interested in the solution to this question. However I do not really see why there can't be palindromes when you have an odd number of bases. Can you give me an example?

Thanks ^_^

4. Assume you have only A,G base and want find how many possible such sequences are there of length 4 containing 2 of each base.
you have, AAGG, AGAG, AGGA, GAAG, GAGA, GGAA
$\frac{4!}{2!2!} = 6$ possibilities.

Since there are even number or combinations you have AGGA, GAAG which are palindromes. So, total number of sequences as per your question b in this case will be 4 (not half all possibilities).

Lets see the odd case,
Assume you have only A,G base and want find how many possible such sequences are there of length 6 containing 3 of each base.
you have, AAAGGG, AAGAGG, AAGGAG, AAGGGA, AGAAGG, AGAGAG, AGAGGA, AGGAAG, AGGAGA, AGGGAA, GAAAGG, GAAGAG, GAAGGA, GAGAAG, GAGAGA, GAGGAA, GGAAAG, GGAAGA, GGAGAA, GGGAAA.
$\frac{6!}{3!3!} = 20$ possibilities.

There are no palindromes as there is always one extra base, either A or G, in the first or second half of the sequence. So, total number of sequences as per your question b, in this case, will be 10 (half of the possibilities).

5. Ah thank you, but let's say there are palindromes, is it possible to calculate how many there are? Or do you need to write them all out?

6. You can calculate without writing them all. I think these examples will help.

Consider the even case where the length of the sequence is even, lets take a sequence of length 4 and 2 of each of A,G
Total possibilities = $\frac{4!}{2!2!} = 6$

To calculate palindromes you only need those bases which contribute to the palindrome, in this case one of each of A,G. Cause, the first half must have A,G only then the second half will have bases to create a palindrome. So, ever possible permutation of A,G can generate a palindrome. So, you have to find all possible values for this shorter sequence of length 2,

$\frac{2!}{1!1!} = 2$ (note: AGGA, GAAG are the palindromes)

Finally, you have $\frac{1}{2}(\frac{4!}{2!2!} + \frac{2!}{1!1!}) = 4$

Here is the sequence as per your question b, for length 4, using 2 of each of A,G.
AAGG, AGAG, AGGA, GAAG

Now, for the case of odd length sequences. The method is same. Consider a sequence with length 5, using 2 of each of bases A,G and one of T.
Total possibilities = $\frac{5!}{2!2!1!} = 30$

here, the contributors to the palindromes are one of each of A,G. T is neutral as all the possible palindromes will have T in the middle. So, you have to find all possible values for this shorter sequence of length 2,
$\frac{2!}{1!1!} = 2$ (note: AGTGA, GATAG are the palindromes)

Finally, you have $\frac{1}{2}(\frac{5!}{2!2!1!} + \frac{2!}{1!1!}) = 16$

Here is the sequence as per your question b, for length 5, using 2 of each of A,G and one of T.
AAGGT, AAGTG, AATGG, AGAGT, AGATG, AGGAT, AGGTA, AGTGA, AGTAG, ATGAG, ATAGG, GAAGT, GAATG, GAGAT, GATAG, GGAAT

In summary, add the possible number of possible palindromes to the possible number of sequences and divide the sum by 2.

Just try for this AAATT, AAAAGGT, AAAAGGGT, you'll get the idea. be sure to check if a palindrome can be formed.