The issue of how best to implement Markov Chains piqued my interest, so here's a little script I crashed out off the top of my head. I spent about 5 minutes or so writing it, so don't expect the cleanest code, but hopefully it illustrates the point (I didn't use nucleotide sequences, I just invented a random sequence of X, Y and Z):
import random
# Define the chain.
chain = {
"XX": {"X": 8, "Y": 1, "Z": 1},
"XY": {"X": 1, "Y": 2, "Z": 5},
"XZ": {"X": 1, "Y": 3, "Z": 2},
"YX": {"X": 2, "Y": 4, "Z": 5},
"YY": {"X": 4, "Y": 1, "Z": 4},
"YZ": {"X": 3, "Y": 2, "Z": 1},
"ZX": {"X": 2, "Y": 2, "Z": 4},
"ZY": {"X": 1, "Y": 4, "Z": 2},
"ZZ": {"X": 4, "Y": 4, "Z": 1}
}
def chain_gen(chain, start, length):
prev = start[-2:]
for i in start:
yield i
for i in xrange(length):
try:
choice = random.randint(1, sum(chain[prev].values()))
cumulative = 0
for item, chance in sorted(chain[prev].items()):
cumulative += chance
if choice <= cumulative:
yield item
prev = prev[-1] + item
break
except KeyError:
break
print "".join(chain_gen(chain, "ZZ", 40))
The chain is stored as a dictionary mapping the previous two characters to another dictionary mapping the next character to the relative chance of that character being chosen. These are relative to the other choices in the dictionary, so if the chances are 1, 3 and 2 then the corresponding probabilities will be 1 in 6, 3 in 6 and 2 in 6.