
So far we’ve only used genes as data elements that are applied to an external problem. Our next project, equation generation, will introduce us to a new way of using genes called genetic programming. The essence of genetic programming is to overlay some kind of grammar on the genes, which may or may not include embedded data. These operation-genes can then be evaluated to produce a result.
When using genetic programming it is important to understand the characteristics of both the embedded data, if any, and how we expect the operations to interact with each other and/or the environment. For example, let’s say we are asked to find some combination of the numbers 1 to 7 and addition and subtraction operations that, when evaluated, produces the numeric value 29. There are many ways to solve this problem. For example: 7+7+7+7+7-6
We could simply use positive and negative integers between -7 and 7 because they essentially have the addition and subtraction operations built in, but we need to learn how to evaluate operations as independent genes, so we’ll make '+' and '-' separate tokens. That means the equation has 11 tokens or genes:
7 + 7 + 7 + 7 + 7 - 6
| | | | | | | | | | |
1 2 3 4 5 6 7 8 9 10 11
When viewed this way we can easily see the pattern, or grammar, of alternating operations and numbers. This should be easy to enforce using an odd/even check when creating and mutating chromosomes.
Next, if we visualize of the equation from the point of view of the operations:
It is easy to see that each operation has 2 parameters because each acts upon 2 other genes which can be either a number or another operation.
The heart of a genetic programming algorithm is the evaluate
function. It knows how to treat the problem-specific gene sequence like a program. The evaluate
function for this project will apply the operation genes to the neighboring numbers, rolling up to get the final result.
def evaluate(genes):
result = genes[0]
for i in range(1, len(genes), 2):
operation = genes[i]
nextValue = genes[i + 1]
if operation == '+':
result += nextValue
elif operation == '-':
result -= nextValue
return result
The function starts by initializing result
with the value of the first numeric gene. It then applies the operations, left to right, to the current result
and the value-gene that follows the operation. The output of each operation is stored back into result
.
We’re ready to write the rest of the genetic algorithm according to the usual pattern, starting with the gene set.
import unittest
import datetime
import genetic
import random
class EquationGenerationTests(unittest.TestCase):
def test(self):
numbers = [1, 2, 3, 4, 5, 6, 7]
operations = ['+', '-']
Now we can immediately see that we’ll need a custom create
function, for two reasons. First, we don’t know how many symbols we’ll need to produce a particular result, and second, we need to alternate numbers and operations.
def create(numbers, operations, minNumbers, maxNumbers):
genes = [random.choice(numbers)]
count = random.randint(minNumbers, 1 + maxNumbers)
while count > 1:
count -= 1
genes.append(random.choice(operations))
genes.append(random.choice(numbers))
return genes
This implementation prevents the creation of gene sequences that have multiple numbers or operations in a row, so we don’t have to detect that situation in the fitness function.
We also need a custom mutate
function that knows how to append an operation-number pair to the gene sequence,
def mutate(genes, numbers, operations, minNumbers, maxNumbers):
numberCount = (1 + len(genes)) / 2
adding = numberCount < maxNumbers and random.randint(0, 100) == 0
if adding:
genes.append(random.choice(operations))
genes.append(random.choice(numbers))
return
remove an operation-number pair,
removing = numberCount > minNumbers and random.randint(0, 20) == 0
if removing:
index = random.randrange(0, len(genes) - 1)
del genes[index]
del genes[index]
return
and mutate an operation or number.
index = random.randrange(0, len(genes))
genes[index] = random.choice(operations) \
if (index & 1) == 1 else random.choice(numbers)
Here we use the parity of the index to determine whether we’re replacing a number or an operation.
In display
we can simply write the genes separated by a space to see the equation.
def display(candidate, startTime):
timeDiff = datetime.datetime.now() - startTime
print("{}\t{}\t{}".format(
(' '.join(map(str, [i for i in candidate.Genes]))),
candidate.Fitness,
timeDiff))
6 + 3 + 3 - 4 + 6 - 3 - 6 - 6 - 3 - 1 -5 0:00:00.001003
That just leaves the fitness function where we’ll use the evaluate
function to get the result of the equation, then compare it to the expected result. This time, instead of using a Fitness
class we’ll use a 2-stage fitness function. The first stage tracks how far the result of evaluating the genes is from the expected total. If the result is correct then the second fitness range is used. This allows us to prefer shorter gene sequences, meaning equations with fewer operations.
def get_fitness(genes, expectedTotal):
result = evaluate(genes)
if result != expectedTotal:
fitness = expectedTotal - abs(result - expectedTotal)
else:
fitness = 1000 - len(genes)
return fitness
The following full test harness should hold no surprises.
def test(self):
numbers = [1, 2, 3, 4, 5, 6, 7]
operations = ['+', '-']
expectedTotal = 29
optimalLengthSolution = [7, '+', 7, '+', 7, '+', 7, '+', 7, '-', 6]
minNumbers = (1 + len(optimalLengthSolution)) / 2
maxNumbers = 6 * minNumbers
startTime = datetime.datetime.now()
def fnDisplay(candidate):
display(candidate, startTime)
def fnGetFitness(genes):
return get_fitness(genes, expectedTotal)
def fnCreate():
return create(numbers, operations, minNumbers, maxNumbers)
def fnMutate(child):
mutate(child, numbers, operations, minNumbers, maxNumbers)
optimalFitness = fnGetFitness(optimalLengthSolution)
best = genetic.get_best(fnGetFitness, None, optimalFitness, None,
fnDisplay, fnMutate, fnCreate, maxAge=50)
self.assertTrue(not optimalFitness > best.Fitness)
When we run the test it finds an equivalent solution.
5 + 3 + 2 + 4 + 6 + 2 + 1 23 0:00:00
5 + 3 + 2 + 4 + 6 + 6 + 1 27 0:00:00
5 + 3 + 3 + 4 + 6 + 6 + 1 28 0:00:00
6 + 3 + 3 + 4 + 6 + 6 + 1 987 0:00:00
6 + 5 + 3 + 3 + 6 + 6 989 0:00:00.000998
Nice. But having to add 6 three times and 3 two times is boring. If we introduce a multiplication operation it might find 7 * 3 + 4 * 2, which saves an operation, or even 7 * 4 + 1, which saves two. Let’s try it.
First we have to add the multiplication token (*) to the list of operation genes.
def test(self):
...
operations = ['+', '-', '*']
Next we have to implement multiplication in evaluate
. The problem is that for the math to be correct we have to perform all multiplication operations before we perform any addition or subtraction. No problem, we can just group the operations by priority.
def test(self):
...
operations = ['+', '-', '*']
prioritizedOperations = [['*'], ['+', '-']]
...
def fnEvaluate(genes):
return evaluate(genes, prioritizedOperations)
Then we’ll evaluate all operations in priority order rather than simultaneously, which means we’ll iterate over the array once for each priority group.
def evaluate(genes, prioritizedOperations):
equation = genes[:]
for operationSet in prioritizedOperations:
iOffset = 0
for i in range(1, len(equation), 2):
i += iOffset
opToken = equation[i]
if opToken in operationSet:
leftOperand = equation[i - 1]
rightOperand = equation[i + 1]
if opToken == '+':
leftOperand += rightOperand
elif opToken == '-':
leftOperand -= rightOperand
elif opToken == '*':
leftOperand *= rightOperand
equation[i - 1] = leftOperand
del equation[i + 1]
del equation[i]
iOffset -= 2
return equation[0]
In the implementation above we make a copy of the genes so we can modify it each time through the loop. Each operation stores its output in the array location to its left then deletes itself and its second operand from the array. After all operations have been evaluated the result is in array index zero.
Note, since we changed the function to pass in the list of operations, we also need to change what we pass to get_fitness
def get_fitness(genes, expectedTotal, fnEvaluate):
result = fnEvaluate(genes)
...
and update the function call in the test harness.
def test(self):
...
def fnGetFitness(genes):
return get_fitness(genes, expectedTotal, fnEvaluate)
This code finds a solution, but not a minimal length solution, because the defined optimalLengthSolution
in the test is based on addition and subtraction so the engine stops as soon as it beats that solution.
...
4 - 3 * 3 - 7 * 2 + 7 + 6 + 5 * 7 983 0:00:00.103305
5 + 2 - 7 * 2 + 6 + 5 + 5 * 5 985 0:00:00.115337
5 - 4 - 3 + 6 + 5 + 4 * 5 987 0:00:00.124361
5 - 4 - 3 + 6 + 5 * 5 989 0:00:00.128372
solve
To fix that we need to extract a solve
function so we can pass in the list of operations with the optimal solution for that set of operations.
def test_addition(self):
operations = ['+', '-']
prioritizedOperations = [['+', '-']]
optimalLengthSolution = [7, '+', 7, '+', 7, '+', 7, '+', 7, '-', 6]
self.solve(operations, prioritizedOperations, optimalLengthSolution)
def solve(self, operations, prioritizedOperations,
optimalLengthSolution):
numbers = [1, 2, 3, 4, 5, 6, 7]
expectedTotal = evaluate(optimalLengthSolution,
prioritizedOperations)
Now we can add a test for multiplication, and give it a more difficult equation to generate.
def test_multiplication(self):
operations = ['+', '-', '*']
prioritizedOperations = [['*'], ['+', '-']]
optimalLengthSolution = [6, '*', 3, '*', 3, '*', 6, '-', 7]
self.solve(operations, prioritizedOperations, optimalLengthSolution)
When we run that test it too can find an optimal length solution:
....
7 * 5 * 6 - 3 + 7 * 2 * 7 + 4 * 3 983 0:00:00.109255
7 * 7 * 7 - 4 * 4 - 6 - 5 + 1 985 0:00:00.141350
7 * 7 * 7 - 7 * 2 - 3 * 4 987 0:00:00.150363
5 * 7 * 7 + 6 * 2 * 6 989 0:00:00.158386
5 * 4 * 4 * 4 - 3 991 0:00:02.141662
Cool. When we review that solution, however, it too seems repetitive. It would be much nicer if we could reduce the equation to 5 * 43 - 3. Let’s do that.
But first, have you noticed that each time we add a new operation we have to change evaluate
? That’s an indication of a poor design that violates
the Open-Closed Principal
We can fix that by extracting the operation implementations to separate functions:
def add(a, b):
return a + b
def subtract(a, b):
return a - b
def multiply(a, b):
return a * b
Then make the prioritized operation list contain dictionaries where each dictionary key is the token for an operation and the value is a function that implements the operation.
def test_addition(self):
operations = ['+', '-']
prioritizedOperations = [{'+': add,
'-': subtract}]
...
def test_multiplication(self):
operations = ['+', '-', '*']
prioritizedOperations = [{'*': multiply},
{'+': add,
'-': subtract}]
...
Finally, use the operation function in evaluate
.
def evaluate(genes, prioritizedOperations):
equation = genes[:]
for operationSet in prioritizedOperations:
iOffset = 0
for i in range(1, len(equation), 2):
i += iOffset
opToken = equation[i]
if opToken in operationSet:
leftOperand = equation[i - 1]
rightOperand = equation[i + 1]
equation[i - 1] = operationSet[opToken](leftOperand,
rightOperand)
del equation[i + 1]
del equation[i]
iOffset += -2
return equation[0]
Now we can easily add support for a new operation, like exponentiation, without changing any other code:
def test_exponent(self):
operations = ['^', '+', '-', '*']
prioritizedOperations = [{'^': lambda a, b: a ** b},
{'*': multiply},
{'+': add,
'-': subtract}]
optimalLengthSolution = [6, '^', 3, '*', 2, '-', 5]
self.solve(operations, prioritizedOperations, optimalLengthSolution)
...
4 * 3 ^ 4 - 3 - 1 - 7 + 4 * 2 * 7 * 2 + 2 979 0:00:00.030079
4 * 3 ^ 4 - 2 - 7 + 4 * 2 * 7 * 2 983 0:00:00.035093
6 * 7 * 5 * 2 + 5 + 3 + 3 - 4 985 0:00:00.070215
6 * 6 * 6 * 2 + 1 - 6 989 0:00:00.083221
7 * 5 * 6 * 2 + 7 991 0:00:00.108286
6 ^ 3 * 2 - 5 993 0:00:13.129462
Since it is starting to take tens of seconds to find the solution, let’s add a loop to mutate
so that it can make a random number of changes instead of just one.
def mutate(genes, numbers, operations, minNumbers, maxNumbers, fnGetFitness):
count = random.randint(1, 10)
initialFitness = fnGetFitness(genes)
while count > 0:
count -= 1
if fnGetFitness(genes) > initialFitness:
return ①
numberCount = (1 + len(genes)) / 2
...
...
genes.append(random.choice(numbers))
continue ②
removing = numberCount > minNumbers and random.randint(0, 20) == 0
...
...
del genes[index]
continue ③
index = random.randrange(0, len(genes))
...
Now that we’re checking the fitness in mutate
, we also need to pass in fnGetFitness
:
def solve(self, operations, prioritizedOperations,
optimalLengthSolution):
...
def fnMutate(child):
mutate(child, numbers, operations, minNumbers, maxNumbers,
fnGetFitness)
...
As expected, it finds the solution much faster.
2 + 3 ^ 4 * 5 + 3 * 6 425 0:00:00.014006
3 + 3 ^ 4 * 5 + 3 * 6 426 0:00:00.016041
4 + 3 ^ 4 * 5 + 3 * 6 989 0:00:00.024067
6 * 2 * 5 * 7 + 7 991 0:00:00.067148
2 * 6 ^ 3 - 5 993 0:00:00.567508
Fantastic!
We’ll benchmark this project with the exponent test.
def test_benchmark(self):
genetic.Benchmark.run(self.test_exponent)
Benchmark average (seconds) | standard deviation ---------------------------------------- 0.49 | 0.36
This chapter was our first introduction to genetic programming and it opened up several really interesting problems for exploration. We also learned one way to work with both data and operations in a gene sequence. This gave us experience with symbols and grammar and "program" evaluation.
The final code for this chapter is available from:
https://github.com/handcraftsman/GeneticAlgorithmsWithPython/tree/master/ch14