
In this chapter we will solve the Card Problem. We start off with the set of cards, Ace and 2-10, which we separate into 2 groups of 5 cards. The cards in each group have different constraints. The card values in one group must have a product of 360. The card values in the other group must sum to 36. We can only use each card once. First try it by hand with a deck of cards.
For this project our genotype and phenotype can be the same, integers. Thus we can avoid an encoding/decoding step.
import unittest
import datetime
import genetic
class CardTests(unittest.TestCase):
def test(self):
geneset = [i + 1 for i in range(10)]
This keeps fitness calculation relatively easy because we can sum one range of genes and multiply the rest. In this project we have 3 values to include in the fitness. One is the sum of the numbers in the first group. Another is the product of the numbers in the second group. The last is a count of the duplicated numbers in the list; we don’t want any duplicates.
import operator
import functools
...
def get_fitness(genes):
group1Sum = sum(genes[0:5])
group2Product = functools.reduce(operator.mul, genes[5:10])
duplicateCount = (len(genes) - len(set(genes)))
return Fitness(group1Sum, group2Product, duplicateCount)
Once again we’ll use a Fitness
class.
class Fitness:
def __init__(self, group1Sum, group2Product, duplicateCount):
self.Group1Sum = group1Sum
self.Group2Product = group2Product
sumDifference = abs(36 - group1Sum)
productDifference = abs(360 - group2Product)
self.TotalDifference = sumDifference + productDifference
self.DuplicateCount = duplicateCount
When comparing two fitnesses, we’ll prefer gene sequences with fewer duplicates, and when those are the same, we’ll prefer the one whose sum and product values are closer to the optimal values.
class Fitness:
...
def __gt__(self, other):
if self.DuplicateCount != other.DuplicateCount:
return self.DuplicateCount < other.DuplicateCount
return self.TotalDifference < other.TotalDifference
In display
we’ll separate the two groups visually with a dash.
def display(candidate, startTime):
timeDiff = datetime.datetime.now() - startTime
print("{} - {}\t{}\t{}".format(
', '.join(map(str, candidate.Genes[0:5])),
', '.join(map(str, candidate.Genes[5:10])),
candidate.Fitness,
timeDiff))
We need to add a __str__
function to Fitness
:
class Fitness:
...
def __str__(self):
return "sum: {} prod: {} dups: {}".format(
self.Group1Sum,
self.Group2Product,
self.DuplicateCount)
Here’s the full test harness:
def test(self):
geneset = [i + 1 for i in range(10)]
startTime = datetime.datetime.now()
def fnDisplay(candidate):
display(candidate, startTime)
def fnGetFitness(genes):
return get_fitness(genes)
optimalFitness = Fitness(36, 360, 0)
best = genetic.get_best(fnGetFitness, 10, optimalFitness, geneset,
fnDisplay)
self.assertTrue(not optimalFitness > best.Fitness)
Now we’re ready to try it.
2, 3, 1, 6, 4 - 8, 5, 9, 7, 10 sum: 16, prod: 25200, dups: 0 0:00:00
We can see that mutate
quickly eliminates the duplicate values, but the algorithm almost always gets stuck immediately thereafter. The reason is, in order to make progress it has to be able to change 2 numbers. But, it can only change 1 at a time and the priority of eliminating duplicates keeps it from moving forward.
We need to find a way to let the engine make two changes at once, ideally without introducing duplication in the process. This means changing the way mutation works. The most flexible way to do that is to introduce an optional parameter called custom_mutate
into get_best
that allows us to perform the mutation ourselves. Then we replace the current definition of fnMutate
with a different one depending on whether or not custom_mutate
is provided.
def get_best(get_fitness, targetLen, optimalFitness, geneSet, display,
custom_mutate=None):
if custom_mutate is None:
def fnMutate(parent):
return _mutate(parent, geneSet, get_fitness)
else:
def fnMutate(parent):
return _mutate_custom(parent, custom_mutate, get_fitness)
We’ll use a variant of _mutate
instead of the built-in implementation when custom_mutate
is used.
def _mutate_custom(parent, custom_mutate, get_fitness):
childGenes = parent.Genes[:]
custom_mutate(childGenes)
fitness = get_fitness(childGenes)
return Chromosome(childGenes, fitness)
Now back in our genetic algorithm we can add a mutate
function that changes 1 random gene if there are duplicates, but otherwise swaps 2 genes.
import random
...
def mutate(genes, geneset):
if len(genes) == len(set(genes)):
indexA, indexB = random.sample(range(len(genes)), 2)
genes[indexA], genes[indexB] = genes[indexB], genes[indexA]
else:
indexA = random.randrange(0, len(genes))
indexB = random.randrange(0, len(geneset))
genes[indexA] = geneset[indexB]
Then we pass that function to get_best
in the test harness.
def test(self):
...
def fnMutate(genes): ①
mutate(genes, geneset)
optimalFitness = Fitness(36, 360, 0)
best = genetic.get_best(fnGetFitness, 10, optimalFitness, geneset,
fnDisplay, custom_mutate=fnMutate) ②
self.assertTrue(not optimalFitness > best.Fitness)
Now when we run the test, it can find the solution about 20 percent of the time. The rest of the time it still gets stuck.
8, 10, 7, 6, 4 - 9, 3, 2, 1, 5 sum: 35, prod: 270, dups: 0 0:00:00
10, 5, 6, 4, 8 - 3, 9, 2, 1, 7 sum: 33, prod: 378, dups: 0 0:00:00.001033
When we compare the last line in these results with the optimal solution, we can see that the test needs to be able to swap 2 elements on the left side with 2 on the right.
We can solve that by looping the swap portion of mutate
a random number of times.
def mutate(genes, geneset):
if len(genes) == len(set(genes)):
count = random.randint(1, 4)
while count > 0:
count -= 1
indexA, indexB = random.sample(range(len(genes)), 2)
genes[indexA], genes[indexB] = genes[indexB], genes[indexA]
else:
indexA = random.randrange(0, len(genes))
indexB = random.randrange(0, len(geneset))
genes[indexA] = geneset[indexB]
We’ll see this pattern again in other chapters.
Now, when we run the test it finds the optimal solution every time.
7, 4, 6, 10, 9 - 8, 3, 5, 1, 2 sum: 36, prod: 240, dups: 0 0:00:00
7, 4, 9, 8, 6 - 1, 3, 5, 2, 10 sum: 34, prod: 300, dups: 0 0:00:00
9, 8, 3, 7, 6 - 5, 4, 1, 2, 10 sum: 33, prod: 400, dups: 0 0:00:00.002000
9, 10, 3, 7, 6 - 5, 4, 1, 2, 8 sum: 35, prod: 320, dups: 0 0:00:00.002000
9, 7, 10, 5, 3 - 6, 1, 2, 8, 4 sum: 34, prod: 384, dups: 0 0:00:00.003000
5, 8, 10, 3, 9 - 4, 1, 7, 2, 6 sum: 35, prod: 336, dups: 0 0:00:00.003000
3, 6, 10, 7, 8 - 9, 2, 5, 1, 4 sum: 34, prod: 360, dups: 0 0:00:00.015001
7, 9, 2, 8, 10 - 6, 3, 1, 5, 4 sum: 36, prod: 360, dups: 0 0:00:00.023001
In all previous projects the only ability we’ve had to guide the algorithm towards a solution has been the fitness. We’ve seen that the fitness alone can usually be used to work around structural constraints in the genes. In this project we started using mutation to work around structural constraints as well, but custom mutation is much more powerful than that. It can also take advantage of problem-specific knowledge to narrow the search scope toward better solutions, away from unworkable ones, or both. When we start using mutation that way we’re no longer building a simple genetic algorithm, we’re building a memetic algorithm.
Memetic algorithms are capable of solving a wider range of problems than random-population-based genetic algorithms because they accelerate the search. Take some time and try to use custom mutation to improve the speed of previous project solutions.
Your mutate function for the 8 Queens Puzzle can now take advantage of problem-specific knowledge such as: no 2 queens will be in the same row or column. So, like this project’s mutate function, if there is a duplicate row index, change a random row index, otherwise swap 2 row indexes. The same can be done for the column indexes.
In the Password and One Max projects you might try changing a random number of genes. Another option is to create an empty dictionary of index-to-known in the test harness and pass it to your mutate function. Then each time you make a random change, check the fitness and compare it with the initial fitness from when the mutate function was called. If the random change makes the fitness worse, then you’ll know that the previous value at that index was the correct one. Add it to the dictionary and only change that location if it doesn’t match what is in the dictionary. There’s a lot you could do with this concept, so experiment.
The speed of the Sorted Numbers project could be improved in several ways. For example, we could increment all the values in the final run of the array by 1 if possible, or decrement all the values in the first run by 1 if possible, or change 1 value on both sides of a discontinuity to bring the lines together faster. You likely will think of more.
Also, in Graph Coloring you could take advantage of the adjacency rules to choose a compatible color for a particular gene index.
def test_benchmark(self):
genetic.Benchmark.run(lambda: self.test())
Because we changed the genetic
module we’ll update all of the benchmarks.
Updated Benchmarks project | average (seconds) | standard deviation ----------------------------------------------------------- Guess Password | 1.23 | 0.28 One Max | 1.24 | 0.17 Sorted Numbers | 1.15 | 0.66 Queens | 1.45 | 1.07 Graph Coloring | 0.79 | 0.41 Cards | 0.01 | 0.01
In this chapter we added the very useful ability to take over mutation from the engine. We also learned about memetic algorithms and how the custom mutation function can be used to take advantage of problem-specific knowledge. Finally, we used the sum-of-difference technique in the fitness calculation. This is a technique commonly used in solving numerical problems.
The final code for this chapter is available from:
https://github.com/handcraftsman/GeneticAlgorithmsWithPython/tree/master/ch06