
In this chapter we will try a type of problem known as graph coloring. Variations involve using the fewest number of colors while making each node a unique color, trying to use an equal number of each color, etc. By experimenting we can see that a shape surrounded by an even number of neighbors can generally be filled with only 3 colors.
While a shape surrounded by an odd number of neighbors requires 4 colors.
We will to try to use only 4 colors to color a map of the United States with the constraint that no adjacent states have the same color. To do this by hand, color any state then use alternating colors on its neighboring states, and only introduce a new color when necessary.
Now when we do this in code we don’t care about the visual representation, just the physical relationships between the states, which we can encode as a graph, or more simply as a set of rules indicating which states cannot have the same color.
We’ll start off with a file containing a list of states and the set of states that are adjacent to each. Each row of the file contains the information for one state. There are 2 fields on each line, separated by a comma. The first field contains a unique code for the state. The second field contains a semicolon-separated list of the codes for its neighboring states. I used 2 columns in the table below to keep the list together.
AK, | MT,ID;ND;SD;WY AL,FL;GA;MS;TN | NC,GA;SC;TN;VA AR,LA;MO;MS;OK;TN;TX | ND,MN;MT;SD AZ,CA;NM;NV;UT | NE,CO;IA;KS;MO;SD;WY CA,AZ;NV;OR | NH,MA;ME;VT CO,KS;NE;NM;OK;UT;WY | NJ,DE;NY;PA CT,MA;NY;RI | NM,AZ;CO;OK;TX DC,MD;VA | NV,AZ;CA;ID;OR;UT DE,MD;NJ;PA | NY,CT;MA;NJ;PA;VT FL,AL;GA | OH,IN;KY;MI;PA;WV GA,AL;FL;NC;SC;TN | OK,AR;CO;KS;MO;NM;TX HI, | OR,CA;ID;NV;WA IA,IL;MN;MO;NE;SD;WI | PA,DE;MD;NJ;NY;OH;WV ID,MT;NV;OR;UT;WA;WY | RI,CT;MA IL,IA;IN;KY;MO;WI | SC,GA;NC IN,IL;KY;MI;OH | SD,IA;MN;MT;ND;NE;WY KS,CO;MO;NE;OK | TN,AL;AR;GA;KY;MO;MS;NC;VA KY,IL;IN;MO;OH;TN;VA;WV | TX,AR;LA;NM;OK LA,AR;MS;TX | UT,AZ;CO;ID;NV;WY MA,CT;NH;NY;RI;VT | VA,DC;KY;MD;NC;TN;WV MD,DC;DE;PA;VA;WV | VT,MA;NH;NY ME,NH | WA,ID;OR MI,IN;OH;WI | WI,IA;IL;MI;MN MN,IA;ND;SD;WI | WV,KY;MD;OH;PA;VA MO,AR;IA;IL;KS;KY;NE;OK;TN | WY,CO;ID;MT;NE;SD;UT MS,AL;AR;LA;TN
We’ll use the csv
module to read the file and split the lines on comma. Then we’ll manually split the adjacent state list on semicolon and link the state and its list of adjacent states in a key-value table (dict
).
import csv
def load_data(localFileName):
""" expects: AA,BB;CC where BB and CC are the
initial column values in other rows
"""
with open(localFileName, mode='r') as infile:
reader = csv.reader(infile)
lookup = {row[0]: row[1].split(';') for row in reader if row}
return lookup
Now that we’ve read the data we need to build the rules. A Rule
connects two states indicating that they are adjacent. When we create the rule we always sort the state
and adjacent
codes alphabetically. This makes it possible to eliminate duplicates.
class Rule:
def __init__(self, node, adjacent):
if node < adjacent:
node, adjacent = adjacent, node
self.Node = node
self.Adjacent = adjacent
def __eq__(self, other):
return self.Node == other.Node and self.Adjacent == other.Adjacent
def __hash__(self):
return hash(self.Node) * 397 ^ hash(self.Adjacent)
We may also want to be able to display a rule so we’ll add a str
implementation.
def __str__(self):
return self.Node + " -> " + self.Adjacent
Next we will build the set of Rules
. While we’re doing so we will perform a sanity check on the data as follows: Whenever a key state says it is adjacent to another state, the adjacent state’s rules should also say the key state is adjacent to it. We do that by keeping a count of the number of times we’ve seen each state pair. We should see each twice, so if we see it only once we know we have a data problem.
def build_rules(items):
rulesAdded = {}
for state, adjacent in items.items():
for adjacentState in adjacent:
if adjacentState == '':
continue
rule = Rule(state, adjacentState)
if rule in rulesAdded:
rulesAdded[rule] += 1
else:
rulesAdded[rule] = 1
for k, v in rulesAdded.items():
if v != 2:
print("rule {} is not bidirectional".format(k))
return rulesAdded.keys()
We now have the ability to convert a file of node relationships to a set of adjacency Rules
.
Next we will build the code used by the genetic engine.
import unittest
import datetime
import genetic
...
class GraphColoringTests(unittest.TestCase):
def test(self):
We’ll start by loading the states from the file and building the rules. Since the expected optimal situation will be that all adjacent states have different colors, we can set the optimal value to the number of rules. Our Chromosome
will have 50 genes, one for each state in alphabetical order. This lets us use the index into its genes as an index into a list of sorted state codes.
def test(self):
states = load_data("adjacent_states.csv")
rules = build_rules(states)
optimalValue = len(rules)
stateIndexLookup = {key: index
for index, key in enumerate(sorted(states))}
...
Next, since we want to four-color the 50 states, our genotype can be four color codes (the first letter of each color name).
...
colors = ["Orange", "Yellow", "Green", "Blue"]
colorLookup = {color[0]: color for color in colors}
geneset = list(colorLookup.keys())
...
Now add the usual function definitions and call get_best
...
startTime = datetime.datetime.now()
def fnDisplay(candidate):
display(candidate, startTime)
def fnGetFitness(genes):
return get_fitness(genes, rules, stateIndexLookup)
best = genetic.get_best(fnGetFitness, len(states), optimalValue,
geneset, fnDisplay)
self.assertTrue(not optimalValue > best.Fitness)
...
And at the end we can write out the color of each state.
...
keys = sorted(states.keys())
for index in range(len(states)):
print(keys[index] + " is " + colorLookup[best.Genes[index]])
As for display
, it should be sufficient to output the genes because they are also color codes.
def display(candidate, startTime):
timeDiff = datetime.datetime.now() - startTime
print("{}\t{}\t{}".format(
''.join(map(str, candidate.Genes)),
candidate.Fitness,
timeDiff))
This results in output like the following. The number to the right of the gene sequence will indicate how many rules this gene sequence satisfies.
YGGBOOGOOBBYGGYYYYGBGYOOGBOYGGOOOYBOYBBGGOBYOGOGOGG 74 0:00:00.001000
Finally, we need a fitness function that counts how many constraints the gene sequence passed.
def get_fitness(genes, rules, stateIndexLookup):
rulesThatPass = sum(1 for rule in rules
if rule.IsValid(genes, stateIndexLookup))
return rulesThatPass
This uses a new function in Rule
:
class Rule:
...
def IsValid(self, genes, nodeIndexLookup):
index = nodeIndexLookup[self.Node]
adjacentStateIndex = nodeIndexLookup[self.Adjacent]
return genes[index] != genes[adjacentStateIndex]
Now when we run our main test
function we get output like the following:
...
GOGOYYGOBGYOYYOOOGOBYOGOYYGOBBYOGGYYBBGOBYBYBGOOBOO 103 0:00:00.014000
YOGOYYGOBGYYGYBOOGOBYGGOYYGOBBYOGGYYBBGOBYBYBYOOBOO 105 0:00:00.017001
YOGOYYGOBGYYGYBOOGOBYGGOYYGOBBYOGGYYBBGOBYBYBYOOYBO 106 0:00:00.017001
GOGOYYOGYBYGGYBOOGOBBBBOYYGGBBYOGGYYBBGYOYBYBYOOYOO 107 0:00:00.040002
AK is Green
AL is Orange
AR is Green
AZ is Orange
CA is Yellow
CO is Yellow
CT is Orange
DC is Green
DE is Yellow
FL is Blue
GA is Yellow
HI is Green
IA is Green
ID is Yellow
IL is Blue
IN is Orange
KS is Orange
KY is Green
LA is Orange
MA is Blue
MD is Blue
ME is Blue
MI is Blue
MN is Orange
MO is Yellow
MS is Yellow
MT is Green
NC is Green
ND is Blue
NE is Blue
NH is Yellow
NJ is Orange
NM is Green
NV is Green
NY is Yellow
OH is Yellow
OK is Blue
OR is Blue
PA is Green
RI is Yellow
SC is Orange
SD is Yellow
TN is Blue
TX is Yellow
UT is Blue
VA is Yellow
VT is Orange
WA is Orange
WI is Yellow
WV is Orange
WY is Orange
To benchmark this project we’re going to use R100_1gb, one of a set of graph coloring problem sets available from https://turing.cs.hbg.psu.edu/txn131/graphcoloring.html The file format is line based, with each line prefixed with its type. The lines we’re interested in start with e
, for edge, followed by the IDs of the two nodes connected by that edge, as follows:
e 16 10
If we change the format of our adjacent state file to the same format, then we’ll be able to reuse a lot of code. We’ll start by adding a utility test-function that reads our current file and outputs the contents in the new format.
@staticmethod
def test_convert_file():
states = load_data("adjacent_states.csv")
output = []
nodeCount = edgeCount = 0
for state, adjacents in states.items():
nodeCount += 1
for adjacent in adjacents:
if adjacent == '':
output.append("n {} 0".format(state))
else:
output.append("e {} {}".format(state, adjacent))
edgeCount += 1
with open('./adjacent_states.col', mode='w+') as outfile:
print("p edge {} {}".format(nodeCount, edgeCount), file=outfile)
for line in sorted(output):
print(line, file=outfile)
When we run this test we end up with a file whose content looks like the following:
p edge 51 214
e AL FL
e AL GA
...
e WY UT
n AK 0
n HI 0
Alaska and Hawaii have no edges, so the only way we know about them is from their (n)ode records.
The utility function and import csv
can be removed after the file is converted.
Next we need to update load_data
to read the new file format.
def load_data(localFileName):
""" expects: T D1 [D2 ... DN]
where T is the record type
and D1 .. DN are record-type appropriate data elements
"""
rules = set()
nodes = set()
with open(localFileName, mode='r') as infile:
content = infile.read().splitlines()
for row in content:
if row[0] == 'e': # e aa bb, aa and bb are node ids
nodeIds = row.split(' ')[1:3]
rules.add(Rule(nodeIds[0], nodeIds[1]))
nodes.add(nodeIds[0])
nodes.add(nodeIds[1])
continue
if row[0] == 'n': # n aa ww, aa is a node id, ww is a weight
nodeIds = row.split(' ')
nodes.add(nodeIds[1])
return rules, nodes
We’ll also take this opportunity to rename test
to color
and make the file name and color list into parameters. Plus we’ll also add a test_states
function that uses the color
function.
def test_states(self):
self.color("adjacent_states.col",
["Orange", "Yellow", "Green", "Blue"])
def color(self, file, colors):
rules, nodes = load_data(file)
...
And remove the following line from the color
function since the list of colors is now a parameter.
colors = ["Orange", "Yellow", "Green", "Blue"]
The next change to the color
function affects how we get the node indexes.
rules, nodes = load_data(file)
optimalValue = len(rules)
colorLookup = {color[0]: color for color in colors}
geneset = list(colorLookup.keys())
startTime = datetime.datetime.now()
nodeIndexLookup = {key: index
for index, key in enumerate(sorted(nodes))} ①
def fnDisplay(candidate):
display(candidate, startTime)
def fnGetFitness(genes):
return get_fitness(genes, rules, nodeIndexLookup) ②
Finally, we have to update the way we write out the results at the end of the test
function:
best = genetic.get_best(fnGetFitness, len(nodes), optimalValue,
geneset, fnDisplay)
self.assertTrue(not optimalValue > best.Fitness)
keys = sorted(nodes) ③
for index in range(len(nodes)): ④
print(keys[index] + " is " + colorLookup[best.Genes[index]])
Be sure to run the state test to verify that it still works.
Next we can add a test for R100_1gb as follows:
def test_R100_1gb(self):
self.color("R100_1gb.col",
["Red", "Orange", "Yellow", "Green", "Blue", "Indigo"])
When we run the test it is able to find a way to color the graph using 6 colors relatively quickly. It can also find a solution using 5 colors but that takes longer than we want to spend on a benchmark, so we’ll stay with 6 colors. We’ll use this test as our benchmark for graph coloring.
def test_benchmark(self):
genetic.Benchmark.run(lambda: self.test_R100_1gb())
average (seconds) | standard deviation ---------------------------------------- 0.74 | 0.42
In this chapter we read from a file and built constraints that can be applied to each candidate to determine its fitness. This was also the first project where we used test data from a standard set. Reading about others' solutions to the same standard problem is a good way to learn advanced techniques and to get inspiration for improving your fitness function and genotype.
The state files and final code for this chapter are available from:
https://github.com/handcraftsman/GeneticAlgorithmsWithPython/tree/master/ch05