Breadth-First Searching

What’s next? We can call one_step "lead" to get a list of next steps, and call one_step on any of the new words and so on. But we’re after the shortest path between two words, so we need a controlled way of exploring how steps are related in order to find such a path. The obvious thing is to try all one-step paths, then all two-step paths, then three, etc.

This kind of process is called a “breadth-first search” (BFS). I’ll show you a version of BFS that is phrased in terms of exploring a “state space.” This framework also makes it easier to experiment with other kinds of searching, like best-first search. The parent-to-list-of-children relationship suggests a tree, so let’s program it this way. (More generally, it could be a graph too, but let’s keep it simple for now.)

First, we need a type. Haskell’s standard library contains a module Data.Tree[14] that provides a suitable type and some useful related functionality.

 -- contained in the Data.Tree module
 data Tree a
  = Node { rootLabel :: a, -- label value
  subForest :: Forest a -- zero or more child trees
  }
 type Forest a = [Tree a]

This is a tree with only one kind of node, and the node holds a value or label plus a list of zero or more child trees. The line type Forest a = [Tree a] defines a type synonym, providing a possibly more meaningful name for some type expression; i.e., a forest is a list of trees. The tree is (parametrically) polymorphic, of course. Concrete values look like this:

 eg1 = Node "foo" [Node "bar" []]
 eg2 = Node True [Node False [Node True []], Node True []]

The library contains drawTree :: Tree String -> String, which shows a simple text picture of the tree, so we can run putStr $ drawTree eg1 right away. How do we convert eg2 to something showable? Well, we want to convert all of the Boolean values to strings, without changing the tree structure, and this sounds like a mapping operation. To cut a long story short, the Tree library implements the Functor interface too, which allows use of the overloaded fmap function for mapping over container types (like lists or trees), hence we can call putStr $ drawTree $ fmap show eg2.

Next, we’ll create the state space as a tree, where each node contains a state plus the trees that are reachable from that state. Notice the type first and what it describes: from a state-generating function and an initial state, we can produce the tree. It’s polymorphic too, because the tree-building process doesn’t care what’s in the tree. The implementation is straightforward: start a new tree at the current state and then collect the state space trees produced by all of the children. When a state has no children, there’s nothing to recurse on, hence no subtrees—just an empty list in the node.

 import Data.Tree -- put at top of file
 
 state_tree :: (s -> [s]) -> s -> Tree s
 state_tree children_of state
  = Node state [ state_tree children_of s | s <- children_of state ]

Now you would not run this code directly in a conventional setting, but it is fine in a lazy language because we will only need to generate enough of the tree to satisfy what the caller code requires—even if the tree’s depth was not finite.

Still, if we tried drawTree on an infinite tree, it would just keep on going or maybe run out of memory eventually. OK, let’s define another function to help us get sensible output: prune n t returns the structure from t down to a depth of n and discards anything deeper than level n.

There are two main cases, a depth of zero (or below) and a positive depth. When we hit zero, then truncate the tree by dropping its children. Otherwise, build a new tree based on the pruned child trees.

 prune :: Int -> Tree a -> Tree a
 prune d (Node x cs)
  | d <= 0 = Node x []
  | otherwise = Node x [ prune (d - 1) c | c <- cs ]

With this, code like putStr $ drawTree $ fmap show $ prune 3 $ state_tree (\s -> [s+1, s+1]) 0 now generates something useful. (Nothing significant here—it’s just a test value. The state generator here just says, from each state—which happens to be a number—it has two child states of that number with one added, hence it generates a tree that shows the depth at each level and has a fan-out of two children at each step.)

Now for BFS, we need to go through the state tree in breadth-first order. To keep things general—in other words, free from any application-specific details—we are going to write a traversal function (to visit all of the states in order) and later filter the resulting state list for goal states. The traversal function will have the type bft :: Tree a -> [a]. It’s a bit trickier, but the technique here is to get a second function (here called bft_) to handle a list of trees at a time, then the main function calls it with the original tree in a singleton list.

 bft :: Tree a -> [a]
 bft t = bft_ [t]
 bft_ :: [Tree a] -> [a]
 bft_ trees = [ x | Node x _ <- trees ]
  ++ bft_ (concat $ map subForest trees)

You can think of bft_ as taking the next list of trees, skimming off the root values of the trees, and adding on the result of traversing the various child trees. The child traversal works by taking all of the children, flattening the lists of children to a flat list, and then running bft_ on this new list. If it helps, draw the picture and check how the code mirrors an informal level-by-level sweep.

The last step is to identify when we have found “goal” states—states that solve the required problem. We can do this by passing in a predicate of type a -> Bool and using it when filtering the output of bft. Putting this all together, we get the following.

 breadth_first_search :: (s -> Bool) -- goal test
  -> (s -> [s]) -- state generator
  -> s -- initial state
  -> [s] -- solutions out
 breadth_first_search is_goal children_of
  = filter is_goal
  . bft
  . state_tree children_of

To recap: we build the state tree, then do a breadth-first traversal of it to get a list of states to try, then return the ones that correspond to solved goals.

That’s our solution. Now compare this to implementations in your favorite language. Is this more modular? More reusable?