Functional Programming: A PragPub Anthology

One Step at a Time

All right, we’re looking to construct paths, and paths contain steps, so how about starting with the step operation? We want this operation to take a word and return all of the valid words that are reachable in one step from that word. For example, from “lead” we want to get the list

	["bead","dead","head","mead","read","load","lend","lewd",
	"leaf","leak","leal","lean","leap","lear","leas"]

and we want it to exclude non-words like “xead.”

A key idea in functional programming is to split big problems into smaller ones. We use it all the time. So how can we apply it here?

At some point we probably need to use the dictionary to filter out invalid words like “xead,” but does the filtering need to be tied into the word generation? Let’s try splitting the problem right at this point. In fact, this is a common way to split problems in functional programming as well as elsewhere: via the “generate and test” pattern, where one component suggests results and the other one discards invalid ones. So let’s try creating one component to generate a list of candidate words, and another to filter out the bad candidates.

Thinking About Types

A quick thought about types first: our basic step needs to have the type String -> [String], or from a word to a list of words. You might be itching to use more complex types after Chapter 13, Functional Thinking and Haskell, but let’s keep it simple for now. What do we have so far? Let’s write it down.

	one_step :: String -> [String]
	one_step = filter is_valid_word . generate_candidates

Of course, this won’t compile because some code is missing (aka, we have at least one failing test).

What does this code mean? The definition uses function composition (the dot) and says directly that the one_step operation is a composition (or combination) of generating candidates and filtering them. Recall that f . g is defined in the Haskell Prelude as \x -> f (g x), so the composition is a new function (the \x -> ... bit) that applies f to the result of g x. Programming wise, we’re building a bigger transformation from two smaller ones.

You can read function compositions in any direction—whatever makes more sense or another version that you are happier with. For example, right to left as a series of steps (generate the candidates, then filter them) or left to right for a more “transformational” view (filtering the results from the candidate list). Notice that we don’t need to mention any parameters here: practically because the definition of function composition handles that for us, and philosophically because we want to say that the big operation is a composition of two others and that’s all.

I’m going to leave word filtering for a bit, so I will just define the following and move on to the other missing code.

is_valid_word = error "Need to finish"

You can see what that line means, but I should pause briefly to explain it a bit, because it reveals something about programming in Haskell. The function error :: String -> a takes a string and causes a runtime exception when the expression is evaluated. That is, if the program ever tried to use is_valid_word, then there would be a runtime error whose message included the "Need to finish" string. This function is useful for stubbing out unfinished code, and for signaling serious problems at runtime. (It is a bit like throw for exceptions, and indeed, some exception support is built into Haskell—though not often used because we have other, softer alternatives.)

Note the type of error: it takes a string and the result type is a, but there’s absolutely no constraint on a. In other words, it really does mean anything. This is sensible, because it allows us to flag errors anywhere; for example, we can use it where a list of Ints is expected because the type variable matches anything. Plus, if you think about it, there’s no sensible operation that could have this type, so it kind of has to be this error functionality!

Telling It Like It Is

As long as we’re paused, notice that I used the expression filter is_valid_word rather than some other single (yet to be defined) function name. There are some important points about language use here.

First, I’m making the positive choice that I want to use filtering on the candidate list, because I know what data structures are in play at the time, and have decided that filtering is going to be appropriate. So instead of a simple decomposition into two stages, I’ve gone a bit further on the particular detail.

Second, I’m using the language to say this directly rather than making up a new function name and adding another definition of that name. Compare these expressions: map (\x -> 2 + x) [1..10] vs. map add_two [1..10]. Haskell is flexible enough that quite often the code itself is the clearest explanation of the programmer’s intent, and adding in extra definitions can make the code harder to follow. In this case, you would have to check elsewhere in the code to see what add_two actually does, whereas 2 + x is immediately obvious. Of course, there are cases where such new definitions can be justified—but Haskell gives you the choice. (You can be even more terse, like map (2+) [1..10].)

Third, what does the expression filter is_valid_word actually mean? Conceptually, it’s taking the general idea of filtering a list with some predicate and then fixing the predicate to be a certain test: the result is an operation that picks out the valid words from a list. The important point here is that in Haskell, this kind of expression is meaningful and hence a key part of the coding style. Let’s consider a simpler example.

	foo = filter (\x -> reverse x == x)
	bar = foo ["ada", "fred", "bob"]

The first line names a combination of filter and a test for a list being a palindrome. (Haskell strings are a list of characters, though this test for palindromes will work with a list of anything that has an equality test.) The combination can then be used with a list of words, as in the second line.

You Don’t Have to Like Curry

What’s the type of foo? Well, filter :: (a -> Bool) -> [a] -> [a]; in other words, it takes a predicate (takes some a value, returns a Bool) and a list of such a values and returns a list of the same. For the type of \x -> reverse x == x, we have to do some type inference, but it comes out as Eq a => [a] -> Bool; in other words, testing a list of a values as long as we can do equality tests on a values themselves. Putting these together—which means matching the first parameter of filter with the predicate—we get filter (\x -> reverse x == x) :: Eq a => [[a]] -> [[a]], hence converting a list of a values into a similar list (assuming equality is known).

A nested list of something might be a bit abstract if you are new to it, but it’s fine to consider concrete examples, so in the context of bar, it’s going to take and return a list of strings.

Some people like to use the word “currying” at this point. I don’t. I think it’s a bit of a misleading distraction that adds little. Conceptually (and theoretically; i.e., the lambda calculus), all functions in Haskell take one argument at a time, and either return a new function or a concrete value, and it’s important to understand this. How filter behaved in the preceding code is a direct consequence of this.

Pragmatically, we often define and use multi-argument functions as if they were just like their equivalents in (say) Ruby, but underneath, Haskell is still using the preceding mechanism. The compilers will optimize such multi-argument functions so they behave efficiently, so we don’t have to worry about how the functions are actually implemented.