Chapter 4. Patterns and Rule-Based Programming

You are an obsession I cannot sleep I am your possession Unopened at your feet There’s no balance No equality Be still I will not accept defeat

I will have you Yes, I will have you I will find a way and I will have you Like a butterfly A wild butterly I will collect you and capture you

In Chapter 2, I argue that the functional style of programming is the preferred way to solve problems in Mathematica. Although functions form much of the brawn, pattern matching provides the brains. In fact, functions and patterns should be thought of as partners rather than competitors. By mastering both functional programming and pattern-based programming, you will be able to use Mathematica to its fullest potential. In fact, once you get the hang of pattern-based solutions they may become a bit of an obsession.

If you have done any programming that involves text manipulation, you have no doubt been exposed to regular expressions, a concise syntax for describing patterns in text and manipulating text. Mathematica’s pattern syntax generalizes regular expressions to the domain of symbolic processing, which allows you to manipulate arbitrary symbolic structures. Patterns and rules are at the foundation of Mathematica’s symbolic processing capabilities. Symbolic integration, differentiation, equation solving, and simplification are all driven by the pattern primitives explained in this chapter.

In the context of Mathematica, a pattern is an expression that acts as a template against which other expressions can be matched. Some of the most useful patterns contain variables that are bound to values as a result of the matching process. However, many times just knowing that a pattern matched is sufficient. Patterns are central to specifying constraints in function arguments (e.g., Integer). They also play roles in parsing, replacing, and counting, as we show in the recipes here. I defer the role of patterns in string manipulation to Chapter 5.

Rules build on patterns by specifying a mapping from a pattern to another expression that uses all or parts of the matched results. Rules pervade Mathematica, as you will see in this chapter’s recipes and throughout this book. It’s safe to say that Mathematica would be almost as crippled by the removal of rules as it would be by the removal of the definition for Plus.

The rest of this introduction gives a brief overview of the most important primitives associated with pattern matching. This will make the recipes a bit easier to follow if you are new to these concepts. The recipes will explore the primitives more deeply, and as usual, you should refer to the Mathematica documentation for subtle details or clarification.

Cases will work with any expression, not just lists. However, you need to keep in mind that Mathematica will rearrange the expression before the pattern is applied.

In[41]:=  Cases[x + y - z^2 + z^3 + x^5, _^_]
Out[41]=  {x5, z3}

You may have expected z^2 or -z^2 to be selected; examining the FullForm of the expression will reveal why it was not. FullForm is your friend when it comes to debugging pattern matching because that is the form that Mathematica sees.

   In[42]:=  x + y - z^2 + z^3 + x^5 // FullForm
Out[42]//FullForm=
             Plus[x, Power[x, 5], y, Times[-1, Power[z, 2]], Power[z, 3]]

Providing a level specification will allow you to reach down deeper. Level specifications are discussed in 3.9 Manipulating Deeply Nested Lists Using Functions with Level Specifications.

In[43]:=  Cases[x + y - z^2 + z^3 + x^5, _^_, 2]
Out[43]=  {x5, z2, z3}

You can also limit the number of matches using an optional fourth argument.

In[44]:= Cases [x + y - z^2 + z^3 + x^5, _^_, 2, 1]
Out[44]= {x5}

Take into account the attributes Flat and Orderless when pattern matching. Flat means nested expressions like Plus[a,Plus[b,c]] will be flattened; Orderless means the operation is communicative, and Mathematica will account for this when pattern matching.

In[45]:= Attributes[Plus]
Out[45]= {Flat, Listable, NumericFunction, OneIdentity, Orderless, Protected}

Here we select every expression that contains b +, no matter its level or order in the input expression.

In[46]:= Cases[{a + b, a + c, b + a, a^2 + b, Plus[a, Plus[b, c]]}, b + _]
Out[46]= {a + b, a + b, a2 + b, a + b + c}

Hold will suppress transformations due to Flat and Orderless, but the pattern itself is still reordered from b + a to a + b. In 4.8 Preventing Evaluation Until Replace Is Complete we show how to prevent this using HoldPattern.

In[47]:= Cases[Hold[a + b, a + c, b + a, a^2 + b, Plus[a, Plus[b, c]]], b + a]
Out[47]= {a + b}

An alternative to Cases is the combination of Position and Extract. Here Position locates the items, and Extract returns them. This variation would be more helpful than Cases, for example, if you needed to know the positions as well as the items, since Cases does not provide positional information. By default, Position will search every level, but you can restrict it with a levelspec as I do here.

In[48]:=  list = {1, 1.2, "test", 3, {2}, x +1};
          positions = Position[list, _Integer, {1}];
          Extract[list, positions]
Out[50]=  {1, 3}

One useful application of this idiom is matching on one list and extracting from a parallel list.

In[51]:=  names = {"Jane", "Jim", "Jeff", "Jessie", "Jezebel"};
          ages = {30, 20, 42, 16, 69} ;
          Extract[names, Position[ages, x_ /; x >30]]
Out[53]=  {Jeff, Jezebel}

Most of the variations supported by Cases discussed in 4.1 Collecting Items That Match (or Don’t Match) a Pattern apply to DeleteCases as well. In fact, given the existence of Except, one could argue that DeleteCases is redundant. However, given the context of the problem, usually either Cases or DeleteCases will be easier to understand compared to using pattern inversions. Also, Except has some limitations since pattern variables like x_ can’t appear inside of an Except.

Use levelspecs to constrain deletions to particular portions of an expression tree. Here is an expression that is nine levels deep.

Discussion

You can delete roots at level four.

Discussion

You can also delete roots at levels up to four.

Discussion

Or, you delete roots at every level.

Discussion

Just as Extract plus Position is the equivalent of Cases (discussed in 4.1 Collecting Items That Match (or Don’t Match) a Pattern), Delete plus Position is the equivalent for DeleteCases. Again, remember that Position looks at all levels unless you restrict it.

Discussion

This leads to a way to get the results of Cases and DeleteCases without executing the pattern match twice.

Discussion

Chapter 3 covers list manipulation in detail, including the use of Part.

The hasPath and transitiveClosure functions share a common property. They are implemented by repeated transformation of the input until some goal state is achieved. The search terminates when there are no more available transformations, as determined by FixedPoint. TransitiveClosure uses the final state as the result, whereas hasPath makes one more match using MemberQ to see if the goal was reached.

Although rule-driven algorithms tend to be small, they are not always the most efficient. HasPath finds all paths from the start node before making a determination.

The hasPath2 implementation here uses Catch-Throw to exit as soon as the solution is found.

Discussion

The main components of this solution are:

If you have trouble following one of these solutions, Mathematica will show its work if you use FixedPointList. For example, here is the expansion of the steps in hasPath.

Discussion
Discussion

This shows each step in the transition from the original graph to the one with all intermediate steps filled in. Try to work out how the rule took each line to the next line. Only by working through examples like this will you begin to master the concepts.

Sometimes applying the debugging techniques in the solution can still leave you stumped. Here is an example that one would expect to terminate based on the fact that NumberQ[Infinity] is false.

Discussion

In situations like this, you should try applying FullForm to the output to see what Mathematica sees rather than what it shows you.

  In[126]:=  FullForm[%]
Out[126]//FullForm=
             List[F[DirectedInfinity[
                F[DirectedInfinity[F[DirectedInfinity[F[DirectedInfinity[
                      F[DirectedInfinity[F[DirectedInfinity[F[DirectedInfinity[F[
                              DirectedInfinity[F[DirectedInfinity[
                                 F[DirectedInfinity[1]]]]]]]]]]]]]]]]]]]],
             a, F[DirectedInfinity[F[DirectedInfinity[F[DirectedInfinity[
                    F[DirectedInfinity[F[DirectedInfinity[F[DirectedInfinity[
                           F[DirectedInfinity[F[DirectedInfinity[F[DirectedInfinity[
                                  F[DirectedInfinity[1]]]]]]]]]]]]]]]]]]]],
             b, F[DirectedInfinity[F[DirectedInfinity[F[DirectedInfinity[
                    F[DirectedInfinity[F[DirectedInfinity[F[DirectedInfinity[
                           F[DirectedInfinity[F[DirectedInfinity[F[DirectedInfinity[
                                  F[DirectedInfinity[1]]]]]]]]]]]]]]]]]]]], c]

Do you see the problem? It is near the end of the output. If you can’t see it, consider this.

  In[127]:=  FullForm[Infinity]
Out[127]//FullForm=
             DirectedInfinity[1]

The full form of Infinity contains the integer 1, which is being matched and replaced with F[DirectedInfinity[l]] and so on, ad infinitum. In this simple case, ReplaceRepeated was not needed because ReplaceAll would do the trick. If ReplaceRepeated is necessary, break the process into two steps, first using a proxy for the construct that has the hidden representation that is messing you up. Here I use Inf instead of Infinity.

Discussion

Chapter 2 discusses Hold in more detail.

If the data you need to query resides in a database, it makes more sense to let that database do the query work before the data is imported into Mathematica. If this is not the case, Mathematica can easily do the job, even for rather sophisticated queries. Here are some simple examples with SQL equivalents.

Find all pairs of cities where a supplier in the first city has inventory on a part in the second city.

Discussion

In this case, ReplaceRepeated can be used to implement GROUP BY. The idea is to continually search for pairs of items that match on the grouping criteria and combine them according to some aggregation method, in this case the sum of qty. Since each replacement removes an inventory item, we are guaranteed to terminate when all items are unique. A final ReplaceAll is used to extract the relevant information. The use of Null in the replacement rule is just for aesthetics, conveying that when you aggregate two inventory records you no longer have a valid record for a particular supplier.

Discussion

Suppose you want the names of suppliers who have inventory in the part P1. This involves integrating information from S and INV. This can be done as a join, but in SQL it can also be done via a subquery. You can emulate that using rules. Here MemberQ implements the semantics of the SQL IN.

Discussion

In the examples given, I have demonstrated queries for which the data is in relational form. One feature of relational form is that it is normalized so that each column can hold only atomic data. However, Mathematica is not a relational database, so data can appear in just about any form with any level of nesting. This is no problem because patterns are much more flexible than SQL. Still, I find it easier to put data in a tabular form before trying to extract information and relationships with other collections of data. Let’s consider an example that is more in the Mathematica domain.

GraphData and PolyhedronData are two extensive data sources that are bundled with Mathematica 6 and later versions. The relationship between these data sources is that each polyhedron has an associated graph. In PolyhedronData, the property that ties the two sources together is called SkeletonGraph. In database jargon, SkeletonGraph is a foreign key to GraphData, and thus, allows us to investigate relationships between polyhedra and their associated graphs. For this example, I want to consider all graphs that are both Eulerian and Hamiltonian with their associated polyhedron being Archimedean. (An Archimedean solid is a highly symmetric, semiregular, convex polyhedron composed of two or more types of regular polygons meeting in identical vertices.)

Discussion

It’s often a good idea to see how many results you received.

Discussion

There are exactly 4 cases out of 13 Archimedean polyhedra that meet the criteria of having both Eulerian and Hamiltonian graphs.

Discussion
Discussion

You might find more intuitive ways to solve this problem, but the solution given emphasizes pattern matching. You could also use Intersection with an appropriate SameTest, as shown here. The r @@@ serves only to put the result in the same form as we used previously and is not strictly needed.

In[165]:=  results = r @@@
              Intersection[Archimedean, Graphs, SameTest ->(#1[[3]] == #2[[1]] &)];