The Recursive Nature of M4

Now we consider the recursive nature of the M4 input stream. Whenever a name token is expanded by a macro definition, the expansion text is pushed back onto the input stream for complete reprocessing. This recursive reprocessing continues to occur as long as there are macro calls found in the input stream that generate text.

For example:

$ m4
define(`macro', `expansion')dnl
macro ``quoted' text'
expansion `quoted' text
<ctrl-d>$

Here, I define a macro called macro, and then present this macro name on the input stream, followed by additional text, some of which is quoted, and some of which is double quoted.

The process used by M4 to parse this example is shown in Figure 10-1.

Figure 10-1. The procedure used by M4 to process an input text stream

In the bottom line of the figure, M4 is generating a stream of output text (expansion `quoted' text) from a stream of input text (macro ``quoted' text').

The diagram above this line shows how M4 actually generates the output text from the input text. When the first token (macro) is read in the top line, M4 finds a matching symbol in the symbol table, pushes it onto the input stream on the second line, and then restarts the input stream. Thus, the very next token read is another name token (expansion). Since this name is not found in the symbol table, the text is sent directly to the output stream. The third line sends the next token from the input stream (a space character) directly to the output stream. Finally, in the fourth line, one level of quotes is removed from the quoted text (``quoted' text'), and the result (`quoted' text) is sent to the output stream.

As you might guess, there are some potentially nasty side effects of this process. For example, you can accidentally define a macro that is infinitely recursive. The expansion of such a macro would lead to a massive amount of unwanted output, followed by a stack overflow. This is easy to do:

$ m4
define(`macro', `This is a macro')dnl
macro
This is a This is a This is a This is a This is a This is a...<ctrl-c>
$

This happens because the macro name expands into text containing the macro's own name, which is then pushed back onto the input stream for reprocessing. Consider the following scenario: What would have been the result if I'd left the quotes off of the expansion text in the macro definition? To help you discover the answer, let's turn next to M4 quoting rules.

Quoting Rules

Proper quoting is critical. You have probably encountered situations where your invocations of Autoconf macros didn't work as you expected. The problem is often a case of under-quoting, which means you omitted a required level of quotes around some text.

You see, each time text passes through M4, a layer of quotes is stripped off. Quoted strings are not names and are thus not subject to macro expansion, but if a quoted string passes through M4 twice, the second time through, it's no longer quoted. As a result, individual words within that string are no longer part of a string, but instead are parsed as name tokens, which are subject to macro expansion. To illustrate, enter the following text at a shell prompt:

$ m4
❶ define(`abc', `def')dnl
  abc
  def
❷ define(`abc', ``def'')dnl
  abc
  def
❸ define(`abc', ```def''')dnl
  abc
  `def'
  <ctrl-d>$

In this example, the first time abc is defined (at ❶), it's quoted once. As M4 processes the macro definition, it removes a layer of quotes. Thus, the expansion text is stored in the symbol table without quotes, and we would expect the output of abc to be simply def, which it is.

As you can see, the second definition of abc (at ❷) is double quoted, so when the definition is processed, and the outer layer of quotes is stripped off, we would expect the expansion text in the symbol table to contain at least one set of quotes, and it does. Then why don't we see quotes around the output text? Remember that when macros are expanded, the expansion text is pushed onto the front of the input stream and reparsed using the usual rules. Thus, while the text of the second definition is stored quoted in the symbol table, as it's reprocessed upon use, the second layer of quotes is removed between the input and output streams.

The difference between ❶ and ❷ in this example is that the expansion text of ❷ is treated as quoted text by M4, rather than as a potential macro name. The quotes are removed during definition, but the enclosed text is not considered for further expansion because it's still quoted.

In the third definition of abc (at ❸), we finally see the result we were trying to obtain: a quoted version of the output text. The expansion text is entered into the symbol table double quoted, because the outermost set of quotes is stripped off during processing of the definition. Then, when the macro is used, the expansion text is reprocessed and the second set of quotes is stripped off, leaving one set in the final output text.

If you keep these rules in mind as you work with macros within Autoconf (including both definitions and calls), you'll find it easier to understand why things may not work the way you think they should. The GNU M4 Manual provides a simple rule of thumb for using quotes in macro calls: For each layer of nested parentheses in a macro call, use one layer of quotes.