Pattern Matching Quick Reference with Examples

Section 32.4 gives an introduction to regular expressions. This article is intended for those of you who need just a quick listing of regular expression syntax as a refresher from time to time. It also includes some simple examples. The characters in Table 32-7 have special meaning only in search patterns.

Table 32-7. Special characters in search patterns

Pattern

What does it match?

.

Match any single character except newline.

*

Match any number (including none) of the single characters that immediately precede it. The preceding character can also be a regular expression. For example, since . (dot) means any character, .* means "match any number of any character."

^

Match the following regular expression at the beginning of the line.

$

Match the preceding regular expression at the end of the line.

[ ]

Match any one of the enclosed characters.

 

A hyphen (-) indicates a range of consecutive characters. A caret (^) as the first character in the brackets reverses the sense: it matches any one character not in the list. A hyphen or a right square bracket (]) as the first character is treated as a member of the list. All other metacharacters are treated as members of the list.

\{n,m \}

Match a range of occurrences of the single character that immediately precedes it. The preceding character can also be a regular expression. \{n\} will match exactly n occurrences, \{n,\} will match at least n occurrences, and \{n,m\} will match any number of occurrences between n and m.

\

Turn off the special meaning of the character that follows (except for \{ and \(, etc., where it turns on the special meaning of the character that follows).

\( \)

Save the pattern enclosed between \( and \) into a special holding space. Up to nine patterns can be saved on a single line. They can be "replayed" in substitutions by the escape sequences \1 to \9.

\< \>

Match characters at beginning (\<) or end (\>) of a word.

+

Match one or more instances of preceding regular expression.

?

Match zero or one instances of preceding regular expression.

|

Match the regular expression specified before or after.

(' )

Apply a match to the enclosed group of regular expressions.

The characters in Table 32-8 have special meaning only in replacement patterns.

Table 32-8. Special characters in replacement patterns

Pattern

What does it do?

\

Turn off the special meaning of the character that follows.

\ n

Restore the nth pattern previously saved by \( and \). n is a number from 1 to 9, with 1 starting on the left.

&

Reuse the string that matched the search pattern as part of the replacement pattern.

\u

Convert first character of replacement pattern to uppercase.

\U

Convert replacement pattern to uppercase.

\l

Convert first character of replacement pattern to lowercase.

\L

Convert replacement pattern to lowercase.

Note that many programs, especially perl , awk, and sed, implement their own programming languages and often have much more extensive support for regular expressions. As such, their manual pages are the best place to look when you wish to confirm which expressions are supported or whether the program supports more than simple regular expressions. On many systems, notably those with a large complement of GNU tools, the regular expression support is astonishing, and many generations of tools may be implemented by one program (as with grep, which also emulates the later egrep in the same program, with widely varying support for expression formats based on how the program is invoked). Don't make the mistake of thinking that all of these patterns will work everywhere in every program with regex support, or of thinking that this is all there is.

When used with grep or egrep, regular expressions are surrounded by quotes. (If the pattern contains a $, you must use single quotes from the shell; e.g., ' pattern '.) When used with ed, ex, sed, and awk, regular expressions are usually surrounded by / (although any delimiter works). Table 32-9 has some example patterns.

Table 32-10 shows the metacharacters available to sed or ex. (ex commands begin with a colon.) A space is marked by ·; a TAB is marked by tab.

— DG