Defining regular expressions

A regular expression (regex) is made up of characters that define a pattern. The expression contains special symbols that are meaningful to the parser, and if you want to use those symbols in the search pattern in the expression then you can escape them with a backslash (\). Your code will typically pass the expression as a string object to an instance of the regex class as a constructor parameter. This object is then passed to functions in <regex> that will use the expression to parse text for sequences that match the pattern.

The following table summarizes some of the patterns that you can match with the regex class.

Pattern Explanation Example
literals Matches the exact characters li matches flip lip plier
[group] Matches a single character in a group [at] matches cat, cat, top, pear
[^group] Matches a single character not in the group [^at] matches cat, top, top, pear, pear, pear
[first-last] Matches any character in the range first to last [0-9] matches digits 102, 102, 102
{n} The element is matched exactly n times 91{2} matches 911
{n,} The element is matched n or more times wel{1,} matches well and welcome
{n,m} The element is matched between n and m times 9{2,4} matches 99, 999, 9999, 99999 but not 9
. Wildcard, any character except n a.e matches ate and are
* The element is matched zero or more times d*.d matches .1, 0.1, 10.1 but not 10
+ The element is matched one or more times d*.d matches 0.1, 10.1 but not 10 or .1
? The element is matched zero or one time tr?ap matches trap and tap
| Matches any one of the elements separated by the | th(e|is|at) matches the, this, that
[[:class:]] Matches the character class [[:upper:]] matches uppercase characters: I am Richard
n Matches a newline
s Matches any single whitespace
d Matches any single digit d is [0-9]
w Matches a character that can be in a word (upper case and lower case characters)
b Matches at a boundary between alphanumeric characters and non-alphanumeric characters d{2}b matches 999 and 9999 bd{2} matches 999 and 9999
$ End of the line s$ matches a single white space at the end of a line
^ Start of line ^d matches if a line starts with a digit

 

You can use regular expressions to define a pattern to be matched--the Visual C++ editor allows you to do this in the search dialog (which is a good test bed to develop your expressions).

It is much easier to define a pattern to match rather than a pattern not to match. For example, the expression w+b<w+> will match the string "vector<int>", because this has one or more word characters followed by a non-word character (<), followed by one or more word characters followed by >. This pattern will not match the string "#include <regex>" because there is a space after the include and the b indicates that there is a boundary between alphanumeric characters and non-alphanumeric characters.

The th(e|is|at) example in the table shows that you can use parentheses to group patterns when you want to provide alternatives. However, parentheses have another use--they allow you to capture groups. So, if you want to perform a replace action, you can search for a pattern as a group and then refer to that group as a named subgroup later (for example, search for (Joe) so that you can replace Joe with Tom). You can also refer to a sub-expression specified by parentheses in the expression (called back references):

    ([A-Za-z]+) +1

This expression says: search for words with one or more characters in the ranges a to z and A to Z; the word is called 1 so find where it appears twice with a space between them.