Standard Library classes

To perform matching or replacement you have to create a regular expression object. This is an object of the class basic_regex that has template parameters for the character type and a regular expression traits class. There are two typedefs for this class: regex for char and wregex for wide chars, which have traits described by the regex_traits and wregex_traits classes.

The traits class determines how the regex class parses the expression. For example, recall from previous text that you can use w for a word, d for a digit, and s for whitespace. The [[::]] syntax allows you to use a more descriptive name for the character class: alnum, digit, lower, and so on. And since these are text sequences that depend upon a character set, the traits class will have the appropriate code to test whether the expression uses a supported character class.

The appropriate regex class will parse the expression to enable functions in the <regex> library to use the expression to identify patterns in some text:

    regex rx("([A-Za-z]+) +1");

This searches for repeated words using a back reference. Note that the regular expression uses 1 for the back reference, but in a string the backslash has to be escaped (\). If you use character classes such asĀ s and d then you will need to do a lot of escaping. Instead, you can use raw strings (R"()"), but bear in mind that the first set of parentheses inside the quote marks is part of the syntax for raw strings and does not form a regex group:

    regex rx(R"(([A-Za-z]+) +1)");

It is entirely up to you as to which is the more readable; both introduce extra characters within the double quotes, which has the potential to confuse a quick glance-over what the regular expression matches.

Bear in mind that the regular expression is essentially a program in itself, so the regex parser will determine whether that expression is valid, and if it isn't the object, the constructor, will throw an exception of type regex_error. Exception handling is explained in the next chapter, but it is important to point out that if the exception is not caught it will result in the application aborting at runtime. The exception's what method will return a basic description of the error, and the code method will return one of the constants in the error_type enumeration in the regex_constants namespace. There is no indication of where in the expression the error occurs. You should thoroughly test your expression in an external tool (for example Visual C++ search).

The constructor can be called with a string (C or C++) or a pair of iterators to a range of characters in a string (or other container), or you can pass an initialization list where each item in the list is a character. There are various flavors of the language of regex; the default for the basic_regex class is ECMAScript. If you want a different language (basic POSIX, extended POSIX, awk, grep, or egrep), you can pass one of the constants defined in the syntax_option_type enumeration in the regex_constants namespace (copies are also available as constants defined in the basic_regex class) as a constructor parameter.

You can only specify one language flavor, but you can combine this with some of the other syntax_option_type constants: icase specifies case insensitivity, collate uses the locale in matches, nosubs means you do not want to capture groups, and optimize optimizes matching.

The class uses the method getloc to obtain the locale used by the parser and imbue to reset the locale. If you imbue a locale, then you will not be able to use the regex object to do any matching until you reset it with the assign method. This means there are two ways to use a regex object. If you want to use the current locale then pass the regular expression to the constructor: if you want to use a different locale create an empty regex object with the default constructor, then call imbue with the locale and pass the regular expression using the assign method. Once a regular expression has been parsed you can call the mark_count method to get the number of capture groups in the expression (assuming you did not use nosubs).