Once you have constructed a regex object you can pass it to the methods in the <regex> library to search for the pattern in a string. The regex_match function is passed in a string (C or C++) or iterators to a range of characters in a container and a constructed regex object. In its simplest form, the function will return true only if there is an exact match, that is, the expression exactly matches the search string:
regex rx("[at]"); // search for either a or t
cout << boolalpha;
cout << regex_match("a", rx) << "n"; // true
cout << regex_match("a", rx) << "n"; // true
cout << regex_match("at", rx) << "n"; // false
In the previous code, the search expression is for a single character in the range given (a or t), so the first two calls to regex_match return true because the searched string is one character. The last call returns false because the match is not the same as the searched string. If you remove the [] in the regular expression, then just the third call returns true because you are looking for the exact string at. If the regular expression is [at]+ so that you are looking for one or more of the characters a and t, then all three calls return true. You can alter how the match is determined by passing one or more of the constants in the match_flag_type enumeration.
If you pass a reference to a match_results object to this function, then after the search the object will contain information about the position and the string that matches. The match_results object is a container of sub_match objects. If the function succeeds it means that the entire search string matches the expression, and in this case the first sub_match item returned will be the entire search string. If the expression has subgroups (patterns identified with parentheses) then these sub groups will be additional sub_match objects in the match_results object.
string str("trumpet");
regex rx("(trump)(.*)");
match_results<string::const_iterator> sm;
if (regex_match(str, sm, rx))
{
cout << "the matches were: ";
for (unsigned i = 0; i < sm.size(); ++i)
{
cout << "[" << sm[i] << "," << sm.position(i) << "] ";
}
cout << "n";
} // the matches were: [trumpet,0] [trump,0] [et,5]
Here, the expression is the literal trump followed by any number of characters. The entire string matches this expression and there are two sub groups: the literal string trump and whatever is left over after the trump is removed.
Both the match_results class and the sub_match class are templated on the type of iterator that is used to indicate the matched item. There are typedef call's cmatch and wcmatch where the template parameter is const char* and const wchar_t*, respectively, and smatch and wsmatch where the parameter is the iterator used in string and wstring objects, respectively (similarly, there are submatch classes: csub_match, wcsub_match, ssub_match, and wssub_match).
The regex_match function can be quite restrictive because it looks for an exact match between the pattern and the searched string. The regex_search function is more flexible because it returns true if there is a substring within the search string that matches the expression. Note that even if there are multiple matches in the search string, the regex_search function will only find the first. If you want to parse through the string you will have to call the function multiple times until it indicates that there are no more matches. This is where the overload with iterator access to the search string becomes useful:
regex rx("bd{2}b");
smatch mr;
string str = "1 4 10 42 100 999";
string::const_iterator cit = str.begin();
while (regex_search(cit, str.cend(), mr, rx))
{
cout << mr[0] << "n";
cit += mr.position() + mr.length();
}
Here, the expression will match a 2 digit number (d{2}) that is surrounded by whitespace (the two b patterns mean a boundary before and after). The loop starts with an iterator pointing to the start of the string, and when a match is found this iterator is incremented to that position and then incremented by the length of the match. The regex_iterator object, explained further, wraps this behavior.
The match_results class gives iterator access to the contained sub_match objects so you can use ranged for. Initially, it appears that the container works in an odd way because it knows the position in the searched string of the sub_match object (through the position method, which takes the index of the sub match object), but the sub_match object appears to only know the string it refers to. However, on closer inspection of the sub_match class, it shows that it derives from pair, where both parameters are string iterators. This means that a sub_match object has iterators specifying the range in the original string of the sub string. The match_result object knows the start of the original string and can use the sub_match.first iterator to determine the character position of the start of the substring.
The match_result object has a [] operator (and the str method) that returns the substring of the specified group; this will be a string constructed using the iterators to the range of characters in the original string. The prefix method returns the string that precedes the match and the suffix method returns the string that follows the match. So, in the previous code, the first match will be 10, the prefix will be 1 4, and the suffix will be 42 100 999. In contrast, if you access the sub_match object itself, it only knows its length and the string, which is obtained by calling the str method.
The match_result object can also return the results through the format method. This takes a format string where the matched groups are identified through numbered placeholders identified by the $ symbol ($1, $2, and so on). The output can either be to a stream or returned from the method as a string:
string str("trumpet");
regex rx("(trump)(.*)");
match_results<string::const_iterator> sm;
if (regex_match(str, sm, rx))
{
string fmt = "Results: [$1] [$2]";
cout << sm.format(fmt) << "n";
} // Results: [trump] [et]
With regex_match or regex_search, you can use parentheses to identify subgroups. If the pattern matches then you can obtain these subgroups using an appropriate match_results object passed by reference to the function. As shown earlier, the match_results object is a container for sub_match objects. Sub matches can be compared with theĀ <, !=, ==, <=, >, and >= operators, which compare items that the iterators point to (that is, the sub strings). Further, sub_match objects can be inserted into a stream.