There are two sets of PHP functions that facilitate regular expressions. The preferred set is the PCRE (Perl-Compatible Regular Expressions) library. You can identify these functions because, in PHP, they start with the prefix preg
. Examples of PCRE regular expression functions are preg_replace()
, preg_split()
, preg_match()
, and preg_match_all()
. The other regular expression family available within PHP is POSIX (Extended Regular Expressions). These functions begin with the prefix ereg
and are included in PHP primarily for backward compatibility. The ereg
instructions have been deprecated since PHP 5.3.0 and are mentioned here only because you’ll be exposed to them as you explore PHP regular expression instructions online. We will describe only the PCRE regular expressions in this chapter, and for simplicity, we will also limit our discussion to the most frequently used functions within PHP’s implementation of PCRE.[17]
The most common uses for regular expressions are these:
Make string substitutions within a subject string.
Detect if a substring exists within a larger subject string.
Capture (parse) all matches of the pattern within the subject.
Split strings at a given location.
Next we’ll discuss the functions that use the same simple pattern to perform all of these tasks.
The function preg_replace()
allows you to replace part of a string with another piece of text where the pattern is found within the original (subject) string. In Example 5-1, the replacement text new
is substituted for every occurrence of the pattern /"test"/
.
Example 5-1. Using simple regular expressions to pattern-match and replace
// USAGE: preg_replace(pattern
,replacement
,subject
); // If the pattern is found, the subject // "This is the test string" // becomes // "This is the new string" $resulting_string = preg_replace("/test/", "new", "This is the test string");
Note the pattern that abstracts our target string—the pattern /"test"/
is not a string itself but rather a pattern that represents criteria that match the word test. If the pattern had occurred more than once, each occurrence of the pattern string would have been replaced with new
.
You can use the preg_match()
function to determine if the defined pattern string exists in the subject string. For example, Example 5-2 shows how preg_match()
is used to detect whether the pattern /"test"/
occurs in the subject string.
Example 5-2. Using regular expressions to detect the occurrence of one string in another
// USAGE: preg_match(pattern
,subject
); // $result = 1 (true) if pattern found in subject. // $result = 0 (false) if pattern is not found in subject. $result = preg_match("/test/", "This is the test string");
This functionality is expanded with the preg_match_all()
function, described next.
The difference between preg_match()
and preg_match_all()
is that in addition to returning a true or false if the pattern is found in the subject, preg_match_all()
also returns an array including all the instances within the subject that match the pattern. So in Example 5-3, the returned $result_array
contains two array elements, each containing the word test, because that word matched the pattern twice in the subject.
Example 5-3. Using regular expressions to return occurrences of one string in another
// USAGE: preg_match_all(pattern
,subject
,result_array
); // $result = 1 (true) if pattern found in subject. // $result = 0 (false) if pattern is not found in subject. // $result_array = all instances in subject that match the pattern. $result = preg_match_all("/test/", "This is a test of the test string", $result_array);
While this isn’t particularly useful, this function becomes more interesting when the pattern matches more than one possible result set. For example, if the pattern had described an email address, we could have extracted all the email addresses from a web page. Or, if you were developing a spider, the pattern could have described hyperlinks and extracted all the links in a web page. We’ll cover this in detail as we progress.
Finally, preg_split()
facilitates splitting the subject string at the point of the pattern string. Example 5-4 shows how this is implemented.
Example 5-4. Using a simple regular expressions pattern to split a string
// USAGE: $result_array = preg_split(pattern, subject
);
// If the pattern is found in the subject, the subject is split at the pattern
// $result_array[0] = "This is the "
// $result_array[1] = " string"
$result_array = preg_split("/test/", "This is the test string");
You may have noticed that these regular expression functions bear a resemblance to PHP built-in functions or the parsing functions found in LIB_parse
. For example:
preg_replace()
has similarities to PHP built-in function str_replace()
.
preg_split()
has similarities to PHP built-in function substr()
.
preg_match()
has similarities to PHP built-in function strstr()
.
preg_match_all()
has similarities to parse_array()
in LIB_parse
.[18]
As you will continue to discover, there are usually multiple ways to accomplish a single string manipulation, and many of the solutions can be performed with, or without, using regular expressions.
The second thing to note is that the power of regular expressions is not found in the functions that operate on patterns but in the patterns themselves. So far, you’ve only seen patterns that match a single condition. But regular expressions are much more useful when they contain complex patterns that match a variety of situations.
[17] The entire PHP PCRE manual is available at http://us.php.net/manual/en/ref.pcre.php.
[18] In fact, as you read in the previous chapter, parse_array()
is merely a wrapper for preg_match_all()
.