If you want to match specific
characters, you can use
square brackets, [ ]
, to identify the exact
characters you are searching for. The pattern that will match any line of text
that contains exactly one digit is ^[0123456789]$
. This is longer than it has to be. You can use the
hyphen between two characters to specify a range: ^[0-9]$
. You can intermix explicit characters with character
ranges. This pattern will match a single character that is a letter, digit, or
underscore: [A-Za-z0-9_]
. Character sets can
be combined by placing them next to one another. If you wanted to search for a
word that:
started with an uppercase T,
was the first word on a line,
had a lowercase letter as its second letter,
was three letters long (followed by a space character (·
)), and
had a lowercase vowel as its third letter,
the regular expression would be:
^T[a-z][aeiou]·
To be specific: a range is a contiguous series of characters, from low to
high, in the ASCII character set.[3] For example, [z-a]
is
not a range because it's backwards. The range [A-z]
matches both uppercase and lowercase
letters, but it also matches the six characters that fall between uppercase and
lowercase letters in the ASCII chart: [
,
\
, ]
,
^
, _
,
and '
.
— BB
[3] Some languages, notably Java and Perl, do support Unicode regular expressions, but as Unicode generally subsumes the ASCII 7-bit character set, regular expressions written for ASCII will work as well.