You have a problem and you want to solve it with regular expressions? Now you have two problems! This is just one of the many regular expression jokes on the Internet.
In this section, you will learn how regular expressions work, as we will be using them in the upcoming chapters. We have prepared a file for our playground and if you want to try the grep commands on your own, you can take it from the GitHub repository.
Let's start by opening our text file so we can see its contents, and then splitting the screen so we can see both the file and the command side by side.
First of all, the simplest and probably the most common regular expression is to find a single word.
For this we will use the grep "joe" file.txt
command:
joe
is the string we are searching for and file.txt
is the file where we perform the search. You can see that grep printed the line that contained our string and the word is highlighted with another color. This will only match the exact case of the word (so, if we use lowercase j
, this regex will not work anymore). To do a case insensitive search, grep
has an -i
option. What this means is that grep will print the line that contains our word even if the word is in a different case, like JoE, JOE, joE, and so on:
grep -i "joe" file.txt
If we don't know exactly what characters are there in our string, we can use .*
to match any number of characters. For example, to find a sentence beginning with "word" and ending with "day", we'd use the grep "word.*day" file.txt
command:
.
- matches any character*
- matches previous character multiple timesHere you can see that it matched the first line in the file.
A very common scenario is to find empty lines in a file. For this we use the grep "^\s$" file.txt
command:
\s
: This stands for space, ^
: It's for the beginning of the line.$
: It's for its ending.We have two empty lines with no space. If we add a space between the lines, it will match the lines containing one space. These are called anchors.
grep
can do a neat little trick to count the number of matches. For this, we use the -c
parameter:
To find all the lines that have only letters and space, use:
grep
""
: Open quotes^$
: From the beginning of the line to the end[]*
: Match these characters any number of timesA-Za-z
: Any upper and lower case letterIf we run the command up to here, we get only the first line. If we add:
grep "^[A-Za-z0-9\s]*$" file.txt
Sometimes we need to search for something that's not in the string:
grep "^[^0-9]*$" file.txt
This command will find all the lines that do not have only numeric characters. [^]
means match all characters that are not inside, in our case, any non-number.
The square brackets are markers in our regular expression. If we want to use them in our search string, we have to escape them. So, in order to find lines that have content between square brackets, do this:
grep "\[.*\]" file.txt
This is for any line that has characters in square brackets. To find all lines that have these character !
, type this:
grep "\!" file.txt
Now let's have a look at a basic sed,
lets find Joe
word and replace with All
word:
sed "s/Joe/All/g" file.txt
This will replace every occurrence of the string Joe
with the string All
. We will deep dive into this in the upcoming chapters.
Regular expressions, such as Vim, are one of the things many people are afraid of because they seem complicated to learn in the beginning. Although they might seem cryptic, regular expressions are handy companions once mastered: they are not limited to our shell because the syntax is very similar in most programming languages, databases, editors, and any other place that includes searching for strings. We will go into more detail about regular expressions in the upcoming chapters.