Basic regular expressions

You have a problem and you want to solve it with regular expressions? Now you have two problems! This is just one of the many regular expression jokes on the Internet.

In this section, you will learn how regular expressions work, as we will be using them in the upcoming chapters. We have prepared a file for our playground and if you want to try the grep commands on your own, you can take it from the GitHub repository.

Let's start by opening our text file so we can see its contents, and then splitting the screen so we can see both the file and the command side by side.

First of all, the simplest and probably the most common regular expression is to find a single word.

For this we will use the grep "joe" file.txt command:

Basic regular expressions

joe is the string we are searching for and file.txt is the file where we perform the search. You can see that grep printed the line that contained our string and the word is highlighted with another color. This will only match the exact case of the word (so, if we use lowercase j, this regex will not work anymore). To do a case insensitive search, grep has an -i option. What this means is that grep will print the line that contains our word even if the word is in a different case, like JoE, JOE, joE, and so on:

grep -i "joe" file.txt

Basic regular expressions

If we don't know exactly what characters are there in our string, we can use .* to match any number of characters. For example, to find a sentence beginning with "word" and ending with "day", we'd use the grep "word.*day" file.txt command:

. - matches any character
* - matches previous character multiple times

Here you can see that it matched the first line in the file.

A very common scenario is to find empty lines in a file. For this we use the grep "^\s$" file.txt command:

Where \s : This stands for space,
^ : It's for the beginning of the line.
$ : It's for its ending.

We have two empty lines with no space. If we add a space between the lines, it will match the lines containing one space. These are called anchors.

grep can do a neat little trick to count the number of matches. For this, we use the -c parameter:

Basic regular expressions

To find all the lines that have only letters and space, use:

grep
"": Open quotes
^$: From the beginning of the line to the end
[]*: Match these characters any number of times
A-Za-z: Any upper and lower case letter

If we run the command up to here, we get only the first line. If we add:

- 0-9 any number we match another two lines,
And if we add: - \s any space, we also match the empty lines and the all caps line
If we run the command until here, we get only the first line from the output, the rest is not displayed
Then, if we add 0-9 we match any number (so the first two lines get matched)
And if we add \s we match any type of space (so the empty lines are matched as well)
```
grep "^[A-Za-z0-9\s]*$" file.txt
```

Sometimes we need to search for something that's not in the string:

grep "^[^0-9]*$" file.txt

This command will find all the lines that do not have only numeric characters. [^] means match all characters that are not inside, in our case, any non-number.

The square brackets are markers in our regular expression. If we want to use them in our search string, we have to escape them. So, in order to find lines that have content between square brackets, do this:

grep "\[.*\]" file.txt

This is for any line that has characters in square brackets. To find all lines that have these character !, type this:

grep "\!" file.txt

Now let's have a look at a basic sed, lets find Joe word and replace with All word:

sed "s/Joe/All/g" file.txt

Basic regular expressions

This will replace every occurrence of the string Joe with the string All. We will deep dive into this in the upcoming chapters.

Regular expressions, such as Vim, are one of the things many people are afraid of because they seem complicated to learn in the beginning. Although they might seem cryptic, regular expressions are handy companions once mastered: they are not limited to our shell because the syntax is very similar in most programming languages, databases, editors, and any other place that includes searching for strings. We will go into more detail about regular expressions in the upcoming chapters.