You can run, but you can't hide… from find

Tens of projects, hundreds of folders and thousands of file; does this scenario sound familiar? If the answer is yes, then you probably found yourself more than once in a situation where you couldn't find a specific file. The find command will help us locate any file in our project and much more. But first, for creating a quick playground, let's download the electron open source project from GitHub:

Git clone https://github.com/electron/electron

And cd into it:

cd electron

We see here lots of different files and folders, just like in any normal sized software project. In order to find a particular file, let's say package.json, we will use:

find . -name package.json

You can run, but you can't hide… from find

.: This starts the search in the current folder

-name: This helps to search the file name

If we were to look for all readme files in the project, the previous command format is not helpful. We need to issue a case insensitive find. For demonstration purposes, we will also create a readme.md file:

touch lib/readme.md

We will also use the -iname argument for case insensitive search:

find . -iname readme.md

You see here that both readme.md and README.md have been found. Now, if we were to search for all JavaScript files we would use:

find . -name "*.js"

And as you can see, there are quite a few results. For narrowing down our results, let's limit the find to the default_app folder:

find default_app -name "*.js"

As you can see, there are only two js files in this folder. And if we were to find all files that are not JavaScript, just add a ! mark before the name argument:

find default_app ! -name "*.js"

You can see here all files that don't end their name with js. If we were to look for all inodes in the directory, which are of type file, we would use the -type f argument:

find lib -type f

In the same way, we'd use -type d to find all directories in a specific location:

find lib -type d

Find can also locate files based on time identifiers. For example, in order to find all files in the /usr/share directory that were modified in the last 24 hours, issue the following command:

find /usr/share -mtime -1

I have quite a big list. You can see the -mtime -3 broadens the list even more.

If we were to find, for example, all the files modified in the last hour, we can use -mmin -60:

find ~/.local/share -mmin -60

A good folder to search is ~/.local/share, If we use -mmin -90, the list broadens again.

Find can also show us the list of files accessed in the last 24 hours by using the -atime -1 argument like so:

find ~/.local/share -atime -1

While working with lots of project files, if sometimes the case in some projects remain empty, and we forget to delete them. In order to locate all empty files just do a:

find . -empty

As we can see, electron has a few empty files. Find will also show us empty directories, or links.

Removing empty files will keep our project clean, but when it comes to reducing size, we sometimes want to know which files are taking up most of the space. Find can also do searches based on file size. For example, let's find all the files larger than 1 mega:

find . -size +1M

use -1M for smaller.

As we said in the beginning, find can do much more than locating files in your project. Using the -exec argument, it can be combined with almost any other command, which gives it almost infinite capabilities. For example, if we want to find all javascript files that contain the text manager, we can combine find with grep, command as follows:

find . -name "*.js" -exec grep -li 'manager' {} \;

This will execute the grep command on all the files returned by find. Let's also search inside the file using vim, so that we verify the result is correct. As you can see, the text "manager" appears in this file. You don't have to worry about {} \;, it's just standard -exec syntax.

Moving on with the practical examples, let's say you have a folder where you want to remove all the files modified in the last 100 days. We can see our default_app folder contains such files. If we combine find with rm like so:

find default_app -mtime -100 -exec rm -rf {} \;

We can do a quick cleanup. Find can be used for smart backups. For example, if we were to backup all json files in the project we would combine find with the cpio backup utility using a pipe and a standard output redirection:

find . -name "*.json" | cpio -o > backup.cpio

We can see that this command has created a backup.cpio file, of type cpio archive.

Now this could probably have been written with -exec also, but it's critical you understand that pipes can also be used in this type of scenario, together with redirects.

When doing reports, you may have to count the number of lines written:

In order to do this, we combine find with wc -l:
```
find . -iname "*.js" -exec wc -l {} \; 
```
This will give us all js files and the number of lines. We can pipe this to cut:
```
find . -iname "*.js" -exec wc -l {} \; | cut -f 1 -d ' ' 
```
To only output the number of lines, and then pipe to the paste command, we do this:
```
find . -iname "*.js" -exec wc -l {} \; | cut -f 1 -d ' ' | paste -sd+ 
```
The above will merge all our lines with the + sign as a delimiter. This, of course, can translate to an arithmetic operation, which we can calculate using the binary calculator (bc):
```
find . -iname "*.js" -exec wc -l {} \; | cut -f 1 -d ' ' | paste -sd+ | bc
```

This last command will tell us how many lines our javascript files contain. Of course, these are not actual lines of code, as they can be empty lines or comments. For a precise calculation of lines of code, you can use the sloc utility.

In order to mass rename files, like changing the file extension name to node for all js files we can use this command:

find . -type f -iname "*.js" -exec rename "s/js$/node/g" {} \;

You can see the rename syntax is quite similar to sed. In addition, there are no more .js files left, as all have been renamed to .node:

Some software projects require all source code files to have a copyright header. As this is not required in the beginning, often times we can find ourselves in the situation that we have to add copyright information at the beginning of all our files.

In order to do this, we can combine find with sed like this:

find . -name "*.node" -exec sed -i "1s/^/\/** Copyright 2016 all rights reserved *\/\n/" {} \;

What this is basically doing is telling the computer to find all .node files, and add the copyright notice in the beginning of each file, followed by a new line.

We can check one random file and, yes, the copyright notice is there:

Update version numbers in all files:

find . -name pom.xml -exec sed -i "s/<version>4.02/<version>4.03/g" {} \;

As you can imagine, find has lots of use cases. The examples I've shown you are only the first piece of the pie. Learning find, along with sed and the git cli can set you free from your IDE when it comes to finding, refactoring or working with git, which means you can more easily switch from one IDE to the other, because you don't have to learn all the features. You just use your friendly CLI tools.