Searching for files

In this section, we will learn how to search for files in Linux. The man find command, as the name implies, can find files based on versatile criteria. But more than that, you can even apply actions on every search result during execution of the program, which is a very useful feature. Find can take some options to change its default behavior, for example, how to treat files, which are symbolic links during execution of the program. The first few arguments are a list of directories or starting points to start your search in, all the other arguments are search expressions or conditions to find in your search. It's important to discuss what search expressions are. A search expression typically is a test and an action. Tests are typically separated by logical operators. If no operator is given, the end operator is assumed. If the expression contains no action by the user, then the print action will be performed for all the files in the search result.

Before we start using the man find command, it's important to know how the man find command processes the search results. For every file in the list of search paths, all the expressions get evaluated from left to right. By default, only if all the expressions are correct, the man find command marks the file as a hit. You can change this logical end behavior if you like using an OR expression as well, as we will see later in one of our examples. The man find command lets you create very sophisticated search queries using a broad range of very useful file test expressions. If you search for tests in the manual page of the man find command, you will get a full list of all the available test operators. For example, you can search for files that have been modified or accessed at specific time in the past, or which have a certain size. As mentioned earlier, the default action is the print action on every file match. Another very useful action is the exec expression, which lets you execute a specific command for every file match. The man find command is a very complex command and we cannot show you everything here. Thus, for the rest of this section we will show you some very useful use cases. You can use the find command without any options or arguments. This is the same as writing because without any options and arguments the search path is the current directory and the default action is the print action. This command goes through your current directory and prints out all the files and directories, including all subdirectories and files beneath the subdirectories, recursively. It does so because you have not provided any test expression, so it will just match any file or directory in your current directory and apply the print action to it. As mentioned earlier, what makes the find command so powerful is its huge list of different test expressions to locate files based on a variety of useful conditions. Such file search tests can be anything imaginable, such as timestamps, user permissions, users, groups, file type, date, size, or any other possible search criteria.

For the following examples, we will use the root user account set up during installation, because in the example shown here, we search a lot in the system directories, which need special privileges. To search for only files and not directories in the /etc directory for the filename logrotate.conf use the following:

find /etc -type f -name logrotate.conf  

If the file is found, you won't encounter any errors. What this command does in the background is it goes through the /etc directory and picks up all the files and subdirectories included in the /etc directory and it processes them recursively one by one. Then, for every file, it checks whether the file is the actual file, and whether the name is equal to the filename. You can also use multiple directories as search starting points, as well as use the -type d to search only directories, this will print out all subdirectory names beginning with the /etc and /var directories and starting with the letter y:

Here, the name expression takes normal POSIX 5 globbing characters, not regular expressions. If you want to use regular expressions for file search use the -regex expression instead. Note that if you use -iname expression instead, it will search case-insensitive. You can also search for files using file size as a criteria:

The find / -type f -size +4M -name 'l*' searches for all the files equal to or larger than 4 MB starting with the name l and only files and not directories starting in the root directory, which means it will recursively search in the whole filesystem tree. As you see, only two files match all of these conditions. By the way, the + stands for greater or equals, if you use a - symbol it stands for less than. You can also search for specific file permissions. File permissions in general are discussed in later sections. To get a list of all the very dangerous directories with read, write, execute permissions for everybody searching in the whole filesystem, we use the following command:

find / -type d -perm 777  

Note that if the user doesn't provide any action for the find command itself the default print action is assumed, so the command will print out every matched file to the stdout command line. We can change that using the -exec action expression, which will apply a command after the -exec expression for every matched file:

find / -type d -perm 777 chmod 755 {} ;  

In our example, the chmod 755 command will be applied for every matched file using the placeholder {}, which stands for matched. The find command here will search for all the files having a very dangerous file permission, 777, and changes it back to a more moderate permission, 755. So if we search again for the dangerous permission, the result will be empty. Why do we have to escape the semicolon? This is because normally a semicolon in the Bash shell delimits commands, so we have to disable its special meaning here. In all the examples shown so far, all the tests and expressions of a single find command must be true that the file can be counted as match.

For example command find / -type f -size +4M -name 'l*' only files are matched and printed out if they are of type file and have a size of 4 MB or more and have a name starting with l. All of these three test expressions have to be true and are connected via a logical and. By default, the logical AND operator is connecting all the test expressions, which means only if all test expressions are true, the file can be matched as a hit. You can easily change a logical AND to a logical OR using the -or expression, like:

find / -type f -name p*.conf -or -name 'p*.d'  

This will match all files starting with a p and having the extension conf or .d in the /etc directory, and having the type file. There are also some very useful test expressions based on the time of a file. For example, find /var -mtime 10| head will output all the files, which have been modified in the last three days, outputting only the first 10 hits before the last three days or longer. Using time-based test expressions is very useful and is often needed in your daily work as a system administrator. For example, if you would need to delete all the files uploaded by users of a web application running on your server, which are older than 30 days, you could do something like the following:

find /var/www/webapp-uploads -mtime +30 -exec rm {} ;  

This command could also be easily put into a script running each day, such as in a Cron job, to automate deleting of all the files which are older than 30 days, so you don't have to take care of this manually anymore. To search for all the files in the entire filesystem, which start with the l and r and has a size between 1 and 4 MB use:

find / -type f -size +1M -size -4M -name 'l*'  

You can also quickly search for files using the locate command instead of using find. You first need to install using the package and locate. The locate command does not do a live search in the filesystem, but rather uses a snapshot of the filesystem using a specific time point. This database gets updated every day at a certain point in time, but you can also regenerate the snapshot database by yourself using the following:

updatedb  

Now if you use the locate command, it will search the database you just generated for all the files matching the name logrotate. This will only search for literal text. If you want to use for regular expressions, use the --regex option.

As we are searching a database, this is usually faster than doing a live search using the find command, but always remember this is not a live state of the current filesystem. Hence, you can run into problems especially when searching for files, which are newer than the search database.