One of the most important functions of working in the Linux command‐line interface is handling files and directories. Just about every administrative task you perform on your Linux system requires working with some type of file. This chapter dives into the topic of handling files and directories from the Linux command line.
For most Linux distributions, when you start a shell session, you are placed in your user Home directory. Most often, you will need to break out of your Home directory to get to other areas in the Linux system. This section describes how to do that using command‐line commands. Before that, though, is a short tour of just what the Linux filesystem looks like.
If you're new to the Linux system, you may be confused by how it references files and directories, especially if you're used to the way that the Microsoft Windows operating system does that. Before exploring the Linux system, it helps to have an understanding of how it's laid out.
The first difference you'll notice is that Linux does not use drive letters in pathnames. In the Windows world, the physical drives installed on the PC determine the pathname of the file. Windows assigns a letter to each physical disk drive, and each drive contains its own directory structure for accessing files stored on it.
For example, in Windows you may be used to seeing file paths such as this:
C:\Users\Rich\My Documents\test.doc
This indicates that the file test.doc
is located in the directory My Documents
, which itself is located in the directory Rich
. The Rich
directory is contained under the directory Users
, which is located on the hard disk partition assigned the letter C
(usually the first hard drive on the PC).
The Windows file path tells you exactly which physical disk partition contains the file named test.doc
. If you wanted to save a file on a USB memory stick, you would click the icon for the drive assigned to the memory stick, such as E:
, which automatically uses the file path E:\test.doc
. This path indicates that the file is located at the root of the drive assigned the letter E
, which is often assigned to the first USB storage device plugged into the PC.
This is not the method used by Linux. Linux stores files within a single directory structure, called a virtual directory. The virtual directory contains file paths from all the storage devices installed on the PC, merged into a single directory structure.
The Linux virtual directory structure contains a single base directory, called the root. Directories and files beneath the root directory are listed based on the directory path used to get to them, similar to the way Windows does it.
The tricky part about the Linux virtual directory is how it incorporates each storage device. The first hard drive installed in a Linux PC is called the root drive. The root drive contains the core of the virtual directory. Everything else builds from there.
On the root drive, Linux creates special directories called mount points. Mount points are directories in the virtual directory where you assign additional storage devices.
The virtual directory causes files and directories to appear within these mount point directories, even though they are physically stored on a different drive.
Often the system files are physically stored on the root drive, while user files are stored on a different drive, as shown in Figure 7.1.
In Figure 7.1, there are two hard drives on the PC. One hard drive is associated with the root of the virtual directory (indicated by a single forward slash). Other hard drives can be mounted anywhere in the virtual directory structure. In this example, the second hard drive is mounted at the location /home
, which is where the user directories are located.
The Linux filesystem structure has evolved from the Unix file structure. Unfortunately, the Unix file structure has been somewhat convoluted over the years by different flavors of Unix. While Linux started out that way too, a push has been made to standardize the Linux directory structure, called the Linux Filesystem Hierarchy Standard (FHS). Table 7.1 lists some of the more common Linux virtual directory names defined in the FHS.
TABLE 7.1: Common Linux Directory Names
DIRECTORY | USAGE |
---|---|
/ |
The root of the virtual directory. Normally, no files are placed here |
/bin |
The binary directory, where many GNU user‐level utilities are stored |
/boot |
The boot directory, where boot files are stored |
/dev |
The device directory, where Linux creates device nodes |
/etc |
The system configuration files directory |
/home |
The Home directory, where Linux creates user directories |
/lib |
The library directory, where system and application library files are stored |
/media |
The media directory, a common place for mount points used for removable media |
/mnt |
The mount directory, another common place for mount points used for removable media |
/opt |
The optional directory, often used to store optional software packages |
/root |
The root user account's Home directory |
/sbin |
The system binary directory, where many GNU admin‐level utilities are stored |
/tmp |
The temporary directory, where temporary work files can be created and destroyed |
/usr |
The user‐installed software directory |
/var |
The variable directory, for files that change frequently, such as log files |
When you start a new shell prompt, your session starts in your Home directory, which is a unique directory assigned to your user account. When you create a user account, the system normally assigns a unique directory for the account.
In the Windows world, you're probably used to moving around the directory structure using a graphical interface. To move around the virtual directory from a command‐line interface (CLI) prompt, you'll need to learn to use the cd
command.
The change directory command (cd
) is what you'll use to move your shell session to another directory in the Linux filesystem. The format of the cd
command is pretty simplistic.
cd destination
The cd
command may take a single parameter, destination
, which specifies the directory name you want to go to. If you don't specify a destination on the cd
command, it will take you to your Home directory.
The destination parameter, though, can be expressed using two different methods.
The following sections describe the differences between these two methods.
You can reference a directory name within the virtual directory using an absolute filepath. The absolute filepath defines exactly where the directory is in the virtual directory structure, starting at the root of the virtual directory. It's sort of like a full name for a directory.
Thus, to reference the ssl
directory that's contained within the lib
directory, which in turn is contained within the usr
directory, you would use the absolute filepath.
/usr/lib/ssl
With the absolute filepath, there's no doubt as to exactly where you want to go. To move to a specific location in the filesystem using the absolute filepath, you just specify the full pathname in the cd
command.
sysadmin@ubuntu-server:~$ cd /usr/lib/ssl
sysadmin@ubuntu-server:/usr/lib/ssl$
On most Linux distributions, the prompt shows the current directory for the shell (the tilde represents your user Home directory). You can move to any level within the entire Linux virtual directory structure using the absolute filepath.
However, if you're just working within your own Home directory structure, often using absolute filepaths can get tedious. For example, if you're already in the directory /home/rich
, it seems somewhat cumbersome to have to type the following command just to get to your Documents
directory. Fortunately, there's a simpler solution.
cd /home/rich/Documents
Relative filepaths allow you to specify a destination directory relative to your current location, without having to start at the root. A relative filepath doesn't start with a forward slash indicating the root directory.
Instead, a relative filepath starts with either a directory name (if you're traversing to a directory under your current directory) or a special character indicating a relative location to your current directory location. The two special characters used for this are as follows:
The double dot character is extremely handy when you're trying to traverse a directory hierarchy. For example, if you are in the systemd
directory under the etc
directory and need to go to the ssl
directory, also under the etc
directory, you can do this:
sysadmin@ubuntu-server:/etc/systemd$ cd ../ssl
sysadmin@ubuntu-server:/etc/ssl$
The double dot character takes you back up one level to the etc
directory, and then the /ssl
portion takes you back down into the ssl
directory. You can use as many double dot characters as necessary to move around. For example, if you are in your Home directory (/home/sysadmin
) and want to go to the /etc
directory, you could type the following:
sysadmin@ubuntu-server:~$ cd ../../etc
sysadmin@ubuntu-server:/etc$
Of course, in a case like this, you actually have to do more typing to use the relative filepath rather than just typing the absolute filepath, /etc
, which would get you to the same place!
One of the things that made the Unix operating system unique when it was first created in the 1970s was that it treats everything on the computer system as a file—hardware devices, data, network connections, everything. That simplified the way programs interact with hardware, and with each other, because no matter where your data comes from, the Unix operating system handles it the same way. While at first that may seem odd, it's what revolutionized the computer world and made the Unix operating system so popular.
However, because everything is a file, there are some issues that you'll run into. One of those issues is trying to identify file types. This section walks you through how Linux handles filenames and provides some hints on how you can determine file types on a Linux system.
Linux files cover a pretty wide range of file types—everything from text data to executable programs. Because Linux files aren't required to use a file extension, it can sometimes be difficult to tell what files are programs, what files are text data, and what files are binary data. Fortunately, there's a command‐line utility that can help.
The file
command returns the type of the file specified. If the file is a data file, it also attempts to detect just what the file contains.
$ file myprog.c
myprog.c: C source, ASCII text
$ file myprog
myprog: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked,
interpreter /lib64/ld-linux-x86-64.so.2,
BuildID[sha1]=a0758159df7a479a54ef386b970a26f076561dfe, for GNU/Linux 3.2.0, not
stripped
$
In this example, the myprog.c
file is a C program text file, and the myprog
file is a binary executable program.
The first thing you'll notice as you peruse the directories on your Linux system is that Linux has a different filenaming standard than Windows. In the Windows world, you're probably used to seeing a three‐ or four‐character file extension added onto each file (such as .docx
for Word documents). Linux doesn't require file extensions to identify file types on the system (although they're allowed if you like to use them).
Linux filenames can be any length of characters, although 255 is a practical limit for filenames (when filenames get longer than that, some programs have trouble handling them). Linux filenames can contain uppercase and lowercase letters, numbers, and most special characters (the only characters not allowed are the forward slash and the NULL character). Linux filenames can contain spaces as well as foreign language characters. However, the filename should always start with a letter or number.
Here's an example of some valid Linux filenames:
testing
A test file.txt
4_data_points
my.long.program.name
Another feature of Linux filenames that is different from Windows is hidden files. Hidden files don't appear in normal file listings. In the Windows world, you use file properties to make a file hidden from normal viewing. When a normal user displays the directory listing, the hidden files don't appear, but when an administrator displays the directory listing, the hidden files magically appear in the output. This feature helps protect important system files from being accidentally deleted or overwritten.
Linux doesn't use file properties to make a file hidden. Instead, it uses the filename. Filenames that start with a period are considered hidden files in that they don't appear in a normal listing when you use the ls
command and aren't displayed when you're using a graphical file manager tool. You can, however, use options to display hidden files with the ls
command, discussed later in the “File and Directory Listing” section.
To display hidden files with the ls
command, you need to include the –a
command‐line parameter.
$ ls
mydata.txt mydirectory myprog myprog.c
$ ls -a
. .bash_logout mydata.txt myprog.c .viminfo
.. .bashrc mydirectory .profile .zcompdump
.bash_history .cache myprog .sudo_as_admin_successful .zshrc
$
In this example, the ls
command by itself shows only a few files in the directory. Adding the –a
parameter shows all of the hidden files in the directory. Many Linux programs store any settings that you make as a hidden file in each users' Home directory.
Linux has to keep track of a lot of information for each file on the system. The way it does that is by using index nodes, also called inodes. The Linux operating system creates an inode for each file on the system to store the file properties. The inodes are hidden from view, and only the operating system can access them. Each inode is also assigned a number, called the inode number. This is what Linux uses to reference the file, not the filename. The inode numbers are unique on each physical disk partition.
Linux also creates a table on each disk partition, called the inode table. The inode table contains a listing that matches each inode number assigned to each file on the disk partition. As you create and delete files, Linux automatically updates the inode table behind the scenes. However, if the system should abruptly shut down (such as due to a power failure), the inode table can become corrupt. Fortunately for us, there are utilities that can help reorganize the inode table and help prevent data loss. Unfortunately, though, if the inode table does become unrepairable, you won't be able to access the files on the disk partition, even if the actual files are still there!
To view the inode number for files, use the ‐i
option in the ls
command. You can combine it with the ‐l
option to produce a long listing to show more details about the file:
$ ls -il
total 36
263784 -rw-rw-r-- 2 sysadmin sysadmin 71 Dec 19 13:17 copy.c
262804 -rw-rw-r-- 1 sysadmin sysadmin 15 Dec 19 13:26 mydata.txt
395365 drwxrwxr-x 2 sysadmin sysadmin 4096 Dec 19 13:26 mydirectory
264036 -rwxrwxr-x 1 sysadmin sysadmin 16696 Dec 19 13:18 myprog
263784 -rw-rw-r-- 2 sysadmin sysadmin 71 Dec 19 13:17 myprog.c
$
The first number displayed in the long listing is the inode number for each file. Notice that the inode number for the myprog.c
file is the same as the copy.c
file. That means there is a hard link between those two files. The hard link points to the same physical disk location as the original file, so the inode numbers are the same. We'll look at hard links more closely in the “Linking Files” section.
The most basic feature of the shell is the ability to see what files are available on the system. The list command (ls
) is the tool that helps do that. This section describes the ls
command and all of the options available to format the information it can provide.
The ls
command at its most basic form displays the files and directories located in your current directory in the command line (also called the working directory).
sysadmin@ubuntu-server:~$ ls
copy.c mydata.txt mydirectory myprog myprog.c
sysadmin@ubuntu-server:~$
Notice that the ls
command produces the listing in alphabetical order (in columns rather than rows). If you're using a terminal emulator that supports color, the ls
command may also show different types of entries in different colors. The LS_COLORS
environment variable controls this feature. Different Linux distributions set this environment variable depending on the capabilities of the terminal emulator.
If you don't have a color terminal emulator, you can use the ‐F
parameter with the ls
command to easily distinguish files from directories. Using the ‐F
parameter produces the following output:
sysadmin@ubuntu-server:~$ ls -F
copy.c mydata.txt mydirectory/ myprog* myprog.c
sysadmin@ubuntu-server:~$
The ‐F
parameter now flags the directories with a forward slash to help identify them in the listing. Similarly, it flags executable files (like the myprog
file in the previous code snippet) with an asterisk to help you find the files that can be run on the system easier.
The basic ls
command can be somewhat misleading. It shows the files and subdirectories contained in the current directory, but not the hidden files.
The ‐R
parameter is another command ls
parameter to use. It performs a recursive listing, showing files that are contained within subdirectories in the current directory. If you have lots of subdirectories, this can be quite a long listing. Here's a simple example of what the ‐R
parameter produces:
sysadmin@ubuntu-server:~$ ls -F -R
.:
copy.c mydata.txt mydirectory/ myprog* myprog.c
./mydirectory:
file1 file2 file3
sysadmin@ubuntu-server:~$
Notice that, first, the ‐R
parameter shows the contents of the current directory, which includes a subdirectory (mydirectory
). Following that, it traverses all of the subdirectories, showing if any files are contained within each subdirectory. The mydirectory
subdirectory shows three files (file1
, file2
, and file3
). If there had been further subdirectories within the mydirectory
subdirectory, the ‐R
parameter would have continued to traverse those as well. As you can see, for large directory structures, this can become quite a large output listing.
As you can see in the basic listings, the ls
command doesn't produce a whole lot of information about each file by default. For listing additional information, another popular parameter is ‐l
. The ‐l
parameter produces a long listing format, providing more information about each file in the directory.
sysadmin@ubuntu-server:~$ ls -l
total 36
-rw-rw-r-- 2 sysadmin sysadmin 71 Dec 19 13:17 copy.c
-rw-rw-r-- 1 sysadmin sysadmin 15 Dec 19 13:26 mydata.txt
drwxrwxr-x 2 sysadmin sysadmin 4096 Dec 19 13:51 mydirectory
-rwxrwxr-x 1 sysadmin sysadmin 16696 Dec 19 13:18 myprog
-rw-rw-r-- 2 sysadmin sysadmin 71 Dec 19 13:17 myprog.c
sysadmin@ubuntu-server:~$
The long listing format lists each file and directory contained in the directory on a single line. Besides the filename, it shows additional useful information. The first line in the output shows the total number of blocks contained within the directory. Following that, each line contains the following information about each file (or directory):
d
), file (‐
), character device (c
), or block device (b
)The ‐l
parameter is a powerful tool to have. Armed with this information, you can see just about any information you need for any file or directory on the system.
There are lots of parameters for the ls
command that can come in handy as you do file management. If you use the man
command for ls
, you'll see several pages of available parameters for you to use to modify the output of the ls
command.
The ls
command uses two types of command‐line parameters:
The single‐letter parameters are always preceded by a single dash. Full‐word parameters are more descriptive and are preceded by a double dash. Many parameters have both a single‐letter and full‐word version, while some have only one type. Table 7.2 lists some of the more popular parameters that'll help you out with using the ls
command.
TABLE 7.2: Some Popular ls
Command Parameters
SINGLE LETTER | FULL WORD | DESCRIPTION |
---|---|---|
‐a |
‐‐all |
Don't ignore entries starting with a period. |
‐A |
‐‐almost‐all |
Don't list the . and .. files. |
‐‐author |
Print the author of each file. | |
‐b |
‐‐escape |
Print octal values for nonprintable characters. |
‐‐block‐size=size |
Calculate the block sizes using size‐byte blocks. | |
‐B |
‐‐
ignore‐backups |
Don't list entries with the tilde (~) symbol (used to denote backup copies). |
‐c |
Sort by time of last modification. | |
‐C |
List entries by columns. | |
‐‐color=when |
When to use colors (always, never, or auto). | |
‐d |
‐‐directory |
List directory entries instead of contents, and don't dereference symbolic links. |
‐F |
‐‐classify |
Append file‐type indicator to entries. |
‐‐file‐type |
Only append file‐type indicators to some filetypes (not executable files). | |
‐‐format=word |
Format output as across, commas, horizontal, long, single‐column, verbose, or vertical. | |
‐g |
List full file information except for the file's owner. | |
‐‐group‐directories‐first |
List all directories before files. | |
‐G |
‐‐no‐group |
In a long listing, don't display group names. |
‐h |
‐‐human‐readable |
Print sizes using K for kilobytes, M for megabytes, and G for gigabytes. |
‐‐si |
Same as ‐h , but use powers of 1000 instead of 1024. | |
‐i |
‐‐inode |
Display the index number (inode) of each file. |
‐l |
Display the long listing format. | |
‐L |
‐‐dereference |
Show information for the original file for a linked file. |
‐n |
‐‐numeric‐uid‐gid |
Show numeric userid and groupid instead of names. |
‐o |
In a long listing, don't display owner names. | |
‐r |
‐‐reverse |
Reverse the sorting order when displaying files and directories. |
‐R |
‐‐recursive |
List subdirectory contents recursively. |
‐s |
‐‐size |
Print the block size of each file. |
‐S |
‐‐sort=size |
Sort the output by file size. |
‐t |
‐‐sort=time |
Sort the output by file modification time. |
‐u |
Display the last access time instead of last modification time for all files. | |
‐U |
‐‐sort=none |
Don't sort the output listing. |
‐v |
‐‐sort=version |
Sort the output by file version. |
‐x |
List entries by line instead of columns. | |
‐X |
‐‐sort=extension |
Sort the output by file extension. |
You can use more than one parameter at a time if you want. The double‐dash parameters must be listed separately, but the single‐dash parameters can be combined into a string behind the dash. A common combination to use is the ‐a
parameter to list all files, the ‐i
parameter to list the inode for each file, the ‐l
parameter to produce a long listing, and the ‐s
parameter to list the block size of the files. Combining all of these parameters creates the easy‐to‐remember ‐sail
parameter.
sysadmin@ubuntu-server:~$ ls -sail
total 132
265238 4 drwxr-xr-x 4 sysadmin sysadmin 4096 Dec 19 13:45 .
262145 4 drwxr-xr-x 3 root root 4096 Nov 4 18:56 ..
262184 8 -rw------- 1 sysadmin sysadmin 4957 Dec 19 13:27 .bash_history
265244 4 -rw-r--r-- 1 sysadmin sysadmin 220 Feb 25 2020 .bash_logout
265240 4 -rw-r--r-- 1 sysadmin sysadmin 3771 Feb 25 2020 .bashrc
265296 4 drwx------ 2 sysadmin sysadmin 4096 Nov 4 18:57 .cache
263784 4 -rw-rw-r-- 2 sysadmin sysadmin 71 Dec 19 13:17 copy.c
262804 4 -rw-rw-r-- 1 sysadmin sysadmin 15 Dec 19 13:26 mydata.txt
395365 4 drwxrwxr-x 2 sysadmin sysadmin 4096 Dec 19 13:51 mydirectory
264036 20 -rwxrwxr-x 1 sysadmin sysadmin 16696 Dec 19 13:18 myprog
263784 4 -rw-rw-r-- 2 sysadmin sysadmin 71 Dec 19 13:17 myprog.c
265242 4 -rw-r--r-- 1 sysadmin sysadmin 807 Feb 25 2020 .profile
265302 0 -rw-r--r-- 1 sysadmin sysadmin 0 Nov 4 18:58 .sudo_as_admin_successful
263813 12 -rw------- 1 sysadmin sysadmin 10056 Dec 19 13:17 .viminfo
262207 48 -rw-rw-r-- 1 sysadmin sysadmin 49006 Nov 23 13:36 .zcompdump
265288 4 -rw-rw-r-- 1 sysadmin sysadmin 29 Nov 23 13:36 .zshrc
sysadmin@ubuntu-server:~$
Besides the normal ‐l
parameter output information, you'll see two additional numbers added to each line. The first number in the listing is the file or directory inode number. The second number is the block size of the file.
In Linux, there are a few commands that work for both files and directories, and some that work only for directories. This section discusses the commands that can work only with directories.
There's not much to creating a new directory in Linux; just use the mkdir
command.
sysadmin@ubuntu-server:~$ mkdir dir3
sysadmin@ubuntu-server:~$ ls -il
total 40
263784 -rw-rw-r-- 2 sysadmin sysadmin 71 Dec 19 13:17 copy.c
395394 drwxrwxr-x 2 sysadmin sysadmin 4096 Dec 19 14:09 dir3
262804 -rw-rw-r-- 1 sysadmin sysadmin 15 Dec 19 13:26 mydata.txt
395365 drwxrwxr-x 2 sysadmin sysadmin 4096 Dec 19 13:51 mydirectory
264036 -rwxrwxr-x 1 sysadmin sysadmin 16696 Dec 19 13:18 myprog
263784 -rw-rw-r-- 2 sysadmin sysadmin 71 Dec 19 13:17 myprog.c
sysadmin@ubuntu-server:~$
The system creates the new directory and assigns it a new inode number.
Removing directories can be tricky, but there's a reason for that. There are lots of opportunities for bad things to happen when you start deleting directories. Bash tries to protect us from accidental catastrophes as much as possible. The basic command for removing a directory is rmdir
.
sysadmin@ubuntu-server:~$ rmdir dir3
sysadmin@ubuntu-server:~$ rmdir mydirectory
rmdir: failed to remove 'mydirectory': Directory not empty
sysadmin@ubuntu-server:~$
By default, the rmdir
command works only for removing empty directories. Since there is a file in the mydirectory
directory, the rmdir
command refuses to remove it. You can remove nonempty directories using the ‐‐ignore‐fail‐on‐non‐empty
parameter.
You can also use the rm
command when handling directories.
If you try using it without parameters, as with files, you'll be somewhat disappointed.
sysadmin@ubuntu-server:~$ rm mydirectory
rm: cannot remove 'mydirectory': Is a directory
sysadmin@ubuntu-server:~$
However, if you really want to remove a directory, you can use the ‐r
parameter to recursively remove the files in the directory and then the directory itself.
sysadmin@ubuntu-server:~$ rm -r mydirectory
sysadmin@ubuntu-server:~$
Bash provides lots of commands for manipulating files on the Linux filesystem. This section walks through the basic commands you will need to work with files from the CLI for all your file‐handling needs.
Every once in a while, you will run into a situation where you need to create an empty file. Sometimes applications expect a log file to be present before they can write to it. In these situations, you can use the touch
command to easily create an empty file.
sysadmin@ubuntu-server:~$ touch test1
sysadmin@ubuntu-server:~$ ls -il test1
263320 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:22 test1
sysadmin@ubuntu-server:~$
The touch
command creates the new file you specify and assigns your username as the file owner.
Notice that the file size is zero, since the touch
command just created an empty file. The touch
command can also be used to change the access and modification times on an existing file without changing the file contents.
sysadmin@ubuntu-server:~$ touch test1
sysadmin@ubuntu-server:~$ ls -il test1
263320 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:30 test1
sysadmin@ubuntu-server:~$
The modification time of test1
is now updated from the original time. If you want to change only the access time, use the ‐a
parameter. To change only the modification time, use the ‐m
parameter.
By default, touch
uses the current time. You can specify the time by using the ‐t
parameter with a specific timestamp value.
sysadmin@ubuntu-server:~$ touch -t 202112251200 test1
sysadmin@ubuntu-server:~$ ls -l test1
-rw-rw-r-- 1 sysadmin sysadmin 0 Dec 25 2021 test1
sysadmin@ubuntu-server:~$
Now the modification time for the file is set to a date significantly in the future from the current time.
Copying files and directories from one location in the filesystem to another is a common practice for system administrators. The cp
command provides this feature.
In its most basic form, the cp
command uses two parameters—the source object and the destination object.
cp source destination
When both the source
and destination
parameters are filenames, the cp
command copies the source file to a new file with the filename specified as the destination. The new file acts like a new file, with an updated file creation and last modified times.
sysadmin@ubuntu-server:~$ cp test1 test2
sysadmin@ubuntu-server:~$ ls -il test*
263320 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 25 2021 test1
263324 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:27 test2
sysadmin@ubuntu-server:~$
The new file test2
shows a different inode number, indicating that it's a completely new file. You'll also notice that the modification time for the test2
file shows the time that it was created.
If the destination file already exists, the cp
command will overwrite the existing file by default. That can be somewhat dangerous. By adding the ‐i
parameter, the cp
command will prompt you to answer whether you want to overwrite the destination file.
sysadmin@ubuntu-server:~$ cp -i test1 test2
cp: overwrite 'test2'? y
sysadmin@ubuntu-server:~$
If you don't answer y
, the file copy will not proceed.
You can also copy a file to an existing directory by specifying the directory as the destination.
sysadmin@ubuntu-server:~$ mkdir dir1
sysadmin@ubuntu-server:~$ cp test1 dir1
sysadmin@ubuntu-server:~$ ls -il dir1
total 0
395391 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:31 test1
sysadmin@ubuntu-server:~$
The new file is now under the dir1
directory, using the same filename as the original.
These examples all used relative pathnames, but you can just as easily use the absolute pathname for both the source and destination objects. To copy a file to the current directory you're in, you can use the dot symbol.
sysadmin@ubuntu-server:~$ cp /home/sysadmin/dir1/test1 .
sysadmin@ubuntu-server:~$ ls -il test1
263320 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:33 test1
sysadmin@ubuntu-server:~$
As with most commands, the cp
command has a few command‐line parameters to help you. These are shown in Table 7.3.
TABLE 7.3: The cp
Command Parameters
PARAMETER | DESCRIPTION |
---|---|
‐a |
Archive files by preserving their attributes. |
‐b |
Create a backup of each existing destination file instead of overwriting it. |
‐d |
Preserve. |
‐f |
Force the overwriting of existing destination files without prompting. |
‐i |
Prompt before overwriting destination files. |
‐l |
Create a file link instead of copying the files. |
‐p |
Preserve file attributes if possible. |
‐r |
Copy files recursively. |
‐R |
Copy directories recursively. |
‐s |
Create a symbolic link instead of copying the file. |
‐S |
Override the backup feature. |
‐u |
Copy the source file only if it has a newer date and time than the destination (update). |
‐v |
Verbose mode, explaining what's happening. |
‐x |
Restrict the copy to the current filesystem. |
Use the ‐p
parameter to preserve the file access or modification times of the original file for the copied file.
sysadmin@ubuntu-server:~$ cp -p test1 test3
sysadmin@ubuntu-server:~$ ls -il test*
263320 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:33 test1
263324 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:30 test2
263725 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:33 test3
sysadmin@ubuntu-server:~$
Now, even though the test3
file is a completely new file, it has the same timestamps as the original test1
file.
The ‐R
parameter is extremely powerful. It allows you to recursively copy the contents of an entire directory in one command.
sysadmin@ubuntu-server:~$ cp -R dir1 dir2
sysadmin@ubuntu-server:~$ ls -l dir*
dir1:
total 0
-rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:31 test1
dir2:
total 0
-rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:38 test1
sysadmin@ubuntu-server:~$
Now dir2
is a complete copy of dir1
.
You can also use wildcard characters in your cp
commands.
sysadmin@ubuntu-server:~$ cp test* dir2
sysadmin@ubuntu-server:~$ ls -al dir2
total 8
drwxrwxr-x 2 sysadmin sysadmin 4096 Dec 19 14:42 .
drwxr-xr-x 5 sysadmin sysadmin 4096 Dec 19 14:38 ..
-rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:42 test1
-rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:42 test2
-rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:42 test3
sysadmin@ubuntu-server:~$
This command copied all of the files that started with test
to dir2
.
You may have noticed that a couple of the parameters for the cp
command referred to linking files. This is a pretty cool option available in the Linux filesystems. If you need to maintain two (or more) copies of the same file on the system, instead of having separate physical copies, you can use one physical copy and multiple virtual copies, called links. A link is a placeholder in a directory that points to the real location of the file. There are two different types of file links in Linux.
The hard link creates a separate file that contains information about the original file and where to locate it. When you reference the hard link file, it's just as if you're referencing the original file.
sysadmin@ubuntu-server:~$ cp -l test1 test4
sysadmin@ubuntu-server:~$ ls -il test*
263320 -rw-rw-r-- 2 sysadmin sysadmin 54 Dec 19 14:33 test1
263324 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:30 test2
263725 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:33 test3
263320 -rw-rw-r-- 2 sysadmin sysadmin 54 Dec 19 14:33 test4
sysadmin@ubuntu-server:~$
The ‐l
parameter created a hard link for the test1
file called test4
. The file listing shows that the inode numbers of both the files are the same, indicating that, in reality, they are both the same file. Also notice that the link count (the third item in the listing) now shows that both files have two links.
Conversely, the ‐s
parameter creates a symbolic, or soft, link.
sysadmin@ubuntu-server:~$ cp -s test1 test5
sysadmin@ubuntu-server:~$ ls -il test*
263320 -rw-rw-r-- 2 sysadmin sysadmin 54 Dec 19 14:33 test1
263324 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:30 test2
263725 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:33 test3
263320 -rw-rw-r-- 2 sysadmin sysadmin 54 Dec 19 14:33 test4
263840 lrwxrwxrwx 1 sysadmin sysadmin 5 Dec 19 14:46 test5 -> test1
sysadmin@ubuntu-server:~$
There are a couple of things to notice in the file listing, First, you'll notice that the new test5
file has a different inode number than the test1
file, indicating that the Linux system treats it as a separate file. Second, the file size is different. A linked file needs to store only information about the source file, not the actual data in the file. The filename area of the listing shows the relationship between the two files.
Instead of using the cp
command to link files, you can use the ln
command. By default, the ln
command creates hard links. If you want to create a soft link, you'll still need to use the ‐s
parameter.
Instead of copying the linked file, you can create another link to the original file. You can have many links to the same file with no problems. However, you also don't want to create soft links to other soft‐linked files. This creates a chain of links that not only can be confusing but also can be easily broken, causing all sorts of problems.
In the Linux world, renaming files is called moving. The mv
command is available to move both files and directories to another location.
sysadmin@ubuntu-server:~$ mv test2 test6
sysadmin@ubuntu-server:~$ ls -il test*
263320 -rw-rw-r-- 2 sysadmin sysadmin 54 Dec 19 14:48 test1
263725 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:33 test3
263320 -rw-rw-r-- 2 sysadmin sysadmin 54 Dec 19 14:48 test4
263840 lrwxrwxrwx 1 sysadmin sysadmin 5 Dec 19 14:48 test5 -> test1
263324 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:30 test6
sysadmin@ubuntu-server:~$
Notice that moving the file changed the filename but kept the same inode number and the timestamp value. Moving a file with soft links is a problem.
sysadmin@ubuntu-server:~$ mv test1 test8
sysadmin@ubuntu-server:~$ ls -il test*
263725 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:33 test3
263320 -rw-rw-r-- 2 sysadmin sysadmin 54 Dec 19 14:48 test4
263840 lrwxrwxrwx 1 sysadmin sysadmin 5 Dec 19 14:48 test5 -> test1
263324 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:30 test6
263320 -rw-rw-r-- 2 sysadmin sysadmin 54 Dec 19 14:48 test8
sysadmin@ubuntu-server:~$
The test4
file that uses a hard link still uses the same inode number, which is perfectly fine. However, the test5
file now points to an invalid file, and it is no longer a valid link. If your terminal supports colors, it is most likely displayed in a red font.
You can also use the mv
command to move directories.
sysadmin@ubuntu-server:~$ mv dir2 dir4
sysadmin@ubuntu-server:~$
The entire contents of the directory are unchanged. The only thing that changes is the name of the directory.
Most likely at some point in your Linux career, you'll want to be able to delete existing files. Whether it's to clean up a filesystem or to remove a software package, there's always opportunities to delete files.
In the Linux world, deleting is called removing. The command to remove files in Bash is rm
. The basic format of the rm
command is pretty simple.
sysadmin@ubuntu-server:~$ rm -i test3
rm: remove regular empty file 'test3'? y
sysadmin@ubuntu-server:~$ ls -il test*
263320 -rw-rw-r-- 2 sysadmin sysadmin 54 Dec 19 14:48 test4
263840 lrwxrwxrwx 1 sysadmin sysadmin 5 Dec 19 14:48 test5 -> test1
263324 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:30 test6
263320 -rw-rw-r-- 2 sysadmin sysadmin 54 Dec 19 14:48 test8
sysadmin@ubuntu-server:~$
Just as with the rmdir
command, you can use the ‐i
parameter to prompt you for safety. There's no trashcan in the CLI like there often is in the graphical desktop environment. Once you remove a file, it's gone forever.
There are a few features unique to Linux that you'll need to be aware of when working with files. This section walks you through these features.
The ls
, cp
, mv
, and rm
commands are handy, but specifying a single file or directory name in the commands makes them somewhat clunky to work with on the Linux command line. If you want to work with more than one file or directory, you need to use a technique the Linux world calls globbing.
Globbing is basically the use of wildcard characters to represent one or more characters in a file or directory name. That feature allows us to specify a pattern for Linux to match multiple files or directories against. There are two basic globbing characters that you can use.
The question mark is a stand‐in character to represent any single character to match in the filename. For example, you can specify the filename file?.txt
in a rm
command to remove any file that starts with file
, followed by one character, and ending with .txt
. Here's an example:
sysadmin@ubuntu-server:~$ ls -il file*
265217 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 15:05 file.txt
265214 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 15:05 file11.txt
263855 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 15:04 file1.txt
265146 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 15:04 file2.txt
sysadmin@ubuntu-server:~$ rm file?.txt
sysadmin@ubuntu-server:~$ ls -il file*
265217 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 15:05 file.txt
265214 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 15:05 file11.txt
sysadmin@ubuntu-server:~$
The rm
command uses the glob file?.txt
as the parameter. Linux looks for any file in the directory that matches the pattern to remove. Two files, file1.txt
and file2.txt
, match the pattern. However, the file11.txt
file doesn't match the pattern, as there are two characters between the file
and .txt
parts of the filename, and the file.txt
file doesn't match the pattern, as there aren't any characters between the file
and .txt
parts of the filename.
You use the asterisk glob character to match zero or more characters in the filename.
sysadmin@ubuntu-server:~$ rm file*
sysadmin@ubuntu-server:~$ ls -il file*
ls: cannot access 'file*': No such file or directory
sysadmin@ubuntu-server:~$
By using the asterisk, Linux matched all of the files, even the file.txt
file! You can use the asterisk in any list, copy, move, or delete operation in the command line.
Another issue you may run into with Linux is files or directories that contain spaces in their names. This is perfectly legal in Linux, but it can cause headaches when you're working from the command line.
If you try to reference a file or directory that contains a space in the filename, you'll get several error messages:
sysadmin@ubuntu-server:~$ ls -l long*
-rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 15:09 'long file name.txt'
sysadmin@ubuntu-server:~$ rm long file name.txt
rm: cannot remove 'long': No such file or directory
rm: cannot remove 'file': No such file or directory
rm: cannot remove 'name.txt': No such file or directory
sysadmin@ubuntu-server:~$
The problem is that, by default, the rm
command uses a space to indicate the end of a filename, so it thinks you're trying to remove three separate files—
long
, file
, and name.txt
!
To get around that, you need to use quoting, which places quotes around any filenames that contain spaces:
sysadmin@ubuntu-server:~$ rm 'long file name.txt'
sysadmin@ubuntu-server:~$ ls -il long*
ls: cannot access 'long*': No such file or directory
sysadmin@ubuntu-server:~$
You can use either single or double quotes around the filename, as long as you use the same type on both ends of the filename.
One last thing to watch out for when using the Linux file handling command‐line commands is the case of any file or directory names that you're working with. Linux is a case‐sensitive operating system, so files and directories can have both uppercase and lowercase letters in the names. Likewise, as you're working with the files, make sure you reference the correct format of the files or directory names.
sysadmin@ubuntu-server:~$ ls -il *.txt
263725 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 15:13 file1.txt
263855 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 15:12 File1.txt
sysadmin@ubuntu-server:~$ rm file1.txt
sysadmin@ubuntu-server:~$ ls -il *.txt
263855 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 15:12 File1.txt
sysadmin@ubuntu-server:~$
This is a good example of when using filename globbing can come in handy. If you're not sure of the case of a character, you can use the question mark to represent any character of any case.
$ rm ?ile1.txt
$
This command will remove both the file1.txt
and File1.txt
files.
With so many files stored on the Linux system, it can often become difficult to find the files you're looking for. Fortunately, Linux provides a few different file‐searching features to help. This section looks at the ones you'll most likely use as a Linux systems administrator.
You can use the which
command to find where programs and utilities are stored. This can come in handy if you have two versions of a program installed on your system or if you're not sure if a command is built into the Linux shell or supplied as a separate utility. The format for using the which
command is pretty simple.
$ which touch
/usr/bin/touch
$
The output of the which
command shows the full path to where the command is stored on the system. If you have two versions of a program on the system, the which
command shows which one will run when you type it at the command line. If you need to use a different version, you have to use the full path to the program file.
Many Linux distributions contain the locate
command by default. If your distribution doesn't, it's usually found in the mlocate
software package.
The locate
command uses a database that keeps track of the location of files on the system. When you use the locate
command, it searches the database to find the requested file. This process is often quicker than trying to search through all of the files on the filesystem.
The key to the locate
command is the information in the database. It can only find files that have been indexed into the database. The information in the database is updated by the appropriately named updatedb
program. The Linux system runs the updatedb
program in background mode on a regular basis to update the file database with any new files stored on the system.
Be careful when using the locate
command, though; you may get more than you bargained for! Here's an example:
sysadmin@ubuntu-server:~$ locate touch
/snap/core18/1932/bin/touch
/snap/core18/1932/lib/udev/hwdb.d/70-touchpad.hwdb
/snap/core18/1932/lib/udev/rules.d/70-touchpad.rules
/snap/core18/1932/usr/bin/touch
/snap/core18/1944/bin/touch
/snap/core18/1944/lib/udev/hwdb.d/70-touchpad.hwdb
/snap/core18/1944/lib/udev/rules.d/70-touchpad.rules
/snap/core18/1944/usr/bin/touch
/usr/bin/touch
…
sysadmin@ubuntu-server:~$
The locate
command returns any file that contains the word touch in the filename! You'll need to filter through the results to find the file that you're looking for.
Another downside to the locate
command is that it can't find any newly added files until the next running of the updatedb
program. Some Linux systems run the updatedb
program on a regular basis, every few minutes, while others schedule it to run only once or twice a day. How often you need to run it depends on how often you get new files on your Linux system and how quickly you'd need to find them.
The whereis
command is similar to the which
command in that it looks for a specific occurrence of the file you're searching for. However, it looks only in binary file directories, library directories, and documentation directories, so that helps speed up the search process some. This is great for finding not only commands but the documentation files that go along with them.
sysadmin@ubuntu-server:~$ whereis touch
touch: /usr/bin/touch /usr/share/man/man1/touch.1.gz
sysadmin@ubuntu-server:~$
In this example, the whereis
command returned the location of the touch
program file and the location of the manual page associated with the touch
program.
The last resort to finding files on your Linux system is the find
command. It does a physical search through the virtual directory tree looking for the specified file. As you can imagine, the wider the search area, the longer it will take for the find
command to return an answer. You specify the search area as the first parameter on the find
command line.
sysadmin@ubuntu-server:~$ find /home/sysadmin -name myprog -print
/home/sysadmin/myprog
sysadmin@ubuntu-server:~$
This example restricts the find
command to looking in the /home/sysadmin
directory structure for the file named myprog
. The ‐name
option specifies the filename to look for, and the ‐print
command tells the find
command to display the results.
What makes the find
command so versatile is that it can find files based on lots of different criteria besides just the filename, such as the creation time, the file owner, the file size, or even file permissions. For example, you can use the find
command to look for all files over 1 MB on your filesystem. Table 7.4 shows some of the options you can use in the find
command.
TABLE 7.4: Useful find
Command Options
OPTION | DESCRIPTION |
---|---|
‐amin n
|
File was last accessed n minutes ago. |
‐atime n
|
File was last accessed n days ago. |
‐ctime n
|
File was last changed n minutes ago. |
‐inum n
|
Match the file inode number to the number specified. |
‐name pattern
|
Match the file name to the pattern specified. |
‐perm pattern
|
Match the file permissions to the pattern specified. |
‐size n
|
Match the file size to the amount specified. |
‐user name
|
Match the file owner to the name specified. |
You can also use special modifiers on the find
options, such as a plus sign for “greater than” or a minus sign for “less than.” For example, to list all of the files larger than 5,000 characters, you'd use the following:
sysadmin@ubuntu-server:~$ find . -size +5000c -print
./.zcompdump
./myprog
./.viminfo
sysadmin@ubuntu-server:~$
The +5000c
parameter tells the find
command to look for files in the current directory that are more than 5,000 characters in size.
Storing data can get ugly. The more data you need to store, the more disk space it requires. While disk sizes are getting larger these days, there is still a limit to how much space you have. To help with that, you can use some Linux file archiving tools to compress data files for storage and sharing. This section takes a look at how Linux handles compressing and archiving both files and directories.
If you've done any work in the Microsoft Windows world, no doubt you've used zip files. The PKZip compression utility became the de facto way to compress data and executable files in Windows, so much so that Microsoft eventually incorporated it into the Windows operating system, starting with XP, as the compressed directories feature. Compressed directories allow you to easily compress large files or a large group of files into a smaller file that takes up less space and is easier to copy to another location.
Linux provides a few different tools you can use to compress files to save space. While this may sound great, it can sometimes lead to confusion and chaos when trying to download and extract Linux files from the Internet. Table 7.5 lists the different file compression utilities available in Linux.
TABLE 7.5: Linux File Compression Utilities
UTILITY | FILE EXTENSION | DESCRIPTION |
---|---|---|
bzip2 |
.bz2 |
Uses the Burrows‐Wheeler block sorting text compression algorithm and human coding |
compress |
.Z |
Original Unix file compression utility, but starting to fade away into obscurity |
gzip |
.gz |
The GNU Project's compression utility; uses the open‐source Lempel‐Ziv_Welch coding |
xz |
.xz |
A general‐purpose compression utility gaining in popularity |
zip |
.zip |
The Unix version of the PKZip program for Windows |
The compress utility can work with files compressed on standard Unix systems, but it's not often installed by default on Linux systems. If you download a file with a .Z
extension, you can usually install the compress package from the distribution software repository. The zip
utility creates compressed directories that can be extracted on Windows systems, but it's not the best compression algorithm to use if you're keeping the files on a Linux system.
The gzip
utility is the most popular compression tool used in Linux. It is a creation of the GNU Project, in its attempt to create a free version of the original Unix compress utility. This package includes three main files.
gzip
for compressing filesgzcat
for displaying the contents of compressed text filesgunzip
for uncompressing filesThe gzip
command compresses the file you specify on the command line. You can also specify more than one filename or even use wildcard characters to compress multiple files at once.
$ gzip my*
$
This gzip
command compresses every file in the directory that starts with my
.
Although the gzip
command not only can compress data but also archive the data into a single file, it's not the standard utility used for archiving large amounts of data in the Unix and Linux worlds. By far the most popular archiving tool used in Unix and Linux is the tar
command.
The tar
command was originally used to back up files to a tape device for archiving. However, it can also write the output to a file, which has become a popular way to bundle data for distribution in Linux. It's common to see source code files bundled into a tar
archive file (affectionately called a tarball) for distribution.
The following is the format of the tar
command:
tar function [options] object1 object2
The function
parameter defines what the tar
command should do, as shown in Table 7.6.
TABLE 7.6: The tar
Command Functions
FUNCTION | DESCRIPTION |
---|---|
‐a |
Appends an existing tar archive file to another tar archive file |
‐c |
Creates a new tar archive file |
‐d |
Checks the differences between a tar archive file and the filesystem files |
‐r |
Appends files to an existing tar archive file |
‐t |
Lists the contents of an existing tar archive file |
‐u |
Appends files to an existing tar archive file that are newer than a file with the same in the archive |
‐x |
Extracts files from an existing tar archive file |
Each function uses one or more options to define a specific behavior for the tar archive file. Table 7.7 shows the options that you can use with the tar
command.
TABLE 7.7: The tar
Command Options
OPTION | DESCRIPTION |
---|---|
‐C dir
|
Changes to the specified directory |
‐f file
|
Outputs results to the file (or device) specified |
‐j |
Redirects output to the bzip2 command for compression |
‐P |
Preserves all file permissions |
‐v |
Lists files as they are processed |
‐z |
Redirects the output to the gzip command for compressions |
While the combination of several functions along with several options seems like an impossible task to remember, in reality you'll find yourself just using a handful of combinations to do common tasks. The following section takes a look at the more common archiving scenarios that you'll run into.
Normally, there are just three basic things you'll need to do with the tar
command.
This helps narrow down the function and option features that you need to remember for the tar
command.
To start, you can create a new archive file using this command:
tar –cvf test.tar test/ test2/
This command creates an archive file called test.tar
containing the contents of both the test
directory and the test2
directory. The ‐v
option is a nice feature in that it displays the files as they are added to the archive file.
Next, to display the contents of a tarball file, you just use this command:
tar –tf test.tar
The ‐t
function lists the contents of the tarball to the standard output by default, which is your monitor. The files aren't extracted, just listed.
Finally, to extract the files contained in a tarball, you'll use this command:
tar –xvf test.tar
It extracts the contents of the tar file test.tar
into the current directory. If the tar file was created from a directory structure, the entire directory structure is re‐created starting at the current directory.
As you can see, using the tar
command is a simple way to create archive files of entire directory structures. That's why this has become a common method for distributing source code files for open source applications in the Linux world!
What command should you use to determine those file types?/usr/bin/grep
/usr/bin/zcat
/etc/hosts
~/.bashrc
ls
command is how to list the contents of directories from the command prompt. While there are lots of parameters associated with the ls
command, you'll soon find yourself using just a handful of them to view the information that you need.
/share/HR/employees.txt
. However, the user doesn't know who owns the file to ask for permission to access the file. What command and parameters should you use to determine the owner of the file?which
, locate
, and whereis
commands can be useful for general searches, but the find
command allows you to customize your search by specifying specific file or directory properties to look for.
gzip
family of commands is a popular option. For archiving multiple files into a single file, the tar
command is common. You can also compress a tar archive file to facilitate moving it to off‐site storage.
/Engineering
directory and compress it?.