Chapter 7
Exploring Linux File Management

One of the most important functions of working in the Linux command‐line interface is handling files and directories. Just about every administrative task you perform on your Linux system requires working with some type of file. This chapter dives into the topic of handling files and directories from the Linux command line.

Filesystem Navigation

For most Linux distributions, when you start a shell session, you are placed in your user Home directory. Most often, you will need to break out of your Home directory to get to other areas in the Linux system. This section describes how to do that using command‐line commands. Before that, though, is a short tour of just what the Linux filesystem looks like.

The Linux Filesystem

If you're new to the Linux system, you may be confused by how it references files and directories, especially if you're used to the way that the Microsoft Windows operating system does that. Before exploring the Linux system, it helps to have an understanding of how it's laid out.

The first difference you'll notice is that Linux does not use drive letters in pathnames. In the Windows world, the physical drives installed on the PC determine the pathname of the file. Windows assigns a letter to each physical disk drive, and each drive contains its own directory structure for accessing files stored on it.

For example, in Windows you may be used to seeing file paths such as this:

C:\Users\Rich\My Documents\test.doc

This indicates that the file test.doc is located in the directory My Documents, which itself is located in the directory Rich. The Rich directory is contained under the directory Users, which is located on the hard disk partition assigned the letter C (usually the first hard drive on the PC).

The Windows file path tells you exactly which physical disk partition contains the file named test.doc. If you wanted to save a file on a USB memory stick, you would click the icon for the drive assigned to the memory stick, such as E:, which automatically uses the file path E:\test.doc. This path indicates that the file is located at the root of the drive assigned the letter E, which is often assigned to the first USB storage device plugged into the PC.

This is not the method used by Linux. Linux stores files within a single directory structure, called a virtual directory. The virtual directory contains file paths from all the storage devices installed on the PC, merged into a single directory structure.

The Linux virtual directory structure contains a single base directory, called the root. Directories and files beneath the root directory are listed based on the directory path used to get to them, similar to the way Windows does it.

The tricky part about the Linux virtual directory is how it incorporates each storage device. The first hard drive installed in a Linux PC is called the root drive. The root drive contains the core of the virtual directory. Everything else builds from there.

On the root drive, Linux creates special directories called mount points. Mount points are directories in the virtual directory where you assign additional storage devices.

The virtual directory causes files and directories to appear within these mount point directories, even though they are physically stored on a different drive.

Often the system files are physically stored on the root drive, while user files are stored on a different drive, as shown in Figure 7.1.

In Figure 7.1, there are two hard drives on the PC. One hard drive is associated with the root of the virtual directory (indicated by a single forward slash). Other hard drives can be mounted anywhere in the virtual directory structure. In this example, the second hard drive is mounted at the location /home, which is where the user directories are located.

The Linux filesystem structure has evolved from the Unix file structure. Unfortunately, the Unix file structure has been somewhat convoluted over the years by different flavors of Unix. While Linux started out that way too, a push has been made to standardize the Linux directory structure, called the Linux Filesystem Hierarchy Standard (FHS). Table 7.1 lists some of the more common Linux virtual directory names defined in the FHS.

Snapshot of the Linux file structure — FIGURE 7.1 The Linux file structure

TABLE 7.1: Common Linux Directory Names

DIRECTORY	USAGE
`/`	The root of the virtual directory. Normally, no files are placed here
`/bin`	The binary directory, where many GNU user‐level utilities are stored
`/boot`	The boot directory, where boot files are stored
`/dev`	The device directory, where Linux creates device nodes
`/etc`	The system configuration files directory
`/home`	The Home directory, where Linux creates user directories
`/lib`	The library directory, where system and application library files are stored
`/media`	The media directory, a common place for mount points used for removable media
`/mnt`	The mount directory, another common place for mount points used for removable media
`/opt`	The optional directory, often used to store optional software packages
`/root`	The root user account's Home directory
`/sbin`	The system binary directory, where many GNU admin‐level utilities are stored
`/tmp`	The temporary directory, where temporary work files can be created and destroyed
`/usr`	The user‐installed software directory
`/var`	The variable directory, for files that change frequently, such as log files

When you start a new shell prompt, your session starts in your Home directory, which is a unique directory assigned to your user account. When you create a user account, the system normally assigns a unique directory for the account.

In the Windows world, you're probably used to moving around the directory structure using a graphical interface. To move around the virtual directory from a command‐line interface (CLI) prompt, you'll need to learn to use the cd command.

Traversing Directories

The change directory command (cd) is what you'll use to move your shell session to another directory in the Linux filesystem. The format of the cd command is pretty simplistic.

cd destination

The cd command may take a single parameter, destination , which specifies the directory name you want to go to. If you don't specify a destination on the cd command, it will take you to your Home directory.

The destination parameter, though, can be expressed using two different methods.

An absolute filepath
A relative filepath

The following sections describe the differences between these two methods.

ABSOLUTE FILEPATHS

You can reference a directory name within the virtual directory using an absolute filepath. The absolute filepath defines exactly where the directory is in the virtual directory structure, starting at the root of the virtual directory. It's sort of like a full name for a directory.

Thus, to reference the ssl directory that's contained within the lib directory, which in turn is contained within the usr directory, you would use the absolute filepath.

/usr/lib/ssl

With the absolute filepath, there's no doubt as to exactly where you want to go. To move to a specific location in the filesystem using the absolute filepath, you just specify the full pathname in the cd command.

sysadmin@ubuntu-server:~$ cd /usr/lib/ssl
sysadmin@ubuntu-server:/usr/lib/ssl$

On most Linux distributions, the prompt shows the current directory for the shell (the tilde represents your user Home directory). You can move to any level within the entire Linux virtual directory structure using the absolute filepath.

However, if you're just working within your own Home directory structure, often using absolute filepaths can get tedious. For example, if you're already in the directory /home/rich, it seems somewhat cumbersome to have to type the following command just to get to your Documents directory. Fortunately, there's a simpler solution.

cd /home/rich/Documents

RELATIVE FILEPATHS

Relative filepaths allow you to specify a destination directory relative to your current location, without having to start at the root. A relative filepath doesn't start with a forward slash indicating the root directory.

Instead, a relative filepath starts with either a directory name (if you're traversing to a directory under your current directory) or a special character indicating a relative location to your current directory location. The two special characters used for this are as follows:

The dot (.) to represent the current directory
The double dot (..) to represent the parent directory

The double dot character is extremely handy when you're trying to traverse a directory hierarchy. For example, if you are in the systemd directory under the etc directory and need to go to the ssl directory, also under the etc directory, you can do this:

sysadmin@ubuntu-server:/etc/systemd$ cd ../ssl
sysadmin@ubuntu-server:/etc/ssl$

The double dot character takes you back up one level to the etc directory, and then the /ssl portion takes you back down into the ssl directory. You can use as many double dot characters as necessary to move around. For example, if you are in your Home directory (/home/sysadmin) and want to go to the /etc directory, you could type the following:

sysadmin@ubuntu-server:~$ cd ../../etc
sysadmin@ubuntu-server:/etc$

Of course, in a case like this, you actually have to do more typing to use the relative filepath rather than just typing the absolute filepath, /etc, which would get you to the same place!

Linux Files

One of the things that made the Unix operating system unique when it was first created in the 1970s was that it treats everything on the computer system as a file—hardware devices, data, network connections, everything. That simplified the way programs interact with hardware, and with each other, because no matter where your data comes from, the Unix operating system handles it the same way. While at first that may seem odd, it's what revolutionized the computer world and made the Unix operating system so popular.

However, because everything is a file, there are some issues that you'll run into. One of those issues is trying to identify file types. This section walks you through how Linux handles filenames and provides some hints on how you can determine file types on a Linux system.

Determining File Types

Linux files cover a pretty wide range of file types—everything from text data to executable programs. Because Linux files aren't required to use a file extension, it can sometimes be difficult to tell what files are programs, what files are text data, and what files are binary data. Fortunately, there's a command‐line utility that can help.

The file command returns the type of the file specified. If the file is a data file, it also attempts to detect just what the file contains.

$ file myprog.c
myprog.c: C source, ASCII text
$ file myprog
myprog: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked,
 interpreter /lib64/ld-linux-x86-64.so.2,
 BuildID[sha1]=a0758159df7a479a54ef386b970a26f076561dfe, for GNU/Linux 3.2.0, not
 stripped
$

In this example, the myprog.c file is a C program text file, and the myprog file is a binary executable program.

Filenames

The first thing you'll notice as you peruse the directories on your Linux system is that Linux has a different filenaming standard than Windows. In the Windows world, you're probably used to seeing a three‐ or four‐character file extension added onto each file (such as .docx for Word documents). Linux doesn't require file extensions to identify file types on the system (although they're allowed if you like to use them).

Linux filenames can be any length of characters, although 255 is a practical limit for filenames (when filenames get longer than that, some programs have trouble handling them). Linux filenames can contain uppercase and lowercase letters, numbers, and most special characters (the only characters not allowed are the forward slash and the NULL character). Linux filenames can contain spaces as well as foreign language characters. However, the filename should always start with a letter or number.

Here's an example of some valid Linux filenames:

testing
A test file.txt
4_data_points
my.long.program.name

Hidden Files

Another feature of Linux filenames that is different from Windows is hidden files. Hidden files don't appear in normal file listings. In the Windows world, you use file properties to make a file hidden from normal viewing. When a normal user displays the directory listing, the hidden files don't appear, but when an administrator displays the directory listing, the hidden files magically appear in the output. This feature helps protect important system files from being accidentally deleted or overwritten.

Linux doesn't use file properties to make a file hidden. Instead, it uses the filename. Filenames that start with a period are considered hidden files in that they don't appear in a normal listing when you use the ls command and aren't displayed when you're using a graphical file manager tool. You can, however, use options to display hidden files with the ls command, discussed later in the “File and Directory Listing” section.

To display hidden files with the ls command, you need to include the –a command‐line parameter.

$ ls
mydata.txt  mydirectory  myprog  myprog.c
$ ls -a
.              .bash_logout  mydata.txt   myprog.c                   .viminfo
..             .bashrc       mydirectory  .profile                   .zcompdump
.bash_history  .cache        myprog       .sudo_as_admin_successful  .zshrc
$

In this example, the ls command by itself shows only a few files in the directory. Adding the –a parameter shows all of the hidden files in the directory. Many Linux programs store any settings that you make as a hidden file in each users' Home directory.

File Inodes

Linux has to keep track of a lot of information for each file on the system. The way it does that is by using index nodes, also called inodes. The Linux operating system creates an inode for each file on the system to store the file properties. The inodes are hidden from view, and only the operating system can access them. Each inode is also assigned a number, called the inode number. This is what Linux uses to reference the file, not the filename. The inode numbers are unique on each physical disk partition.

Linux also creates a table on each disk partition, called the inode table. The inode table contains a listing that matches each inode number assigned to each file on the disk partition. As you create and delete files, Linux automatically updates the inode table behind the scenes. However, if the system should abruptly shut down (such as due to a power failure), the inode table can become corrupt. Fortunately for us, there are utilities that can help reorganize the inode table and help prevent data loss. Unfortunately, though, if the inode table does become unrepairable, you won't be able to access the files on the disk partition, even if the actual files are still there!

To view the inode number for files, use the ‐i option in the ls command. You can combine it with the ‐l option to produce a long listing to show more details about the file:

$ ls -il
total 36
263784 -rw-rw-r-- 2 sysadmin sysadmin    71 Dec 19 13:17 copy.c
262804 -rw-rw-r-- 1 sysadmin sysadmin    15 Dec 19 13:26 mydata.txt
395365 drwxrwxr-x 2 sysadmin sysadmin  4096 Dec 19 13:26 mydirectory
264036 -rwxrwxr-x 1 sysadmin sysadmin 16696 Dec 19 13:18 myprog
263784 -rw-rw-r-- 2 sysadmin sysadmin    71 Dec 19 13:17 myprog.c
$

The first number displayed in the long listing is the inode number for each file. Notice that the inode number for the myprog.c file is the same as the copy.c file. That means there is a hard link between those two files. The hard link points to the same physical disk location as the original file, so the inode numbers are the same. We'll look at hard links more closely in the “Linking Files” section.

File and Directory Listing

The most basic feature of the shell is the ability to see what files are available on the system. The list command (ls) is the tool that helps do that. This section describes the ls command and all of the options available to format the information it can provide.

Basic Listing

The ls command at its most basic form displays the files and directories located in your current directory in the command line (also called the working directory).

sysadmin@ubuntu-server:~$ ls
copy.c  mydata.txt  mydirectory  myprog  myprog.c
sysadmin@ubuntu-server:~$

Notice that the ls command produces the listing in alphabetical order (in columns rather than rows). If you're using a terminal emulator that supports color, the ls command may also show different types of entries in different colors. The LS_COLORS environment variable controls this feature. Different Linux distributions set this environment variable depending on the capabilities of the terminal emulator.

If you don't have a color terminal emulator, you can use the ‐F parameter with the ls command to easily distinguish files from directories. Using the ‐F parameter produces the following output:

sysadmin@ubuntu-server:~$ ls -F
copy.c  mydata.txt  mydirectory/  myprog*  myprog.c
sysadmin@ubuntu-server:~$

The ‐F parameter now flags the directories with a forward slash to help identify them in the listing. Similarly, it flags executable files (like the myprog file in the previous code snippet) with an asterisk to help you find the files that can be run on the system easier.

The basic ls command can be somewhat misleading. It shows the files and subdirectories contained in the current directory, but not the hidden files.

The ‐R parameter is another command ls parameter to use. It performs a recursive listing, showing files that are contained within subdirectories in the current directory. If you have lots of subdirectories, this can be quite a long listing. Here's a simple example of what the ‐R parameter produces:

sysadmin@ubuntu-server:~$ ls -F -R
.:
copy.c  mydata.txt  mydirectory/  myprog*  myprog.c
 
./mydirectory:
file1  file2  file3
sysadmin@ubuntu-server:~$

Notice that, first, the ‐R parameter shows the contents of the current directory, which includes a subdirectory (mydirectory). Following that, it traverses all of the subdirectories, showing if any files are contained within each subdirectory. The mydirectory subdirectory shows three files (file1, file2, and file3). If there had been further subdirectories within the mydirectory subdirectory, the ‐R parameter would have continued to traverse those as well. As you can see, for large directory structures, this can become quite a large output listing.

Modifying Listing Information

As you can see in the basic listings, the ls command doesn't produce a whole lot of information about each file by default. For listing additional information, another popular parameter is ‐l. The ‐l parameter produces a long listing format, providing more information about each file in the directory.

sysadmin@ubuntu-server:~$ ls -l
total 36
-rw-rw-r-- 2 sysadmin sysadmin    71 Dec 19 13:17 copy.c
-rw-rw-r-- 1 sysadmin sysadmin    15 Dec 19 13:26 mydata.txt
drwxrwxr-x 2 sysadmin sysadmin  4096 Dec 19 13:51 mydirectory
-rwxrwxr-x 1 sysadmin sysadmin 16696 Dec 19 13:18 myprog
-rw-rw-r-- 2 sysadmin sysadmin    71 Dec 19 13:17 myprog.c
sysadmin@ubuntu-server:~$

The long listing format lists each file and directory contained in the directory on a single line. Besides the filename, it shows additional useful information. The first line in the output shows the total number of blocks contained within the directory. Following that, each line contains the following information about each file (or directory):

The file type, such as directory (d), file (‐), character device (c), or block device (b)
The permissions string for the file, indicting permissions for the user, group, and other users
- The number of hard links to the file
- The username of the owner of the file
- The group name of the group the file belongs to
- The size of the file in bytes
- The time the file was modified last
- The file or directory name

The ‐l parameter is a powerful tool to have. Armed with this information, you can see just about any information you need for any file or directory on the system.

The Complete Parameter List

There are lots of parameters for the ls command that can come in handy as you do file management. If you use the man command for ls, you'll see several pages of available parameters for you to use to modify the output of the ls command.

The ls command uses two types of command‐line parameters:

Single‐letter parameters
Full‐word (long) parameters

The single‐letter parameters are always preceded by a single dash. Full‐word parameters are more descriptive and are preceded by a double dash. Many parameters have both a single‐letter and full‐word version, while some have only one type. Table 7.2 lists some of the more popular parameters that'll help you out with using the ls command.

TABLE 7.2: Some Popular ls Command Parameters

SINGLE LETTER	FULL WORD	DESCRIPTION
`‐a`	`‐‐all`	Don't ignore entries starting with a period.
`‐A`	`‐‐almost‐all`	Don't list the . and .. files.
	`‐‐author`	Print the author of each file.
`‐b`	`‐‐escape`	Print octal values for nonprintable characters.
	`‐‐block‐size=size`	Calculate the block sizes using size‐byte blocks.
`‐B`	‐‐ `ignore‐backups`	Don't list entries with the tilde (~) symbol (used to denote backup copies).
`‐c`		Sort by time of last modification.
`‐C`		List entries by columns.
	`‐‐color=when`	When to use colors (always, never, or auto).
`‐d`	`‐‐directory`	List directory entries instead of contents, and don't dereference symbolic links.
`‐F`	`‐‐classify`	Append file‐type indicator to entries.
	`‐‐file‐type`	Only append file‐type indicators to some filetypes (not executable files).
	`‐‐format=word`	Format output as across, commas, horizontal, long, single‐column, verbose, or vertical.
`‐g`		List full file information except for the file's owner.
	`‐‐group‐directories‐first`	List all directories before files.
`‐G`	`‐‐no‐group`	In a long listing, don't display group names.
`‐h`	`‐‐human‐readable`	Print sizes using K for kilobytes, M for megabytes, and G for gigabytes.
	`‐‐si`	Same as `‐h`, but use powers of 1000 instead of 1024.
`‐i`	`‐‐inode`	Display the index number (inode) of each file.
`‐l`		Display the long listing format.
`‐L`	`‐‐dereference`	Show information for the original file for a linked file.
`‐n`	`‐‐numeric‐uid‐gid`	Show numeric userid and groupid instead of names.
`‐o`		In a long listing, don't display owner names.
`‐r`	`‐‐reverse`	Reverse the sorting order when displaying files and directories.
`‐R`	`‐‐recursive`	List subdirectory contents recursively.
`‐s`	`‐‐size`	Print the block size of each file.
`‐S`	`‐‐sort=size`	Sort the output by file size.
`‐t`	`‐‐sort=time`	Sort the output by file modification time.
`‐u`		Display the last access time instead of last modification time for all files.
`‐U`	`‐‐sort=none`	Don't sort the output listing.
`‐v`	`‐‐sort=version`	Sort the output by file version.
`‐x`		List entries by line instead of columns.
`‐X`	`‐‐sort=extension`	Sort the output by file extension.

You can use more than one parameter at a time if you want. The double‐dash parameters must be listed separately, but the single‐dash parameters can be combined into a string behind the dash. A common combination to use is the ‐a parameter to list all files, the ‐i parameter to list the inode for each file, the ‐l parameter to produce a long listing, and the ‐s parameter to list the block size of the files. Combining all of these parameters creates the easy‐to‐remember ‐sail parameter.

sysadmin@ubuntu-server:~$ ls -sail
total 132
265238  4 drwxr-xr-x 4 sysadmin sysadmin  4096 Dec 19 13:45 .
262145  4 drwxr-xr-x 3 root     root      4096 Nov  4 18:56 ..
262184  8 -rw------- 1 sysadmin sysadmin  4957 Dec 19 13:27 .bash_history
265244  4 -rw-r--r-- 1 sysadmin sysadmin   220 Feb 25  2020 .bash_logout
265240  4 -rw-r--r-- 1 sysadmin sysadmin  3771 Feb 25  2020 .bashrc
265296  4 drwx------ 2 sysadmin sysadmin  4096 Nov  4 18:57 .cache
263784  4 -rw-rw-r-- 2 sysadmin sysadmin    71 Dec 19 13:17 copy.c
262804  4 -rw-rw-r-- 1 sysadmin sysadmin    15 Dec 19 13:26 mydata.txt
395365  4 drwxrwxr-x 2 sysadmin sysadmin  4096 Dec 19 13:51 mydirectory
264036 20 -rwxrwxr-x 1 sysadmin sysadmin 16696 Dec 19 13:18 myprog
263784  4 -rw-rw-r-- 2 sysadmin sysadmin    71 Dec 19 13:17 myprog.c
265242  4 -rw-r--r-- 1 sysadmin sysadmin   807 Feb 25  2020 .profile
265302  0 -rw-r--r-- 1 sysadmin sysadmin     0 Nov  4 18:58 .sudo_as_admin_successful
263813 12 -rw------- 1 sysadmin sysadmin 10056 Dec 19 13:17 .viminfo
262207 48 -rw-rw-r-- 1 sysadmin sysadmin 49006 Nov 23 13:36 .zcompdump
265288  4 -rw-rw-r-- 1 sysadmin sysadmin    29 Nov 23 13:36 .zshrc
sysadmin@ubuntu-server:~$

Besides the normal ‐l parameter output information, you'll see two additional numbers added to each line. The first number in the listing is the file or directory inode number. The second number is the block size of the file.

Directory Handling

In Linux, there are a few commands that work for both files and directories, and some that work only for directories. This section discusses the commands that can work only with directories.

Creating Directories

There's not much to creating a new directory in Linux; just use the mkdir command.

sysadmin@ubuntu-server:~$ mkdir dir3
sysadmin@ubuntu-server:~$ ls -il
total 40
263784 -rw-rw-r-- 2 sysadmin sysadmin    71 Dec 19 13:17 copy.c
395394 drwxrwxr-x 2 sysadmin sysadmin  4096 Dec 19 14:09 dir3
262804 -rw-rw-r-- 1 sysadmin sysadmin    15 Dec 19 13:26 mydata.txt
395365 drwxrwxr-x 2 sysadmin sysadmin  4096 Dec 19 13:51 mydirectory
264036 -rwxrwxr-x 1 sysadmin sysadmin 16696 Dec 19 13:18 myprog
263784 -rw-rw-r-- 2 sysadmin sysadmin    71 Dec 19 13:17 myprog.c
sysadmin@ubuntu-server:~$

The system creates the new directory and assigns it a new inode number.

Deleting Directories

Removing directories can be tricky, but there's a reason for that. There are lots of opportunities for bad things to happen when you start deleting directories. Bash tries to protect us from accidental catastrophes as much as possible. The basic command for removing a directory is rmdir.

sysadmin@ubuntu-server:~$ rmdir dir3
sysadmin@ubuntu-server:~$ rmdir mydirectory
rmdir: failed to remove 'mydirectory': Directory not empty
sysadmin@ubuntu-server:~$

By default, the rmdir command works only for removing empty directories. Since there is a file in the mydirectory directory, the rmdir command refuses to remove it. You can remove nonempty directories using the ‐‐ignore‐fail‐on‐non‐empty parameter.

You can also use the rm command when handling directories.

If you try using it without parameters, as with files, you'll be somewhat disappointed.

sysadmin@ubuntu-server:~$ rm mydirectory
rm: cannot remove 'mydirectory': Is a directory
sysadmin@ubuntu-server:~$

However, if you really want to remove a directory, you can use the ‐r parameter to recursively remove the files in the directory and then the directory itself.

sysadmin@ubuntu-server:~$ rm -r mydirectory
sysadmin@ubuntu-server:~$

File Handling

Bash provides lots of commands for manipulating files on the Linux filesystem. This section walks through the basic commands you will need to work with files from the CLI for all your file‐handling needs.

Creating Files

Every once in a while, you will run into a situation where you need to create an empty file. Sometimes applications expect a log file to be present before they can write to it. In these situations, you can use the touch command to easily create an empty file.

sysadmin@ubuntu-server:~$ touch test1
sysadmin@ubuntu-server:~$ ls -il test1
263320 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:22 test1
sysadmin@ubuntu-server:~$

The touch command creates the new file you specify and assigns your username as the file owner.

Notice that the file size is zero, since the touch command just created an empty file. The touch command can also be used to change the access and modification times on an existing file without changing the file contents.

sysadmin@ubuntu-server:~$ touch test1
sysadmin@ubuntu-server:~$ ls -il test1
263320 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:30 test1
sysadmin@ubuntu-server:~$

The modification time of test1 is now updated from the original time. If you want to change only the access time, use the ‐a parameter. To change only the modification time, use the ‐m parameter.

By default, touch uses the current time. You can specify the time by using the ‐t parameter with a specific timestamp value.

sysadmin@ubuntu-server:~$ touch -t 202112251200 test1
sysadmin@ubuntu-server:~$ ls -l test1
-rw-rw-r-- 1 sysadmin sysadmin 0 Dec 25  2021 test1
sysadmin@ubuntu-server:~$

Now the modification time for the file is set to a date significantly in the future from the current time.

Copying Files

Copying files and directories from one location in the filesystem to another is a common practice for system administrators. The cp command provides this feature.

In its most basic form, the cp command uses two parameters—the source object and the destination object.

cp source destination

When both the source and destination parameters are filenames, the cp command copies the source file to a new file with the filename specified as the destination. The new file acts like a new file, with an updated file creation and last modified times.

sysadmin@ubuntu-server:~$ cp test1 test2
sysadmin@ubuntu-server:~$ ls -il test*
263320 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 25  2021 test1
263324 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:27 test2
sysadmin@ubuntu-server:~$

The new file test2 shows a different inode number, indicating that it's a completely new file. You'll also notice that the modification time for the test2 file shows the time that it was created.

If the destination file already exists, the cp command will overwrite the existing file by default. That can be somewhat dangerous. By adding the ‐i parameter, the cp command will prompt you to answer whether you want to overwrite the destination file.

sysadmin@ubuntu-server:~$ cp -i test1 test2
cp: overwrite 'test2'? y
sysadmin@ubuntu-server:~$

If you don't answer y, the file copy will not proceed.

You can also copy a file to an existing directory by specifying the directory as the destination.

sysadmin@ubuntu-server:~$ mkdir dir1
sysadmin@ubuntu-server:~$ cp test1 dir1
sysadmin@ubuntu-server:~$ ls -il dir1
total 0
395391 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:31 test1
sysadmin@ubuntu-server:~$

The new file is now under the dir1 directory, using the same filename as the original.

These examples all used relative pathnames, but you can just as easily use the absolute pathname for both the source and destination objects. To copy a file to the current directory you're in, you can use the dot symbol.

sysadmin@ubuntu-server:~$ cp /home/sysadmin/dir1/test1 .
sysadmin@ubuntu-server:~$ ls -il test1
263320 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:33 test1
sysadmin@ubuntu-server:~$

As with most commands, the cp command has a few command‐line parameters to help you. These are shown in Table 7.3.

TABLE 7.3: The cp Command Parameters

PARAMETER	DESCRIPTION
`‐a`	Archive files by preserving their attributes.
`‐b`	Create a backup of each existing destination file instead of overwriting it.
`‐d`	Preserve.
`‐f`	Force the overwriting of existing destination files without prompting.
`‐i`	Prompt before overwriting destination files.
`‐l`	Create a file link instead of copying the files.
`‐p`	Preserve file attributes if possible.
`‐r`	Copy files recursively.
`‐R`	Copy directories recursively.
`‐s`	Create a symbolic link instead of copying the file.
`‐S`	Override the backup feature.
`‐u`	Copy the source file only if it has a newer date and time than the destination (update).
`‐v`	Verbose mode, explaining what's happening.
`‐x`	Restrict the copy to the current filesystem.

Use the ‐p parameter to preserve the file access or modification times of the original file for the copied file.

sysadmin@ubuntu-server:~$ cp -p test1 test3
sysadmin@ubuntu-server:~$ ls -il test*
263320 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:33 test1
263324 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:30 test2
263725 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:33 test3
sysadmin@ubuntu-server:~$

Now, even though the test3 file is a completely new file, it has the same timestamps as the original test1 file.

The ‐R parameter is extremely powerful. It allows you to recursively copy the contents of an entire directory in one command.

sysadmin@ubuntu-server:~$ cp -R dir1 dir2
sysadmin@ubuntu-server:~$ ls -l dir*
dir1:
total 0
-rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:31 test1
 
dir2:
total 0
-rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 14:38 test1
sysadmin@ubuntu-server:~$

Now dir2 is a complete copy of dir1.

You can also use wildcard characters in your cp commands.

sysadmin@ubuntu-server:~$ cp test* dir2
sysadmin@ubuntu-server:~$ ls -al dir2
total 8
drwxrwxr-x 2 sysadmin sysadmin 4096 Dec 19 14:42 .
drwxr-xr-x 5 sysadmin sysadmin 4096 Dec 19 14:38 ..
-rw-rw-r-- 1 sysadmin sysadmin    0 Dec 19 14:42 test1
-rw-rw-r-- 1 sysadmin sysadmin    0 Dec 19 14:42 test2
-rw-rw-r-- 1 sysadmin sysadmin    0 Dec 19 14:42 test3
sysadmin@ubuntu-server:~$

This command copied all of the files that started with test to dir2.

Linking Files

You may have noticed that a couple of the parameters for the cp command referred to linking files. This is a pretty cool option available in the Linux filesystems. If you need to maintain two (or more) copies of the same file on the system, instead of having separate physical copies, you can use one physical copy and multiple virtual copies, called links. A link is a placeholder in a directory that points to the real location of the file. There are two different types of file links in Linux.

A symbolic, or soft, link
A hard link

The hard link creates a separate file that contains information about the original file and where to locate it. When you reference the hard link file, it's just as if you're referencing the original file.

sysadmin@ubuntu-server:~$ cp -l test1 test4
sysadmin@ubuntu-server:~$ ls -il test*
263320 -rw-rw-r-- 2 sysadmin sysadmin 54 Dec 19 14:33 test1
263324 -rw-rw-r-- 1 sysadmin sysadmin  0 Dec 19 14:30 test2
263725 -rw-rw-r-- 1 sysadmin sysadmin  0 Dec 19 14:33 test3
263320 -rw-rw-r-- 2 sysadmin sysadmin 54 Dec 19 14:33 test4
sysadmin@ubuntu-server:~$

The ‐l parameter created a hard link for the test1 file called test4. The file listing shows that the inode numbers of both the files are the same, indicating that, in reality, they are both the same file. Also notice that the link count (the third item in the listing) now shows that both files have two links.

Conversely, the ‐s parameter creates a symbolic, or soft, link.

sysadmin@ubuntu-server:~$ cp -s test1 test5
sysadmin@ubuntu-server:~$ ls -il test*
263320 -rw-rw-r-- 2 sysadmin sysadmin 54 Dec 19 14:33 test1
263324 -rw-rw-r-- 1 sysadmin sysadmin  0 Dec 19 14:30 test2
263725 -rw-rw-r-- 1 sysadmin sysadmin  0 Dec 19 14:33 test3
263320 -rw-rw-r-- 2 sysadmin sysadmin 54 Dec 19 14:33 test4
263840 lrwxrwxrwx 1 sysadmin sysadmin  5 Dec 19 14:46 test5 -> test1
sysadmin@ubuntu-server:~$

There are a couple of things to notice in the file listing, First, you'll notice that the new test5 file has a different inode number than the test1 file, indicating that the Linux system treats it as a separate file. Second, the file size is different. A linked file needs to store only information about the source file, not the actual data in the file. The filename area of the listing shows the relationship between the two files.

Instead of using the cp command to link files, you can use the ln command. By default, the ln command creates hard links. If you want to create a soft link, you'll still need to use the ‐s parameter.

Instead of copying the linked file, you can create another link to the original file. You can have many links to the same file with no problems. However, you also don't want to create soft links to other soft‐linked files. This creates a chain of links that not only can be confusing but also can be easily broken, causing all sorts of problems.

Renaming Files

In the Linux world, renaming files is called moving. The mv command is available to move both files and directories to another location.

sysadmin@ubuntu-server:~$ mv test2 test6
sysadmin@ubuntu-server:~$ ls -il test*
263320 -rw-rw-r-- 2 sysadmin sysadmin 54 Dec 19 14:48 test1
263725 -rw-rw-r-- 1 sysadmin sysadmin  0 Dec 19 14:33 test3
263320 -rw-rw-r-- 2 sysadmin sysadmin 54 Dec 19 14:48 test4
263840 lrwxrwxrwx 1 sysadmin sysadmin  5 Dec 19 14:48 test5 -> test1
263324 -rw-rw-r-- 1 sysadmin sysadmin  0 Dec 19 14:30 test6
sysadmin@ubuntu-server:~$

Notice that moving the file changed the filename but kept the same inode number and the timestamp value. Moving a file with soft links is a problem.

sysadmin@ubuntu-server:~$ mv test1 test8
sysadmin@ubuntu-server:~$ ls -il test*
263725 -rw-rw-r-- 1 sysadmin sysadmin  0 Dec 19 14:33 test3
263320 -rw-rw-r-- 2 sysadmin sysadmin 54 Dec 19 14:48 test4
263840 lrwxrwxrwx 1 sysadmin sysadmin  5 Dec 19 14:48 test5 -> test1
263324 -rw-rw-r-- 1 sysadmin sysadmin  0 Dec 19 14:30 test6
263320 -rw-rw-r-- 2 sysadmin sysadmin 54 Dec 19 14:48 test8
sysadmin@ubuntu-server:~$

The test4 file that uses a hard link still uses the same inode number, which is perfectly fine. However, the test5 file now points to an invalid file, and it is no longer a valid link. If your terminal supports colors, it is most likely displayed in a red font.

You can also use the mv command to move directories.

sysadmin@ubuntu-server:~$ mv dir2 dir4
sysadmin@ubuntu-server:~$

The entire contents of the directory are unchanged. The only thing that changes is the name of the directory.

Deleting Files

Most likely at some point in your Linux career, you'll want to be able to delete existing files. Whether it's to clean up a filesystem or to remove a software package, there's always opportunities to delete files.

In the Linux world, deleting is called removing. The command to remove files in Bash is rm. The basic format of the rm command is pretty simple.

sysadmin@ubuntu-server:~$ rm -i test3
rm: remove regular empty file 'test3'? y
sysadmin@ubuntu-server:~$ ls -il test*
263320 -rw-rw-r-- 2 sysadmin sysadmin 54 Dec 19 14:48 test4
263840 lrwxrwxrwx 1 sysadmin sysadmin  5 Dec 19 14:48 test5 -> test1
263324 -rw-rw-r-- 1 sysadmin sysadmin  0 Dec 19 14:30 test6
263320 -rw-rw-r-- 2 sysadmin sysadmin 54 Dec 19 14:48 test8
sysadmin@ubuntu-server:~$

Just as with the rmdir command, you can use the ‐i parameter to prompt you for safety. There's no trashcan in the CLI like there often is in the graphical desktop environment. Once you remove a file, it's gone forever.

Real World Scenario

WORKING WITH FILES

It's important to feel comfortable with creating, modifying, and deleting files in Linux. This exercise walks you through doing just that with some sample files so you don't have to worry about breaking anything as you experiment. Just follow these steps:

Log into your Linux server using the user account you created earlier during the installation.
From your Home directory CLI prompt, create a directory by entering the command mkdir test. Change to that directory by entering the command cd test, and then enter the command ls ‐l to look at the directory contents.
```
    sysadmin@ubuntu-server:~$ mkdir test
    sysadmin@ubuntu-server:~$ cd test
    sysadmin@ubuntu-server:~/test$ ls -l
    total 0
    sysadmin@ubuntu-server:~/test$
```

From the CLI prompt, enter the command touch test1. This creates a test file to work with.

    sysadmin@ubuntu-server:~/test$ touch test1
    sysadmin@ubuntu-server:~/test$ ls -l
    total 0
    -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 17:45 test1
    sysadmin@ubuntu-server:~/test$

From the CLI prompt, create another file that's a hard link to the first file by entering the command ln test1 test2. List the inodes of the files using the command by typing ls ‐il to ensure they are hard linked.

    sysadmin@ubuntu-server:~/test$ ln test1 test2
    sysadmin@ubuntu-server:~/test$ ls -il
    total 0
    395415 -rw-rw-r-- 2 sysadmin sysadmin 0 Dec 19 17:45 test1
    395415 -rw-rw-r-- 2 sysadmin sysadmin 0 Dec 19 17:45 test2
    sysadmin@ubuntu-server:~/test$

Save some data in the test1 file by entering the command echo "Testing" >> test1. Enter the command ls ‐l to see the file size of both the test1 and test2 files. They should have both changed.

    sysadmin@ubuntu-server:~/test$ echo "Testing">> test1
    sysadmin@ubuntu-server:~/test$ ls -l
    total 8
    -rw-rw-r-- 2 sysadmin sysadmin 8 Dec 19 17:46 test1
    -rw-rw-r-- 2 sysadmin sysadmin 8 Dec 19 17:46 test2
    sysadmin@ubuntu-server:~/test$

Remove the test1 file by entering the command rm test1. Enter the command ls to list the remaining file.

    sysadmin@ubuntu-server:~/test$ rm test1
    sysadmin@ubuntu-server:~/test$ ls -il
    total 4
    395415 -rw-rw-r-- 1 sysadmin sysadmin 8 Dec 19 17:46 test2
    sysadmin@ubuntu-server:~/test$

Notice that the test2 file still remains, with the data intact.

File Features

There are a few features unique to Linux that you'll need to be aware of when working with files. This section walks you through these features.

Using Wildcards

The ls, cp, mv, and rm commands are handy, but specifying a single file or directory name in the commands makes them somewhat clunky to work with on the Linux command line. If you want to work with more than one file or directory, you need to use a technique the Linux world calls globbing.

Globbing is basically the use of wildcard characters to represent one or more characters in a file or directory name. That feature allows us to specify a pattern for Linux to match multiple files or directories against. There are two basic globbing characters that you can use.

The question mark—represents a single character
The asterisk—represents zero or more characters

The question mark is a stand‐in character to represent any single character to match in the filename. For example, you can specify the filename file?.txt in a rm command to remove any file that starts with file, followed by one character, and ending with .txt. Here's an example:

sysadmin@ubuntu-server:~$ ls -il file*
265217 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 15:05 file.txt
265214 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 15:05 file11.txt
263855 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 15:04 file1.txt
265146 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 15:04 file2.txt
sysadmin@ubuntu-server:~$ rm file?.txt
sysadmin@ubuntu-server:~$ ls -il file*
265217 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 15:05 file.txt
265214 -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 15:05 file11.txt
sysadmin@ubuntu-server:~$

The rm command uses the glob file?.txt as the parameter. Linux looks for any file in the directory that matches the pattern to remove. Two files, file1.txt and file2.txt, match the pattern. However, the file11.txt file doesn't match the pattern, as there are two characters between the file and .txt parts of the filename, and the file.txt file doesn't match the pattern, as there aren't any characters between the file and .txt parts of the filename.

You use the asterisk glob character to match zero or more characters in the filename.

sysadmin@ubuntu-server:~$ rm file*
sysadmin@ubuntu-server:~$ ls -il file*
ls: cannot access 'file*': No such file or directory
sysadmin@ubuntu-server:~$

By using the asterisk, Linux matched all of the files, even the file.txt file! You can use the asterisk in any list, copy, move, or delete operation in the command line.

Quoting

Another issue you may run into with Linux is files or directories that contain spaces in their names. This is perfectly legal in Linux, but it can cause headaches when you're working from the command line.

If you try to reference a file or directory that contains a space in the filename, you'll get several error messages:

sysadmin@ubuntu-server:~$ ls -l long*
-rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 15:09 'long file name.txt'
sysadmin@ubuntu-server:~$ rm long file name.txt
rm: cannot remove 'long': No such file or directory
rm: cannot remove 'file': No such file or directory
rm: cannot remove 'name.txt': No such file or directory
sysadmin@ubuntu-server:~$

The problem is that, by default, the rm command uses a space to indicate the end of a filename, so it thinks you're trying to remove three separate files— long, file, and name.txt !

To get around that, you need to use quoting, which places quotes around any filenames that contain spaces:

sysadmin@ubuntu-server:~$ rm 'long file name.txt'
sysadmin@ubuntu-server:~$ ls -il long*
ls: cannot access 'long*': No such file or directory
sysadmin@ubuntu-server:~$

You can use either single or double quotes around the filename, as long as you use the same type on both ends of the filename.

Case Sensitivity

One last thing to watch out for when using the Linux file handling command‐line commands is the case of any file or directory names that you're working with. Linux is a case‐sensitive operating system, so files and directories can have both uppercase and lowercase letters in the names. Likewise, as you're working with the files, make sure you reference the correct format of the files or directory names.

sysadmin@ubuntu-server:~$ ls -il *.txt
263725 -rw-rw-r-- 1 sysadmin sysadmin  0 Dec 19 15:13 file1.txt
263855 -rw-rw-r-- 1 sysadmin sysadmin  0 Dec 19 15:12 File1.txt
sysadmin@ubuntu-server:~$ rm file1.txt
sysadmin@ubuntu-server:~$ ls -il *.txt
263855 -rw-rw-r-- 1 sysadmin sysadmin  0 Dec 19 15:12 File1.txt
sysadmin@ubuntu-server:~$

This is a good example of when using filename globbing can come in handy. If you're not sure of the case of a character, you can use the question mark to represent any character of any case.

$ rm ?ile1.txt 
$

This command will remove both the file1.txt and File1.txt files.

Finding Files

With so many files stored on the Linux system, it can often become difficult to find the files you're looking for. Fortunately, Linux provides a few different file‐searching features to help. This section looks at the ones you'll most likely use as a Linux systems administrator.

The which Command

You can use the which command to find where programs and utilities are stored. This can come in handy if you have two versions of a program installed on your system or if you're not sure if a command is built into the Linux shell or supplied as a separate utility. The format for using the which command is pretty simple.

$ which touch
/usr/bin/touch
$

The output of the which command shows the full path to where the command is stored on the system. If you have two versions of a program on the system, the which command shows which one will run when you type it at the command line. If you need to use a different version, you have to use the full path to the program file.

The locate Command

Many Linux distributions contain the locate command by default. If your distribution doesn't, it's usually found in the mlocate software package.

The locate command uses a database that keeps track of the location of files on the system. When you use the locate command, it searches the database to find the requested file. This process is often quicker than trying to search through all of the files on the filesystem.

The key to the locate command is the information in the database. It can only find files that have been indexed into the database. The information in the database is updated by the appropriately named updatedb program. The Linux system runs the updatedb program in background mode on a regular basis to update the file database with any new files stored on the system.

Be careful when using the locate command, though; you may get more than you bargained for! Here's an example:

sysadmin@ubuntu-server:~$ locate touch
/snap/core18/1932/bin/touch
/snap/core18/1932/lib/udev/hwdb.d/70-touchpad.hwdb
/snap/core18/1932/lib/udev/rules.d/70-touchpad.rules
/snap/core18/1932/usr/bin/touch
/snap/core18/1944/bin/touch
/snap/core18/1944/lib/udev/hwdb.d/70-touchpad.hwdb
/snap/core18/1944/lib/udev/rules.d/70-touchpad.rules
/snap/core18/1944/usr/bin/touch
/usr/bin/touch
…
sysadmin@ubuntu-server:~$

The locate command returns any file that contains the word touch in the filename! You'll need to filter through the results to find the file that you're looking for.

Another downside to the locate command is that it can't find any newly added files until the next running of the updatedb program. Some Linux systems run the updatedb program on a regular basis, every few minutes, while others schedule it to run only once or twice a day. How often you need to run it depends on how often you get new files on your Linux system and how quickly you'd need to find them.

The whereis Command

The whereis command is similar to the which command in that it looks for a specific occurrence of the file you're searching for. However, it looks only in binary file directories, library directories, and documentation directories, so that helps speed up the search process some. This is great for finding not only commands but the documentation files that go along with them.

sysadmin@ubuntu-server:~$ whereis touch
touch: /usr/bin/touch /usr/share/man/man1/touch.1.gz
sysadmin@ubuntu-server:~$

In this example, the whereis command returned the location of the touch program file and the location of the manual page associated with the touch program.

The find Command

The last resort to finding files on your Linux system is the find command. It does a physical search through the virtual directory tree looking for the specified file. As you can imagine, the wider the search area, the longer it will take for the find command to return an answer. You specify the search area as the first parameter on the find command line.

sysadmin@ubuntu-server:~$ find /home/sysadmin -name myprog -print
/home/sysadmin/myprog
sysadmin@ubuntu-server:~$

This example restricts the find command to looking in the /home/sysadmin directory structure for the file named myprog. The ‐name option specifies the filename to look for, and the ‐print command tells the find command to display the results.

What makes the find command so versatile is that it can find files based on lots of different criteria besides just the filename, such as the creation time, the file owner, the file size, or even file permissions. For example, you can use the find command to look for all files over 1 MB on your filesystem. Table 7.4 shows some of the options you can use in the find command.

TABLE 7.4: Useful find Command Options

OPTION	DESCRIPTION
`‐amin` `n`	File was last accessed `n` minutes ago.
`‐atime` `n`	File was last accessed `n` days ago.
`‐ctime` `n`	File was last changed `n` minutes ago.
`‐inum` `n`	Match the file inode number to the number specified.
`‐name` `pattern`	Match the file name to the `pattern` specified.
`‐perm` `pattern`	Match the file permissions to the `pattern` specified.
`‐size` `n`	Match the file size to the amount specified.
`‐user` `name`	Match the file owner to the `name` specified.

You can also use special modifiers on the find options, such as a plus sign for “greater than” or a minus sign for “less than.” For example, to list all of the files larger than 5,000 characters, you'd use the following:

sysadmin@ubuntu-server:~$ find . -size +5000c -print
./.zcompdump
./myprog
./.viminfo
sysadmin@ubuntu-server:~$

The +5000c parameter tells the find command to look for files in the current directory that are more than 5,000 characters in size.

Archiving Files

Storing data can get ugly. The more data you need to store, the more disk space it requires. While disk sizes are getting larger these days, there is still a limit to how much space you have. To help with that, you can use some Linux file archiving tools to compress data files for storage and sharing. This section takes a look at how Linux handles compressing and archiving both files and directories.

Compressing Files

If you've done any work in the Microsoft Windows world, no doubt you've used zip files. The PKZip compression utility became the de facto way to compress data and executable files in Windows, so much so that Microsoft eventually incorporated it into the Windows operating system, starting with XP, as the compressed directories feature. Compressed directories allow you to easily compress large files or a large group of files into a smaller file that takes up less space and is easier to copy to another location.

Linux provides a few different tools you can use to compress files to save space. While this may sound great, it can sometimes lead to confusion and chaos when trying to download and extract Linux files from the Internet. Table 7.5 lists the different file compression utilities available in Linux.

TABLE 7.5: Linux File Compression Utilities

UTILITY	FILE EXTENSION	DESCRIPTION
`bzip2`	`.bz2`	Uses the Burrows‐Wheeler block sorting text compression algorithm and human coding
`compress`	`.Z`	Original Unix file compression utility, but starting to fade away into obscurity
`gzip`	`.gz`	The GNU Project's compression utility; uses the open‐source Lempel‐Ziv_Welch coding
`xz`	`.xz`	A general‐purpose compression utility gaining in popularity
`zip`	`.zip`	The Unix version of the PKZip program for Windows

The compress utility can work with files compressed on standard Unix systems, but it's not often installed by default on Linux systems. If you download a file with a .Z extension, you can usually install the compress package from the distribution software repository. The zip utility creates compressed directories that can be extracted on Windows systems, but it's not the best compression algorithm to use if you're keeping the files on a Linux system.

The gzip utility is the most popular compression tool used in Linux. It is a creation of the GNU Project, in its attempt to create a free version of the original Unix compress utility. This package includes three main files.

gzip for compressing files
gzcat for displaying the contents of compressed text files
gunzip for uncompressing files

The gzip command compresses the file you specify on the command line. You can also specify more than one filename or even use wildcard characters to compress multiple files at once.

$ gzip my*
$

This gzip command compresses every file in the directory that starts with my.

Creating Archive Files

Although the gzip command not only can compress data but also archive the data into a single file, it's not the standard utility used for archiving large amounts of data in the Unix and Linux worlds. By far the most popular archiving tool used in Unix and Linux is the tar command.

The tar command was originally used to back up files to a tape device for archiving. However, it can also write the output to a file, which has become a popular way to bundle data for distribution in Linux. It's common to see source code files bundled into a tar archive file (affectionately called a tarball) for distribution.

The following is the format of the tar command:

tar function [options] object1 object2

The function parameter defines what the tar command should do, as shown in Table 7.6.

TABLE 7.6: The tar Command Functions

FUNCTION	DESCRIPTION
`‐a`	Appends an existing tar archive file to another tar archive file
`‐c`	Creates a new tar archive file
`‐d`	Checks the differences between a tar archive file and the filesystem files
`‐r`	Appends files to an existing tar archive file
`‐t`	Lists the contents of an existing tar archive file
`‐u`	Appends files to an existing tar archive file that are newer than a file with the same in the archive
`‐x`	Extracts files from an existing tar archive file

Each function uses one or more options to define a specific behavior for the tar archive file. Table 7.7 shows the options that you can use with the tar command.

TABLE 7.7: The tar Command Options

OPTION	DESCRIPTION
`‐C` `dir`	Changes to the specified directory
`‐f` `file`	Outputs results to the file (or device) specified
`‐j`	Redirects output to the `bzip2` command for compression
`‐P`	Preserves all file permissions
`‐v`	Lists files as they are processed
`‐z`	Redirects the output to the `gzip` command for compressions

While the combination of several functions along with several options seems like an impossible task to remember, in reality you'll find yourself just using a handful of combinations to do common tasks. The following section takes a look at the more common archiving scenarios that you'll run into.

Archiving Scenarios

Normally, there are just three basic things you'll need to do with the tar command.

Archive files to create a tarball.
List the files contained in a tarball.
Extract the files from a tarball.

This helps narrow down the function and option features that you need to remember for the tar command.

To start, you can create a new archive file using this command:

tar –cvf test.tar test/ test2/

This command creates an archive file called test.tar containing the contents of both the test directory and the test2 directory. The ‐v option is a nice feature in that it displays the files as they are added to the archive file.

Next, to display the contents of a tarball file, you just use this command:

tar –tf test.tar

The ‐t function lists the contents of the tarball to the standard output by default, which is your monitor. The files aren't extracted, just listed.

Finally, to extract the files contained in a tarball, you'll use this command:

tar –xvf test.tar

It extracts the contents of the tar file test.tar into the current directory. If the tar file was created from a directory structure, the entire directory structure is re‐created starting at the current directory.

As you can see, using the tar command is a simple way to create archive files of entire directory structures. That's why this has become a common method for distributing source code files for open source applications in the Linux world!

Real World Scenario

WORKING WITH FILE ARCHIVES

In this exercise, you will create a tar archive file containing several files in one directory and then extract the tar archive file contents into another directory to simulate moving the files to another server. Just follow these steps:

Log into your Linux server using the user account you created during installation.
From the CLI prompt, create a new directory by entering the command mkdir mytest1, and then create another new directory by entering the command mkdir mytest2.

Create a few new files in the mytest1 directory by entering these commands:

    touch mytest1/test1
    touch mytest1/test2
    touch mytest1/test3
    touch mytest1/test4

Change to the mytest1 directory by entering the command cd mytest1, and then enter the command ls ‐l to ensure the files exist:

    sysadmin@ubuntu-server:~$ cd mytest1
    sysadmin@ubuntu-server:~/mytest1$ ls -l
    total 0
    -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 18:14 test1
    -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 18:14 test2
    -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 18:14 test3
    -rw-rw-r-- 1 sysadmin sysadmin 0 Dec 19 18:14 test4
    sysadmin@ubuntu-server:~/mytest1$

Archive the files by entering the command tar ‐cvf test.tar test*. You should see the following output:

    sysadmin@ubuntu-server:~/mytest1$ tar -cvf test.tar test*
    test1
    test2
    test3
    test4
    sysadmin@ubuntu-server:~/mytest1$

Copy the test.tar archive file to the mytest2 directory using the command cp test.tar ../mytest2.
Change to the mytest2 directory by entering the command cd ../mytest2, and then list the directory contents by entering the command ls ‐l.

Extract the archive file using the command tar ‐xvf test.tar. Enter the command ls ‐l to ensure the files have been extracted.

    sysadmin@ubuntu-server:~/mytest2$ ls -l
    total 12
    -rw-rw-r-- 1 sysadmin sysadmin     0 Dec 19 18:14 test1
    -rw-rw-r-- 1 sysadmin sysadmin     0 Dec 19 18:14 test2
    -rw-rw-r-- 1 sysadmin sysadmin     0 Dec 19 18:14 test3
    -rw-rw-r-- 1 sysadmin sysadmin     0 Dec 19 18:14 test4
    -rw-rw-r-- 1 sysadmin sysadmin 10240 Dec 19 18:15 test.tar
    sysadmin@ubuntu-server:~/mytest2$

You can use this same process to move files from one server to another server as a single archive file.

The Bottom Line

Describe how Linux handles files and directories. File management is an important part of the Linux system, and it helps to know the basics of how to manage files from the CLI. This chapter first showed you how to use both absolute and relative filepaths in commands to reference files and directories. Next, it showed the standard Linux file naming conventions used by Linux distributions, along with how Linux uses inodes to handle files.
- Master It Your boss has given you a list of files he saw being used on the server and wants you to find out what type of files they are. The files are as follows:
```
/usr/bin/grep
/usr/bin/zcat
/etc/hosts
~/.bashrc
```
  What command should you use to determine those file types?
Explain the different options available to list files and directories. The ls command is how to list the contents of directories from the command prompt. While there are lots of parameters associated with the ls command, you'll soon find yourself using just a handful of them to view the information that you need.
- Master It A user on your Linux server has an important project and needs access to the file /share/HR/employees.txt. However, the user doesn't know who owns the file to ask for permission to access the file. What command and parameters should you use to determine the owner of the file?
Submit commands to manage files and directories. The chapter showed you how to use the Linux CLI to create, move, and remove both directories and files. The chapter also went through how to use globbing to specify file and directory ranges instead of single files in the commands, as well as how to use quoting to work with file and directory names that incorporate spaces.
- Master It You have been assigned the task of creating a new directory for the Engineering team on the Linux server. Under that directory they'd also like to have separate directories for the automotive project group and the truck project group. What commands should you enter to create these directories?
Use Linux commands to find files and directories. There are a few common Linux commands used to help find files on the Linux system. The which, locate, and whereis commands can be useful for general searches, but the find command allows you to customize your search by specifying specific file or directory properties to look for.
- Master It You have been tasked to find all files on your filesystem that are larger than 10 MB in size. What command would you use to easily find those files?
Use Linux commands to compress and archive files and directories. There are many different utilities available for compressing and archiving files in Linux. For archiving files, the gzip family of commands is a popular option. For archiving multiple files into a single file, the tar command is common. You can also compress a tar archive file to facilitate moving it to off‐site storage.
- Master It What commands should you use to create a backup archive file of the new /Engineering directory and compress it?.