Chapter 12. Moving Files Across the Network

image with no caption

This chapter surveys options for moving and sharing files between machines on a network. We’ll start by looking at some ways to copy files other than the scp and sftp utilities that you’ve already seen. Then we’ll briefly look at true file sharing, where you attach a directory on one machine to another machine.

This chapter describes some alternative ways to transfer files because not every file transfer problem is the same. Sometimes you need to provide quick, temporary access to machines that you don’t know much about, sometimes you need to efficiently maintain copies of large directory structures, and sometimes you need more constant access.

Let’s say you want to copy a file (or files) from your machine to another one on your network, and you don’t care about copying it back or need to do anything fancy. You just want to do it quickly. There’s a convenient way to do this with Python. Just go to the directory containing the file(s) and run

$ python -m SimpleHTTPServer

This starts a basic web server that makes the directory available to any browser on the network. It usually runs on port 8000, so if the machine you run this on is at 10.1.2.4, go to http://10.1.2.4:8000 on the destination and you’ll be able to grab what you need.

If you want to move an entire directory structure around, you can do so with scp -r—or if you need a little more performance, tar in a pipeline:

$ tar cBvf - directory | ssh remote_host tar xBvpf -

These methods get the job done but are not very flexible. In particular, after the transfer completes, the remote host may not have an exact copy of the directory. If directory already exists on the remote machine and contains some extraneous files, those files persist after the transfer.

If you need to do this sort of thing regularly (and especially if you plan to automate the process), use a dedicated synchronizer system. On Linux, rsync is the standard synchronizer, offering good performance and many useful ways to perform transfers. We’ll cover some of the essential rsync operation modes and look at some of its peculiarities.

To get rsync working between two hosts, the rsync program must be installed on both the source and destination, and you’ll need a way to access one machine from the other. The easiest way to transfer files is to use a remote shell account, and we’ll assume that you want to transfer files using SSH access. However, remember that rsync can be handy even for copying files and directories between locations on a single machine, such as from one filesystem to another.

On the surface, the rsync command is not much different from scp. In fact, you can run rsync with the same arguments. For example, to copy a group of files to your home directory on host, enter

$ rsync file1 file2 ... host:

On any modern system, rsync assumes that you’re using SSH to connect to the remote host.

Beware of this error message:

rsync not found
rsync: connection unexpectedly closed (0 bytes read so far)
rsync error: error in rsync protocol data stream (code 12) at io.c(165)

This notice says that your remote shell can’t find rsync on its system. If rsync isn’t in the remote path but is on the system, use --rsync-path=path to manually specify its location.

If your username is different on the remote host, add user@ to the hostname, where user is your username on host:

$ rsync file1 file2 ... user@host:

Unless you supply extra options, rsync copies only files. In fact, if you specify just the options described so far and you supply a directory dir as an argument, you’ll see this message:

skipping directory dir

To transfer entire directory hierarchies—complete with symbolic links, permissions, modes, and devices—use the -a option. Furthermore, if you want to copy to some place other than your home directory on the remote host, place this destination after the remote host, like this:

$ rsync -a dir host:destination_dir

Copying directories can be tricky, so if you’re not exactly sure what will happen when you transfer the files, use the -nv option combination. The -n option tells rsync to operate in “dry run” mode—that is, to run a trial without actually copying any files. The -v option is for verbose mode, which shows details about the transfer and the files involved:

$ rsync -nva dir host:destination_dir

The output looks like this:

building file list ... done
ml/nftrans/nftrans.html
[more files]
wrote 2183 bytes read 24 bytes 401.27 bytes/sec

Be particularly careful when specifying a directory as the source in an rsync command line. Consider the basic command that we’ve been working with so far:

$ rsync -a dir host:dest_dir

Upon completion, you’ll have a directory dir inside dest_dir on host. Figure 12-1 shows an example of how rsync normally handles a directory with files named a and b. However, adding a slash (/) significantly changes the behavior:

$ rsync -a dir/ host:dest_dir

Here, rsync copies everything inside dir to dest_dir on host without actually creating dir on the destination host. Therefore, you can think of a transfer of dir/ as an operation similar to cp dir/* dest_dir on the local filesystem.

For example, say you have a directory dir containing the files a and b (dir/a and dir/b). You run the trailing-slash version of the command to transfer them to the dest_dir directory on host:

$ rsync -a dir/ host:dest_dir

When the transfer completes, dest_dir contains copies of a and b but not dir. If, however, you had omitted the trailing / on dir, dest_dir would have gotten a copy of dir with a and b inside. Then, as a result of the transfer, you’d have files and directories named dest_dir/dir/a and dest_dir/dir/b on the remote host. Figure 12-2 illustrates how rsync handles the directory structure from Figure 12-1 when using a trailing slash.

When transferring files and directories to a remote host, accidentally adding a / after a path would normally be nothing more than a nuisance; you could go to the remote host, add the dir directory, and put all of the transferred items back in dir. Unfortunately, you must be careful to avoid disaster when combining the trailing / with the --delete option, because you can easily remove unrelated files this way.

To speed operation, rsync uses a quick check to determine whether any files on the transfer source are already on the destination. The quick check uses a combination of the file size and its last-modified date. The first time you transfer an entire directory hierarchy to a remote host, rsync sees that none of the files already exist at the destination, and it transfers everything. Testing your transfer with rsync -n verifies this for you.

After running rsync once, run it again using rsync -v. This time you should see that no files show up in the transfer list because the file set exists on both ends, with the same modification dates.

When the files on the source side are not identical to the files on the destination side, rsync transfers the source files and overwrites any files that exist on the remote side. The default behavior may be inadequate, though, because you may need additional reassurance that files are indeed the same before skipping over them in transfers, or you may want to put in some extra safeguards. Here are some options that come in handy:

With no special options, rsync operates quietly, only producing output when there is a problem. However, you can use rsync -v for verbose mode or rsync -vv for even more details. (You can tack on as many v options as you like, but two is probably more than you need.) For a comprehensive summary after the transfer, use rsync --stats.

Your Linux machine probably doesn’t live alone on your network, and when you have multiple machines on a network, there’s nearly always a reason to share files between them. For the remainder of this chapter, we’ll primarily be concerned with file sharing between Windows and Mac OS X machines, because it’s interesting to see how Linux adapts to completely foreign environments. For the purpose of sharing files between Linux machines, or for accessing files from a Network Area Storage (NAS) device, we’ll briefly talk about using Network File System (NFS) as a client.

If you have machines running Windows, you’ll probably want to permit access to your Linux system’s files and printers from those Windows machines using the standard Windows network protocol, Server Message Block (SMB). Mac OS X also supports SMB file sharing.

The standard file-sharing software suite for Unix is called Samba. Not only does Samba allow your network’s Windows computers to get to your Linux system, but it works the other way around: You can print and access files on Windows servers from your Linux machine with the Samba client software.

To set up a Samba server, perform these steps:

When you install Samba from a distribution package, your system should perform the steps listed above using some reasonable defaults for the server. However, it probably won’t be able to determine which particular shares (resources) on your Linux machine you offer to clients.

In general, you should only allow access to your Samba server with password authentication. Unfortunately, the basic password system on Unix is different than that on Windows, so unless you specify clear-text network passwords or authenticate passwords with a Windows server, you must set up an alternative password system. This section shows you how to set up an alternative password system using Samba’s Trivial Database (TDB) backend, which is appropriate for small networks.

First, use these entries in your smb.conf [global] section to define the Samba password database characteristics:

# use the tdb for Samba to enable encrypted passwords
security = user
passdb backend = tdbsam
obey pam restrictions = yes
smb passwd file = /etc/samba/passwd_smb

These lines allow you to manipulate the Samba password database with the smbpasswd command. The obey pam restrictions parameter ensures that any user changing their password with the smbpasswd command must obey any rules that PAM enforces for normal password changes. For the passdb backend parameter, you can add an optional pathname for the TDB file after a colon; for example, tdbsam:/etc/samba/private/passwd.tdb.

If you need only casual access to files in a disk share, use the following command. (Again, you can omit the -U username if your Linux username matches your username on the server.)

$ smbclient -U username '\\SERVER\sharename'

Upon success, you will get a prompt like this, indicating that you can now transfer files:

smb: \>

In this file transfer mode, smbclient is similar to the Unix ftp, and you can run these commands:

The standard system for file sharing among Unix systems is NFS; there are many different versions of NFS for different scenarios. You can serve NFS over TCP and UDP, with a large number of authentication and encryption techniques. Because there are so many options, NFS can be a big topic, so we’ll just stick to the basics of NFS clients.

To mount a remote directory on a server with NFS, use the same basic syntax as for mounting a CIFS directory:

# mount -t nfs server:directory mountpoint

Technically, you don’t need the -t nfs option because mount should figure this out for you, but you may want to investigate the options in the nfs(5) manual page. (You’ll find several different options for security using the sec option. Many administrators on small, closed networks use host-based access control. However, more sophisticated methods, such as Kerberos-based authentication, require additional configuration on other parts of your system.)

When you find that you’re making greater use of filesystems over a network, set up the automounter so that your system will mount the filesystems only when you actually try to use them in order to prevent problems with dependencies on boot. The traditional automounting tool is called automount, with a newer version called amd, but much of this is now being supplanted by the automount unit type in systemd.

Setting up an NFS server to share files to other Linux machines is more complicated than using a simple NFS client. You need to run the server daemons (mountd and nfsd) and set up the /etc/exports file to reflect the directories that you’re sharing. However, we won’t cover NFS servers primarily because shared storage over a network is often made much more convenient by simply purchasing an NAS device to handle it for you. Many of these devices are Linux based, so they’ll naturally have NFS server support. Vendors add value to their NAS devices by offering their own administration tools to take the pain out of tedious tasks such as setting up RAID configurations and cloud backups.

Speaking of cloud backups, another network file service option is cloud storage. This can be handy when you need the extra storage that comes with automatic backups and you don’t mind an extra hit on performance. It’s especially useful when you don’t need the service for a long time or don’t need to access it very much. You can usually mount Internet storage much as you would NFS.

Although NFS and other file-sharing systems work well for casual use, don’t expect great performance. Read-only access to larger files should work well, such as when you’re streaming audio or video, because you’re reading data in large, predictable chunks that don’t require much back-and-forth communication between the file server and its client. As long as the network is fast enough and the client has enough memory, a server can supply data as needed.

Local storage is much faster for tasks involving many small files, such as compiling software packages and starting desktop environments. The picture becomes more complicated when you have a larger network with many users accessing many different machines, because there are tradeoffs between convenience, performance, and ease of administration.