Too Many Computers to Update over the Internet

If all you administer is one or two Linux computers, updates are a straightforward process. All you need to do is configure updates from the most appropriate mirror on the Internet. If desired, you can automate downloads and installations of updates using a cron job. For more information on how to configure updates from yum and apt-based mirrors, see Chapter 8.

However, when you administer a large number of Linux computers, the updates can easily overload standard high-speed Internet connections. For example, if you're downloading updates to the OpenOffice.org suite, you could be downloading hundreds of megabytes of packages. If you're downloading these packages on 100 computers simultaneously, that may be too much for your Internet connection, especially when other jobs are pending.

In this annoyance, I'll show you how you can create a local mirror of your favorite update server. You can then share the appropriate directory and configure your updates locally.

Where possible, I'll show you how you can limit what you mirror to updates. For example, Fedora Linux includes dedicated update directories. Most downloads are associated with updates, so it's appropriate to limit what you mirror to such packages.

One other approach is to download just the packages and create the repository systems yourself. For example, the createrepo command strips the headers from each RPM and configures a database that helps the yum command find the dependencies associated with every package.

I assume you have the hard disk space you need on your mirror server. Repositories can be very demanding with respect to disk space; be aware, if you're synchronizing repositories for multiple architectures and distributions, that downloaded mirrors can easily take up hundreds of gigabytes of space.

Available Mirror Tools

There are a number of ways to download the files associated with a mirror. The most common standard is based on the rsync command. With rsync, you can synchronize your mirrors as needed, downloading only those parts of those packages that are new or have otherwise changed. I'll show you how you can use rsync in this annoyance.

There are a number of other tools available. Naturally, you can use any FTP client to download mirrors to local directories. Commands such as wget and curl do an excellent job with large downloads. If you're working with an apt repository, the apt-mirror project provides another excellent alternative (http://freshmeat.net/projects/apt-mirror/).

Basic Steps

To create your mirror, you can take these steps, which I'll detail in the following subsections:

Find an appropriate update mirror, specifically the one that gives you the best performance for individual updates. Some trial and error may be required. While the best update mirror is usually geographically close to you, that may not always be the case.
Make room for the updates. Several gigabytes may be required, especially if you're making room for updates for multiple distributions and/or versions. You may even consider using a dedicated partition or drive.
Synchronize the mirror locally. The first time you download a mirror, you may be downloading gigabytes of data.
If required, make your local mirror usable through your preferred update system.
Test a local update after you've downloaded a mirror to make sure it works.
Automate the synchronization process.
Point your clients to the local mirror.

Find the Best Update Mirror

The best update mirror may not be the one that is physically closest to your network. Some mirrors have faster connections to the Internet. Others have less traffic. Some mirror administrators may discourage full mirror downloads or even limit the number of simultaneous connections. And many public mirrors don't support rsync connections.

Our selected distributions have "official" lists of update mirrors. More may be available. If a mirror includes a Fedora repository, it may also include a SUSE repository. For example, while the University of Mississippi is not (currently) on the official list of mirrors for SUSE Linux, updates are available from its server at http://mirror.phy.olemiss.edu/mirror/suse/suse/. Here's where to find the "official" list of mirrors for our selected distributions:

Fedora Core Linux: http://fedora.redhat.com/download/mirrors.html includes a list of mirrors accessible through the rsync protocol; don't limit yourself to those specified, as others may also work with rsync.
SUSE Linux: Official mirrors of the open source SUSE distribution can be found at http://en.opensuse.org/Mirrors_Released_Version. Trial and error is required to find rsync-capable mirrors.
Debian Linux: Official Debian mirrors can be found at http://www.debian.org/mirror/list. Many support a limited number of architectures. Trial and error is required to find rsync-capable mirrors.

To see if a mirror works with the rsync protocol, run the rsync command with the URL in question. For example, if you want to check the mirror specified in the Debian Mirror List from the University of Southern California, run the following command (and don't forget the double colon at the end):

rsync mirrors.usc.edu::

When I ran this command, I saw a long list of directories, clearly associated with various Linux distributions, including SUSE, Fedora, and others. If there is no rsync server at your desired site, the rsync command will time out, or you'll have to press Ctrl-C to return to the command line.

Finding the best update mirror is somewhat subjective. Yes, you could go by objective measures, such as the time required for the download. But conditions change. Internet traffic can slow down in certain geographic areas. Servers do go down. Some trial and error may be required.

Tip

Fedora had implemented an "apt-get mirror select" for apt-based repositories. But Fedora is moving away from the apt commands, and Red Hat developers are working on plug-ins for yum that function in the same way.

Make Room for the Updates

Updates can consume gigabytes of space. The choices you make can make a significant difference in the space you need. Key factors include:

Architectures: Every architecture that you maintain locally can multiply the space you need. For example, if you're rolling out both 64-bit and 32-bit workstations, you'll need at least double the space.
Distributions: If you're maintaining mirrors for more than one distribution, your space requirements increase accordingly.
Distribution Versions: If you're maintaining mirrors for more than one version of a distribution (such as for Fedora Core 4 and 5), your space requirements can multiply.
Installation Files: Many administrators find it convenient to include a copy of the installation trees in the update repository partition. This increases the space required by the size of the installation CDs/DVDs.

You may want to create a dedicated partition for your update repositories. That way, you can be sure that the space required by the repository does not crowd out the rest of your system.

Tip

If you're configuring mirrors for 64-bit Linux RPM-based distributions, focus on yum. The apt tools currently have trouble with repositories that mix 32-bit and 64-bit packages, as is currently required for a number of applications. I know of no similar problems for Debian distributions.

Synchronize the Mirror

Along with perhaps most of the world of Linux, I like the rsync command. With appropriate switches, it's easy to use this command to copy the files and directories that you want. Once you've set up a mirror, you can use the rsync command as needed to keep your local mirror up-to-date.

The rsync command is straightforward; I use it to back up the home directory from my laptop computer with the following command:

rsync -a -e ssh michael@laptop.example.com:/home/michael/* /backup

Tip

If you've set the environment variable ENV_RSYNC=ssh, you don't need the -e ssh option. For more information on rsync, see the "I'm Afraid of Losing Data" annoyance in Chapter 2.

In the following subsections, I illustrate some simple examples of how you can create your own rsync mirror on our selected distributions. This assumes you're using an appropriate directory, possibly configured on a separate disk or partition.

Synchronizing a Fedora mirror

For this exercise, assume you want to synchronize your local update mirror with the one available from kernel.org. The entry in the list of Fedora mirrors is a little deceiving. When you see the following:

rsync://mirrors.kernel.org/fedora/core/

You'll need to run the following command to confirm that rsync works on that server, as well as to view the available directories (don't forget the trailing forward slash):

rsync mirrors.kernel.org::fedora/core/

When I ran this command, I saw the result shown here:

MOTD:   Welcome to the Linux Kernel Archive.
MOTD:
MOTD:   Due to U.S. Exports Regulations, all cryptographic software on this
MOTD:   site is subject to the following legal notice:
MOTD:
MOTD:   This site includes publicly available encryption source code
MOTD:   which, together with object code resulting from the compiling of
MOTD:   publicly available source code, may be exported from the United
MOTD:   States under License Exception "TSU" pursuant to 15 C.F.R. Section
MOTD:   740.13(e).
MOTD:
MOTD:   This legal notice applies to cryptographic software only.
MOTD:   Please see the Bureau of Industry and Security,
MOTD:   http://www.bis.doc.gov/ for more information about current
MOTD:   U.S. regulations.
MOTD:

drwxr-xr-x        4096 2005/06/09 09:40:43 .
drwxr-xr-x        4096 2004/03/01 08:39:30 1
drwxr-xr-x        4096 2004/05/14 04:18:24 2
drwxr-xr-x        4096 2004/11/03 15:00:14 3
drwxr-xr-x        4096 2005/06/09 09:41:47 4
drwxrwsr-x        4096 2005/12/16 23:49:44 development
drwxr-xr-x        4096 2005/11/22 06:14:23 test
drwxrwsr-x        4096 2005/06/07 08:29:19 updates
[michael@FedoraCore4 rhn]$

Naturally, Fedora Core production releases (which should also be available on the installation CDs/DVDs) are associated with the numbered directories. But the focus in this annoyance is on updates, which is the last directory listed on the server. Hopefully, this directory includes updates divided by Fedora Core releases.

To make sure this server includes the updates I need, I ran the following command:

rsync mirrors.kernel.org::fedora/core/updates/

I continued the process until I confirmed that this server included the update RPMs that I wanted to mirror. I wanted to create an Apache-based repository, so I mirrored the RPMs to the /var/www/html/yum/Fedora/Core/updates/4/i386 directory.

Tip

By default, the DocumentRoot associated with the default Fedora Apache configuration points to the /var/www/html directory; if I configure a local Apache server, I can use the Fedora/Core/updates/4/ subdirectory.

Then, to synchronize the local and remote update directories, I ran the following command:

rsync -a mirrors.kernel.org::fedora/core/updates/4/i386/. \
/var/www/html/yum/Fedora/Core/updates/4/i386

Synchronizing a SUSE mirror

Because the SUSE list of mirrors doesn't specify which are rsync servers, some trial and error is required. For this exercise, I attempted to synchronize my local update mirror with that available from the University of Utah. The listing that I saw in the SUSE mirror list as of this writing was:

suse.cs.utah.edu/pub/

I tried the following command, which led to an error message:

rsync suse.cs.utah.edu::pub/
@ERROR: Unknown module 'pub'
rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(359)

So I tried the top-level directory and found the SUSE repositories at the top of the list:

rsync suse.cs.utah.edu::
suse            The full /pub/suse directory from ftp.suse.com.
people          The full /pub/people directory from ftp.suse.com.
projects        The full /pub/projects directory from ftp.suse.com.

And, with a little browsing, as described in the previous section, I found the SUSE update directories with the following command:

rsync suse.cs.utah.edu::suse/i386/update/10.0/

I wanted to download updates associated with SUSE 10.0 to the following directory:

/var/lib/YaST2/you/mnt/i386/update/10.0/

I could run the following command to synchronize all updates from the update directory at the University of Utah (the -v uses verbose mode, and the -z compresses the transferred data):

rsync -avz suse.cs.utah.edu::suse/i386/update/10.0/. \
/var/lib/YaST2/you/mnt/i386/update/10.0/

But that might transfer more than you need. If you explore a bit further, you'll find source packages as well as packages built for 64-bit and PPC CPU systems. If you have only 32-bit workstations, you don't need all this extra data. You can use the --exclude switch to avoid transferring these packages:

rsync -avz --exclude=*.src.rpm --exclude=*.ppc --exclude=*x86_64* \ suse.cs.utah.edu:
:suse/i386/update/10.0/. \
/var/lib/YaST2/you/mnt/i386/update/10.0/

Synchronizing a Debian mirror

Debian mirrors are somewhat different. Besides the different package format, Debian mirrors do not include any separate update servers. Therefore, if you want to mirror a Debian update server, you'll have to install all the packages in the server (except any that you specifically exclude).

Because the Debian list of mirrors does not specify rsync servers, some trial and error may be required. For this exercise, I wanted to synchronize my local update mirror with that available from the University of California at Berkeley. The listing that I saw from this mirror was:

rsync linux.csua.berkeley.edu::
debian
debian-non-US
debian-cd

In other words, this revealed the directories associated with Debian CDs as well as non-U.S. packages. For now, I assume that you want to mirror the regular Debian repositories. I found them with the following command:

rsync linux.csua.berkeley.edu::debian/dists/Debian3.1r0/main/

But as you can see from the output shown below, there are a number of directories full of packages that you may not need, unless you want to include the installers, as well as the binary packages associated with the full Debian range of architectures:

drwxr-sr-x        4096 2005/06/04 10:20:54 .
drwxr-sr-x        4096 2005/12/17 00:33:29 binary-alpha
drwxr-sr-x        4096 2005/12/17 00:39:50 binary-arm
drwxr-sr-x        4096 2005/12/17 00:48:56 binary-hppa
drwxr-sr-x        4096 2005/12/17 00:55:50 binary-i386
drwxr-sr-x        4096 2005/12/17 01:01:22 binary-ia64
drwxr-sr-x        4096 2005/12/17 01:07:29 binary-m68k
drwxr-sr-x        4096 2005/12/17 01:15:06 binary-mips
drwxr-sr-x        4096 2005/12/17 01:23:07 binary-mipsel
drwxr-sr-x        4096 2005/12/17 01:29:11 binary-powerpc
drwxr-sr-x        4096 2005/12/17 01:35:33 binary-s390
drwxr-sr-x        4096 2005/12/17 01:41:44 binary-sparc
drwxr-sr-x        4096 2004/01/04 11:47:29 debian-installer
drwxr-sr-x        4096 2005/03/24 00:22:16 installer-alpha
drwxr-sr-x        4096 2005/03/24 00:22:16 installer-arm
drwxr-sr-x        4096 2005/03/24 00:22:17 installer-hppa
drwxr-sr-x        4096 2005/03/24 00:22:17 installer-i386
drwxr-sr-x        4096 2005/03/24 00:22:17 installer-ia64
drwxr-sr-x        4096 2005/03/24 00:22:17 installer-m68k
drwxr-sr-x        4096 2005/03/24 00:22:17 installer-mips
drwxr-sr-x        4096 2005/03/24 00:22:17 installer-mipsel
drwxr-sr-x        4096 2005/03/24 00:22:17 installer-powerpc
drwxr-sr-x        4096 2005/03/24 00:22:17 installer-s390
drwxr-sr-x        4096 2005/03/24 00:22:17 installer-sparc
drwxr-sr-x        4096 2005/12/17 01:45:08 source
drwxr-sr-x        4096 2005/06/04 11:40:37 upgrade-kernel

To download just the directories that you need, you can go into the appropriate subdirectory, or you can make extensive use of the --exclude switch. Debian recommends the latter. For example, if all of your workstations include Intel Itanium CPUs, you can run a command that excludes all files and directories not associated with the IA64 architecture. Debian recommends that you include the --recursive, --times, --links, --hard-links, and --delete switches, too. The basic steps to creating your mirror are:

Recursively download and synchronize files from all subdirectories
Preserve the date and time associated with each file
Re-create any existing symlinks
Include any hard-linked files
Delete any files that no longer exist on the mirror

If I wanted to limit the downloads to the ia64 directory, I would include the following switches:

rsync -avz --recursive --times --links --hard-links --delete
--exclude binary-alpha/ --exclude *_alpha.deb
--exclude binary-arm/ --exclude *_arm.deb
--exclude binary-hppa/ --exclude *_hppa.deb
--exclude binary-i386/ --exclude *_i386.deb
--exclude binary-m68k/ --exclude *_m68k.deb
--exclude binary-mips/ --exclude *_mips.deb
--exclude binary-mipsel/ --exclude *_mipsel.deb
--exclude binary-powerpc/ --exclude *_powerpc.deb
--exclude binary-s390/ --exclude *_s390.deb
--exclude binary-sparc/ --exclude *_sparc.deb

But things are beginning to get complicated. Debian provides a script that can help. All you'll need to do before running the script is to specify a few directives, including the rsync server, directory, and architectures to exclude. To see the script, navigate to http://www.debian.org/mirror/anonftpsync. For additional discussion of this rsync script, see http://www.debian.org/mirror/ftpmirror.

Making Your Mirror Work with Your Update System

Now that you have a local mirror of Linux updates, you'll need to make sure it's usable through your update system. For our selected distributions, I'm assuming that you're using yum for Fedora, apt for Debian, or YaST for SUSE Linux. This step involves creating the database that your packaging system consults on each host to know what it's already updated and to stay in sync.

I also assume that you've shared the update directory using a standard sharing service, such as FTP, HTTP, or NFS. I've described the basic methods associated with yum and apt updates in Chapter 8. If you're connecting to a shared NFS directory, substitute file:/// (with three forward slashes) for http:// or ftp://.

Generally, when you use rsync to copy and synchronize to local mirrors, you've also downloaded the directories that support the apt or yum databases.

Creating apt repository database files

If you're using apt for updates, such as for Debian Linux, you may already have the key database files: Packages.gz for regular binary packages and Sources.gz for source packages. Based on the Debian mirror described earlier, you can find these files in the following directories:

linux.csua.berkeley.edu/debian/dists/Debian3.1r0/main/binary-i386/
linux.csua.berkeley.edu/debian/dists/Debian3.1r0/main/source/

If you need to create your own versions of these database files, navigate to the directory with the binary packages and run the following command:

dpkg-scanpackages . /dev/null | gzip -9c > Packages.gz

And for the database of source packages, navigate to the directory with those packages and run the following command:

dpkg-scansources . /dev/null | gzip -9c > Sources.gz

For more information on this process, see http://www.interq.or.jp/libra/oohara/apt-gettable/.

Creating yum repository database files

There are two ways to create a yum repository database. Through Fedora Core 3, the standard was the yum-arch command, which is included in the yum RPM. Since that time, the standard has become the createrepo command, based on a package of the same name. For the older Fedora distributions (as well as the rebuild distributions of Red Hat Enterprise Linux 3 and 4, which use yum for updates), you can create your own yum repository database by navigating to the package directory and running the following command:

yum-arch .

As yum "digests" the package headers, it collects them in a headers/ subdirectory.

For later Fedora distributions, assuming the packages are in the directory described earlier for Fedora updates, you'd run the following command:

createrepo /var/www/html/yum/Fedora/Core/updates/4/i386

This command creates an XML database in the repodata/ subdirectory. If your mirror process already copied either of these directories, you don't need to create it.

Test a Local Update

Now you'll want to test a local update. I described some of the update systems in Chapter 8. To summarize, for any of our three distributions, you'll need to make some configuration changes to point the package manager to the update server you created on your local network:

Updating yum for Fedora

If you're updating yum for Fedora, you'll want to update the appropriate configuration files in the /etc/yum.repos.d directory. If your local mirror consists of Fedora updates, the file is fedora-updates.repo. For example, if you've shared the directory described in the previous section via NFS and have mounted the appropriate directory, you would substitute the following for the default baseurl directive:

baseurl=file:/var/www/html/yum/Fedora/Core/updates/4/i386/

Updating YaST for SUSE

If you're updating YaST for SUSE Linux, you'll need to point the update server to the shared local directory. In the appropriate YaST menu, you can configure a connection to any of several servers, including FTP, HTTP, or NFS servers from the local network. For example, if I've created an FTP server that points to the SUSE repository directory created earlier, I'd select FTP, cite the name of the server, and point to the following directory on that server: /var/lib/YaST2/you/mnt/i386/update/10.0/

Updating apt for Debian

If you're updating apt for Debian Linux, you'll want to update the appropriate URLs configured in /etc/apt/sources.list. For example, if you've mirrored a repository for Debian Sarge and created an HTTP server on your local network, on a computer named debianrep, in the web server's /repo subdirectory, you'd add the following line to each clients' sources.list file:

deb http://debianrep/repo sarge main

Once you change the appropriate configuration file, you can test updates from the local server that you created.

Automate the Synchronization Process

When you're satisfied that the local update server meets your needs, you'll want to automate the synchronization process. To do so, insert the rsync command(s) that you used in a cron job file. If you had to create yum or apt database files, you'll want to add those commands described earlier to the cron job.

Even after the first time you create a mirror, the downloads for updates can be extensive. For example, updates to the OpenOffice.org suite alone can occupy several hundred megabytes.

Therefore, you'll want to schedule the cron job for a time when few or no other jobs are running. And that depends on the schedule of other cron jobs, as well as any other jobs (such as database processing) that may happen during off-hours.

Connecting Local Workstations

Once you've tested your local mirror, and then configured regular updates to that mirror, you're ready to connect your local workstations to it. You'll need to modify the same files as described earlier in the "Test a Local Update" section.

If you want to configure automatic updates on your workstations from your local repositories, you'll need to configure cron jobs on each host.

Warning

Remember, updates always carry some degree of risk. But when you update the system with the local repository, you're testing at least some of the updates. You have to decide if you want to do more testing or allow automatic updates to the production systems on your network. You can always create a script to log in to and update each of the production systems when you're ready.

Some distributions support GUI configuration of automated updates; SUSE supports it directly via YaST (which is saved to /etc/cron.d/yast2-online-update).

If you've installed the latest version of yum on Fedora Core, there's a cron job already configured in /etc/cron.daily/yum.cron. To let it run, you'll need to activate the yum service in the /etc/init.d directory.

Creating an update script is a straightforward process, with the following general steps:

Create a cron job in the appropriate directory. If you want a weekly update, add it to /etc/cron.weekly.
Make sure the script checks for the latest version of the update-management command. For example, if you're updating with apt, make sure it's up-to-date with the following command:
```
apt-get install apt
```
I use apt-get install and not apt-get upgrade, so I don't have to worry about pending updates to other packages. If the package is already installed, it is automatically upgraded.
If you're running apt, you'll need to make sure the local cache of packages is up-to-date:
```
apt-get update
```
Finally, apply the update command that you need, such as the following:

apt-get dist-upgrade

My Favorite Service Is Not Included with My Distribution

As distributions evolve, developers make changes. Sometimes, the developers behind a distribution choose to drop services. Sometimes the service that you're most comfortable with was never built for your distribution. Sometimes people convert from distributions or allied systems, such as HP-UX or Sun Solaris, where different services are available. In any of these situations, you'll have to look beyond the distribution repositories to install the service you want.

For example, while the WU-FTP server is the default on Sun Solaris 10, it has been dropped from Red Hat and Fedora distributions. It isn't even available in the Fedora Extras repository. Nevertheless, if a company is converting from Solaris to Red Hat Linux, the administrators would naturally look to install WU-FTP on Red Hat Enterprise Linux. (In my opinion, that would be a mistake, but we'll explore that issue in more detail in this annoyance.)

Check the Home Page for the Service

The developers behind your favorite service may have built what you want for your distribution. If they have, that is your best option, as it ensures that:

Configuration files are installed in appropriate locations.
The package becomes part of your database.
The developers are motivated to help you if there are distribution-specific problems.

If the developers behind a service have built their software, and have customized a package for a specific distribution, they have an interest in making sure that it works on that distribution.

However, if the service is not built for your distribution, don't immediately try building or compiling the service from its source code. While that might be your best option (especially if you're customizing the service), I believe there are alternatives that should be explored first.

Explore Alternative Services

One of the joys associated with open source software is choice. Rarely is there only one option for a service. For example, there are a wide variety of FTP servers that you can install on Linux systems. They include ProFTP, vsFTP, Muddleftp, glFTP, Pure-FTP, and WU-FTP. I've left out a few, including those built on Java.

But if it's a major service, your distribution should have at least one natively configured option for that service. For example, Red Hat Enterprise Linux includes vsFTP as the only FTP server. That's quite an expression of faith from the leading Linux distribution, enough to make many geeks take a closer look at vsFTP.

You can also explore alternative software for your service. You may be able to find alternatives in the Linux application libraries, described in "So Many Options for Applications" in Chapter 4. You may be able to find other options in third party repositories described in the next section. You may also be able to find alternatives online, perhaps with a search through Wikipedia (http://www.wikipedia.org) or Google.

In other words, if you find that your preferred server software is not available for your distribution, you should look for alternatives. That means:

Trying the software provided by your distribution for the service
Looking for alternatives from third parties who may have built similar software for the desired service
Examining other alternatives that can be installed on your system

Look for a Third Party Who Has Built the Package for Your Distribution

If you can't find the software you want included with your distribution, you can look to third parties to help. These developers generally take the source code from original developers and build appropriate RPM or DEB packages suitable for the distribution of your choice.

There are a number of third-party repositories available for Linux distributions. They generally include software not available from the main repositories. For example, in the "I Need a Movie Viewer" annoyance in Chapter 4, I described some third-party repositories that included the libdvdcss package needed to view commercial DVDs.

The drawback of a third-party repository is that its packages may not be fully tested, especially with respect to the libraries that you might install on your distribution. In fact, there are reports of geeks who have run into incompatible libraries when they use more than one third-party repository.

You can get direct access to a third-party repository through your update software. Specifically, you can point yum, apt, and YaST systems directly to the appropriate URLs for the third-party repositories of your choice.

Generally, third-party repositories include instructions on how to include them in your update software and/or configuration files. For our preferred distributions, you can find a list of third-party repositories in the following locations.

Red Hat/Fedora: Individuals within the Fedora project help integrate connections with a number of third-party repositories. While the focus is on Fedora Core, most of these repositories include separate URLs you can use for Red Hat Enterprise Linux (as well as rebuild distributions based on Red Hat Enterprise Linux source code). Instructions are usually available on the web page for each third-party repository. As of this writing, the status for the major Fedora repositories can be found online at http://fedoranews.org.
SUSE Linux Professional: SUSE has traditionally included a lot of extra software with its DVDs. And more is available from third parties. Several are listed for your information at http://en.opensuse.org/YaST_package_repository. You can include them as an installation source in YaST. However, SUSE warns that "YaST fully trusts installation sources and does not perform any kind of authenticity verification on the contained packages." In other words, SUSE's third-party repositories might not include a GPG key, as you see with Fedora's repositories.
Tip
Many third-party repositories for SUSE distributions do have GPG keys. One central location for many of these repositories can be found at ftp://ftp.gwdg.de/pub/linux/misc/apt4rpm/rpmkeys.
Debian Linux: The repositories associated with Debian Linux are extensive, which is natural for a community-based distribution. Be careful with the list at http://www.apt-get.org/main/; many of the repositories are dedicated to specific versions of Debian such as Potato, which has been obsolete since 2002.

By their very nature, these lists of third-party repositories may not be complete. And as the developers behind these repositories may not coordinate their efforts, including more than one third-party repository on your Linux system may lead to unpredictable results.

Try Installing the Older Package

If a package used to be built for your distribution, it may still work for the newer version. For example, if you absolutely need the WU-FTP server on Red Hat Enterprise Linux (RHEL) 4, there are ways to get old versions.

For the purpose of this annoyance, I installed the latest available version of WU-FTP built for Red Hat. It's available from the Fedora Legacy project, at http://fedoralegacy.org, from the updates repository associated with Red Hat Linux 7.3. When I tried to install it on RHEL 4, I got a message suggesting that I install the Open SSL toolkit, which addresses the security vulnerabilities associated with WU-FTP, at least as of its release in 2004.

Tip

Because of the security issues associated with it, I do not recommend WU-FTP. However, it may be helpful in a transition from a different operating system where WU-FTP is the default, such as Solaris. The security issues can be managed behind firewalls until your transition is complete.

Once the appropriate packages were installed, I was able to get WU-FTP running on RHEL 4. While using old versions is not recommended as a general solution, the installation of familiar software and services can ease transitions, even for organizations moving just from one version of Linux (or Unix) to another.

Install from Source Package, if Available in the Appropriate Format

In some cases, the appropriate service is available as a source code package, customized for the desired distribution. This option is most common for the "rebuild" distributions associated with RHEL.

For RHEL, Red Hat complies with the GNU General Public License by releasing its source code. As Red Hat has released the source code in Source RPM packages, you can try to install those packages on any RPM-based distribution. These packages are publicly available from Red Hat at ftp.redhat.com, in the pub/redhat/linux/enterprise/4/en/os/i386/SRPMS/ subdirectory.

If you're running RHEL Workstation, you don't have the server packages included with the RHEL Server distributions. One example is the vsFTP server. It goes almost without saying that if you install a package available only on RHEL Server on a RHEL Workstation, you should not expect support for that package from Red Hat. I've downloaded the RHEL 4 Source RPM for the vsFTP server on my RHEL Workstation. Once downloaded, I can install it using the following steps:

Run the following command to unpack the source code from the vsFTP server to the /usr/src/redhat directory:
```
rpm -ivh vsftpd-2.0.1-5.src.rpm
```
The source code is unpacked to a .spec file in the /usr/src/redhat/SPECS directory, as well as various source and patch files in the /usr/src/redhat/SOURCES directory.
Navigate to the directory containing the .spec file.
Build the binary RPM (as well as source information) with the following command:
```
rpmbuild -bb vsftpd.spec
```
In this particular case, the .spec file creates two binary RPMs and stores them in the /usr/src/redhat/RPMS/i386 directory.
Install the binary RPMs just like any other Red Hat RPMs.

This process doesn't always work. As different tools are used by the rebuild distributions, you generally can't use the kernel source code released by Red Hat on a RHEL rebuild distribution, as they have been built by different teams of developers, using different tools.

Install from a Tarball

You can always install a Linux service (or any other Linux software) from the original source code. Generally, it's available only as a compressed tar archive. Once you download the archive, you'll want to decompress it. The command you use depends on the compression format, which is normally associated with the archive extension. For example, if the archive has a .tar.gz or .tgz extension, such as archive.tar.gz, you can decompress it with the following command:

tar xzvf archive.tar.gz

Alternatively, archives with a .tar.bz or .tar.bz2 extension can be decompressed with the tar xjvf command. Normally, archived files packaged for a service are decompressed to a separate subdirectory, with the name of the archive.

The methods for installing from source code vary widely. Detailed instructions are normally made available in a text file in the decompressed archive.

Configuring a Linux Gateway

Configuring a Linux gateway normally requires three basic administrative steps:

Configuring your system to forward IP traffic.
Setting up masquerading.
Creating a firewall between networks.

The only thing you absolutely need to do is configure IP forwarding. It is disabled by default. For this annoyance, I assume you're configuring a computer with two network cards, and each card is connected to a different network.

There are many excellent firewall configuration tools, but this annoyance shows you how to configure the system by hand. If you use the tools, you'll overwrite the configuration files that you may create as you review this annoyance.

IP Forwarding

Linux normally disables IP forwarding between network cards, and it is disabled in the default configurations of our preferred distributions. The way you activate IP forwarding depends on whether you've configured an IPv4 or IPv6 network.

Here, I assume that your system supports the /proc filesystem with kernel settings, along with the sysctl program to access kernel switches. Your system meets these requirements if you have a /proc directory and an /etc/sysctl.conf file.

If there are problems, you'll want to make sure the appropriate settings are active in your kernel. Specifically, you should see the following settings in the active config-* file in the /boot directory:

CONFIG_PROC_FS=y
CONFIG_SYSCTL=y

If these settings don't reflect what you need, you can't just edit this configuration file. In that case, you'll need to recompile the kernel, as described in the "Recompiling the Kernel" annoyance in Chapter 7.

Forwarding on an IPv4 network

To activate forwarding on an IPv4 network, you'll need to toggle the ip_forward setting in the appropriate kernel configuration directory. The simplest way to do so is with the following command:

echo "1" > /proc/sys/net/ipv4/ip_forward

To make sure forwarding is turned on the next time you boot your computer, open /etc/sysctl.conf and add the following directive:

net.ipv4.ip_forward = 1

Forwarding on an IPv6 network

To activate forwarding on an IPv6 network, you'll need to toggle the forwarding setting in the appropriate kernel configuration directory. The simplest way to do so is with the following command:

echo "1" > /proc/sys/net/ipv6/conf/all/forwarding

To make sure forwarding is turned on the next time you boot your computer, open /etc/sysctl.conf and add the following directive:

net.ipv6.conf.all.forwarding = 1

This assumes you've installed all other components required to configure an IPv6 network. For more information, see the related HOWTO written by Peter Bieringer at http://www.tldp.org/HOWTO/Linux+IPv6-HOWTO/.

IP Masquerading

When you have one IP address on the Internet for your network, you need to find a way to share it with all the computers on your network. The standard is with IP masquerading. Once configured, your gateway substitutes the IP address of the network interface card it uses to reach the Internet for the address of any computer on your network that requests data from the Internet.

Naturally, IP masquerading assumes you've activated IP forwarding, as I described in the previous section.

The current standard for configuring IP address translation on a gateway is iptables, the same command used to erect firewalls. Here you use it to alter network packets with Network Address Translation, specifically with the iptables -t nat command.

As an example, if your Internet connection uses a device named wlan0 and your LAN uses IP addresses on the 10.11.12.0/16 private network, the command you need is:

iptables -t nat -A POSTROUTING -s 10.11.12.0/16 -o wlan0 -j MASQUERADE

As described earlier, this command uses Network Address Translation. It adds (-A) the rule to the end of the iptables chain. It modifies network packets as they leave the network (POSTROUTING). It specifies (-s) source IP addresses to be those from your LAN (10.11.12.0/16). It points to wlan0 as the output interface (-o). For all data that meets these standards, computers on your LAN MASQUERADE on the external network with the IP address assigned to wlan0.

To save this command, you'll need to run iptables-save and send the result to a file with a command such as:

iptables-save >> firewall

You could save the iptables commands to the standard configuration file for the distribution, but that would risk conflicts with settings written by tools such as Red Hat's Firewall Configuration tool. If you want to make these commands part of your firewall, you'll have to modify those files manually.

Firewalls

Detailed instructions on creating a firewall are beyond the scope of this book. However, the gateway between networks is the best place to create a firewall, so I'll mention some of the considerations for doing so.

Both Red Hat/Fedora and SUSE Linux have their own firewall configuration tools. These tools are excellent and can be used to create a fairly simple firewall. You can build upon the firewall created by these tools as needed.

You can start the standard Red Hat/Fedora Firewall Configuration tool with the system-config-securitylevel command. Results are saved to /etc/sysconfig/iptables.

You can open the SUSE firewall tool in YaST. Results are saved to /etc/sysconfig/SUSEfirewall2.

There is no standard firewall tool available for Debian. However, there are a substantial number of available options, including several excellent GUI tools.

In addition, a number of third-party firewall generators are available online. As is standard with open source software, neither I nor O'Reilly endorses any of these systems (or anything else in the book).

For more information, see the related annoyance "My Firewall Blocks My Internet Access," in Chapter 8.

My Other Computer Has No Monitor

There are two reasons why you may want remote access. First, the computer you want to use may be too far away. Second, the computer, as with many servers, may not even have a monitor.

There are several ways to configure remote access to a Linux server. As described in Chapter 9, in the "Users Are Still Demanding Telnet" annoyance, Telnet is one method. While Telnet is insecure, I described methods you can use to encrypt and further secure Telnet communications in that chapter.

Perhaps the best way to configure secure access to a remote Linux system is through the Secure Shell (SSH). Connections through SSH are encrypted. You can even set up encryption keys and password phrases that are not transferred during logins. As described in the next annoyance, you can even use SSH to access GUI applications remotely.

What I describe in this annoyance just covers the basics associated with creating an SSH server and connection. For more information, see SSH, The Secure Shell: The Definitive Guide by Daniel J. Barrett et al. (O'Reilly).

Configure SSH

Security is provided through the Secure Shell, and access can be configured through the appropriate SSH configuration file. You'll find two configuration files in the /etc/ssh directory, sshd_config and ssh_config. You can configure both files: sshd_config on the server, and ssh_config on each client. You can also use some of the switches available with the ssh command or customize a client for an individual user with a file in the appropriate home directory.

One possible security issue with SSH is related to user keys, which are stored in ~/.ssh/ under their home directories. If your workstations use NFS to mount home directories from a central server, your encrypted keys will be transmitted over the network in clear text. Anyone who intercepts this transmission can eventually decrypt those keys. If this describes your configuration, consult SSH, The Secure Shell: The Definitive Guide by Daniel J. Barrett et al. (O'Reilly) for an alternative configuration.

Tip

Generally, when you configure SSH, it's mostly done on the server. Any configuration you do on the client, through /etc/ssh/ssh_config, is secondary.

After you make any changes to the configuration files, remember that you'll have to restart the SSH server. On Debian Linux, you can do so with the following command:

/etc/init.d/ssh restart

On SUSE and Red Hat/Fedora Linux, the command is slightly different:

/etc/init.d/sshd restart

Limiting Access on the SSH Server

The SSH server configuration file, /etc/ssh/sshd_config, supports direct access by default. You can limit access by user, by group, and by network. If you're supporting access through a firewall, you'll need to provide appropriate access through that barrier.

Limiting access by user

You can limit access by user with the AllowUsers directive. If there is no such directive in the /etc/ssh/sshd_config configuration file, all users are allowed on the SSH server (unless otherwise prohibited via Pluggable Authentication Modules, as described in Chapter 10).

For example, if I want to allow only donna to access this server via SSH, I can add the following directive:

AllowUsers donna

You can add AllowUsers directives for all users for whom you want to authorize access via SSH. For example, I could add the following directives to limit access to three users:

AllowUsers donna
AllowUsers nancy
AllowUsers randy

Alternatively, you can use the DenyUsers directive to prohibit access to certain accounts.

You may want to deny access to the most privileged user. This requires a different directive:

PermitRootLogin no

SSH allows root logins by default. So if you want to minimize the risk to the administrative account, this directive is important.

Specific network

You can further refine the AllowUsers directive. For example, you can limit access from users on the remote computer named enterprise4a to donna's account:

AllowUsers donna@enterprise4a

Don't let the @ confuse you. This directive does not specify an email address. It specifies a local account and a remote computer from where users are allowed to log in to that account. You can substitute an FQDN for enterprise4a.

Some wildcards are supported. For instance, if you want to support access from the 192.168.0.0/24 network to all local accounts, use the following directive:

AllowUsers *@192.168.0.*

Specific group

Just as the AllowUsers and DenyUsers directives can help you regulate access via SSH to accounts on the local server, the AllowGroups and DenyGroups directives can do the same, based on group accounts as defined in /etc/group.

External access via firewall

If you have a firewall between desired SSH clients and servers, you'll need to make sure that the firewall allows SSH connections. For your convenience, allowing SSH connections is a standard option with the Red Hat/Fedora and SUSE firewall configuration tools. If you're configuring your firewall manually, you'll have to make sure to allow TCP and UPD connections through port 22.

Create Encryption Keys

Sending passwords over a network can be a problem. While SSH communications are encrypted, if a cracker can determine when you send your password and intercept it over your network, he can eventually decrypt it.

The SSH system supports the use of passphrases, which can be more complex than regular passwords (you can even use complete sentences such as "I live 40 feet from the North Pole."). Commands such as ssh-keygen allow you to create a private and public key based on the passphrase. The standard is 1024-bit encryption, which makes the passphrase (or the associated keys) much more difficult to crack.

Once the public key is transferred to the remote system, you'll be able to use SSH to log in to the remote system. The passphrase activates the private key. If matched to the public key on the remote system, an SSH login is allowed.

Create and transfer the private and public keys as follows:

Choose an encryption algorithm (I've arbitrarily selected DSA) and generate a private and public key in your home directory (I use /home/michael/.ssh here) with a command like:
```
ssh-keygen -t dsa -b 1024 -f /home/michael/.ssh/enterprise-key
```
When prompted, enter a passphrase. Passphrases are different from standard passwords. They can include whole sentences, such as:
```
I like O'Reilly's ice cream
```
This particular ssh-keygen command generates two keys, putting them in the enterprise-key and enterprise-key.pub files in the /home/michael/.ssh/ directory. You can (and probably should) choose a different passphrase for the encryption key.
Next, transmit the public key that you've created to the remote computer. The following command uses the Secure Copy command (scp) to copy the file to donna's home directory on the computer named debian:
```
scp .ssh/enterprise-key.pub donna@debian:/home/donna/
```
Now log in to donna's account on the remote computer. Assuming the Secure Shell service is enabled on debian, you can do so with the following command:
```
ssh donna@debian
```
You'll have to specify donna's password because you have not yet set up passphrase protection. You should now be in donna's home directory, /home/donna, on the debian computer.
If it doesn't already exist, you'll need to create an .ssh/ subdirectory. You'll also want to make sure it has appropriate permissions with the following commands:
```
mkdir /home/donna/.ssh
chmod 700 /home/donna/.ssh
```
Create the authorized_keys file in the .ssh/ subdirectory:
```
touch .ssh/authorized_keys
```
Now take the contents of the public SSH key that you created and put it in the authorized_keys file:
```
cat enterprise-key.pub >> .ssh/authorized_keys
```
Note that I used the the append sign (>>) because I want to keep all previous keys that might be in the file; it can contain all the keys referring to all the remote hosts from which you want to log in.
Log out of donna's account. The next time you log in, you'll be prompted for the passphrase as follows.
```
Enter passphrase key for '/home/michael/.ssh/enterprise-key':
```

Now you can connect securely, using SSH, without having to enter your password or a password on the remote system. With the other measures described earlier in this annoyance, you can also protect your SSH server by user, protect it by group, make sure SSH communications come from a permitted network, and allow SSH through firewalls.

SSH on the Client

The first time you use SSH to log in to a remote system, you may see the following message, which means you haven't configured passphrases:

The authenticity of host 'debian (10.168.0.15)' can't be established.
RSA key fingerprint is 18:d2:73:ec:53:ce:52:4f:2d:43:55:fb:0c:14:49:1e.
Are you sure you want to continue connecting (yes/no)?

Once you enter yes, you'll see the following message:

Warning: Permanently added 'debian,10.168.0.15' (RSA) to the list of known hosts.

Then you're prompted for the password on the remote system.

If you've configured passphrases, you'll see only the second message, followed by a request for the passphrase.

In either case, the remote system sends your client a public key, which is added to the user's ~/.ssh/known_hosts file. If the name or IP address of the remote system changes, you'll see an error, which you can address only by editing or deleting the known_hosts file.

I Need to Run an X Application Remotely

Sometimes you need to run a GUI application but can't get to your computer. You may want to support users who need remote access to their applications.

I'll assume that you've already set up Secure Shell (SSH) or VNC clients for these users. In this annoyance, I'll show you how you can configure secure remote access to your GUI applications. While you can use VNC, SSH is preferred, as it provides strong encryption, making it more difficult for a cracker to track your keystrokes. An SSH configuration means that you're networking only the GUI application that you happen to be running remotely, as opposed to a whole GUI desktop environment.

Tip

There are relatively secure versions of VNC available; you can even tunnel VNC through an SSH connection. For more information on the wide variety of VNC servers and clients, Wikipedia provides an excellent starting point at http://en.wikipedia.org/wiki/VNC. If you don't like VNC, explore the increasingly popular FreeNX (which uses SSH) at http://freenx.berlios.de/.

If you absolutely need remote access for GUI applications, keep it behind a firewall. If at all possible, don't open the firewall to external clients on the SSH ports. If you do, use the directives described in the following sections (and the previous annoyance) to minimize your risks.

Configuring the SSH Server for X Access

The configuration file for the SSH server is /etc/ssh/sshd_config. While it offers a substantial number of directives, most of the defaults configured on our target distributions don't need to be changed for SSH to work. However, these defaults may not be secure. Depending on your distribution, you may need to make a few changes. I suggest you pay particular attention to the following directives:

X11Forwarding yes

As the object of this annoyance is how to safely configure remote access of GUI applications, I assume you'll use this directive to enable remote access.

Protocol 2

Specifies the use of the SSH2 protocol, which is currently being maintained and updated for any security problems. Without this directive, the SSH server can also take logins from SSH1 clients, which are less secure.

ListenAddress

Allows you to specify the IP address of the network card to take SSH connections, such as ListenAddress 192.168.0.12. Assumes you have more than one network card on this computer.

LoginGraceTime

Helps thwart crackers who try to break into an account with different passwords. The default is 120 seconds, after which no additional password entries are allowed. I would set a shorter period, such as LoginGraceTime 30.

PermitRootLogin no

The default is yes. In my opinion, you should never permit logins by the root user. Even if encrypted, root logins are a risk. If the login is intercepted, the root password may be eventually decrypted. In contrast, if you use the su or sudo commands after logging in via SSH, it's much more difficult for a cracker to determine which bits contain the root password.

Alternatively, you can create encryption keys as described in the previous annoyance. Once configured, SSH login passwords don't get sent over the network.

AllowUsers

By default, all users are allowed to log in via SSH. It's best to limit this as much as possible. You can limit logins by users, or even by users on specific systems. For example, if you wanted to limit SSH access to two users, you might use one of the following directives:

AllowUsers michael donna
AllowUsers michael@debian.example.com donna@suse.example.com

In the second directive, SSH logins to the local accounts for michael and donna are allowed from the remote debian.example.com system.

After saving changes to the SSH server configuration file, you'll need to restart the associated daemon. The name of the daemon may vary slightly by distribution; you can use the following command for Red Hat/Fedora and SUSE Linux:

/etc/init.d/sshd restart

The appropriate command on Debian Linux is slightly different:

/etc/init.d/ssh restart

Configuring the SSH Client for X Access

There are three ways to configure the SSH client to support networking of GUI tools and applications:

Directly, via switches and options to the ssh connection command
For all users on a client, via the /etc/ssh/ssh_config file
For a single user on a client, via the ~/.ssh/config file

By default, any authorized user can log in to an SSH server, specifying access to GUI applications with the -X switch, e.g.:

ssh -X michael@debian.example.com

But GUI access may not be secure. The most secure approach is to limit X access for all users on a client and then enable it for only the desired users. To do so, open /etc/ssh/ssh_config and set the following directives:

ForwardX11Trusted no: The default for this directive varies by distribution. This setting minimizes risks to other clients.
ForwardX11 no: Although this should be disabled by default on all Linux distributions, it doesn't hurt to make sure.

Next, on the ~/.ssh/config file for the user that you want to authorize, include:

ForwardX11 yes: This directive supersedes any default settings in the /etc/ssh/ssh_config file, and allows remote GUI access to the applications of that user's choice.

Remote SSH Access to GUI Applications

Once configured, you can access remote GUI applications through the command line. To this end, you'll need to know the text commands that start GUI applications, such as /usr/bin/oowriter. Unless you're running a network with gigabit-level speeds, expect a bit of a delay as the application opens (and as it runs remotely on your workstation).

So Many Server Logs

I believe it's helpful for any Linux user to review her own logs on a regular basis. Familiarity can help any geek learn the value of logs. For one thing, log entries will be associated with failed logins, which suggest that a cracker is trying to break into your system. But logs can do much more. For example, web logs can give you a feel for where your customers are coming from, in terms of geography; clicked links associated with web ads; how long they stay on your web site; and more.

As a Linux administrator, chances are good that you're administering a substantial number of Linux computers. It may be useful to consolidate the logs on a single system. If a server goes down, you'll have the logs from that server available on the central log server. When there are problems, such as "critical" error messages, you may want an email sent to your address. You may need tools to help you go through all of these logs.

Central Log Servers

Logs on our selected distributions are governed by the system and kernel log daemons. While Red Hat/Fedora and Debian combine these daemons into a single package (sysklogd), SUSE includes them in separate packages (syslogd and klogd). While there are minor variations in how they're configured, they're all governed by a logfile in the same location, /etc/syslog.conf.

If you want to dedicate a specific system as your central log server, first make sure you have enough space on that system. It may help to configure logs, such as those in the /var/log directory, on a separate partition so that they can't fill up critical system partitions if they get too big. For more information, see the next annoyance.

On the system that you're configuring as a central log server, you'll have to configure the system log daemon (syslogd) to accept remote connections. The simplest way to do so is to stop the daemon, and then start it again with the -rm 0 switch. The way you implement this varies slightly by distribution:

Red Hat/Fedora

The Red Hat/Fedora distributions let you configure switches for the system log daemon in /etc/sysconfig/syslog. The key directive is SYSLOGD_OPTIONS. To support remote log reception, change this directive to:

SYSLOGD_OPTIONS="-rm 0"

SUSE

SUSE Linux handles standard options for the system log daemon in a similar fashion. The daemon log options are listed in /etc/sysconfig/syslog. To support remote log reception, change the SYSLOGD_PARAMS directive to:

SYSLOGD_PARAMS="-rm 0"

Debian

Debian Linux does not provide any /etc/sysconfig files for daemon configuration. However, you can configure the system log daemon directly in the associated start script, /etc/init.d/sysklogd. To support remote log reception, change the SYSLOGD directive to:

SYSLOGD="-rm 0"

Once you've made the configuration changes, you can implement them by restarting the system log daemon on each computer with the following command:

/etc/init.d/syslog restart

On Debian Linux, the script's location is slightly different:

/etc/init.d/sysklogd restart

Naturally, if there's a firewall between the log server and log clients, you'll need to make provisions in that firewall to allow traffic through port 514. As you can see in /etc/services, that's the standard port for system log communications. To make sure your system log service now receives from remote computers, check your /var/log/syslog (or, if that file doesn't exist, /var/log/messages) for the following entry (the version number may vary):

syslogd 1.4.1: restart (remote reception).

Forwarding Server Logs to a Central Server

Now that the central server is ready, you can configure your other Linux systems to send copies of their logs to that computer. The log configuration file on each of our preferred distributions is /etc/syslog.conf, and the key directive is straightforward. If you want a copy of all logs sent to the logmaster.example.com computer, all you need in that file is:

*.* @logmaster

Unfortunately, the system log service can't handle fully qualified domain names. Logs on the central server from your remote systems will have only the regular hostnames.

If you're just concerned with kernel-based issues, to help diagnose shutdowns, you can send just the kernel messages to the remote log server:

kern.* @logmaster

Logwatch Monitoring

There are many excellent tools for monitoring logfiles. Many geeks even create their own scripts for this purpose. One excellent source for different monitoring tools and scripts is Automating Unix and Linux Administration by Kirk Bauer (Apress).

One of the major standards for log monitoring is known as Logwatch. It's available from both the Debian and Red Hat/Fedora Linux repositories. A logwatch RPM that works on SUSE is available from the Logwatch home page at http://www.logwatch.org.

Logwatch is organized into three groups of files. The overall configuration file is logwatch.conf. Other logfiles for many individual services are organized in a services/ subdirectory. The logfiles are placed in groups based on configuration files in a logfiles/ subdirectory. The actual directory varies by distribution, or by the release that you may have installed from http://www.logwatch.org.

Tip

As the directories associated with Logwatch vary so widely by distribution, I generally do not use full directory paths in this annoyance. If you're uncertain about the location of a file, you'll have to do your own searching with commands such as locate, and rpm -ql logwatch or dpkg -L logwatch.

Basic Logwatch configuration

Before I show you how to configure the basic Logwatch configuration file, I need to review its location on your system. If you've downloaded the latest version from http://www.logwatch.org, you'll need to make sure key settings are compatible with the scripts and configuration directories for your distribution.

The standard Logwatch configuration file is logwatch.conf. You can find it in the /etc/log.d/conf or /etc/logwatch/conf directories. As described earlier, there is no standard SUSE Logwatch package.

If you've downloaded the latest version from the Logwatch home page, you'll find key configuration files in different locations. The logwatch.conf configuration file (as well as default services and configuration logfiles) is stored in /usr/share/logwatch/default.conf; detailed configuration changes can be added to files in the /etc/logwatch/conf directory.

Administrators are now encouraged to add changes to Logwatch settings to override.conf and patterns that Logwatch should drop to ignore.conf. But those are advanced settings beyond what I can cover in this annoyance. Refer to the Logwatch web site for the latest information.

Logwatch's standard directives include:

LogDir

The standard for logging directories on our preferred distributions is /var/log.

MailTo

While the default is MailTo = root, you're free to change this to the email address of your choice, assuming you have a working outgoing email server on this system.

Print

The Print directive is unrelated to printers; it determines whether reports are sent to standard output, which is normally the screen. The usual default is Print = no. Change this directive to view output in real time on your console.

TmpDir UseMkTemp MkTemp

These three directives all configure the use of temporary files. By default, the TmpDir directive points to the /tmp directory. In the latest version of Logwatch, this directive points to the /var/cache/logwatch directory.

If your TmpDir is /tmp, make sure the UseMkTemp directive is active. This uses the MkTemp directive to point to the mktemp utility for changing the name and permissions of temporary logfiles to keep them secure while they're stored in the /tmp directory.

If you've activated UseMkTemp, you need to point the MkTemp directive to the full path of the mktemp utility, normally /bin/mktemp.

Range

The Range directive specifies the timeline for the report. The standard is Range = yesterday; it's consistent with a log report, processed by the cron daemon, sometime after midnight.

Detail

The Detail directive associated with a report specifies the amount of information you get. Detail = Low limits information to security and other critical service issues. A High level of Detail creates very verbose reports, especially if you're collecting information from multiple computers.

Service

The Service directive gives you an opportunity to limit the services on which Logwatch prepares reports. While the default is Service = All, you can specify individual services with a directive such as Service = pam—or specify all except an individual service with two directives, such as:

Service = All
Service = -ftpd-messages

Mailer

The Mailer directive specifies the command-line utility associated with text emails. Depending on your distribution, it should be set either to /usr/bin/mail or /bin/mail.

Logwatch service configuration files

The service configuration files associated with Logwatch are stored in a service/ subdirectory. While the list of files may seem extensive, don't worry about configuring each file. The defaults are generally fine, unless you want to specify a special file group. One example where you may want to specify a special group is with the Clam AntiVirus software (www.clamav.net). The following is based on the package downloaded from the Clam AV web site.

Tip

These configuration files differ from those installed with specific services. For example, the clamav.conf file cited in this section, on a Debian system, is in the /etc/logwatch/conf/services directory. It configures Clam AntiVirus software interactions with the logwatch system. It is not a substitute for the main Clam AntiVirus configuration file, normally /etc/clamav/clamd.conf.

By default, the following directive in the clamav.conf file (along with the LogDir=/var/log directive in logwatch.conf) sends logs from this service to the standard /var/log/messages:

LogFile = messages

As it may be inconvenient to have so much traffic in /var/log/messages, you could send logs to a different file, such as /var/log/clam-update, with the following directive:

LogFile = clam-update

Logwatch log groups

The Logwatch service can help you organize a wide variety of logfiles, as configured in the logfiles/ subdirectory. It includes a number of configuration files (with .conf extensions).

Logwatch scripts

Logwatch includes a wide variety of scripts in the shared/ subdirectory. They're generally configured to help you search through logs collected by this service.

The Logs Are Overloading My Hard Drive

Logs can grow quickly, especially for services with a lot of activity. Logs for commercial web sites can easily add several hundred megabytes of files every day.

Unmanaged, this kind of growth can overwhelm your system, taking space needed by your users, occupying the empty space required to run a GUI, and making it impossible to boot your system.

If your logs grow quickly, you should consider creating dedicated partitions. Even with dedicated partitions and search scripts of dazzling sophistication, it can take quite a while to search through large logfiles for the data you may need. Therefore, you may consider configuring your system to start new logfiles more often, perhaps daily or even hourly.

Even large, dedicated partitions may not be good enough. The demands of logfiles can grow so large that you may need to move logfiles to different systems.

The associated cron jobs are run in alphabetical order; files starting with numbers come first. For example, the 00logwatch script in /etc/cron.daily is run before others.

Tip

If you have more than one log-management service installed, such as logrotate or logwatch, the associated jobs may not be fully compatible.

Logfile Partitions

Logs can become quite large, and can easily grow by hundreds of megabytes of space (or more) every day. There are two basic options in this regard:

A dedicated log partition: With a dedicated log partition, the space taken by a service or kernel log doesn't overwhelm the space required to run a Linux system. If you use a standard Linux distribution, the way to set this up is to mount the /var/log directory on a dedicated partition.
A dedicated log server: Even if you configure a dedicated server to collect logs from other Linux systems, I still recommend a dedicated log partition.

For most organizations, the data associated with logs isn't nearly as critical as, say, that associated with user home directories. Because logs grow quickly, one method to manage this growth is a RAID 0 volume with daily backups.

RAID 0 is the fastest possible media for large files and may be suitable for a log server. With appropriate controllers, it allows you to add more disks as logs grow.

Your management may have different feelings about the importance of logfiles. Perhaps you'll want to protect them as evidence, to help you track the activity of certain users, to establish patterns of visits to your web sites, or possibly even as evidence usable in a court of law. If logfiles are that important, you may want to use a more robust data storage system, such as RAID 5, or even back them up to stable archives such as DVDs.

Log Rotation Frequency

Log rotation means starting new files to contain incoming log messages, so that old logs can easily be backed up. Rotation also involves removing old messages to return needed disk space to the system. In Linux, log rotation is configured through the logrotate configuration files. In our preferred distributions, these files are stored in /etc/logrotate.conf. To understand how this process works, it's useful to analyze it in detail.

Every day, on a schedule defined by your /etc/crontab configuration file, Linux runs the /etc/cron.daily/logrotate script. It includes the following command, which runs the logrotate service, based on the settings in /etc/logrotate.conf:

/usr/sbin/logrotate /etc/logrotate.conf

To see how rotation works, we can analyze /etc/logrotate.conf. The first four default commands in this file are identical in our preferred distributions:

weekly

Logfiles are rotated by default on a weekly basis. You can set this to daily or monthly, or specify a maximum log size after which rotation occurs with a command such as:

size=100k

rotate 4

Linux distributions normally store four weeks of backlogs for each service.

create

A new, empty file is created in place of the logfile that is now rotated.

include /etc/logrotate.d

Configuration files in the /etc/logrotate.d directory are included in the rotation process.

In some cases, the distribution developers have configured rotation of the wtmp and btmp access logs in /var/log, as they are not associated with any specific package, nor are they maintained by any of the /etc/logrotate.d configuration files.

If you add the following directive, you can enable compression of your logfiles, saving even more room:

compress

Compression still allows access by some dedicated logfile viewers and editors, including vi. There are a substantial number of options available; Sourceforge.net includes several hundred log-management suites, many of which can even search through directories of compressed logfiles.

Deleting Old Logs

Logs should normally be deleted automatically. However, if you see logs more than five weeks old, that suggests a problem with your logrotate script, or perhaps that your cron jobs aren't being run as scheduled.

For example, on my newer laptop computer, I haven't configured my winmodem to allow external logins by modem. I have no modem getty (mgetty) logs in my /var/log directory. When I run the daily cron logrotate script, I get a related error:

# /etc/cron.daily/logrotate
error: error accessing /var/log/mgetty: No such file or directory
error: mgetty:1 glob failed for /var/log/mgetty/*.log

There are several ways to address this issue. I could configure mgetty, but that would be a waste of time. I could delete the mgetty configuration file in /etc/logrotate.d, but that would cause more problems if I choose to configure it in the future. The option I chose was to create the /var/log/mgetty directory, as the root user. After creating that directory, I ran the logrotate script again, without errors.

I also ran touch .placeholder in that directory, to make sure the directory wouldn't get deleted at the next update.

Create Jobs to Move Logs

If you've made some of the changes suggested in "So Many Server Logs," earlier in this chapter, you may have already sent your logs to remote systems. In short, you'll need to configure the System Log daemon on the log server to receive remote logs, and configure the other computers to send their logs to that log server. See the previous annoyance for details.

Administration Is So Repetitive

The first time you run a job, it's helpful to do it manually. The more you do a job yourself, the more you learn about that job.

However, once you've run a job a few times, there's little more that you can learn about that job, as least in your current environment. At that point, it's best to automate the process. Linux already has a service that runs automated jobs on a regular basis, whether it be hourly, daily, weekly, or monthly.

Another reason why you want to automate tasks is so you can go home. With appropriate logs, you can make sure the job was properly executed when you return to work. Thus, you can configure a database job to run once per year, so you don't have to be at work on New Year's Eve.

Finally, when you administer a group of systems, the number of things you have to do can be overwhelming. Automation is often the only way to keep up with what you need to do. This is why you need to learn to manage the cron service.

Basic cron Jobs

It's easy to learn the workings of the cron service. Every Linux system includes numerous examples of cron jobs. The cron daemon wakes up every minute Linux is running, to see if there's a script scheduled to be run at that time.

Standard administrative jobs are run as scheduled in /etc/crontab. Red Hat and Debian configure this file in straightforward ways, with different command scripts for hourly, daily, weekly, and monthly jobs. The format starts with five time-based columns, followed by the user and the command:

minute / hour / day of month / month / day of week / user / command

Take a look at your own version of this file. While it varies by distribution, all use a variation of the same first two directives, SHELL and PATH.

SHELL=/bin/sh
SHELL=/bin/bash

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
PATH=/sbin:/bin:/usr/sbin:/usr/bin
PATH=/usr/bin:/usr/sbin:/sbin:/bin:/usr/lib/news/bin

Both SHELL directives point to different names for the default bash shell. The PATH directives provide the baseline for other scripts executed from the embedded directories by the cron daemon. The simplest version of this script is associated with Red Hat/Fedora distributions:

01 * * * * root run-parts /etc/cron.hourly
02 4 * * * root run-parts /etc/cron.daily
22 4 * * 0 root run-parts /etc/cron.weekly
42 4 1 * * root run-parts /etc/cron.monthly

These directives point to the run-parts command, which runs the scripts in the noted directories, as the root user. While you could use the full path to the command (/usr/bin/runparts), that's not necessary because /usr/bin is in the PATH, as cited at the beginning of this file.

In this case, hourly scripts are run at one minute past every hour, daily scripts are run at 4:02 A.M. every day, weekly scripts are run at 4:22 A.M. every Sunday, and monthly scripts are run on the first day of each month, at 4:42 A.M.

While Debian and SUSE run more complex versions of this script, the effect is essentially the same. On our preferred Linux distributions, the cron daemon runs the scripts in the /etc/cron.hourly, /etc/cron.daily, /etc/cron.weekly, and /etc/cron.monthly directories. Many scripts in these directories use the full path to all commands, despite the PATH directive in /etc/crontab.

Creating a cron Job

You can create a cron job in any of the aforementioned directories, and it will be run at the intervals established in /etc/crontab. To help you understand how all this works, I'll create a yearly cron job, with the following steps:

Tip

As SUSE's /etc/crontab calls the /usr/lib/cron/run-crons script, the following steps (at least after step 6) won't work in that distribution.

Log in as the root user. (Alternatively, if your regular account is in the /etc/sudoers file, you can log in as a regular user and use the sudo command to invoke the commands in this section.)
Create a /etc/cron.yearly directory, with the same ownership and permissions as the other related directories. As those directories are owned by root and have 755 permissions, they happen to be compatible with standard root permissions for new directories. So all that is required is:
```
sudo mkdir /etc/cron.yearly
```
Create a new script in the /etc/cron.yearly directory. I'd call it happynewyear. Include the following commands in that script (which saves the files from user donna's home directory in user randy's home directory):
```
#!/bin/sh
/usr/bin/rsync -aHvz /home/donna /home/randy/
```
Save the file. Make sure the script is executable with the following command:
```
chmod 755 /etc/cron.yearly/happynewyear
```
Test the script. Run it using the full path to the script:
```
/etc/cron.yearly/happynewyear
```
Now make sure it runs at the next new year. Open your /etc/crontab and make a copy of the directive that runs the monthly cron scripts in /etc/cron.monthly. Change the directory to /etc/cron.yearly, and modify the time the script is run to something appropriate. For example, I use the following line in my Red Hat Enterprise Linux 4 system:
```
2 0 1 1 *  root run-parts /etc/cron.yearly
```
This directive runs the script at two minutes past midnight on January 1. As the day of the week associated with New Year's Day varies, the last time entry has to be a wildcard. I chose two minutes past midnight because the directive associated with the /etc/cron.hourly directory is run at one minute past midnight.
Save your /etc/crontab configuration file.

Any output from a cron job is sent to the user as an email. Most standard cron jobs you'll find in the directories discussed here are carefully designed not to create any output, so you won't see email from them. cron jobs suppress such output by redirecting both standard output and standard error to files (or to /dev/null).

User-specific cron jobs

Users can create and schedule their own cron jobs. As a regular user, you can open a cron file for your account with the following command:

crontab -e

Use the steps described in the previous section to create your own cron job. With the appropriate SHELL, PATH, and commands, you can run the scripts of your choice at the regular times of your choosing. To review your account's crontab configuration, run the following command:

crontab -l

Naturally, most regular users won't understand how to create their own cron jobs. As the administrator, you'll have to create the jobs for them. For example, if you want to create a job for user nancy and have administrative privileges, run the following command:

crontab -u nancy -e

However, for any user to access individual cron jobs, he needs permission. There are several ways to configure permissions to use cron:

If there's an empty /etc/cron.deny file (and no /etc/cron.allow file), all users are allowed to have individual cron jobs.
If there is no /etc/cron.deny or /etc/cron.allow files, only the root user is allowed to have cron jobs.
If there are specific users in /etc/cron.deny, they're not allowed to use cron jobs, and the root user isn't allowed to create a cron job for them; all others are allowed to use cron jobs.
If /etc/cron.deny includes ALL (representing all users), and specific users are listed in /etc/cron.allow, only those users listed in the latter file are allowed to have cron jobs.

What you do depends on whether some of your users need to create cron jobs, and whether they are capable and trusted to do their own cron jobs (or whether you're willing to create cron jobs for your users).

Tip

Depending on how much work a cron job performs, it can noticeably increase the load on the system. If you create a user-specific cron job, try to schedule it for times when other cron jobs aren't also running. If you've authorized users to create their own cron jobs, give them times where they're authorized to run them. Audit their jobs. You can review a user's cron jobs with the crontab -u username -l command.

If you want to see how cron jobs are configured, check them out for all users in your spool directory. The actual directory varies slightly by distribution. Red Hat/Fedora uses /var/spool/cron/ username, SUSE uses /var/spool/cron/tabs/ username, and Debian uses /var/spool/cron/crontabs/ username.

I Don't Want to Work Late to Do That Special Job

Not all jobs have to be run on a regular basis. People who crunch statistical data may need to run scripts at different times. Weathermen who are trying to model future trends may want to try some scripts just once or maybe twice. What they run often takes all of the resources on your systems. The only time they can run their jobs is in the middle of the night. They have families and like to sleep at night, so they may ask you to run that job. Well, perhaps you also have a family and like to sleep at night. But you don't want to create a cron job for this purpose because it's a one-time task.

For this purpose, Linux has the batch-job system, governed by the at daemon. To schedule a batch job at a given time, you can use the at command.

Creating an at Job

When you run the at command to create a batch job, you have to specify a time when the job is to be run. You're then taken to an at> prompt, where you can specify the commands or scripts to be executed.

Tip

Users who configure their own scripts can place them in their own ~/bin directory. Scripts in these directories (with executable permission), such as ~/bin/fatdata, can be run without specifying the full path. Debian Linux doesn't add ~/bin to the PATH unless the directory exists.

For example, if you're about to leave for the day and have already configured the fatdata script in your home directory's command bin (~/bin), take the following steps to run the script in one hour:

Run the following command:
```
at now + 1 hour
```
The at> prompt is open. If you're in SUSE or Debian Linux, you'll see a note that reflects the default bash shell. (After these steps, I'll describe some alternative ways to specify the time you need.)
At the at> prompt, enter the commands that you want to run at the specified time. In my case, that would be the single command:
```
/home/michael/fatdata
```
When you're done with the commands that you want run, press Ctrl-D.
If you want to review pending at jobs, use the atq command.
If you want to cancel a job, you can use the atrm command, based on the queue number shown in the output from atq. For example, if you know that your job will be run at 10:30 P.M. tonight, you'll see something similar to the following output from atq, which notes that this is job 7:
```
7 2006-01-22 22:30 a michael
```
You can then cancel the job with the atrm 7 command.

As with cron jobs, any output from at jobs is sent via email to the user for whom the job ran.

The at command offers a rich syntax for configuring the job at the time of your choice. While you can specify a certain amount of time in the future, such as:

at now + 12 hour

you can also set a specific time, such as 1:00 A.M. tomorrow morning:

at 1 AM tomorrow

Alternatively, you can specify a date:

at 2 AM March 15

You'll need to make sure the at daemon is running. The following command shows whether it's running:

ps aux | grep atd

If it isn't running, make sure it's installed (it's the at RPM or DEB package on our preferred distributions) and configured to run at your default runlevel.

If you want to see how at jobs will be run, you can check them out in your spool. The actual directory varies slightly by distribution: Red Hat/Fedora and Debian use /var/spool/cron/atjobs; SUSE uses /var/spool/atjobs. If you also have batch jobs that use the batch command (see the next section), you'll note that the spool files associated with regular at jobs start with an a, while spool files associated with batch jobs start with a b.

Managing the Load of Your at Jobs

A batch job, in contrast to an at job, runs as soon as the CPU has time for it. All you need to do to create a batch job is to use the batch command. With the batch command, Linux won't run the job unless the load average on the CPU is less than a certain threshold, which depends on the distribution. If you're running Red Hat/Fedora or SUSE Linux, the threshold is .8, or 80 percent of the capacity of a single CPU. If you're running Debian Linux, the threshold is 1.5, or 150 percent of the capacity of a single CPU. Naturally, you'll want to vary this threshold depending on the CPUs on your system.

Except for the aforementioned CPU limits, the batch command works in the same way as the at command; both set you up with an at> prompt. If you want to change the parameters associated with batch jobs, you can do so with the help of the atd command. For example, if your system includes four CPUs, you may find it useful to run batch jobs unless more than three CPUs are loaded:

atd -l 3

If your batch jobs are intense, you may want to increase the time between such jobs. By default, they're run in 60-second intervals. The following command increases the interval to one hour:

atd -b 3600

Batch Job Security

For any user to access individual batch or at jobs, she needs permission. There are several ways to configure these permissions:

If there's an empty /etc/at.deny file (and no /etc/at.allow file), all users are allowed to have individual batch or at jobs.
If there are no /etc/at.deny or /etc/at.allow files, only the root user is allowed to have batch or at jobs.
If there are specific users in /etc/at.deny, they're not allowed to use batch or at jobs.
If /etc/at.deny includes ALL (representing all users) and individual users are listed in /etc/at.allow, only those users listed in the latter file are allowed to have batch or at jobs.

What you do depends on whether some of your users need to create at jobs, and whether they are capable and trusted to do them on their own (or whether you're willing to create them for your users).