Chapter 16. Introduction to Compiling Software From C Source Code

image with no caption

Most nonproprietary third-party Unix software packages come as source code that you can build and install. One reason for this is that Unix (and Linux itself) has so many different flavors and architectures that it would be difficult to distribute binary packages for all possible platform combinations. The other reason, which is at least as important, is that widespread source code distribution throughout the Unix community encourages users to contribute bug fixes and new features to software, giving meaning to the term open source.

You can get nearly everything you see on a Linux system as source code—from the kernel and C library to the web browsers. It’s even possible to update and augment your entire system by (re-)installing parts of your system from the source code. However, you probably shouldn’t update your machine by installing everything from source code, unless you really enjoy the process or have some other reason.

Linux distributions typically provide easier ways to update core parts of the system, such as the programs in /bin, and one particularly important property of distributions is that they usually fix security problems very quickly. But don’t expect your distribution to provide everything for you. Here are some reasons why you may want to install certain packages yourself:

There are many programming environments on Linux, from traditional C to interpreted scripting languages such as Python. Each typically has at least one distinct system for building and installing packages in addition to the tools that a Linux distribution provides.

We’re going to look at compiling and installing C source code in this chapter with only one of these build systems—the configuration scripts generated from the GNU autotools suite. This system is generally considered stable, and many of the basic Linux utilities use it. Because it’s based on existing tools such as make, after you see it in action, you’ll be able to transfer your knowledge to other build systems.

Installing a package from C source code usually involves the following steps:

Note

You should understand the basics in Chapter 15 before proceeding with this chapter.

A package’s source code distribution usually comes as a .tar.gz, .tar.bz2, or .tar.xz file, and you should unpack the file as described in 2.18 Archiving and Compressing Files. Before you unpack, though, verify the contents of the archive with tar tvf or tar ztvf, because some packages don’t create their own subdirectories in the directory where you extract the archive.

Output like this means that the package is probably okay to unpack:

package-1.23/Makefile.in
package-1.23/README
package-1.23/main.c
package-1.23/bar.c
--snip--

However, you may find that not all files are in a common directory (like package-1.23 in the preceding example):

Makefile
README
main.c
--snip--

Extracting an archive like this one can leave a big mess in your current directory. To avoid that, create a new directory and cd there before extracting the contents of the archive.

Finally, beware of packages that contain files with absolute pathnames like this:

/etc/passwd
/etc/inetd.conf

You likely won’t come across anything like this, but if you do, remove the archive from your system. It probably contains a Trojan horse or some other malicious code.

Once you’ve extracted the contents of a source archive and have a bunch of files in front of you, try to get a feel for the package. In particular, look for the files README and INSTALL. Always look at any README files first because they often contain a description of the package, a small manual, installation hints, and other useful information. Many packages also come with INSTALL files with instructions on how to compile and install the package. Pay particular attention to special compiler options and definitions.

In addition to README and INSTALL files, you will find other package files that roughly fall into three categories:

Even though C source code is usually fairly portable, differences on each platform make it impossible to compile most packages with a single Makefile. Early solutions to this problem were to provide individual Makefiles for every operating system or to provide a Makefile that was easy to modify. This approach evolved into scripts that generate Makefiles based on an analysis of the system used to build the package.

GNU autoconf is a popular system for automatic Makefile generation. Packages using this system come with files named configure, Makefile.in, and config.h.in. The .in files are templates; the idea is to run the configure script in order to discover the characteristics of your system, then make substitutions in the .in files to create the real build files. For the end user, it’s easy; to generate a Makefile from Makefile.in, run configure:

$ ./configure

You should get a lot of diagnostic output as the script checks your system for prerequisites. If all goes well, configure creates one or more Makefiles and a config.h file, as well as a cache file (config.cache), so that it doesn’t need to run certain tests again.

Now you can run make to compile the package. A successful configure step doesn’t necessarily mean that the make step will work, but the chances are pretty good. (See 16.6 Troubleshooting Compiles and Installations for troubleshooting failed configures and compiles.)

Let’s get some firsthand experience with the process.

Before discussing how you can change the behavior of autoconf, let’s look at a simple example so that you know what to expect. You’ll install the GNU coreutils package in your own home directory (to make sure that you don’t mess up your system). Get the package from http://ftp.gnu.org/gnu/coreutils/ (the latest version is usually the best), unpack it, change to its directory, and configure it like this:

$ ./configure --prefix=$HOME/mycoreutils
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
--snip--
config.status: executing po-directories commands
config.status: creating po/POTFILES
config.status: creating po/Makefile

Now run make:

$ make
  GEN        lib/alloca.h
  GEN        lib/c++defs.h
--snip--
make[2]: Leaving directory '/home/juser/coreutils-8.22/gnulib-tests'
make[1]: Leaving directory '/home/juser/coreutils-8.22'

Next, try to run one of the executables that you just created, such as ./src/ls, and try running make check to run a series of tests on the package. (This might take a while, but it’s interesting to see.)

Finally, you’re ready to install the package. Do a dry run with make -n first to see what make install does without actually doing the install:

$ make -n install

Browse through the output, and if nothing seems strange (such as installing anywhere other than your mycoreutils directory), do the install for real:

$ make install

You should now have a subdirectory named mycoreutils in your home directory that contains bin, share, and other subdirectories. Check out some of the programs in bin (you just built many of the basic tools that you learned in Chapter 2). Finally, because you configured the mycoreutils directory to be independent of the rest of your system, you can remove it completely without worrying about causing damage.

You’ve just seen one of the most useful options for the configure script: using --prefix to specify the installation directory. By default, the install target from an autoconf-generated Makefile uses a prefix of /usr/local—that is, binary programs go in /usr/local/bin, libraries go in /usr/local/lib, and so on. You will often want to change that prefix like this:

$ ./configure --prefix=new_prefix

Most versions of configure have a --help option that lists other configuration options. Unfortunately, the list is usually so long that it’s sometimes hard to figure out what might be important, so here are some essential options:

You can influence configure with environment variables that the configure script puts into make variables. The most important ones are CPPFLAGS, CFLAGS, and LDFLAGS. But be aware that configure can be very picky about environment variables. For example, you should normally use CPPFLAGS instead of CFLAGS for header file directories, because configure often runs the preprocessor independently of the compiler.

In bash, the easiest way to send an environment variable to configure is by placing the variable assignment in front of ./configure on the command line. For example, to define a DEBUG macro for the preprocessor, use this command:

$ CPPFLAGS=-DDEBUG ./configure
$ ./configure CPPFLAGS=-DDEBUG

Environment variables are especially handy when configure doesn’t know where to look for third-party include files and libraries. For example, to make the preprocessor search in include_dir, run this command:

$ CPPFLAGS=-Iinclude_dir ./configure

As shown in 15.2.6 Standard Macros and Variables, to make the linker look in lib_dir, use this command:

$ LDFLAGS=-Llib_dir ./configure

If lib_dir has shared libraries (see 15.1.4 Shared Libraries), the previous command probably won’t set the runtime dynamic linker path. In that case, use the -rpath linker option in addition to -L:

$ LDFLAGS="-Llib_dir -Wl,-rpath=lib_dir" ./configure

Be careful when setting variables. A small slip can trip up the compiler and cause configure to fail. For example, say you forget the - in -I, as shown here:

$ CPPFLAGS=Iinclude_dir ./configure

This yields an error like this:

configure: error: C compiler cannot create executables
See 'config.log' for more details

Digging through the config.log generated from this failed attempt yields this:

configure:5037: checking whether the C compiler works
configure:5059: gcc Iinclude_dir conftest.c >&5
gcc: error: Iinclude_dir: No such file or directory
configure:5063: $? = 1
configure:5101: result: no

Once you get configure working, you’ll find that the Makefile that it generates has a number of other useful targets in addition to the standard all and install:

  • make clean As described in Chapter 15, this removes all object files, executables, and libraries.

  • make distclean This is similar to make clean except that it removes all automatically generated files, including Makefiles, config.h, config.log, and so on. The idea is that the source tree should look like a newly unpacked distribution after running make distclean.

  • make check Some packages come with a battery of tests to verify that the compiled programs work properly; the command make check runs the tests.

  • make install-strip This is like make install except that it strips the symbol table and other debugging information from executables and libraries when installing. Stripped binaries require much less space.

There are so many third-party libraries that keeping all of them in a common location can be messy. However, installing each with a separate prefix can lead to problems when building packages that require these third-party libraries. For example, if you want to compile OpenSSH, you need the OpenSSL library. How do you tell the OpenSSH configuration process the location of the OpenSSL libraries and which libraries are required?

Many libraries now use the pkg-config program not only to advertise the locations of their include files and libraries but also to specify the exact flags that you need to compile and link a program. The syntax is as follows:

$ pkg-config options package1 package2 ...

For example, to find the libraries required for OpenSSL, you can run this command:

$ pkg-config --libs openssl

The output should be something like this:

-lssl -lcrypto

To see all libraries that pkg-config knows about, run this command:

$ pkg-config --list-all

Knowing how to build and install software is good, but knowing when and where to install your own packages is even more useful. Linux distributions try to cram in as much software as possible at installation, and you should always check whether it would be best to install a package yourself instead. Here are the advantages of doing installs on your own:

There’s not much point in installing packages such as the ones in the coreutils package that you built earlier in the chapter (ls, cat, and so on) unless you’re building a very custom system. On the other hand, if you have a vital interest in network servers such as Apache, the best way to get complete control is to install the servers yourself.

Most changes to software source code are available as branches of the developer’s online version of the source code (such as a git repository). However, every now and then, you might get a patch that you need to apply against source code to fix bugs or add features. You may also see the term diff used as a synonym for patch, because the diff program produces the patch.

The beginning of a patch looks something like this:

--- src/file.c.orig     2015-07-17 14:29:12.000000000 +0100
+++ src/file.c   2015-09-18 10:22:17.000000000 +0100
@@ -2,16 +2,12 @@

Patches usually contain alterations to more than one file. Search the patch for three dashes in a row (---) to see the files that have alterations and always look at the beginning of a patch to determine the required working directory. Notice that the preceding example refers to src/file.c. Therefore, you should change to the directory that contains src before applying the patch, not to the src directory itself.

To apply the patch, run the patch command:

$ patch -p0 < patch_file

If everything goes well, patch exits without a fuss, leaving you with an updated set of files. However, patch may ask you this question:

File to patch:

This usually means that you are not in the correct directory, but it could also indicate that your source code does not match the source code in the patch. In this case, you’re probably out of luck: Even if you could identify some of the files to patch, others would not be properly updated, leaving you with source code that you could not compile.

In some cases, you might come across a patch that refers to a package version like this:

--- package-3.42/src/file.c.orig     2015-07-17 14:29:12.000000000 +0100
+++ package-3.42/src/file.c   2015-09-18 10:22:17.000000000 +0100

If you have a slightly different version number (or you just renamed the directory), you can tell patch to strip leading path components. For example, say you were in the directory that contains src (as before). To tell patch to ignore the package-3.42/ part of the path (that is, strip one leading path component), use -p1:

$ patch -p1 < patch_file

If you understand the difference between compiler errors, compiler warnings, linker errors, and shared library problems as described in Chapter 15, you shouldn’t have too much trouble fixing many of the glitches that arise when building software. This section covers some common problems. Although you’re unlikely to run into any of these when building using autoconf, it never hurts to know what these kinds of problems look like.

Before covering specifics, make sure that you can read certain kinds of make output. It’s important to know the difference between an error and an ignored error. The following is a real error that you need to investigate:

make: *** [target] Error 1

However, some Makefiles suspect that an error condition might occur but know that these errors are harmless. You can usually disregard any messages like this:

make: *** [target] Error 1 (ignored)

Furthermore, GNU make often calls itself many times in large packages, with each instance of make in the error message marked with [N], where N is a number. You can often quickly find the error by looking at the make error that comes directly after the compiler error message. For example:

[compiler error message involving file.c]
make[3]: *** [file.o] Error 1
make[3]: Leaving directory '/home/src/package-5.0/src'
make[2]: *** [all] Error 2
make[2]: Leaving directory '/home/src/package-5.0/src'
make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory '/home/src/package-5.0/'
make: *** [all] Error 2

The first three lines practically give it away: The trouble centers around file.c located in /home/src/package-5.0/src. Unfortunately, there is so much extra output that it can be difficult to spot the important details. Learning how to filter out the subsequent make errors goes a long way toward digging out the real cause.

Here are some common build errors that you might encounter.

Problem

Compiler error message:

src.c:22: conflicting types for 'item'
/usr/include/file.h:47: previous declaration of 'item'

Explanation and Fix

The programmer made an erroneous redeclaration of item on line 22 of src.c. You can usually fix this by removing the offending line (with a comment, an #ifdef, or whatever works).

Problem

Compiler error message:

src.c:37: 'time_t' undeclared (first use this function)
--snip--
src.c:37: parse error before '...'

Explanation and Fix

The programmer forgot a critical header file. The manual pages are the best way to find the missing header file. First, look at the offending line (in this case, line 37 in src.c). It’s probably a variable declaration like the following:

time_t v1;

Search forward for v1 in the program for its use around a function call. For example:

v1 = time(NULL);

Now run man 2 time or man 3 time to look for system and library calls named time(). In this case, the section 2 manual page has what you need:

SYNOPSIS
      #include <time.h>

      time_t time(time_t *t);

This means that time() requires time.h. Place #include <time.h> at the beginning of src.c and try again.

Problem

Compiler (preprocessor) error message:

src.c:4: pkg.h: No such file or directory
(long list of errors follows)

Explanation and Fix

The compiler ran the C preprocessor on src.c but could not find the pkg.h include file. The source code likely depends on a library that you need to install, or you may just need to provide the compiler with the nonstandard include path. Usually, you will just need to add a -I include path option to the C preprocessor flags (CPPFLAGS). (Keep in mind that you might also need a -L linker flag to go along with the include files.)

If it doesn’t look as though you’re missing a library, there’s an outside chance that you’re attempting a compile for an operating system that this source code does not support. Check the Makefile and README files for details about platforms.

If you’re running a Debian-based distribution, try the apt-file command on the header filename:

$ apt-file search pkg.h

This might find the development package that you need. For distributions that provide yum, you can try this instead:

$ yum provides */pkg.h

Problem

make error message:

make: prog: Command not found

Explanation and Fix

To build the package, you need prog on your system. If prog is something like cc, gcc, or ld, you don’t have the development utilities installed on your system. On the other hand, if you think prog is already installed on your system, try altering the Makefile to specify the full pathname of prog.

In rare cases, make builds prog and then uses prog immediately, assuming that the current directory (.) is in your command path. If your $PATH does not include the current directory, you can edit the Makefile and change prog to ./prog. Alternatively, you could append . to your path temporarily.

We’ve only touched on the basics of building software. Here are some more topics that you can explore after you get the hang of your own builds:

Building software is often a stepping-stone to learning about programming and software development. The tools you’ve seen in the past two chapters take the mystery out of where your system software came from. It’s not difficult to take the next steps of looking inside the source code, making changes, and creating your own software.