Chapter 16. Securing Web Applications

Web servers are fine programs for displaying static information such as brochures, FAQs, and product catalogs. But applications that are customized for the user or that implement business logic (such as shopping carts) require that servers be extended with specialized code that executes each time the web page is fetched. This code most often takes the form of scripts or programs that are run when a particular URL is accessed. There is no limit to what a good programming team can do with a web server, a programming language, and enough time. Unfortunately, programs that provide additional functionality over the Web can have flaws that allow attackers to compromise the system on which the web server is running. These flaws are rarely evident when the program is run as intended.

This chapter focuses on programming techniques that you can use to make web programs more secure.

There are four primary techniques that web developers can use to create web-based applications:

CGI

The Common Gateway Interface (CGI) was the first means of extending web servers. When a URL referencing a CGI program is requested from the web server, the web server runs the CGI program in a separate process, captures the program’s output, and sends the results to the requesting web browser. Parameters to the CGI programs are encoded as environment variables and also provided to the program on standard input.

CGI programs can perform database queries and display the results, allow people to perform complex financial calculations, and allow web users to “chat” with others on the Internet. Indeed, practically every innovative use of the World Wide Web, from web search engines to web pages that let you track the status of overnight packages, was originally written using the CGI interface.

Plug-ins, loadable modules, and Application Programmer Interfaces (APIs)

The second technique developed to extend web servers involved modifying the web server with extension modules, usually written in C or C++. The extension module was then loaded into the web server at runtime. Plug-ins, modules, and APIs are a faster way to interface custom programs to web servers because they do not require that a new process be started for each web interaction. Instead, the web server process itself runs application code within its own address space that is invoked through a documented interface. But these techniques have a distinct disadvantage: the plug-in code can be very difficult to write, and a single bug can cause the entire web server to crash.

Embedded scripting languages

Web-based scripting languages were the third technique developed for adding programmatic functionality to web pages. These systems allow developers to place small programs, usually called scripts, directly into the web page. An interpreter embedded in the web server runs the program contained on the web page before the resulting code is sent to the web browser. Embedded scripts tend to be quite fast. Microsoft’s ASP, PHP, server-side JavaScript, and mod_perl are all examples of embedded scripting languages.

Embedded web server

Finally, some systems do away with the web server completely and embed their own HTTP server into the web application itself.

Largely as a result of their power, the extension techniques enumerated here can completely compromise the security of your web server and the host on which it is running. That’s because potentially any program can be run through these interfaces. This includes programs that have security problems, programs that give outsiders access to your computer, and even programs that change or erase critical files on your system.

Two techniques can limit the damage that can be caused by web applications:

On operating systems that allow for multiple users running at multiple authorization levels, web servers are normally run under a restricted account, usually the nobody or the httpd user. Programs that are spawned from the web server through either CGI or API interfaces are then run as the same restricted user.[167]

Unfortunately, other operating systems do not have the same notion of restricted users. On Windows 3.1, Windows 95/98/ME, and the Mac OS 7-9 operating systems prior to Mac OS X, there is no easy way for the operating system to restrict the reach of a CGI program.

Interpreters, shells, scripting engines, and other extensible programs should never appear in a cgi-bin directory, nor should they be located elsewhere on a computer where they might be invoked by a request to the web server process. Programs that are installed in this way allow attackers to run any program they wish on your computer.

For example, on Windows-based systems the Perl executable PERL.EXE should never appear in the cgi-bin directory. It is easy to probe a computer to see if it has been improperly configured. To make matters worse, some search engines can be used to find vulnerable machines automatically. Unfortunately, many Windows-based web servers have been configured this way because it makes it easier to set up Perl scripts on these servers.

Another source of concern are programs or scripts that are distributed with web servers and later found to have security flaws. Because webmasters rarely delete programs that are part of the default installation—it can be quite difficult to find out if a script is in use or not—these dangerous programs and scripts may persist for months or even years, even if new versions of the web server are installed that do not contain the bug.

An example of such a vulnerable script is a script named phf that was distributed with the NCSA web server and many early versions of the Apache web server. Although the script was designed to give web developers another tool, attackers discovered a way to use the script to retrieve files from the computer on which the script was running. This is an example of an unintended side effect, as explained in the next section.

To understand the potential problems with server-side programming, consider the CGI script in Example 16-1.[168]

The first half of this script defines a Perl function, ReadForm, which will be used throughout this chapter for CGI form handling. There are no problems with this function—all it does is take input from a CGI GET or POST operation and stuff the variables into an associative array provided by the programmer.

The second half of this script defines a finger gateway. If called by the result of a normal HTTP GET command, it simply generates the HTML for a CGI form:

Content-type: text/html

<html><hr>
<form method="post" action="bad_finger">
Finger command: <input type="text" size="40" name="command">
</form>

This produces the expected display in a web browser, as shown in Figure 16-1.

Type a typical user ID such as spaf@cs.purdue.edu, into the field, hit Return, and you’ll get the expected result (see Figure 16-2).

Despite the fact that this script works as expected, it has a serious problem: an attacker can use this script to seriously compromise the security of your computer.

You might have security problems similar to this one in the CGI scripts on your server. Security problems in scripts can remain dormant for years before they are exploited.

Sometimes, obscure security holes may even be inserted by the programmer who first wrote the scripts—a sort of “back door” that allows the programmer to gain access in the future, should the programmer’s legitimate means of access be lost. These back doors can be much harder to find than a simple undocumented account or password. We discuss this problem in the next section.

The problem with the script shown previously is the single line that executes the finger command:

print `/usr/bin/finger $input{'command'}`;

This line executes the program /usr/bin/finger with the input provided and displays the result. The problem with this line is the way in which the finger command is invoked—from Perl’s backquote function. The backquote function provides its input to the Unix shell—and the Unix shell may interpret some of that input in an unwanted manner!

Thus, when we sent the value spaf@cs.purdue.edu to this CGI script, it ran the Unix command:

print `/usr/bin/finger spaf@cs.purdue.edu`;

and that evaluated to:

/usr/bin/finger spaf@cs.purdue.edu

and that then produced the expected result.

The Unix shell is known and admired for its power and flexibility by programmers and malicious hackers alike. One of the interesting abilities of the Unix shell is the ability to put multiple commands on a single line. For example, if we wanted to run the finger command in the background and, while we are waiting, do an ls command on the current directory, we might execute this command:

/usr/bin/finger spaf@cs.purdue.edu & /bin/ls -l

And indeed, if we type in the name spaf@cs.purdue.edu & /bin/ls -l as our finger request (see Figure 16-3), the bad_finger script will happily execute it, which produces the output (see Figure 16-4).

What’s the harm in allowing a user to list the files? By looking at the files, an attacker might learn about other confidential information stored on the web server. Also, the /bin/ls command is simply one of many commands that the attacker might run. The attacker could easily run commands to delete files, send an “xterm” back to his own computer, initiate denial-of-service attacks to other computers, or even crash your machine.

Although most operating systems are not fundamentally unsecure, few operational computers are administered in such a way that they can withstand an inside attack from a determined attacker. Thus, you must ensure that attackers never get inside your system: deny the attacker the ability to run arbitrary commands. To prevent an attacker from gaining this foothold, you must be sure that your server’s scripts and programs cannot be turned against you.

Fixing the problem with the bad_finger script is remarkably easy. All you need to do is not trust the user’s input. Instead of merely sending $input{'command'} to a shell, you should filter the input, extracting legal characters for the command that you wish to execute.

In the case of finger, there is a very small set of characters that are valid in email addresses or hostnames. The next script selects those characters with a regular expression pattern match:

if(&ReadForm(*input)){
    $input{'command'} =~ m/([\w+@\.\-]*)/i;        # Match alphanumerics, @ and -
    print "<pre>\n";
    print `/usr/bin/finger $1`;
    print "<pre>\n";
}

This command works as before, except that now it won’t pass on characters such as “&”, “;”, or " ' " to the subshell.

Notice that this example matches legal characters, rather than filters out disallowed ones. This is an important distinction! Many publications recommend filtering out special characters—and then they don’t tell you all of the characters that you need to remove. Indeed, it’s sometimes difficult to know, because the list of characters to remove depends on how you employ the user input as well as which shells and programs are invoked. For example, if you write a script that accepts a number or a date, you might wish to allow the characters “.” and “/”. If you are writing a script that accepts a filename, you may wish to filter out these characters to prevent an attacker from specifying a pathname that is in a different directory—for example, .. /.. /.. /.. /.. /etc /passwd. This is why best practice recommends selecting which characters to let through, rather than guessing which characters should be filtered out.[169]

The script can be made more secure (and somewhat faster) by using Perl’s system function to run the finger command directly. This entirely avoids calling the shell:

if(&ReadForm(*input)){
    $input{'command'} =~ m/([\w+@\-]*)/i;        # Match alphanumerics, @ and -
    print "<pre>\n";
    system '/usr/bin/finger', $1;
    print "<pre>\n";
}

The next section gives many rules of thumb to help you avoid these kinds of problems in your CGI and API programs.



[167] In a multiuser environment, such as a web server at an ISP or a university, it is common practice to use the cgiwrap script so that CGI programs are run with the subscriber’s permissions, rather than with the web server’s.

[168] The ReadForm functions are based on Steven E. Brenner’s cgi-lib.pl. It is used here so that the reader can see the entire program. The serious Perl programmer should use the CGI.pm Perl module, which is available from the CPAN archives.

[169] Another reason that you should select the characters that are matched rather than choose which characters to filter out is that different programs called by your script may treat 8-bit and multibyte characters in different ways. You may not filter out the 8-bit or multibyte versions of a special character, but when they reach the underlying system they may be interpreted as single-byte, 7-bit characters—much to your dismay.