C and C++ do not perform array bounds checking, which turns out to be a security-critical issue, particularly in handling strings. The risks increase even more dramatically when user-controlled data is on the program stack (i.e., is a local variable).
There are many solutions to this problem, but none are satisfying in every situation. You may want to rely on operational protections such as StackGuard from Immunix, use a library for safe string handling, or even use a different programming language.
Buffer overflows get a lot of attention in the technical world, partially because they constitute one of the largest classes of security problems in code, but also because they have been around for a long time and are easy to get rid of, yet still are a huge problem.
Buffer overflows are generally very easy for a C or C++ programmer to understand. An experienced programmer has invariably written off the end of an array, or indexed into the wrong memory because she improperly checked the value of the index variable.
Because we assume that you are a C or C++ programmer, we won't insult your intelligence by explaining buffer overflows to you. If you do not already understand the concept, you can consult many other software security books, including Building Secure Software by John Viega and Gary McGraw (Addison Wesley). In this recipe, we won't even focus so much on why buffer overflows are such a big deal (other resources can help you understand that if you're insatiably curious). Instead, we'll focus on state-of-the-art strategies for mitigating these problems.
Most languages do not have buffer overflow problems at all, because they ensure that writes to memory are always in bounds. This can sometimes be done at compile time, but generally it is done dynamically, right before data gets written. The C and C++ philosophy is different—you are given the ability to eke out more speed, even if it means that you risk shooting yourself in the foot.
Unfortunately, in C and C++, it is not only possible to overflow
buffers but also easy, particularly when dealing with strings. The
problem is that C strings are not high-level data types; they are
arrays of characters. The major consequence of this nonabstraction is
that the language does not manage the length of strings; you have to
do it yourself. The only time C ever cares about the length of a
string is in the standard library, and the length is not related to
the allocated size at all—instead, it is delimited by a
0-valued (NULL
) byte. Needless to say, this can be
extremely error-prone.
One of the simplest examples is the ANSI C standard library function,
gets( )
:
char *gets(char *str);
This function reads data from the standard input device into the
memory pointed to by str
until there is a newline
or until the end of file is reached. It then returns a pointer to the
buffer. In addition, the function NULL
-terminates
the buffer.
If the buffer in question is a local variable or otherwise lives on
the program stack, then the attacker can often force the program to
execute arbitrary code by overwriting important data on the stack.
This is called a
stack-smashing
attack. Even when
the buffer is heap-allocated (that is, it is allocated with
malloc()
or new()
, a buffer
overflow can be security-critical if an attacker can write over
critical data that happens to be in nearby memory.
The problem with this function is that, no matter how big the buffer is, an attacker can always stick more data into the buffer than it is designed to hold, simply by avoiding the newline.
There are plenty of other places where it is easy to overflow
strings. Pretty much any time you perform an operation that writes to
a "string," there is room for a
problem. One famous example is strcpy(
)
:
char *strcpy(char *dst, const char *src);
This function copies bytes from the address indicated by
src
into the buffer pointed to by
dst
, up to and including the first
NULL
byte in src
. Then it
returns dst
. No effort is made to ensure that the
dst
buffer is big enough to hold the contents of
the src
buffer. Because the language does not
track allocated sizes, there is no way for the function to do so.
To help alleviate the problems with functions like strcpy(
)
that have no way of determining whether the destination
buffer is big enough to hold the result from their respective
operations, there are also functions like strncpy(
)
:
char *strncpy(char *dst, const char *src, size_t len);
The strncpy( )
function is certainly an
improvement over strcpy( )
, but there are still
problems with it. Most notably, if the source buffer contains more
data than the limit imposed by the len
argument,
the destination buffer will not be
NULL
-terminated. This means the programmer must
ensure the destination buffer is NULL
-terminated.
Unfortunately, the programmer often forgets to do so; there are two
reasons for this failure:
It's an additional step for what should be a simple operation.
Many programmers do not realize that the destination buffer may not
be NULL
-terminated.
The problems with strncpy( )
are further
complicated by the fact that a similar function, strncat(
)
, treats its length-limiting argument in a completely
different manner. The difference in behavior serves only to confuse
programmers, and more often than not, mistakes are made. Certainly,
we recommend using strncpy( )
over using
strcpy( )
; however, there are better solutions.
OpenBSD 2.4 introduced two new functions, strlcpy(
)
and
strlcat( )
, that are consistent in their behavior, and
they provide an indication back to the caller of how much space in
the destination buffer would be required to successfully complete
their respective operations without truncating the results. For both
functions, the length limit indicates the maximum size of the
destination buffer, and the destination buffer is always
NULL
-terminated, even if the destination buffer
must be truncated.
Unfortunately, strlcpy( )
and strlcat(
)
are not available on all platforms; at present, they seem
to be available only on Darwin, FreeBSD, NetBSD, and OpenBSD.
Fortunately, they are easy to implement yourself—but you
don't have to, because we provide implementations
here:
#include <sys/types.h> #include <string.h> size_t strlcpy(char *dst, const char *src, size_t size) { char *dstptr = dst; size_t tocopy = size; const char *srcptr = src; if (tocopy && --tocopy) { do { if (!(*dstptr++ = *srcptr++)) break; } while (--tocopy); } if (!tocopy) { if (size) *dstptr = 0; while (*srcptr++); } return (srcptr - src - 1); } size_t strlcat(char *dst, const char *src, size_t size) { char *dstptr = dst; size_t dstlen, tocopy = size; const char *srcptr = src; while (tocopy-- && *dstptr) dstptr++; dstlen = dstptr - dst; if (!(tocopy = size - dstlen)) return (dstlen + strlen(src)); while (*srcptr) { if (tocopy != 1) { *dstptr++ = *srcptr; tocopy--; } srcptr++; } *dstptr = 0; return (dstlen + (srcptr - src)); }
As part of its security push, Microsoft has developed a new set of string-handling functions for C and C++ that are defined in the header file strsafe.h . The new functions handle both ANSI and Unicode character sets, and each function is available in byte count and character count versions. For more information regarding using strsafe.h functions in your Windows programs, visit the Microsoft Developer's Network (MSDN) reference for strsafe.h.
All of the string-handling improvements we've
discussed so far operate using traditional C-style
NULL
-terminated strings. While strlcat(
)
, strlcpy( )
, and
Microsoft's new string-handling functions are vast
improvements over the traditional C string-handling functions, they
all still require diligence on the part of the programmer to maintain
information regarding the allocated size of destination buffers.
An alternative to using traditional C style strings is to use the SafeStr library, which is available from http://www.zork.org/safestr/. The library is a safe string implementation that provides a new, high-level data type for strings, tracks accounting information for strings, and performs many other operations. For interoperability purposes, SafeStr strings can be passed to C string functions, as long as those functions use the string in a read-only manner. (We discuss SafeStr in some detail in Recipe 3.4.)
Finally, applications that transfer strings across a network should
consider including a string's length along with the
string itself, rather than requiring the recipient to rely on finding
the NULL
-terminating character to determine the
length of the string. If the length of the string is known up front,
the recipient can allocate a buffer of the proper size up front and
read the appropriate amount of data into it. The alternative is to
read byte-by-byte, looking for the
NULL
-terminator, and possibly repeatedly resizing
the buffer. Dan J. Bernstein has defined a convention
called
Netstrings
(http://cr.yp.to/proto/netstrings.txt) for
encoding the length of a string with the strings. This protocol
simply has you send the length of the string represented in ASCII,
then a colon, then the string itself, then a trailing comma. For
example, if you were to send the string "Hello,
World!" over a network, you would send:
14:Hello, World!,
Note that the Netstrings representation does not
include the NULL
-terminator, as that is really
part of the machine-specific representation of a string, and is not
necessary on the network.
When using C++, you generally have a lot less to worry about when
using the standard C++ string library,
std::string
. This library is designed in such a
way that buffer overflows are less likely. Standard I/O using the
stream operators (>>
and
<<
) is safe when using the standard C++
string type.
However, buffer overflows when using strings in C++ are not out of
the question. First, the programmer may choose to use old fashioned C
API functions, which work fine in C++ but are just as risky as they
are in C. Second, while C++ usually throws an
out_of_range
exception when an operation would
overflow a buffer, there are two cases where it
doesn't.
The first problem area occurs when using the subscript operator,
[]
. This operator doesn't perform
bounds checking for you, so be careful with it.
The second problem area occurs when using C-style strings with the C++ standard library. C-style strings are always a risk, because even C++ doesn't know how much memory is allocated to a string. Consider the following C++ program:
#include <iostream.h> // WARNING: This code has a buffer overflow in it. int main(int argc, char *argv[]) { char buf[12]; cin >> buf; cout << "You said... " << buf << endl; }
If you compile the above program without optimization, then you run
it, typing in more than 11 printable ASCII characters (remember that
C++ will add a NULL
to the end of the string), the
program will either crash or print out more characters than
buf
can store. Those extra characters get written
past the end of buf
.
Also, when indexing a C-style string through C++, C++ always assumes that the indexing is valid, even if it isn't.
Another problem occurs when converting C++-style strings to C-style
strings. If you use string::c_str()
to do the
conversion, you will get a properly
NULL
-terminated C-style string. However, if you
use string::data()
, which writes the string
directly into an array (returning a pointer to the array), you will
get a buffer that is not NULL
-terminated. That is,
the only difference between c_str()
and
data()
is that c_str()
adds a
trailing NULL
.
One final point with regard to C++ is that there are plenty of applications not using the standard string library, that are instead using third-party libraries. Such libraries are of varying quality when it comes to security. We recommend using the standard library if at all possible. Otherwise, be careful in understanding the semantics of the library you do use, and the possibilities for buffer overflow.
In C and C++, memory for local variables is allocated on the stack. In addition, information pertaining to the control flow of a program is also maintained on the stack. If an array is allocated on the stack, and that array is overrun, an attacker can overwrite the control flow information that is also stored on the stack. As we mentioned earlier, this type of attack is often referred to as a stack-smashing attack.
Recognizing the gravity of stack-smashing attacks, several
technologies have been developed that attempt to protect programs
against them. These technologies take various approaches. Some are
implemented in the compiler (such as Microsoft's
/GS
compiler flag and
IBM's
ProPolice), while others are dynamic runtime
solutions (such as Avaya Labs's
LibSafe).
All of the compiler-based solutions work in much the same way, although there are some differences in the implementations. They work by placing a "canary" (which is typically some random value) on the stack between the control flow information and the local variables. The code that is normally generated by the compiler to return from the function is modified to check the value of the canary on the stack, and if it is not what it is supposed to be, the program is terminated immediately.
The idea behind using a canary is that an attacker attempting to mount a stack-smashing attack will have to overwrite the canary to overwrite the control flow information. By choosing a random value for the canary, the attacker cannot know what it is and thus be able to include it in the data used to "smash" the stack.
When a program is distributed in source form, the developer of the
program cannot enforce the use of StackGuard or
ProPolice because they are both nonstandard
extensions to the GCC compiler. It is the responsibility of the
person compiling the program to make use of one of these
technologies. On the other hand, although it is rare for Windows
programs to be distributed in source form, the /GS
compiler flag is a standard part of the Microsoft Visual C++
compiler, and the program's build scripts (whether
they are Makefiles, DevStudio project files, or something else
entirely) can enforce the use of the flag.
For Linux systems, Avaya Labs' LibSafe technology is not implemented as a compiler extension, but instead takes advantage of a feature of the dynamic loader that causes a dynamic library to be preloaded with every executable. Using LibSafe does not require the source code for the programs it protects, and it can be deployed on a system-wide basis.
LibSafe
replaces the implementation of several standard functions that are
known to be vulnerable to buffer overflows, such as gets(
)
, strcpy( )
, and scanf(
)
. The replacement implementations attempt to compute the
maximum possible size of a statically allocated buffer used as a
destination buffer for writing using a GCC built-in function that
returns the address of the frame pointer. That address is normally
the first piece of information on the stack after local variables. If
an attempt is made to write more than the estimated size of the
buffer, the program is terminated.
Unfortunately, there are several problems with the approach taken by
LibSafe. First, it cannot accurately compute the
size of a buffer; the best it can do is limit the size of the buffer
to the difference between the start of the buffer and the frame
pointer. Second, LibSafe's protections will not work
with programs that were compiled using the
-fomit-frame-pointer
flag to GCC, an optimization
that causes the compiler not to put a frame pointer on the stack.
Although relatively useless, this is a popular optimization for
programmers to employ. Finally, LibSafe will not
work on setuid binaries without static linking or a similar trick.
In addition to providing protection against conventional stack-smashing attacks, the newest versions of LibSafe also provide some protection against format-string attacks (see Recipe 3.2). The format-string protection also requires access to the frame pointer because it attempts to filter out arguments that are not pointers into the heap or the local variables on the stack.
MSDN reference for strsafe.h: http://msdn.microsoft.com/library/en-us/winui/winui/windowsuserinterface/resources/strings/usingstrsafefunctions.asp
SafeStr from Zork: http://www.zork.org/safestr/
StackGuard from Immunix: http://www.immunix.org/stackguard.html
ProPolice from IBM: http://www.trl.ibm.com/projects/security/ssp/
LibSafe from Avaya Labs: http://www.research.avayalabs.com/project/libsafe/
Netstrings by Dan J. Bernstein: http://cr.yp.to/proto/netstrings.txt