Chapter 1. Introduction
Information in this chapter:
• Audience
• Filtering Basics
• Regular Expressions
• Book Organization
Abstract:
It seems like everyone is connected to the Internet nowadays. Whether it is to watch a favorite TV show, read the latest best-seller, pay bills, or socialize with friends near and far, people are turning to their Internet connection for its speed, flexibility, and reach. Enabling this connectivity are Web applications that allow access to the resources necessary to perform these activities. Unfortunately, many of the security measures used to protect Web applications are inadequate, allowing attackers to identify and exploit weaknesses to compromise the applications. This chapter discusses the goal of this book, which is to highlight the weaknesses in Web application security measures today. In addition to discussing the book's audience, the chapter explains how filtering works and introduces the subject of regular expressions. The chapter concludes with a preview of the remaining chapters, which include obfuscation and attack techniques related to HTML, JavaScript, VBScript, CSS, PHP, and SQL.
Key words: Web obfuscation, Filter, Blacklist, Whitelist, Regular expression, Regular expression pattern, Regular expression character, Greedy, Nongreedy, Restricted repetition
The reach of the Internet is expanding on a daily basis. Devices such as thermostats and televisions include Internet connectivity. Offline activities such as reading a book and socializing are increasingly becoming online activities. Behind the scenes, enabling this connectivity are countless Web applications allowing devices, people, and other applications to access whatever resources they need. Having access to these Web applications is quickly turning from a nicety to a necessity.
Consider the security aspects of a simple transaction such as buying a book from an online retailer. After selecting the book you wish to purchase on the retailer's Web site, you enter your password to authenticate yourself to the shopping cart application. The network traffic between you and the server is encrypted to ensure the confidentiality of your password and your credit card number used to pay for the book. You provide certain personal details about you and your credit card to ensure that no one has stolen your card. Each of these steps includes security measures to ensure the confidentiality of the transaction. Although these security measures are directly visible to end users, the book retailer likely takes many other security measures to protect the application and end users. For example, the Web application may validate data coming from the user to ensure that it does not contain malicious data. Queries to the database may be parameterized so that an attacker cannot send malicious queries to the database. Transaction tokens may be used to ensure that the incoming requests were not maliciously initiated.
Unfortunately, many of the security measures used to protect Web applications are frequently inadequate. An attacker who can identify weaknesses in various security measures can usually find ways to exploit the weakness to compromise the application in one form or another. The purpose of this book is to highlight many types of weaknesses in Web application security measures. In particular, we will focus on little-known obfuscation techniques that can be used to hide malicious Web attacks. These techniques are starting to be actively used in Web attacks, and by shining a light on them, people will be better able to defend against them.
Audience
The information contained in this book is highly technical. Nevertheless, the intent is to present the information in understandable and accessible ways. Penetration testers, security researchers, incident responders, quality assurance testers, application developers, and application architects will all greatly benefit from the contents herein. Additionally, information security and software development professionals of all types will also gain valuable insights into the nature of sophisticated Web attacks.
This book will help you understand Web obfuscation and advanced Web attacks. In particular, you will learn how attackers are able to bypass security measures such as input filters, output encoding routines, Web application firewalls (WAFs), Web-based intrusion detection and prevention systems, and so forth. You will also learn security techniques and general principles that can be used to build more secure applications that are immune to such techniques.
Web attacks can be used to initiate other types of attacks, such as network and operating system attacks. These attacks may include obfuscated shell code, networking tricks, polymorphic code techniques, and so forth. The focus of this book is entirely on Web and Web application obfuscation techniques. Other resources do a superb job presenting network, operating system, and low-level programming language obfuscation techniques; thus, these techniques are not covered here.
Many different Web attacks are discussed in this book. In each case, we, the authors, provide the neccessary context to understand the obfuscation techniques being discussed. However, this book is neither intended to be an introduction to Web security nor does it address all possible Web attacks. Many quality books exist that cover this ground, including:
Web Security Testing Cookbook™: Systematic Techniques to Find Problems Fast by Paco Hope and Ben Walther
XSS Attacks: Cross Site Scripting Exploits and Defense by Seth Fogie, Jeremiah Grossman, Robert Hansen, Anton Rager, and Petko D. Petkov (ISBN: 978-1-59749-154-9, Syngress)
The Web Application Hacker's Handbook: Discovering and Exploiting Security Flaws by Dafydd Stuttard and Marcus Pinto
Seven Deadliest Web Application Attacks by Mike Shema (ISBN: 978-1-59749-543-1, Syngress)
Filtering basics
Filters are put in place to prevent bad “stuff” from reaching some destination. In a Web scenario, the destination is typically the application server, the database, or the end user's machine. Each destination has different types of bad things that may be targeting it. Common examples include command injection for the application server, SQL injection for the database, and cross-site scripting destined for a user's browser. Filters need to be able to determine the difference between “normal” data and “bad” data. This can be much harder than it seems, for numerous reasons.
Web applications frequently include filters as a security measure; that is, they are included to specifically prevent malicious input from entering the application. Filters are also used to prevent “bad” input from entering the application where bad input is simply any input that the application is not built to handle. (For example, a filter may prevent a particular input field from containing more than 256 characters due to constraints in the database.) Filters for bad data often prohibit malicious data from entering the application as well, though this is often an unintended consequence. From the attacker's point of view, this distinction is inconsequential. In fact, a Web application may do a lot of things to process incoming data for nonsecurity reasons but that ends up having security ramifications. The authors use the term “filtering” very losely to include all such instances.
In some cases, especially when discussing security, filters are used only for detection rather than prevention. The idea is that if malicious activity is detected, someone can be alerted and mitigating actions can be taken. In these cases, an attacker may still attack the application despite the filters being in place. However, an attack will be more successful if detection can be avoided; thus, evading the filters is still an important consideration. Note that detection filters differ from normal data logging since only certain types of data trigger the alert (or logging), rather than all data.
This means that from the attacker's point of view, there are two main considerations: whether malicious data reached its destination and whether it avoided detection. If the answer to both is yes, the filters were bypassed. Otherwise, we will say it failed.
Aside from incidental filtering, filters generally fall into one of the two categories: blacklists or whitelists. Blacklists specify what's not allowed and allow everything else by default. Whitelists specify what's allowed and block everything else by default. The seven words forbidden from being broadcast over U.S. airwaves (http://w2.eff.org/legal/cases/FCC_v_Pacifica/fcc_v_pacifica.decision) are prime examples of a blacklist; all words are allowed to be spoken except the seven on the forbidden list. A vending machine that only accepts certain coins is an example of a whitelist; all coins are forbidden (foreign coins, fake money, etc.) except those that the machine is designed to accept.
Regular expressions
Filters are often implemented as regular expressions, and it is essential to understand regular expressions in order to understand how to obfuscate and prevent obfuscated attacks. The following is a brief introduction to regular expressions. For a more thorough introduction to the topic, please see the excellent tutorial site, www.regular-expressions.info/.
A regular expression is a pattern. Either a string of text characters will match the pattern or they will not. The pattern itself is also a string of text and each character in this string of text has a special meaning. Understanding the special meaning of these characters is key to understanding the usefulness of regular expressions. Table 1.1 lists some of the most common characters used in regular expression patterns along with a description of how to interpret each character. When discussing the strings involved, we will refer to the regular expression string as a pattern and the string being matched as the text string.
Table 1.1 Regular Expression Components
CharacterMeaning
.Matches any character (letter, number, symbol) except for line-break characters
\Treat the next character literally rather than with its special meaning. So, \. will match against a period character
[Used to start a class of characters
]Used to end the specification of a class of characters
(Used to start a group of characters
)Used to close a group of characters
+Means the previous character or group of characters can be repeated one or more times
*Means the previous character or group of characters can be repeated zero or more times
?Means the previous character or group of characters can be repeated zero or one time
|Means “or”; either the group of characters before or the group of characters can match
aMeans the letter a (nothing special)
All other letters are treated as letters (nothing special)
zMeans the letter z (nothing special)
^Matches the beginning of a test string. Note that ^ matches a position, not a character
$Matches the end of a test string. Note that $ matches a position, not a character
Other characters have special meanings as well, but before we go any further, it will be helpful to look at some specific examples. Table 1.2 provides some complete regular expression patterns, a description of the text strings that will match the pattern, and some example strings that match and some that do not. Note that if any part of a test string matches the entire regular expression pattern we say the whole string matches too. The boldface portions of the matching test strings show the exact substring(s) that match the given regular expression.
Table 1.2 Basic Regular Expressions
Regular ExpressionDescriptionMatching Test StringsNonmatching Test Strings
^.Will match the first character of a string
a
zyxwvut
A longer sentence
foo.bar!and$all%that*jazz
Only an empty string will fail to match
.+Will match any group of characters repeated one or more times
I love Hawaii
aaaabbabbaaaa
Only an empty string will fail to match
.*Will match every character in a string (except new-line characters)
a
zyxwvut
A longer sentence
foo.bar!and$all%that*jazz
Nothing, even the empty string matches
aWill match any string containing the letter a
This string matches
a
Repeated a's are okay too
This string does not
A
Neither does this string
No match here
“period period period”
\.Will match against any string with a period
Hello world
foo.bar
This one too
[abc]Will match any lowercase a, b, or c
this string matches
a
b
c
this one does not
!@#$%^&*()_+-=[]{}
ABC
foo|barWill match the strings foo and bar; same as (foo)|(bar) but not fo(o|b)ar
I know foo, do you?
No, I know bar
xyzfooxyz
Foo is not for me
foxo
b!ar
fox(es)?Will match the strings fox and foxes
• The fox is red.
She called me foxy
I see two foxes
foxEs
f.ox
f ox
The f0x is red
foXes
<.+>Will match an opening angle bracket followed by a closing angle bracket with at least one character in between
<x>
xyz<x>xyz
Are <i>you</i> there?
<>
<abcdef
>x<
^[0-9]*\.[0-9]+$Will match a string consisting only of zero or more digits followed by a period (decimal point) followed by one or more digits
3.14159265358979
42.42
.61803
01234.567890
1
42
3.x
1.41421…
+4.0
1.6e-19
Within a character class—that is, [] (brackets)—special rules apply. Most characters are interpreted literally with the following exceptions:
• A hyphen is used to denote a range of characters. For example, [a-m] denotes all lowercase letters between a and m.
• A backslash escapes a character's special meaning. So, [\-\]] would match either a hyphen (-) or a closing bracket (]).
• A caret (^) at the beginning of a character class reverses the matching for the class. For example, [^a-zA-Z] will match any nonalphabetic character.
Before giving additional examples, we must first review some additional regular expression syntax. Table 1.3 lists additional characters in regular expressions that have special meaning.
Table 1.3 Additional Regular Expression Characters
Character(s)Meaning
-Used between other characters to specify a range of characters (as discussed earlier)
\Used before a special character to escape its special meaning or to start a special character or character class (discussed in more detail shortly)
^Only special at the beginning of the character class; ^ means to reverse the matching for the class
\wMatches any alphanumeric character or an underscore; \w is the same as [a-zA-z0-9_]
\WMatches any nonalphanumeric character aside from an underscore. The complement of \w
\dMatches any digit character; \d is the same as [0-9]
\DMatches any nondigit character. The complement of \d
\sMatches any whitespace character, including tabs and new-line characters
\SMatches any nonwhitespace character. The complement of \s
\nMatches the line-feed character (0x0A)
\rMatches the carriage return character (0x0D)
\tMatches the Tab character (0x09)
Table 1.4 highlights some more interesting regular expressions.
Table 1.4 Additional Regular Expressions
Regular ExpressionDescriptionMatching Test StringsNonmatching Test Strings
[A-F0-9]+Matches uppercase letters between A and F along with any digit
I know 6D6172696F
Are you sure she is 28?
Try %C0%BC
I went to offset deadbeef
Where did the feff go?
SQL injection, ftw!!!
\W$Matches any string that does not end with an alphabetic character
I can punctuate!
whatever
and more:)
But sometimes I forget
and leave off stuff
!@#$%^&*()_+-=x
A^\t+Matches one or more Tab characters at the start of a string
Yep, just like that
Doh! no tab…
Another tab fail
Under normal use, the characters . and + are said to be greedy. This means they will match against as many characters as possible, when given the chance. For example, consider the regular expression <.*> and the test string “Some <b>HTML</b> markup.” Note that the part of the test string that matches is “<b>HTML</b>,” not just “<b>. This is due to the greedy nature of *. As the test string is being parsed for a potential match, all the characters up to the end of the string are initially matched by the .*, and then when no trailing > is found, the regular expression parser will begin to backtrack from the end of the string until a match is found which allows it to continue. In many cases, a nongreedy (or lazy) match is preferred so that the earliest possible match that allows the regular expression parser to continue will be used. This is done by following the . or + character with a ?. For example, <.*?> applied against the test string Some <b>HTML</b> markup” will match against both <b> and </b> but not <b>HTML</b>.
One final point worth covering is that of restricted repetition. Table 1.5 covers a few different cases.
Table 1.5 Restricted Repetition in Regular Expressions
Regular Expression PatternDescription
{i}Means the previous character class must repeat exactly i times, where i is an integer greater than or equal to zero
{i,j}Means the previous character class must repeat between i and j times, where i is greater than or equal to zero and j is greater than or equal to i
{i,}Means the previous character class must repeat at least i times, where i is greater than or equal to zero
To understand restricted repetition better, consider the examples in Table 1.6.
Table 1.6 Restricted Repetition Examples
Regular ExpressionDescriptionMatching Test StringsNonmatching Test Strings
^[a-zA-Z]{4}Matches test strings that start with exactly four alphabetic characters (upper- or lowercase)
This string matches!
abcd efgh ijkl
No match here
Neither does this one
kok{1,2}oMatches test strings containing koko and kokko
kokko!
koko!
koo
kokkko
[A-Z]{3,}Matches test strings containing three or more uppercase letters in a row
NCAA
Did you call the CDC?
ABCDEF precedes GHIJKL
He belongs to AA
No Such Agency
A.BC
Restricted repetition matching is greedy by default. To switch to nongreedy matching, append a ? after the }, just like with . and *. For example, consider the regular expression ^[A-Z]{3,}? and the test string “ABCDEF.” Only the string “ABC” matches. However, when ˆ[A-Z]{3,}?F is applied to the same string “ABCDEF” the entire string “ABCEDF” matches. This is because the [A-Z]{3,}? part must match against additional characters (despite its nongreedy-ness), so it can then match the “F.”
Other special characters and syntax are useful to know as well. For more in-depth coverage, check out the excellent introduction and tutorials at www.regular-expressions.info/. In particular, take a look at additional examples with greedy and nongreedy matching, backreference notation, modifiers, and issues with multiline text. Remember that, although the topics introduced here are common in almost all regular expression parsers, there are many differences across implementations as well. This is especially true in their support for some of the more advanced syntax, such as some of the topics covered at www.regular-expressions.info/refadv.html.
Book organization
This remaining content in this book has been divided into nine chapters. The discussion begins with a detailed look at the three foundations of modern Web architecture: HTML, JavaScript, and CSS. The authors will present a thorough introduction to each of these languages before dividing into the many rich and obscure features of each. This will be followed with a discussion on PHP and SQL, two of the staples of server-side Web development. This is followed with a discussion on security mitigations to protect against obfuscated attacks. This will include details on bypassing security control and how to successfully protect Web applications from advanced attacks. Finally, the book concludes with a discussion on where the future of Web application attacks lies in terms of new features being added to Web languages, new obfuscation techniques that will be made possible, and potentially new types of attacks. The following descriptions provide specific details on the content which can be found in each of the remaining chapters.
Chapter 2: “HTML”
HTML forms the backbone of any Web page and Web application. Parsing HTML is insanely difficult due to issues with backward compatibility, custom browser extensions, support for new and emerging specifications, and even security-related controls. This chapter will dive into many of these issues to help you understand the many ways that markup can be obfuscated. In addition to providing unique attack vectors, this chapter will also serve as a foundation to help you to understand obfuscation and advanced attacks in related topics such as JavaScript and CSS.
Chapter 3: “JavaScript and VBScript”
One of the best ways to learn the full range of features offered by JavaScript and VBScript is to understand how to obfuscate and de-obfuscate code. This chapter will give you greater knowledge regarding how JavaScript works while at the same time increasing your arsenal of obfuscation techniques. This chapter will also give you a practical understanding of language syntax, encodings, variables, and vendor-specific features and deviations.
Chapter 4: “Nonalphanumeric JavaScript”
One of the more technically interesting aspects of JavaScript is how it can be used to build JavaScript which does not contain alphabetic or numeric characters. Although the resultant code may be very verbose, you can still execute arbitrary JavaScript using these techniques. This chapter will discuss exactly how such code is constructed and provides several scenarios where the techniques can be (and have been) used in real-world attacks.
Chapter 5: “CSS”
CSS is a key component to modern Web design. Although it is not traditionally used in standard Web attacks, many CSS features may be abused in unique and interesting ways. This includes CSS expressions, attribute selectors, access to browsing history, and manipulating the UI directly. By controlling just the CSS included on a page, an attacker can compromise the privacy of both the target user and application data.
Chapter 6: “PHP”
As a complete feature-packed programming language, there are endless ways to create obfuscated PHP code. The focus of this chapter is on the basic to advanced string obfuscation techniques, how to access and abuse superglobals, and several interesting ways to execute dynamic code. To complement this material, the author will also explore the use of filters and streams in relation to file inclusion vulnerabilities while showing how local file inclusion vulnerabilities can be turned into remote file inclusion vulnerabilities.
Chapter 7: “SQL”
Many Web application frameworks provide decent protection against SQL injection attacks. However, as long as developers continue to write SQL queries manually, this will remain a viable and potent attack. This chapter will cover encoding and obfuscation techniques that can be used with the standard database management systems (DBMSes). The chapter will also discuss tools and fuzzing techniques that can be used to discover new encoding and obfuscation tricks. Many modern browsers now include databases for offline Web applications which can be accessed using SQL. This chapter will also discuss attack techniques that apply in such scenarios.
Chapter 8: “Web application firewalls and client-side filters”
WAFs are a common device used to protect Web applications from malicious attacks. Such devices typically use a list of regular expressions to detect malicious input. This makes them prime targets for bypassing and attacking using Web application obfuscation techniques. This chapter will demonstrate the ineffectiveness of many WAFs at defending against even the most basic obfuscation techniques. In addition to traditional WAFs, this chapter also discusses client-side filters built into browsers. These filters will help raise the bar an attacker must clear to perform successful attacks. We will look at the details of how these filters work and see some specific and highly obfuscated ways in which they can be bypassed.
Chapter 9: “Mitigating bypasses and attacks”
One of the most challenging scenarios related to Web code is building a sandbox in which untrusted code may be dynamically executed and evaluated. This chapter will present techniques which will help you securely analyze malicious code such as JavaScript malware. The same techniques can also be used to help you sanitize user input containing untrusted code for dynamic inclusion on a Web page.
Chapter 10: “Future developments”
We conclude the book with a discussion on the current state of security on the Web and the technologies that surround it. We will look at the future Web being enabled with technologies such as CSS3, HTML5, and plug-in security via Flash and Java. We will see some positive and negative security consequences of these technologies and how they may affect us in the near future.
Updates
As we progress through the book, we will discuss many technical details related to how browsers render content, how servers parse input, and even details on emerging specifications. Being able to include such low-level details adds immense value to the book however it also means that certain details will become outdated or obsolete rather quickly.
Additionally, many of the quirks and issues discussed herein will be classified as bugs or security vulnerabilities and will thus be fixed rather quickly. To this end, a Web site has been set up at http://web-obfuscation.googlecode.com in order to provide updates and corrections related to such issues. Of course, errors in the content are inevitable and errata will be included at this Web site as well.
If you find any vectors or techniques that are not working as described, please check http://web-obfuscation.googlecode.com/ to see if an update has been provided. If not, you can find details on the site regarding how to submit updates or corrections.
Summary
This chapter discussed the motivation behind creating a book on Web application obfuscation and highlights who will benefit from reading the book. The chapter provided a high-level explanation of how filtering works, followed by a brief introduction to regular expressions. Finally, we previewed the contents in the upcoming chapters of the book which include various obfuscation and attack techniques related to HTML, JavaScript, VBScript, CSS, PHP, and SQL. Learning the obfuscation and attack techniques discussed herein, you will be able to better assess the security of your applications, identify insufficient security protections, and build stronger security controls. To get the most out of this book, you are encouraged to spend time actually trying out the various techniques; in doing so, you'll learn the ideas much more thoroughly and have a better understanding and appreciation for the deep field of Web application security. Finally, we hope you have as much fun learning these techniques as we did compiling and documenting them into this book!