Chapter 3. JavaScript and VBScript
Information in this chapter:
• Syntax
• Encodings
• JavaScript Variables
• VBScript
• JScript
• E4X
Abstract:
JavaScript is a dynamic and expressive language. Although JavaScript is loosely typed, it has powerful features. In fact, the loosely typed nature of the language makes much strange looking code syntax work that, at first glance, should not work. Understanding JavaScript syntax is the key to good obfuscation, because understanding how languages work enables you to take full advantage of their features and produce truly unreadable code. This chapter begins with a background on JavaScript and a few simple examples to help you understand the obfuscation you will perform later in the chapter. In addition, the chapter discusses how to represent characters using the three types of escapes supported in JavaScript: Unicode, hexadecimal, and octal. Besides learning how to combine the various encodings/escapes in JavaScript to produce obfuscated code, the chapter also explains the principles of the VBScript, JScript, and E4X languages, and how to encode JavaScript script in various browsers.
Key words: Unicode escape, Hexadecimal escape, Octal escape, JavaScript variable, Unicode variable, VBScript, Jscript, Loosely typed language, E4X
JavaScript is a very dynamic and expressive language. People often mistake JavaScript as being a basic language, but even though it is loosely typed, it has very powerful features. This chapter explains how you can use JavaScript's features in unusual ways to obfuscate your code. We start with some background on JavaScript and a couple of simple examples to help you understand the obfuscation we will perform later in the chapter. Then we will discuss how to encode script in various browsers.
Syntax
Understanding JavaScript syntax is the key to good obfuscation. The loosely typed nature of the language makes much strange looking code syntax work that, at first glance, should not work. In this section, we discuss some basic JavaScript concepts that we will use throughout this chapter. Hopefully, if you are new to JavaScript, you will find this introduction helpful and easy to understand, and you will open your mind to the possibility of abusing other languages in ways that are legal syntax but result in unintended consequences.
JavaScript background
Simple yet powerful, sometimes confusing but eventually logical: There is no better way to describe the JavaScript parser. Once you understand the parser, you will be able to understand how to use the code to your advantage.
The examples in this chapter show you how to change the value alert(1) to a different representation, yet have it execute the same code. In case you are not familiar with alert, here is a simple explanation. The window object in JavaScript is the container of all global variables. You can have window objects in different locations in your code, and therefore separate global objects. When executing functions or reading values JavaScript automatically assumes the window object is the current object and all variables are global, unless a local variable is declared. If you are used to other programming languages, you may find this concept confusing; it helps to just be aware that JavaScript has global variable reliance at its core.
When we call alert we are using the window object's alert method. You can see this by running the following code in a browser of your choice:
<script type=“text/javascript”>
alert(1);
window.alert(1); window.alert(window.alert);
</script>
As you can see, the alert box appears twice with the same value, 1. The last box shows you that alert is a native function of the browser. This means it's already defined before you enter any code. Let us see what happens when we define our own function called alert:
<script type=“text/javascript”>
function alert() {}
alert(1)
</script>
Here, we simply defined our own function called alert, with no arguments between the parentheses. The curly braces indicate the body of the function. In this case, our function does nothing. We get no alert from the browser, and we have successfully overwritten the native method of the window object. Although this will not help you with obfuscation, it should help you to understand how the code can be manipulated.
Something that will help you with obfuscation is the square bracket syntax of JavaScript. This is one of the most-used parts of the language and it shares the syntax with array literals. An array literal consists of a starting square bracket ([) and an ending square bracket (]). The values between the brackets can be any JavaScript object and are separated by commas. They can also be deeply nested to form multidimensional arrays. Let us make an array literal with some values in it. Before running the following example, try to guess the value returned by JavaScript.
<script type=“text/javascript”>
x=[1,alert,{},[],/a/];
alert(x[4]);
</script>
If you guessed /a/, you are correct. JavaScript arrays are indexed from zero. First we assigned the array to x, and then we added a list of JavaScript objects, separating them with commas. Next, we executed alert, which returns the fourth element of the array. Notice the difference between the square bracket syntax when accessing an object and declaring a literal.
Now things will get slightly more complicated and interesting. Take a look at the next example, which shows how the object property is accessed:
<script type=“text/javascript”>
objLiteral={‘objProperty’:123};
alert(objLiteral[0,1,2,3,’objProperty’]);
</script>
In the preceding code, the curly braces declare an object literal. The ‘objProperty’ string is the name of the object's property, and the value 123 is assigned to it. We access the object literal using the square brackets. Notice how the square brackets look like an array, but in fact are accessing an object property. This is important syntax to understand, as these core techniques can enable powerful obfuscation. In this instance, the rightmost statement is returned to access the property (i.e., the last comma of the statement inside the square bracket notation).
Now we will look at a slightly different way of doing the same thing, this time enclosing the contents with parentheses. This enables you to group statements, and return the last statement within another statement. The following example shows two groups of parentheses. The first group returns the next group and the last group returns the string ‘objProperty’ because this is the last statement of that group.
<script type=“text/javascript”>
objLiteral={‘objProperty’:123};
alert(objLiteral[(0,1,2,3,(0,’objProperty’))]);
</script>
The next step of the JavaScript learning process is to understand how strings are created. Strings are the basis of obfuscation, as without them, we cannot create our code. JavaScript supports many more ways to create strings than you may think. For instance, you can use the normal methods that JavaScript provides, such as the new String(‘I am a string’) and the standard “I am a string” and ‘I am a string.’ Although the new String constructor is less convenient than the standard syntax, and therefore is rarely used, in your quest for obfuscated code it helps to know the various ways to create a string. Let us look deeper into strings and see other ways we can create them.
<script type=“text/javascript”>
alert(/I am a string/+‘’);
alert(/I am a string/.source);
alert(/I am a string/[‘source’]);
alert([‘I am a string’]+[])
</script>
In the preceding code, the first alert contains a regular expression, as indicated by the starting forward slash and ending forward slash. JavaScript does type coercion and converts our regular expression into a string when using +. The second example uses the standard source property of the regexp object (every regexp object has a source property), and it returns the text used for the regular expression without the starting and ending forward slashes. Lastly, the array is used as a string because each array has a toString method, and it is called automatically when accessing an array without specifying an element.
There is yet another way to use square bracket notation to access strings. This nonstandard method of using strings—which has been adopted by the major browsers (IE8, Safari, Opera, Firefox, and Chrome)—involves using strings in an array-like fashion: specifying a number will return the various parts of the string, just like an array. This is very useful for obfuscation when combined with various methods of obtaining a string.
If you use string indexes, remember that in IE7 and earlier string indexes are not supported. As a workaround, you can use String.split and convert your string into an array.
<script type=“text/javascript”>
alert(‘abcdefg’[0]);
</script>
The preceding example returns the letter a, as this is the first character of the string. This is not a true array, as it still retains the string methods, and you cannot assign to a position of the string.
A little-known fact is that Firefox allows some truly imprudent practices for function names. Not only can they lead to confusion by clashing with statements, but they can also lead to syntax errors and bad programming style. The following example demonstrates this quirky function-naming convention:
<script type=“text/javascript”>
window.function=function function(){return function function(){return function function(){alert(‘Works in Firefox’)}()}()}()
</script>
Browser quirks
All browsers behave differently. They sometimes follow the ECMA standard and sometimes follow their own path. This is a good hunting ground for obfuscation ninjas to lurk. If we can spot specification diversions or nonstandard functionality we can often use these features in unintended ways. Browser quirks also make it more difficult to deobfuscate code because the software needs to account for these features. Learning more about browser quirks will increase our knowledge of the languages in general and can be a lot of fun in the process.
ECMA is a vendor-neutral standard body that defines the ECMAScript (JavaScript) standard.
Multiline strings
Understanding JavaScript parser behavior is the key to creating good ways to hide your code. You might not be aware that JavaScript supports multiline strings. Using the backslash character, you can continue a string assignment. The backslash has to be the very last character before the new line. After the new line, the string is continued as though it is on the same line. This can be repeated indefinitely, regardless of string length, and as the backslash is removed when the string is joined, this makes it perfect for obfuscation.
<script type=“text/javascript”>
alert(“this is a \
\
\
\
\
string”)
</script>
Multiline regular expressions
Certain browsers support regular expressions as multiline strings too. At the time of this writing, Firefox 3.5 and earlier versions allow backslashes to continue a regular expression. This is less useful than the string feature, as the backslash is actually added to the text string of the RegExp constructor and is not ignored. This may be because the backslash is part of an escape sequence in a RegExp constructor or because the feature is not really documented. Whatever the reason, we can still use it to understand the JavaScript engine or generate a string in a unique way for a particular browser.
<script type=“text/javascript”>
alert(/a\
b\
c/)
</script>
Understanding the parser
All JavaScript engines seem to support infix operators before a function call. This is because the result of the function call isn't known until after the function is executed. Since JavaScript is a loosely typed language, this allows us to create strange-looking but valid syntax and evade detection. JavaScript has many infix operators, including +, −, ~, ++, −−, and !, among others. Infix operators also work with other operators, such as typeof and void. Because the result is evaluated, you can repeat the operation as many times as you like.
<script type=“text/javascript”>
!~+−++alert(1)
</script>
<script type=“text/javascript”>
void~void~typeof~typeof--alert(1)
</script>
<script type=“text/javascript”>
alert(1)/abc
</script>
You may notice in the previous examples that an error is raised after the function is executed. In the first two cases, this is because of the ++ and −− operators—the function returns undefined and then the increment or decrement operation is performed, but the operators after the operation are illegal, so a syntax error is raised. The last example demonstrates this by attempting to divide by a nonexistent variable from the result of the alert function. The function is executed first, but if the function call was after the undeclared variable, the function would not be executed.
Regular expressions as functions
At the time of this writing, Firefox, Opera, Chrome, and Safari all allow a regular expression object to be called as a function, with the string to be matched passed as the argument. The result of the function is either the first matched occurred, or, if you use a parentheses group inside your regular expressions, the regular expression will return an array. The first element contains all matches of the text; the second contains the first matching group, and so on. The array from the regular expression call also has a special property called input which returns the string sent to the regular expression.
<script type=“text/javascript”>
alert(/a(a)(b)|c/g(‘aab’));
</script>
As you can see, the regular expression first matches “a” without a group; then the first group is “a” followed by a “b” or a “c.” The array returns “aab,” “a,” “b.” Because you can use a regular expression to match itself it has some interesting implications for JavaScript quines and nonalphanumeric code.
A quine is a program that outputs its own source code.
Comments in JavaScript
There are several types of comments in JavaScript. For instance, the standard single-line comment, //, and C-style comments such as /**/, are supported. But for legacy reasons, others are supported as well. In the early days of the Web, when scripting languages were first released, Web developers needed a method to hide script from older browsers so that it was not shown as text on older browsers but executed as code on newer ones. Developers and vendors came up with the solution of using HTML comments within JavaScript code. Although this hid the script from legacy browsers and executed JavaScript for newer browsers, HTML comments are not valid JavaScript, so some vendors decided to support HTML comments inside JavaScript by treating each comment as a single-line comment.
<script type=“text/javascript”>
<!---->I am a single line js comment
-->So am I
<!--and so am I
</script>
Encodings
In this section, we discuss the various ways to represent characters using escapes supported in JavaScript. Escapes are commonly used to represent characters outside the normal ASCII range; we can also use them to obfuscate normal characters and layer encodings. JavaScript supports three types of escapes: Unicode, hexadecimal, and octal. We will cover each one in more detail in the following sections.
Unicode escapes
JavaScript supports Unicode characters using hex escape sequences. This allows JavaScript programs to represent international characters using their Unicode hex values. Unicode escapes can be used with standard characters, and generally can be used as a variable or function reference. Firefox 2 at one time supported Unicode-encoded parentheses; this was very useful for obfuscation, as function calls could be fully encoded. Major browsers currently do not allow Unicode to be used in this way, including Internet Explorer, Opera, Firefox, Safari, and Google Chrome.
The escape sequence is always a backslash followed by a single u and then a hex sequence of four characters. Following this convention, the variable a can be represented by the Unicode escape sequence \u0061. To the JavaScript parser this is exactly the same as writing the actual character. The following example shows how to duplicate the same code on one line with mixed Unicode:
<script type=“text/javascript”>
alert(1);
\u0061ler\u0074(1);
</script>
Already, with just this basic encoding, we have an obfuscated vector. Both lines are exactly the same and execute alert(1). The example encodes the character a and the t of alert. It doesn't end there, though. We can also use Unicode escapes within strings and regular expressions. In this case, the Unicode refers to the string rather than the variable reference. To use these strings for obfuscation we need to evaluate the result of the strings using JavaScript native functions, such as eval, Function, and setTimeout. The following code, in which we partially obfuscate the letter a, shows how to do this:
<script type=“text/javascript”>
alert(“\u0061lert(1)”)
eval(“\\u0061lert(1)”)
</script>
The first example in the preceding code shows the string “alert(1).” This is because the Unicode escape is being used as a string escape. The second example is confusing because the backslash is escaped, forcing the string to be sent to eval as a Unicode escape that is not converted. Because Unicode is allowed instead of the letter, as in the previous snippet, the actual string sent to eval is \u0061lert(1), which calls the function.
Unicode can be used in yet another way within regular expressions. Literal expressions support the raw Unicode escape, which matches the character provided in the escape sequence. Using the RegExp constructor allows you to use string escapes as well as RegExp escapes, which allows you to encode Unicode multiple times. In addition, the RegExp object is a function in many browsers, including, at the time of this writing, Firefox, Chrome, and Opera. This allows a regular expression to be called and returned as an array which then can be used to execute obfuscated code.
Here are some examples of using regular expressions to create obfuscated code. The first line in the following code contains the string ‘alert(1)’ and the replace function is called. This function accepts two arguments: the regular expression to match and the function to call in the second argument or string.
<script type=“text/javascript”>
// deobfuscated string
‘alert(1)’.replace(/alert(1)/,eval);
//unicode escapes
‘\u0061\u006c\u0065\u0072\u0074(1)’.replace(/\u0061\u006c\u0065\u0072\u0074.+/,\u0065\u0076\u0061\u006c);
//doub l ed regexp unicode
\u0052\u0065\u0067\u0045\u0078\u0070(‘\u005c\u0075\u0030\u0030\u0036\u0031\u005c\u0075\u0030\u0030\u0036\u0063\u005c\u0075\u0030\u0030\u0036\u0035\u005c\u0075\u0030\u0030\u0037\u0032\u005c\u0075\u0030\u0030\u0037\u0034\u0028\u0031\u0029’)[‘\u0073\u006f\u0075\u0072\u0063\u0065’].\u0072\u0065\u0070\u006c\u0061\u0063\u0065(\u0052\u0065\u0067\u0045\u0078\u0070(‘\u005c\u0075\u0030\u0030\u0035\u0063\u005c\u0075\u0030\u0030\u0037\u0035\u005c\u0075\u0030\u0030\u0033\u0030\u005c\u0075\u0030\u0030\u0033\u0030\u005c\u0075\u0030\u0030\u0033\u0036\u005c\u0075\u0030\u0030\u0033\u0031\u005c\u0075\u0030\u0030\u0035\u0063\u005c\u0075\u0030\u0030\u0037\u0035\u005c\u0075\u0030\u0030\u0033\u0030\u005c\u0075\u0030\u0030\u0033\u0030\u005c\u0075\u0030\u0030\u0033\u0036\u005c\u0075\u0030\u0030\u0036\u0033\u005c\u0075\u0030\u0030\u0035\u0063\u005c\u0075\u0030\u0030\u0037\u0035\u005c\u0075\u0030\u0030\u0033\u0030\u005c\u0075\u0030\u0030\u0033\u0030\u005c\u0075\u0030\u0030\u0033\u0036\u005c\u0075\u0030\u0030\u0033\u0035\u005c\u0075\u0030\u0030\u0035\u0063\u005c\u0075\u0030\u0030\u0037\u0035\u005c\u0075\u0030\u0030\u0033\u0030\u005c\u0075\u0030\u0030\u0033\u0030\u005c\u0075\u0030\u0030\u0033\u0037\u005c\u0075\u0030\u0030\u0033\u0032\u005c\u0075\u0030\u0030\u0035\u0063\u005c\u0075\u0030\u0030\u0037\u0035\u005c\u0075\u0030\u0030\u0033\u0030\u005c\u0075\u0030\u0030\u0033\u0030\u005c\u0075\u0030\u0030\u0033\u0037\u005c\u0075\u0030\u0030\u0033\u0034\u005c\u0075\u0030\u0030\u0032\u0038\u005c\u0075\u0030\u0030\u0033\u0031\u005c\u0075\u0030\u0030\u0032\u0039’),\u0065\u0076\u0061\u006c);
</script>
The last example in the preceding code, labeled doubled regexp unicode, uses the RegExp constructor to create a string which is encoded first with Unicode, and then is encoded again as it is decoded when it is sent to the RegExp constructor. The source property is used to get the contents of the regular expression text, which itself is escaped. Then the whole string is matched again using replace, and a RegExp constructor object is used again to match the string, but is heavily escaped as Unicode escapes are valid within the resultant regular expression. Finally, the eval function is escaped with standard Unicode.
This is a small example of how JavaScript regular expressions can be used for obfuscation. Examples of more advanced techniques are provided in the section “Combining encodings.”
Hexadecimal escapes
There are four forms of hexadecimals within JavaScript: string escapes, the number literal, regular expression escapes, and type coercion. The string escape is probably the most popular in terms of obfuscation, as it provides an easy way to produce an alternative character. To create a string escape you use the backslash character followed by a lowercase x and a two-character hex sequence to represent the Unicode character. The number literal also supports automatic conversion of a hexadecimal number when the prefix 0x is used; for example, 0xFF will return 255 in JavaScript.
Fortunately, we can use this automatic conversion to our advantage. As demonstrated with Unicode, regular expressions also support hex sequences, which allows us to double-encode our hex escapes. Type coercion in JavaScript will automatically convert a hex sequence within a string without the \x prefix if the string contains 0x, which allows us to double-escape hex escapes without regular expressions. It is worth noting that JavaScript does not allow you to use hex escapes in the same way as Unicode escapes. Hex escapes are only supported within strings and cannot be used as a reference to a variable or object.
<script type=“text/javascript”>
eval(‘\x61lert(1)’);
alert(0xFF);
alert(/\x61/.test(‘a’))
alert(+‘0xFF’);
</script>
Octal escapes
JavaScript supports three forms of octal encoding. This is a common source of coding mistakes, because one way to represent octals is to use a zero prefix before a standard number literal, and in such cases, developers often think they are getting a decimal number when in fact they are receiving an octal (e.g., 0100 is 64, not 100). However, we can use this to our advantage for obfuscation, as the decoder or person reading the code will have to account for all forms of representing a number. Within strings, an octal is declared by escaping a number sequence which returns the character from the octal number:
<script type=“text/javascript”>
eval(‘\141lert(1)’);
alert(0377);
alert(/\141/.test(‘a’))
</script>
Combining encodings
Now that you are aware of the various encodings/escapes in JavaScript, let us combine them to produce some obfuscated code. The following example will call alert(1) using all the techniques we have discussed thus far. This should help you to understand how to use each type of escape.
<script type=“text/javascript”>
eval(RegExp(‘\x5c\x75\x30\x30\x36\x31’).source+String.fromCharCode(0154)+‘\\u00’+0x41+/\u0072/(‘\x72’)+‘\134u0074’+‘(1)’)
</script>
In the preceding code, first we used the RegExp constructor to create our string. This allows us to use string escapes and regular expression escapes, as demonstrated in the “Unicode Escapes” section earlier in the chapter. The Unicode escape is performed and it converts a to \u0061. Then, because it's a string, we can escape the Unicode escape, so \u0061 becomes \x5c\x75\x30\x30\x36\x31; this still represents the letter a. Next, source returns the text content of the RegExp, which results in \u0061. Then we use the octal escape 0154; the leading zero indicates an octal number, which is sent to String.fromCharCode as 108 when it is automatically converted from the octal number 0154; the number 108 is the character code for the letter l. We then use a string split by \u00 and a hexadecimal number to create a Unicode string of e. The r is created using a Unicode literal RegExp, and uses the Firefox-, Chrome-, Safari-, and Opera-specific functionality to match a string sent to the RegExp which is hex-escaped. As a result, \x72 returns r. Finally, we use an octal escape to create a backslash, \134, which, once assembled, creates a final Unicode escape for the letter t with the (1) at the end, before calling eval which executes our vector.
Javascript Variables
The standard perception of JavaScript variables is that alphanumeric characters, underscores, and dollar signs are the only legal variables in JavaScript code. This section aims to change that perception. Table 3.1 lists the standard JavaScript variables supported. The first column refers to the allowed character at the beginning of the variable name. For example, you cannot have a variable beginning with a number. The second column indicates the characters allowed in the second or more positions. The hyphen indicates a range of characters from 0 to 9.
Table 3.1 Perceived JavaScript Variables
Allowed First Characters/RangesNone or More Characters after the First Character
$0-9$_a-zA-Z
_0-9$_a-zA-Z
a-z0-9$_a-zA-Z
A-Z0-9$_a-zA-Z
User-defined variables
In JavaScript, variables may be used to store numbers, strings, and other objects. A variable can be instantiated in two ways, with or without the var keyword. Variables can contain any alphabetic character along with each of the following:
• Numbers (except at the beginning of the variable)
• _ and $
• Numerous Unicode characters
Each of these may be used for obfuscation purposes. In particular, _, $, and Unicode characters can be used to develop JavaScript statements that do not even contain alphanumeric characters. In fact, nonalphanumeric JavaScript is such a rich field for Web obfuscation that an entire chapter of this book (Chapter 4) is dedicated to such techniques.
A typical variable assignment takes the following form:
var x=‘string’;
However, there are other ways to assign variables in JavaScript, depending on the context. For example, each of the following is valid JavaScript for assigning a string to a variable:
x=‘string’;
x=“string”;
(x)=(‘string’);
this.x=‘string’;
x={‘a’:’string’}.a;
[x,y,z]=[‘string1’,’string2’,’string3’];
x=/z(.*)/(‘zstring’)[1]; x=‘string’;
x=1?‘string’:0
Using alternative syntax such as these either alone or in conjunction with various string concatenation tricks is one of the most straightforward ways to bypass simplistic Web application firewalls (WAFs). For example, early versions of an anonymous WAF would correctly detect injections such as the following:
x=‘alert(0)’;eval(x)
But they failed to detect injections such as this:
x=1?‘ale’+‘rt(0)’:0;eval(x)
Built-in variables
JavaScript includes many built-in variables that are useful for interacting with browser objects. For example, the document object provides access to the Web page's DOM, URL, cookies, and other properties. Many of these variables are consistent among different browsers; however, some are browser-specific. A few of these variables are especially useful for obfuscation purposes.
The name variable
The window object is a high-level JavaScript object that contains most other JavaScript objects including document and location, among others. The window object refers to the present browser window tab or frame. When a new window is opened from an existing window, the new window can be given a new name. This is the case when you open a pop-up window using window.open or when you use an iframe to embed the contents of another page. For example, when using window.open the name of the new window can be specified like this:
window.open(‘http://example.org/popup_page.html’, ‘my new window
For iframes, the name of the new window is specified in the HTML like so:
<iframe name=“my new iframe window” src=“http://example.org/framed_page.html”></iframe>
JavaScript located on the new page can access the name given to it from the calling page using the special variable, window.name. When calling JavaScript objects and functions, the parent object window (or this) is assumed, so new windows can refer to their assigned names using just the variable name. In the preceding iframe code example, JavaScript used on the framed page will contain a “built-in” variable called name whose value is the string “my new iframe window.”
What makes name so special is the fact that the contents of the variable are specified on a page that is different from the page executing the JavaScript. This can be abused for malicious purposes when a malicious Web page is created on an attacker's Web server that uses an iframe to load a victim Web page that is vulnerable to cross-site scripting. The attacker could create a malicious JavaScript payload and place it inside the name attribute of the calling iframe. Then, on the victim Web page, the attacker (who can also execute JavaScript via cross-site scripting) can execute the malicious payload with the following code:
eval(name)
This is incredibly useful for several reasons:
• The cross-site scripting injection code is extremely short; only 10 characters are needed for this portion of the attack. This means that even cross-site scripting injections that are limited to just a handful of characters (due to server-side constraints) can still be fully exploited. In some cases, length restrictions force an injection to use this technique.
• The actual malicious payload is never sent to the vulnerable Web application. This means any WAFs (or intrusion detection systems) can easily miss an attack with such a small fingerprint. Also, an attacker wishing to bypass server-side filtering only needs to worry about obfuscating the code eval(name) rather than the full payload.
• The payload sent to the server is completely generic. On the surface, this appears to make server-side detection easier. However, eval(name) can be obfuscated in an endless variety of ways, which always gives the attacker the upper hand. The attacker needs to identify just one variation that is not detected and the attacker wins.
• The class of characters used in the injection (lowercase alphabetical characters and parentheses) is extremely small, meaning that it can bypass filters that prevent certain characters such as []{}<>“|’)/%#&ˆ!+=−:;. Note, however, that some of these characters may be needed to initiate the injection. For example, an complete cross-site scripting injection that requires escaping from a JavaScript string may look like ”;eval(name);.”
In all of these cases, the malicious payload is not displayed anywhere the victim will easily see it.
The downside to using name to reference a malicious payload is that the code must be located on a third-party Web site. To exploit a cross-site scripting vulnerability on the target site, whether it is reflected or persistent, the attacker must trick a victim into visiting the third-party Web site. This reduces the likelihood of exploitation since it is generally more difficult to coerce potential victims to a third-party site than it is to coerce them into visiting the target site.
Cross-site scripting injections that separate the malicious payload of the injection from what gets sent to the target Web server are frequently called two-stage injections, a term coined by Stefano Di Paola (www.wisec.it/sectou.php?id=4910a68e913f1).
The location.hash variable
The location object is used to reference parts of the URL of the present window. The location.hash variable in the URL refers to the (optional) last part of the URL that begins with # (the hash symbol) and often contains a reference to an anchor tag on the present page. The hash symbol can be used for other purposes as well, though in most cases it is not required. When a user navigates to a page such as http://www.example.com/page.html#subsection, the browser sends a request for the page http://www.example.com/page.html; the hash part of the URL (i.e., #subsection) is not sent. When the browser receives a response, it looks for an anchor tag that matches the text after the #. If a match is found, it automatically skips the current page to that anchor tag; otherwise, it does nothing.
The # character is frequently called the hash symbol.
The neat thing about location.hash is that the contents are not sent to the target Web server. This means location.hash can be used in a manner similar to the variable name. However, there are a few notable differences. First is the fact that the value of location.hash is a string that always begins with #. In most browsers, this is a problem, which means that to execute arbitrary code located in the hash variable, you will need to do something such as this:
eval(location.hash.slice(1))
In the preceding code, slice is a string function that removes the first n characters from the string, where n is specified in the first argument.
The preceding code will call the eval function on everything located after the # in location.hash. The net result is that you have a very small injection that executes the “real” payload which is located after the hash symbol in the URL. Note that this eliminates the main drawback of using eval(name); no third-party Web site is involved. In a reflected cross-site scripting attack (that exploits a vulnerable GET variable), the injected code as well the malicious payload are included in the URL, but the target Web server never sees the malicious payload!
The main downside with using location.hash to perform obfuscated attacks is that the malicious payload must be included in the URL. So, for both persistent and reflected cross-site scripting attacks, a potential victim may notice an unusually long or otherwise suspicious-looking URL.
The URL variable
Modern versions of Internet Explorer and Opera contain a special and little-known variable called document.URL that is not found in other browsers. By default, this variable returns as a string the present URL of the page, similar to document.location. Also, the present page can be redirected by assigning a new variable to document.URL (in Internet Explorer but not in Opera). Normally, the variable must be fully spelled out as document.URL. However, when using the variable inside event handlers, it can be reduced to just URL. The fact that this variable is so short and not well known makes it a handy variable for obfuscating JavaScript. For example, each of the following could be used to execute JavaScript:
eval(unescape(URL))
eval(‘”’+URL)
URL=‘javascript:alert(0)’
The same techniques can be performed in all browsers using location rather than URL.
Unicode variables
In JavaScript, variables consist of a-zA-Z_$ followed by a-zA-Z$_0–9 or more characters. At least this is the standard perception. In fact, JavaScript supports much more than that. My coauthors and I discovered this by looking at the error responses in a JavaScript console. If an error returned undefined, it was highly likely that a variable could be used as a valid variable. Undefined errors mean the developer tried to use a variable without first assigning it. This makes it easy to traverse all known variables. Here are some examples of Unicode variables:
• a
• µ
• °
• À
• Á
• Â
• Ã
• Ä
• Å
• Æ
All of the variables in the preceding list can be Unicode-escaped and still be valid variables. The following code demonstrates this. It takes the first Unicode variable in the list and converts it to a Unicode escape by taking the character code of the variable and converting the number to hexadecimal; this is then escaped using \u and padded with zeros until the hex sequence is four digits long.
<script type=“text/javascript”>\u00aa=alert,\u00aa(1)</script>
To determine the number of variables JavaScript allows, I have written a little function whose start and end parameters are the character numbers you wish to scan. You can certainly use more than we used in the preceding code, but you ought to log to the console if you start using thousands of scans. The function works on most browsers my coauthors and I tested; the Unicode variables seem to work on all browsers, but their error messages vary, so I added two checks to see if the variable is undefined. The eval statement is used to test this, and a try and catch statement is used to handle the error. Discovering how many variables are possible is left as an exercise for the reader (there are a lot).
The following code contains a simple JavaScript variable generator that should work cross-browser. It contains two arguments, start and end, which specify the range to search.
<script type=“text/javascript”>
function traverseVariables(start, end){
var validVariables=[];
for(i=start;i<end;i++){
var variableTest=String.fromCharCode(i);
try {
eval(variableTest);
} catch(e) {
if((e+‘’).indexOf(‘is not defined’) != −1) {
validVariables.push(variableTest);
}
if(e.description && e.description.indexOf(‘is undefined’) != −1) {
validVariables.push(variableTest);
}
}
}
return validVariables.join(’,’);
}
alert(traverseVariables(150,200));
</script>
Depending on the speed of your computer, it is recommended that you use a maximum of 1000 scans.
VBScript
Internet Explorer has supported VBScript since IE3, and it is included in IE8, the latest browser at the time of this writing. VBScript is another type of scripting language which enables us to change the syntax of our code execution. What is interesting about VBScript is the way it calls functions and the comments it supports. We can use this to our advantage by combining JavaScript and VBScript syntax to produce truly unreadable code.
Comments
Comments are quirky in VBScript. You can use ancient REM-style comments, and because VBScript is case-insensitive, the comments are quite hard to distinguish from normal code. There is an overlap with JavaScript which turns out to be confusing as well; in JavaScript, strings can be declared with single quotes, but in VBScript, single quotes are comments!
<script type=“text/vbscript”>
REM I am a comment
ReM Me too
REm Me too
’ This is a comment too
</script>
Events
When VBScript is executed from an event a special declaration is supported that can force a particular scripting language. This can be done in two ways: either in a separate language attribute or as the first part of an event declaration. The language attribute is supported wherever an event is supported. On an HTML tag, the default is JavaScript, but we can change this by using the language attribute with VBScript, or the abbreviation vbs.
<body onload=“MsgBox 1” language=“vbs”>
<body onload=“vbs:MsgBox 1”>
Functions
In VBScript, functions can be called like JavaScript, with parentheses. However, you can also call them without parentheses. This is useful for filter evasion where a certain limitation of characters has been placed, or an IDS system checks for “(” and ”)”. It can also help with obfuscation, as reading the code can make it difficult to know where each function argument begins and ends. As VBScript deals with the DOM, it can also share functions with JavaScript, such as window.alert and document.write. Unlike JavaScript, these calls are case-insensitive. This means VBScript supports the execScript function too, which is very useful for obfuscation as you will see shortly in the section “The execScript function in VBScript.”
An intrusion detection system (IDS) is a hardware or software platform that looks for malicious patterns to determine if a request is an attack. Usually if you avoid certain characters like “(” or ”)” then it's likely that you can avoid detection by the IDS.
End of statement
The end of statement is considered to be a new line (not a semicolon, as in JavaScript). There is, however, one trick you can use for a new line to continue a string rather than execute the next statement: using multiple-line syntax you can create a string across multiple lines that is useful for obfuscating function calls.
<body onload=‘vbs:MsgBox “O”&amp;_&#x0a”b”&_&#x0a”f”&_&#x0a”u”&_&#x0a”s”&_&#x0a”cated”’>
You can also combine this with HTML entities. For instance, you can split the strings with &_ and then HTML-encode those operators again with an HTML entity for a new line between each. The code executes “Obfuscated” in a VBScript message box. The first &_ operator is HTML-encoded and the others are displayed as normal, making very strange-looking strings. As you can see, the &_ operators can be right next to the HTML-encoded new lines.
VBScript encoding
Microsoft implemented a specific script type to include encoded scripts within a script tag. This was designed to prevent casual attackers from viewing the source code. I say “casual” because the encoding can be broken quite easily, as it involves just a simple substitution cipher. For obfuscation, it's actually quite cool because Microsoft also implemented it in some unusual ways which many people are not aware of. The following code demonstrates the standard method of including encoded scripts:
<script language=“vbscript.encode”>#@~ˆCAAAAA==\ko$K6,
FoQIAAA==ˆ#~@</script>
The vector uses Microsoft's script encoder to encode a simple “MsgBox 1” function call. This is quite cool for obfuscation because, as you can see, the encoded code no longer represents the original code, and different code will be encoded differently depending on the position of the characters in question. If you remember from an earlier example in the “End of Statement” section that the language attribute contents could also be used inside events. The same can be done using vbscript.encode, and because we are inside an event, we can take advantage of HTML entities as well. Double-encoded vectors become possible, and even more are possible depending on the context and type of execution. The next examples show vbscript.encode being used inside events and being encoded with HTML entities.
<iframe onload=“vbscript.encode:#@~ˆCAAAAA==\ko$K6,FoQIAAA==ˆ#~@“></iframe>
<img src=1 onerror=“vbscript.encode:#@~ˆAAAAA==\ko$K6,FoQIAAA==^#~@“>
<img src=1 onerror=“vbsc&#114;&#105;&#112;&#116;&#46;&#101;&#110;&#99;&#111;&#100;&#101;&#58;&#35;&#64;&#126;&#94;&#67;&#65;&#65;&#65;&#65;&#65;&#61;&#61;&#92;ko$K6,FoQIAAA==ˆ#~@“>
The execScript function in VBScript
Internet Explorer also supports another method of executing code. The execScript function is supported by VBScript and JScript. It is similar to the standard JavaScript eval statement, but with one important difference: A second argument is supported which declares the language that is evaluated. This allows you to call JScript code from VBScript and vice versa. The following code shows VBScript executing JScript code using execScript:
<script language=“vbscript”>
execScript “alert(1)”,”jscript”
</script>
At this point, you may be wondering whether the function accepts something other than VBScript and JScript. It does, and this makes it very useful for combining obfuscated code. We can include vbscript.encode as the second argument to execScript, which allows us to execute code in the context of a scripting event and a VBScript string, resulting in even trickier obfuscation techniques. The next example shows how to use the second argument and combine VBScript strings, events, and HTML entities:
<img src=1 onerror=‘vbs:execScript ch&#114;(35)&“@~ˆCAAAAA==\ko$K6”&chr(44)&“FoQIAAA==ˆ#~@”,”vbscri&#x70;&#x74;&#x2e;encode”’>
The preceding code combines the tricks we discussed in the previous examples. First it forces VBScript inside the event using vbs:. Then it uses execScript to execute some encoded VBScript. It then splits the encoded script using the VBScript chr function, which returns the character based on the character code supplied. Finally, it encodes parts of the encoded output with HTML entities. You could fully encode the output using all of these methods, but I have partially encoded it for clarity.
JScript
JScript1 is an interpreted, object-based scripting language. Although it has fewer capabilities than full-fledged object-oriented languages such as C++, JScript is more than sufficiently powerful for its intended purposes.
JScript is not a cut-down version of another language (it is only distantly and indirectly related to Java, for example), nor is it a simplification of anything. It is, however, limited. You cannot write stand-alone applications in it, for instance, and it has no built-in support for reading or writing files. Moreover, JScript scripts can run only in the presence of an interpreter or “host,” such as Active Server Pages (ASP), Internet Explorer, or Windows Script Host.
JScript is a loosely typed language. Loosely typed means you do not have to declare the data types of variables explicitly. In fact, JScript takes this one step further: You cannot explicitly declare data types in JScript. Moreover, in many cases JScript performs conversions automatically when needed. For instance, if you add a number to an item consisting of text (a string), the number is converted to text.
The jscript.compact value
JScript is Internet Explorer's flavor of JavaScript and it supports some of the techniques described in the section “VBScript.” Additionally, there is an interesting language value which supports JScript for mobile devices. This is one of the discoveries that does not obscure code, but is worth knowing about, as in the future, additional techniques may be discovered, whether they involve new event protocol handlers or other undocumented functionality. If you declare JavaScript with jscript.compact this will force Internet Explorer mobile compatibility mode, which forces semicolons for each statement and disables eval.
<script language=“jscript.compact”>
alert(1)//This code fails because jscript.compact expects semi-colons for all statements
</script>
The jscript.encode value
JScript also supports encoding built into the language attribute and event protocols such as VBScript. This is yet another string in our bow to obfuscate our code. The more methods you combine, the more difficult you make it to decode the code. I say “difficult” because encoding can always be defeated in time, but the more difficult you make it the more likely someone will give up decoding your code. Browser-specific code is also good for protecting your code because any decoder would have to account for the features used in your encoder, making decoding more difficult. Here is how to use jscript.encode for JavaScript. Although alert(1) is encoded in the examples, you can encode your own custom code by using the Microsoft Script Encoder which is available at http://msdn.microsoft.com/en-us/library/cbfz3598%28VS.85%29.aspx.
<script language=“JScript.Encode”>
#@~ˆCAAAAA==Cˆ+.D`8#mgIAAA==ˆ#~@
</script>
<a href=# language=“JScript.Encode” onclick=“#@~ˆCAAAAA==Cˆ+.D`8#mgIAAA==ˆ#~@“>test</a>
<iframe onload=JScript.Encode:#@~ˆCAAAAA==Cˆ+.D`8#mgIAAA==ˆ#~@>
Conditional comments
JScript supports conditional comments. These can be directly embedded into code or within comments. To activate them, JScript looks for the @cc_on token. This token can appear as many times as you like, but it must be used at least once before a conditional statement is used. Inside comments, the @cc_on token will only be executed if it is the first statement inside the first comment; otherwise, it will be ignored. You can layer statements and comments to add further complexity and confusion, as a statement can be initiated outside the comment and finished inside the comment, with an unlimited amount of padding.
<script>
//@cc_on@cc_on@cc_on alert@cc_on(1)
</script>
As conditionals are supported outside comments, this technique also extends the syntax of JavaScript itself. This is useful for decoder evasion if the decoder only scans for traditional JavaScript syntax. To successfully decode the JavaScript a decoder would have to parse this extension of JavaScript as well, or remove it. However, removing the code could pose a problem, as conditional statements can be embedded. Therefore, the only reliable way to decode conditional comments is to extend a decoder to support them. This makes them very useful for obfuscation, but consider that the code that is created will only work on Microsoft Internet Explorer.
<script>
@cc_on@if(1)@cc_on~alert(1)@end//demonstrates extension of JavaScript syntax
</script>
Here is how to continue code from outside a comment to inside multiple comments. This really demonstrates the power of conditionals for obfuscation. First, the @cc_on token is used within JScript to enable the use of @if syntax. Then a further @cc_on statement is used for padding, followed by a operator which is then continued with an alert statement inside a comment. Then the function call is actually initiated inside multiple layered conditional comments, and is ended with the @end comment which closes the if block that was started at the beginning.
<script>
@cc_on@if(1)@cc_on~//@cc_on alert//@cc_on//@cc_on//@cc_on//@cc_on(1)@end
</script>
The execScript function in JScript
As with VBScript, JScript supports execScript, and allows us to call VBScript code from within JScript as well as use the jscript.encode technique. Because we can do this, it is possible to transfer VBScript to JScript and back again. The final JScript example shows how to use execScript and event protocols to use jscript.encode multiple times. Originally, the event is a JavaScript event; then a jscript.encode handler is used, and execScript is passed a further encoded jscript before it is further encoded with HTML hex entities.
<body
onload=“&#x6a;&#x73;&#x63;&#x72;&#x69;&#x70;&#x74;&#x2e;&#x65;&#x6e;&#x63;&#x6f;&#x64;&#x65;&#x3a;&#x23;&#x40;&#x7e;&#x5e;&#x54;&#x41;&#x41;&#x41;&#x41;&#x41;&#x3d;&#x3d;&#x6e;&#x58;&#x2b;&#x5e;&#x55;&#x6d;&#x4d;&#x6b;&#x77;&#x44;&#x60;&#x72;&#x3a;&#x40;&#x24;&#x3f;&#x37;&#x33;&#x68;&#x7a;&#x62;&#x29;&#x29;&#x7b;&#x27;&#x5a;&#x25;&#x51;&#x52;&#x47;&#x3d;&#x32;&#x9;&#x56;&#x37;&#x57;&#x42;&#x20;&#x71;&#x64;&#x47;&#x5c;&#x3a;&#x32;&#x6a;&#x62;&#x65;&#x62;&#x7a;&#x29;&#x27;&#x7b;&#x37;&#x3a;&#x3d;&#x40;&#x24;&#x4a;&#x7e;&#x45;&#x25;&#x6b;&#x6d;&#x2e;&#x6b;&#x61;&#x4f;&#x63;&#x2b;&#x55;&#x31;&#x57;&#x39;&#x2b;&#x4a;&#x2a;&#x43;&#x52;&#x63;&#x41;&#x41;&#x41;&#x3d;&#x3d;&#x5e;&#x23;&#x7e;&#x40;”>
E4X
If ever a language were created for JavaScript hackers it is E4X. Currently only supported by Firefox, E4X allows XML data to be embedded directly in JavaScript. Some people (including my coauthors and I) feel E4X was implemented in Firefox in an unfinished state; the language is relatively new, and as such, some of it was not strongly defined. An example of this is that all E4X objects return an object for an undefined property, and standard JavaScript objects have E4X properties. These features are great for padding and obfuscation, but there is more: E4X also supports a special operator within XML data, {}, which allows JavaScript statements to be executed within XML. In addition, you can also use HTML entities within XML data. Depending on the context of the data, you can then double-encode the entity data.
First, let us look at how everything is an object in E4X. The correct method of accessing an undefined object should be to return undefined, but in E4X, a reference to an object is returned instead. Looking at the source code comments in Firefox it seems that the developers were aware of this and acknowledge this limitation or quirk.
<script type=“text/javascript”><></>.I.am.e4x.data.and.everything.returns.an.object;x=1</script>
Next, let us look at how to call JavaScript within JavaScript E4X data. The starting {begins the evaluation and the ending} finishes it.
<script type=“text/javascript”><>{alert(1)}</>;x=1</script>
You might notice the trailing JavaScript;x=1 in both examples. This is because using inline E4X requires at least one JavaScript statement to pass the error check. The error check was introduced in later versions of Firefox, presumably to defend against cross-domain attacks which use external HTML data as JavaScript, and the E4X statements are used to return the document source of external domains.
HTML entities are supported, but they have to be well formed. Malformed entities without a trailing semicolon will produce errors. The following example shows how to encode alert(1) as an E4X string. The +[] converts the XML data into a string by using an empty array. The same effect could be achieved using +‘’.
<script type=“text/javascript”>
eval(<>&#97;&#108;&#101;&#114;&#116;&#40;&#49;&#41;</>+[])
</script>
Using this concept, we can double-encode the entities. We could do this by encoding all of the data again, but for clarity we will just encode the ampersands so that you can see how the data are used.
<img src=1
onerror=“eval(<>&amp;#97;&amp;#108;&amp;#101;&amp;#114;&amp;#116;&amp;#40;&amp;#49;&amp;#41;</>+[])”>
E4X also supports XML processing instructions. This again has not been strongly defined. As a result, it can be used to pad data, confuse a decoder, or create some strange-looking JavaScript statements.
<script type=“text/javascript”><?Again we can have any text we like here?>/alert(1)</script>
JavaScript 1.7 introduced a cool but rarely used feature due to lack of support: destructing assignments. This feature works by providing a method for assigning multiple variables at once which was intended to work on objects and variables. It can also work on E4X data if you use more than one XML node and return each node using the.* special E4X property. This is perfect for obfuscation, especially when you consider that XML data can be HTML-encoded and each string can be split by XML nodes. The following example shows how to use this trick to obscure a JavaScript alert:
<script type=“text/javascript”>
[a, µ, °, À, Á, Â]=<_><_>&#97;</_><_>&#x6c;</_><_>&#101;</_><_>&#x72;</_><_>&#116;</_><_>{‘\x28\x31\x29’}</_></_>.*;<>{eval([]+a+µ+°+À+Á+Â+[])}</>
</script>
You can also embed JavaScript comments in E4X data, making the data even more difficult for an automated decoder or human reader to decipher. This also makes it difficult to decipher whether a statement is E4X data or standard JavaScript. As a little game, can you tell which of the following statements executes code and which does not?
Statement 1:
<script type=“text/javascript”>
a=1;
1+<a>123//</a>;alert(1)
</script>
Statement 2:
<script type=“text/javascript”>
a=1;
1<<a>123//</a>;alert(1)
</script>
Statement 1 is the correct answer. The first statement works because the + operator makes the only outcome an E4X statement, whereas the second statement is a bitshift operator, and therefore the alert is ignored and the comment is an actual comment, not an E4X node. As you can see with these examples, the line between E4X statements and JavaScript is very thin and leads to surprising results. The decoder's job is getting increasingly difficult, but if we do not push the boundaries, we won't win the race.
Summary
This chapter should have given you greater knowledge regarding how JavaScript works, while at the same time increasing your arsenal of obfuscation techniques. Understanding how languages work enables you to take full advantage of their features and produce truly unreadable code. The best way to learn a language is to obfuscate and deobfuscate; both practices require an in-depth knowledge of the syntax. This chapter should have given you a glimpse into the JavaScript abyss and provided you with a practical understanding of why the code works. Look out for vendor-specific features or deviations from a specification, and you will find unexpected (but positive) results.
Remember, features are good, but hidden features and unintentional hacks can lead to some amazing results.