Cryptography uses mathematics to secure data by applying well-known algorithms (or ciphers) to render the data unreadable to people who don’t have the key, the string of bits required to unlock the code. The beauty of cryptography is that it relies on standards to secure data transmission between web clients and servers. Without these standards, it would be impossible to have consistent security across the multitude of websites that require secure data transmission.
Don’t confuse cryptography with obfuscation. Obfuscation attempts to obscure or hide data without standardized protocols—as a result, it is about as reliable as hiding your house key under the doormat. And since it doesn’t rely on standard methods for “un-obfuscation,” it is not suitable for applications that need to work in a variety of circumstances.
Encryption—the use of cryptography—allowed for commerce on the Internet, mostly by making it safe to pay for online purchases with credit cards. The World Wide Web didn’t widely support encryption until 1995, shortly after the Netscape Navigator browser (paired with its Commerce Server) began supporting a protocol called Secure Sockets Layer (SSL). SSL is a private way to transmit personal data through an encrypted data transport layer. While Transport Layer Security (TLS) has superseded SSL, the new protocol only changes SSL slightly, and SSL is still the popular term used to describe web encryption. Today, all popular web servers and browsers support encryption. (You can identify when a website begins to use encryption, because the protocol changes from http
to https
.[58]) If you design webbots that handle sensitive information, you will also need to know how to download encrypted websites and make encrypted requests.
In addition to privacy, SSL also ensures the identity of websites by confirming that a digital certificate was assigned to the website using SSL. This means, for example, that when you check your bank balance, you know that the web page you access is actually coming from your bank’s server. Authentication is enforced by validating the bank’s certificate with the agency that assigned the certificate to the bank. Another feature of SSL is that you’ll know for sure that web clients and servers received all the transmitted data, because the decryption methods won’t work on partial data sets.
As when downloading unencrypted web pages, PHP provides choices to the webbot developer who needs to access secure servers. The following sections explore methods for requesting and downloading web pages that use encryption.
In PHP version 5 or higher, you can use the standard PHP built-in functions (discussed in Chapter 3) to request and download encrypted files. You can download web pages from a secure server using PHP built-in functions like file()
or fopen()
by simply changing the protocol from http:
to https:
. However, I wouldn’t recommend using the built-in functions because they lack many features that are important to webbot developers, like automatic forwarding, form submission, and cookie support, just to name a few.
Example 19-1 shows how to download an encrypted web page using the LIB_http
library. Just as with the PHP built-in functions, it’s as simple as changing the protocol to https:
.
It’s important to note that in some PHP distributions, the protocol may be case sensitive, and a protocol defined as HTTPS:
will not work. Therefore, it’s a good practice to be consistent and always specify the protocol in lowercase.