The binary number system was first used as a coding method in the 19th century and then became the computer language of the 21st century. Today’s plethora of electronic encryption systems are based on the simple idea that ‘1’ stands for ‘on’ and ‘0’ means ‘off’.
Computer language is in binary code. This is a number system with a base of two, so the only digits used are 0 and 1. In our base-ten decimal number system, ‘1234’ is one thousand, two hundreds, three tens and four units. In base two, the first seven columns have the following values:
64 32 16 8 4 2 1
So ‘2’ is expressed as 10, ‘3’ as 11, ‘4’ as 100, ‘5’ as 101 and ‘6’ as 110. Each digit in binary code is known as a ‘bit’, and each ‘bit’ is either ‘on’ or ‘off’, with a value of ‘true’ or ‘false’.
In 1874, Emile Baudot patented a five-bit code, which allowed representation of characters, numbers and some punctuation using binary code. This was intended for use in the pulsing ‘on’, ‘off’ telegraphic communication system, and replaced Morse code in the mid 20th century. Baudot’s code had 32 five-digit ‘words’ allowing it to represent every character in the alphabet plus some other symbols, but special signals doubled its capacity by indicating if the subsequent ‘words’ represented letters (11111, also expressed as +++++) or numbers (11011, also expressed as ++–++).
Baudot’s code developed into the seven-bit American Standard Code for Information Interchange (ASCII) code that continues to be used by computers today.
Capital letters are represented by ASCII binary numbers (see chart on page 146). Numbers are expressed as 1 = 0110001, 2 = 0110010, 3 = 0110011 and so on.
BLOCKS ROCK
British rock band Coldplay used Emile Baudot’s 1870 binary code to create the set of coloured blocks on the artwork of their 2005 X&Y. The shapes are loose representations of the letters of the CD’s title achieved by using binary code to decide whether to have a block or a gap. The colours are irrelevant.
BILATERAL CIPHER
In 1563, Francis Bacon published his bilateral cipher in which all letters are represented by ‘a’ and ‘b’. It bears an uncanny resemblance to five-bit binary code (see opposite).
The Baconian alphabet is:
A |
= |
aaaaa |
B |
= |
aaaab |
C |
= |
aaaba |
D |
= |
aaabb |
E |
= |
aabaa |
F |
= |
aabab |
G |
= |
aabba |
H |
= |
aabbb |
IJ |
= |
abaaa |
K |
= |
abaab |
L |
= |
ababa |
M |
= |
ababb |
N |
= |
abbaa |
O |
= |
abbab |
P |
= |
abbba |
Q |
= |
abbbb |
R |
= |
baaaa |
S |
= |
baaab |
T |
= |
baaba |
UV |
= |
baabb |
W |
= |
babaa |
X |
= |
babab |
Y |
= |
babba |
Z |
= |
babbb |
So ‘Bacon’ is enciphered as
aaaabaaaaaaaabaabbababbaa.
This, of course, creates a simple code. The message ‘Binary’ is written as:
1000010 1001001 1001110 1000001 1010010 1011001
which can also be written as a single string or broken into blocks of five. The following are two kinds of computer encryption that can still be done (at an infinitely slower pace than a computer!) by hand, and so could still be used as a simple cipher.
ASCII binary numbers
A |
1000001 |
B |
1000010 |
C |
1000011 |
D |
1000100 |
E |
1000101 |
F |
1000110 |
G |
1000111 |
H |
1001000 |
I |
1001001 |
J |
1001010 |
K |
1001011 |
L |
1001100 |
M |
1001101 |
N |
1001110 |
O |
1001111 |
P |
1010000 |
Q |
1010001 |
R |
1010010 |
S |
1010011 |
T |
1010100 |
U |
1010101 |
V |
1010110 |
W |
1010111 |
X |
1011000 |
Y |
1011001 |
Z |
1011010 |
1 |
0110001 |
2 |
0110010 |
3 |
0110011 |
4 |
0110100 |
5 |
0110101 |
6 |
0110110 |
7 |
0110111 |
8 |
0111000 |
9 |
0111001 |
In a binary code, changing every bit would be very easy to decipher, but creating a repeating string of bits out of a keyword (known as a key stream sequence) means that ‘0’ can be interpreted as ‘leave’ and ‘1’ as ‘change’. This concept was introduced in 1919 by Gilbert Vernam as a way of enciphering Baudot messages. It is best expressed in a simple table and is known as the XOR operation:
Binary code can be transmitted in ‘on’ or ‘off’ pulses where a timing mechanism checks the line, say, every tenth of a second, to see if a pulse can be sensed or not. Binary code can also be transferred to paper tape by punching, or not punching, holes at regular intervals.
Vernam devised a system in which two punched tapes, one holding the plaintext, the other a key of random numbers, were fed together into an adapted teletypewriter. If two holes matched up, a hole, or pulse was transmitted. If two holes did not match up, it left a space.
This allowed instant transmission of messages typed in plaintext, automatically encrypted, and automatically decrypted by a receiver using an identical key tape. This was a huge advance in cryptography, partly because no one had to sit and encrypt or decrypt the message, which was typed in as normal, transmitted in code, yet fed out at the other end as plaintext – you didn’t need a skilled cryptanalyst at either end of the process (although you needed someone who could run the machine properly).
Using Vernam’s method, but employing today’s seven-bit ASCII language, the plaintext ‘computers’ can be encrypted using the keyword BAUDOT as follows (the keyword is repeated as often as is needed):
The ciphertext is created by adding the digits of the ASCII message and ASCII keyword, i.e. 1 + 1 = 0; 1 + 0 = 1; 0 + 1 = 1. This is called a stream cipher.
The message is decrypted by reversing this process.
Another encryption method is the block cipher, in which the bits are grouped into threes, which can then be converted into digits, using the ASCII binary numbers (see page). For example, the same ‘computers’ message can now be encrypted as follows:
Block |
100 |
001 |
110 |
011 |
111 |
001 |
101 |
101 |
000 |
010 |
101 |
Number |
4 |
1 |
6 |
3 |
7 |
1 |
5 |
5 |
0 |
2 |
5 |
Block |
011 |
010 |
100 |
100 |
010 |
110 |
100 |
101 |
010 |
011 |
Number |
3 |
2 |
4 |
4 |
2 |
6 |
4 |
5 |
2 |
3 |
This shorter ciphertext of 416371550253244264523 can still easily be converted back to binary numbers and then into characters.
With the coming of the internet and email electronic communications, messages are bounced off satellites around the globe. This is fast, but of course such signals can be picked up by anyone, so it is easy for someone with the right knowledge and software to intercept messages. Stream and block ciphers are efficient methods of encryption but suffer the problem of every key-based encryption method: distributing the key.
ALICE AND BOB
By tradition, most explanations about message verification and the use of keys utilize the names Alice and Bob – involving both sexes and offering human-sounding substitutes for A and B. The evil interceptor is usually referred to as Eve, which is either a biblical reference or short for ‘eavesdropper’.
Academics and mathematicians struggled to find a way to keep coded communication secure for decades. There is a simple theory of how to achieve message security:
Alice wants to send a secret message to Bob.
• She puts it in a box, padlocks it and sends it to Bob without the key.
• Bob attaches his own padlock to the box and returns it, again without the key.
• Alice now removes her padlock and returns the box.
• Bob can now open the box with his own key.
This is known as an asymmetric key system, in which a different key (or, if you prefer, combination) is required to decrypt rather than encrypt. In a symmetric key system, Bob would simply use a copy of Alice’s key to open the box – re-introducing the key distribution problem. The asymmetric system is fine if physical keys are used, because two different keys can (separately) lock one box. However, if the key is a code, the process won’t work because Bob has to put Alice’s box inside another box, which he then locks and dispatches. Now she can’t get inside to her original box.
PRIME NUMBERS
Prime numbers have no factors apart from 1 and themselves. So 3, 5 and 7 are prime, but 2, 4, 6, 8 and 9 are not as they can be divided by other numbers. The prime numbers used in public key encryption usually have five or more digits.
In modular arithmetic, numbers change after they reach a certain value. The clearest simple example is a clock face. Add on six hours from 9 o’clock and you reach 15 o’clock, known as 3 o’clock.
In 1976, Martin Hellman hit upon a way of using modular arithmetic to allow Alice and Bob to exchange information in a similar way to the example above, using the results of their calculations as their key, knowing that no one listening in could find the key. The method is called the Diffie–Hellman–Merkle key exchange scheme, named after Hellman and his two colleagues. However, it needed to be made more efficient, with less exchanges of information, to be usable. This was achieved by the RSA and ElGamal systems, which employ the Diffie–Hellman–Merkle method with multi-digit prime numbers. Detailed explanations of the maths involved in the scheme can be found at these websites: http://en.wikipedia.org/wiki/ Diffie-Hellman and www.vectorsite.net/ttcode_10.html
Computer encryption became a genuinely public tool in the early 1990s with the advent of Pretty Good Privacy (PGP), a computer program that provides cryptographic privacy and authentication, allowing anybody to use the RSA method of encryption without having to deal with its complex mathematics. It was invented by Phil Zimmerman and is mainly used to protect email communications, which otherwise are completely unencrypted, although it can also be used to encrypt other computer data.
Its introduction was stimulated by the extraordinary rise in internet and email communication, echoing the impact of the invention of the telegram and radio systems in the past. Communication has never been easier, and nor has interference in it. Digital communication raises a number of issues, stemming from the fact that anyone, anywhere in the world, can send and intercept messages. There are enormous opportunities for fraud, scams and interference in the workings of other computers.
Most purchases made on the internet involve the use of cryptography to protect credit card numbers or other financial information. This is done via the Secure Sockets Layer (SSL) and, increasingly, Transport Layer Security (TLS). Pages protected by SSL have an ‘https’ prefix instead of the conventional ‘http’ one, and use a blend of protocols and algorithms to enable key exchange, authentication and communication in cipher.
For most of us, thinking up a password or username to key into our computer is the nearest we get to having to create a secret code, and most people are terrible at it. Any system is only as strong as its weakest link, and passwords are by far the easiest form of encryption to attack.
A GOOD PASSWORD
So a good password should follow these rules:
• Use the maximum possible number of characters.
• Do not use recognized words or names.
• VaRy capiTaliZation.
• Include random numbers or graphic signs as well as letters. This is known as ‘salting’.
There is a basic problem: we are encouraged to create passwords we can remember easily, and discouraged from writing them down. So there is a strong temptation to use words related to the task (‘password’ is a common password!) or names of family members, and to keep the password short – studies suggest more than one in seven passwords are only three keys long. Forcing people to change their password every month isn’t effective because even if they follow the rule, they tend to choose passwords they have used already, often simply alternating them with each enforced change.
One of the biggest sins is to choose letter-only passwords, especially those that spell a word. But, of course, this is the easiest way to generate a username that you can be sure of remembering.
Someone who wants to find your password will try the most obvious choices first – family names, maiden name, mother’s maiden name. In the age of the internet, such data is relatively easy to find. The computer can also be used to test out other possible passwords through a dictionary attack. In this, a computer simply tries every word in the dictionary until it hits the right one. Powerful machines can do this in a matter of seconds. If they fail, they’ll do the same with reversed words, varied capitalization and extra numbers.
A more sophisticated way to find your password is a timing attack. This notes how long it takes you to key in your password, allowing the computer to calculate the likely number of characters in it. This system can also be used to measure how long it takes for a password to be rejected: the more time, the closer the guess is likely to be.
Another form of attack is ‘password sniffing’. This is when a hacker installs software on your computer that stores the first few keystrokes of every session, which is very likely to include your password.
The trouble is, of course, that a good, complex password is then hard to remember, so you’ll most probably need to write it down. Provided it is kept in a secure place (not on a note stuck to the computer!) this is a sensible option – your attacker is probably tapping a keyboard thousands of miles away, not snooping around your desk. This method allows you to create trickier passwords without needing to make them memorable. You’ve got the same level of security as the code books that were used for centuries.
Encryption is used to protect financial information sent by hole-in-the-wall Automatic Teller Machines (ATMs). The customer places their plastic card with its magnetic strip (and, increasingly, its identifying ‘chip’) in the machine and enters their Personal Identification Number (PIN). This is communicated to a central computer, which checks the data and ascertains if the customer is permitted to make a withdrawal – so information has to pass both ways.
The four-digit PIN is ‘padded’ with extra digits and all data is sent in encrypted form using a Data Encryption Standard (DES) cipher. The PIN obviously takes the place of a ‘password’ or ‘key’, and since there are only 10,000 possible combinations of four-digit numbers (a tiny figure compared to the multi-digit encryption possibilities generated by RSA) the ATM only allows three attempts to input the number before retaining the card. This is starting to seem generous given that a recently devised, extremely complex mathematical attack method can allegedly identify a PIN in about 15 guesses.
Standard cryptography uses the laws of mathematics. Quantum cryptography uses the (highly complex) approaches of quantum mechanics and the physics of information. Communication is via photons in optical fibres or electrons in electric current. Since these are measurable and the channel is highly sensitive, any eavesdropping is immediately detected, so communication ceases until it can be kept safe.
Additionally, quantum computers are theoretically capable of incredibly fast factoring of large numbers, and so may be able to break RSA keys and crack DES and block ciphers far faster than the present generation of conventional computers.
To summarize: quantum technology may be able to find a way to crack codes faster than ever, but also to create secure, closed communications systems.
QUANTUM MECHANICS
The theory of quantum mechanics is a completely different way of looking at the world. It replaces Newtonian mechanics and classical electromagnetism at the atomic and subatomic levels and underpins various fields of physics and chemistry, such as condensed matter physics, quantum chemistry and particle physics.
CODES IN MUSIC
Composer Edward Elgar loved puzzles and codes, and managed to create a musical puzzle of his own in his Enigma Variations, a series of musical character portraits that is one of his best-loved works. In all, 14 people and a dog are featured in the variations: the people are identified by initials, except for the 13th variation, which may have been about a lover of Elgar’s who had left England. Elgar also revealed that he ‘hid’ a well-known tune in the fabric of the score, and for many years musical detectives tried to find it. At last, in 1991, the musicologist Joseph Cooper solved the conundrum – the tune is a passage from the slow movement of Mozart’s Prague Symphony.
Elgar was the author of the Dorabella cipher, a note sent to his friend Mrs Dora Powell in 1897. It comprises 87 squiggly characters at various angles in three neat lines. No one has managed to decipher it and it is thought it may have been linked to the mystery surrounding his Enigma Variations.
Composers have long been able to use letters identified with musical notation to build words into their music. There are two methods:
• The Sol Fa scale creates the syllables do, re, mi, fa, sol, la, and si/ti, represented by the notes C, D, E, F, G, A and B.
• Western notation uses the letters A to G but in German musical tradition B is also known as H, and E flat is represented by S, providing a somewhat limited but usable alphabet.
Baroque composers often used these letters to weave the names of friends or places into their music, and Johann Sebastian Bach was particularly fond of spelling out his surname. Robert Schumann’s ABEGG Variations records the name of a woman he was in love with: Meta Abegg.
Russian composer Dmitri Shostakovich frequently represented himself with the musical motif DSCH. He also used the sequence EAEDA to represent his student Elmira Nasirova, creating the ‘word’ E, La, Mi, Re, A.
The new doughnut-shaped spy centre in Cheltenham, the centre for signals intelligence in the UK, is the latest instalment in an international saga of concealing and probing communications that goes back for centuries.
As explorers found new lands and places with which to trade, countries needed diplomats to negotiate with other governments around the world. Inevitably, their messages would be intercepted, so they started using codes to conceal their content. Thus, the ‘black chambers’ were born. In many countries from the 16th century onwards, missives sent by foreign diplomats were routinely intercepted (often through bribing lowly-paid or greedy officials), opened, copied, re-sealed and sent on their way while clerks began breaking their codes. The practice became even more widespread when Britain separated from the Catholic Church, as European countries and the papacy discussed the significance of the move and manoeuvred for political advantage from it.
This culture of secrecy and subterfuge stimulated cryptological endeavour. In 16th-century Venice, there were specialist schools on the subject, such was the demand for cryptography in this commercial and diplomatic centre. In England, the success of Elizabeth I’s spymaster Walsingham in trapping her rival Mary, Queen of Scots (see pages) was down to his efficient team of code breakers. In 1703, William Blencowe became the first Englishman to get a regular salary for cryptanalysis, receiving £100 a year and taking on the title Decrypter. Actually the job had been going for years, and Blencowe was taking over from his grandfather, John Wallis, who had trained him up in the dark art of unravelling secrets from coded writing.
NEEDLING OUT INFORMATION
One efficient method for seeing letters sealed in envelopes used a long needle rather like an old-fashioned sardine tin opening key. The needle was slipped into a corner of the envelope and turned, rolling up the paper inside. This could then be removed, copied and returned with the same method, leaving the envelope seal undisturbed. The only evidence of tampering was a small hole in the corner of the envelope.
At this time, the most active and efficient black chamber in the world was the Geheime Kabinets-Kanzlei in Vienna, Austria. Here the day’s intercepted missives arrived at 7 o’clock each morning, and were immediately dictated to secretaries to prepare the copy that the cryptanalysts would set to work decoding while the original message went on its way. Staff received financial incentives to master new languages and successful decryptions earned a substantial bonus, paid in person by the grateful king, Karl V. They even got compensation for lost payment opportunities if one of their spy colleagues succeeded in stealing solutions direct from the embassies !
The analysts worked one week on, one week off in their Viennese office in recognition of the mental strain of their job. Indeed, there is a long history of rapid weight loss, stress and even nervous breakdown associated with the people working in this field. Cryptanalysts have often reported difficulty in sleeping, and recurrent dreams in which they are faced with impossibly big searches, like finding the right pebble on a beach.
Throughout the 18th and early 19th centuries, the black chambers were a secret mini industry that recruited the brightest minds and trained them to puzzle out the secret messages of friends and foes. Such was their expertise and value that they would often be kept on by new administrations and monarchs even when other officials who had served the previous government were disposed of.
NO SUCH AGENCY
The National Security Agency (NSA) is said to be the world’s major employer of top mathematicians, to own the largest group of supercomputers, and to have a bigger budget than the CIA. This is despite it being surrounded by such secrecy that its initials were said to stand for ‘No Such Agency’.
A recurring issue in the history of code breaking is what to do with the information gathered, because of the risk that your opponent will realize their codes have been broken and so change them. Diplomats would remain tight-lipped as other ambassadors expressed opinions known to be the opposite of the view expressed in secret communication with their superiors. At one point, the Spanish government was a laughing stock among diplomats because its codes were so easy to break. They chuckled behind their hands, however, as the information they were receiving was so useful. In
CODE COURSES
A number of universities in the UK and the US now offer courses about information security in which much of the content is about cryptography. Graduates tend to go on to work as IT security managers or consultants.
World War II, the British sometimes took no action over decrypted messages for fear of alerting the Germans to their ability to read Enigma messages, and at the time of the Pearl Harbor attack, the Americans knew the Japanese were going to break off diplomatic relations before their own ambassador could deliver the message.
An amusing variation on this theme of whether to reveal or conceal what you know is the story from Henry II’s siege of Réalmont in 1628. A decoded intercepted message revealed the defenders had few supplies left. Henry sent in the decoded plaintext of their letter and they surrendered, knowing they had no chance of success.
After World War I, the US founded MI-8, a code-breaking team under the guise of a New York commercial code production company. It was briefly closed down in 1931 on the orders of Secretary of State Henry Stimson, allegedly with the comment ‘Gentlemen don’t read other gentlemen’s mail’. This deprived its key worker, Herbert Yardley, of an income so he wrote a book about his work, which alerted the Japanese to the fact that America had broken its codes, which were subsequently completely redeveloped.
Later reformed, MI-8 became the National Security Agency (NSA), combining its work with the Central Security Service (CSS).
Britain’s black chamber moved to Room 40 at the Admiralty in London during World War I. It later evolved into Bletchley Park and various other sites, and today the Government Communications Headquarters (GCHQ) at Cheltenham is the UK centre for signals intelligence and information protection.
Both the American and British secrets-busting organizations run websites explaining some of what they do (see pages).