In 2016, the Norwegian Consumer Council evaluated three Internet-connected dolls. The group found that the companies’ terms of use and privacy policies showed a “disconcerting lack of regard for basic consumer and privacy rights,” and were “generally vague about data retention,” and that two of the toys “transfer personal information to a commercial third party, who reserves the right to use this information for practically any purpose, unrelated to the functionality of the toys themselves.” It gets worse:
It was discovered that two of the toys have practically no embedded security. This means that anyone may gain access to the microphone and speakers within the toys, without requiring physical access to the products. . . .
Furthermore, the tests found evidence that voice data is being transferred to a company in the US, who also specialize in collecting biometric data such as voice-fingerprinting. Finally, it was revealed that two of the toys are embedded with pre-programmed phrases endorsing different commercial products, which practically constitutes product-placement within the toys themselves.
I use one of the dolls, My Friend Cayla, as a demonstration in the Internet security policy class I teach at Harvard Kennedy School. It’s ridiculously easy for even my nontechnical students to hack. All they need to do is open up their phone’s Bluetooth control panel and connect to the doll from their seats. They can then eavesdrop on what the toy hears, and send messages through the toy’s speakers. It’s a super-creepy demonstration of how bad the security of a commercial product can get. Germany banned My Friend Cayla because it’s effectively an eavesdropping device that leaves the audio it records unprotected on the Internet, although it’s still for sale in other countries. And it’s not just one-off dolls; Mattel’s Hello Barbie had similar problems.
In 2017, the consumer credit-reporting agency Equifax announced that 150 million Americans, just under half of the population, had had their personal data stolen. The attackers gained access to full names, Social Security numbers, birth dates, addresses, and driver’s license numbers—exactly the information needed to commit the identity theft frauds I talked about in Chapter 4. This was not a sophisticated attack, and we still have no idea who did it. The attackers used a critical vulnerability in the Apache website software that had been patched two months earlier. Equifax had been notified by Apache, US-CERT, and the Department of Homeland Security about the vulnerability, but didn’t get around to installing the patch until months after the attackers used it to breach the network. The company’s insecurity was incredible. When I testified about it to the House Energy and Commerce Committee, I called it “laughably bad.” And it wasn’t an isolated incident; Equifax had a history of security failures.
I wish these were exceptional stories, but they’re not. It really is that bad out there. And without some serious intervention, it won’t get any better.
In a nutshell, what we need to do is to engineer “security by design.” For the engineering reasons discussed in Chapter 1 and the political/market reasons discussed in Chapter 4, security often takes a back seat to speed of development and additional features. Even for larger companies that should know better, computer security is traditionally regarded as a compliance exercise that both slows and adds cost to development. It’s been shoehorned in at the end of a development process, hastily and not very effectively. This has to change. Security needs to be engineered into every system, and every component of every system, from the beginning and throughout the development process.
I admit this sounds obvious, but you have to remember that security isn’t something that was designed into the Internet from the beginning, and it’s not something that the market generally rewards. It’s a bit like the slow process by which we all convinced automobile companies, through regulation and market pressure, to embrace “fuel efficiency by design.”
Highly regulated industries like avionics and medical devices already employ security by design. We also see it in banking applications, and from operating system companies like Apple and Microsoft. But the practice needs to spread beyond those isolated cases.
We need to secure the Internet+. We need to secure our software, data, and algorithms. We need to secure our critical infrastructure and our computing supply chain. We need to do it comprehensively, and we need to do it now. This chapter is an attempt to work out some broad outlines of what that might look like. I’m focusing on the what, saving the how and the who for the following two chapters.
To be fair, these are only the basics, and there are many subtleties to the threats discussed in Part I that I won’t address at all. The recommendations in this chapter aren’t meant to be definitive; they’re a starting place for discussion. All the principles proposed here need to be expanded, and eventually make their way into voluntary or mandatory industry standards.
But if we don’t start somewhere, we’ll never get to the hard stuff.
When the Internet was nascent, it made some sense to let anything connect to it, but that’s no longer tenable. We need to establish security standards for computers, software, and devices. That might sound easy, but it’s not. Because software is now embedded into everything, this quickly turns into security standards that encompass everything—which is simply too broad to be sensible.
But if everything is a computer, we need to think about holistic design principles. All devices need to be secure without much intervention by users. And while it is fine to have different levels of security in response to different threats, everything should start from a common base.
To that end, I offer ten high-level design principles to improve both the security and the privacy of our devices. While these are not specific enough to be standards, they are a basis from which standards can be developed.
Those principles, and some of the items in the next section, are from a working group on national security—of which I am a member—organized by the Berkman Klein Center for Internet and Society and funded by the Hewlett Foundation.
None of these principles is new or radical. While researching for this book, I collected 19 different security and privacy guidelines for the IoT, created by the IoT Security Foundation, the Online Trust Alliance, the state of New York, and other organizations. They’re all similar, which makes them a good indication of what security professionals think should be done. But they’re all voluntary, so no one actually follows them.
Just as we need security design principles for computers, we need them for data. It used to be that the two were basically the same, but today they are separate. We no longer store our most personal data on computers that are physically close to us; we store them in the cloud, on massive servers owned by others—possibly in other countries.
Often our data, too, is owned by others—collected without our knowledge or consent. These databases are tempting targets for attackers of all stripes. We need security principles surrounding data and databases that would apply to all organizations that keep personal databases:
Critical to any rules covering personal information will be a definition of what personal information is. Traditionally, we’ve defined it very narrowly. “PII”—personally identifiable information—has been the term used. That’s not sufficient. We now know that all sorts of information can be combined to identify individuals, and that anonymizing data is much harder than it seems. We need very broad definitions of what counts as personal information—for example, data from apps on your phone, and even the list of add-ons installed in your browser—and thus needs to be protected as such.
These criteria appear in my 2015 book Data and Goliath. Most of them are part of the EU’s General Data Protection Regulation, which I’ll talk about in Chapter 10. Again, they are general design principles. And they are probably the hardest sell in this chapter. Companies will fight being forced to secure their devices, but it will benefit them in the long run. Rules about securing data have the potential to threaten surveillance capitalism. Companies will argue that they need to collect everything for possible future analysis, to train machine-learning systems, and because it might be valuable someday. But we’ll need such rules as databases of personal information become larger and ever more personal.
Much of this data will be in the cloud. This trend is a matter of simple economics and will be the model of computing for the foreseeable future. In many ways, this is a good thing. In fact, I think that people moving their data and processing into the cloud is our most fruitful avenue for security improvements. Already, Google does a better job of securing our data than most individuals or small businesses can do themselves. Cloud providers have both the security expertise and economies of scale that individuals and small businesses lack, and anything that gives people security without their having to become security experts is a win.
Still, there are risks: having multiple different users on the same network increases the opportunities for internal hacking, and large cloud providers—like large databases of personal data—are enticing targets for powerful attackers. We need significantly more research in cloud security. While most of the principles I listed in this section are germane to the amassers of personal databases, some also apply to cloud computing providers.
We expect a lot from our algorithms. And as they continue to replace human beings in decision processes, we’re going to need to trust them absolutely. At a high level, we expect accuracy, fairness, reproducibility, respectfulness of human and other rights, and so on. I’m focusing on security.
The threat is basically that an algorithm will behave in an unintended manner, either because it was programmed badly or because its data or software was hacked. Transparency is an obvious solution. The more transparent an algorithm is, the more it can be inspected and audited—for security or any other property we want our algorithms to have.
The problem is that transparency isn’t always achievable in algorithms, or even desirable. Companies have legitimate trade secrets that they need to keep confidential. Transparency can represent a security risk, because it gives attackers information that can help them game the system. For example, knowing Google’s algorithm for page ranking can help people optimize websites for the algorithm, and knowing the military’s algorithm for identifying people by drone can help individuals hide.
Additionally, transparency is not always sufficient. Modern algorithms are so complex that it’s not even feasible to determine if they’re accurate, let alone fair or secure. Some machine-learning algorithms have models that are simply beyond human comprehension.
That last point is important. Sometimes transparency is impossible. No one knows how some machine-learning algorithms work, including their designers. The algorithms are fundamentally incomprehensible to humans. Think of them as black boxes: data goes in, decisions come out, and what happens in between remains something of a mystery.
Even if an algorithm can’t be made public, or if there is no way to understand how it works, we can demand explainability. That is, we can demand that algorithms explain their reasoning. So, for example, when an algorithm makes a medical diagnosis or scores a job candidate for suitability, it can also be required to provide reasons for its decisions.
This isn’t a panacea. Because of the way machine learning works, explanations might not be possible or understandable by humans, and requiring them often reduces the accuracy of the underlying algorithms because it forces them to be simpler than they would otherwise be.
So maybe what we really want is accountability. Or contestability. Perhaps we need the ability to inspect an algorithm, or interrogate it with sample data and examine the results. Maybe all we need is auditability.
If nothing else, we can treat algorithms like humans. Humans are terrible at explaining their reasoning, and their decisions are filled with unconscious biases. Too often, an explanation—a logical series of steps taken to reach a decision—is really nothing more than a justification. Our subconscious brain makes the decision, and the conscious brain justifies it with an explanation. The psychological literature is filled with studies that demonstrate this.
Still, we are able to judge humans’ biases by looking at their decisions. Similarly, we can judge algorithms by looking at their outputs. After all, what we want to know is if an algorithm used to score job candidates is sexist, or if an algorithm used to make parole decisions is racist. And we might decide that, for some applications, machine-learning algorithms are simply not appropriate, because we want more control over how a decision is made.
I don’t have any concrete recommendations for how we can secure our algorithms, because this is all too new. We are just getting started figuring out what is possible and feasible. Right now, our goals should be as much transparency, explainability, and auditability as possible.
Most of us connect to the Internet through one or more ISPs. These are large companies like AT&T, Comcast, BT, and China Telecom, and they are very powerful. A 2011 report calculated that the top 25 telecommunications companies in the world connect 80% of all Internet traffic. This centralization might be bad for consumer choice, but it affords us a potential security benefit. Because ISPs sit between our homes and the rest of the Internet, they are in a unique position to provide security—especially for home users. We need some security principles for ISPs:
This list draws from a paper by cybersecurity consultant Melissa Hathaway, who was a senior policy advisor to former presidents George W. Bush and Barack Obama.
These principles would grant ISPs considerable power, and that comes with considerable danger. If ISPs can configure users’ security, they can configure it to allow government access. And if they can discriminate between different types of traffic, they can violate net neutrality for all sorts of economic or ideological reasons. These are real concerns, and we need better policies to alleviate them. But users shouldn’t have to be security experts to use the Internet safely, and ISPs will have to step up as a first line of defense.
Heartbleed is the cool name that researchers gave to a serious vulnerability in OpenSSL, the encryption system that protects your web browsing. If the connection between your web browser and the website you’re reading is encrypted, the encryption is likely done by OpenSSL. The protocol is public, and the code is open-source. Heartbleed was discovered in 2014, two years after it was accidentally introduced in the software. It was a huge vulnerability—at the time I called it “catastrophic”—affecting an estimated 17% of the Internet’s web servers, as well as end-user devices, from servers to firewalls to power strips.
The vulnerability allowed attackers to find usernames and passwords, account numbers, and more. Fixing Heartbleed was a massive undertaking, requiring coordination among websites, certificate authorities, and web browser companies around the world.
Two factors precipitated Heartbleed. One: OpenSSL, a critical piece of software, was maintained by one person and a few helpers, all working for free in their spare time. Two: no one had subjected OpenSSL to a good security analysis. It’s a classic collective action problem. The code is open-source, so anyone can evaluate it. But everyone thought that someone else would evaluate it, so no one took the effort to actually do it. The result was that the vulnerability remained undetected for over two years.
In response to Heartbleed, industry created something called the Core Infrastructure Initiative. Basically, the big tech companies all got together and established a testing program for open-source software that we all rely on. It’s a good idea that should have been done a decade earlier, but it’s not enough.
In Chapter 1, I explained that the Internet was never designed with security in mind. That was okay when the Internet was primarily at research institutions and used primarily for academic communication. That’s less okay today, when the Internet supports much of the world’s critical infrastructure.
ISPs do more than connect consumers to the Internet. “Tier 1” ISPs manage the Internet backbone, running the large, high-capacity networks around the world. These are companies you have likely never heard of—Level 3, Cogent, GTT Communications—because end users aren’t their customers. These companies can also do more to secure the Internet:
There are other things Tier 1 ISPs could do that involve monitoring traffic and interdicting attacks. For instance, they could block all sorts of things: spam, child pornography, Internet attacks, and so on. All of these things, however, currently require ISPs to engage in bulk surveillance of Internet traffic, and they won’t work if the traffic is encrypted. And given the choice, we are much more secure if Internet traffic is end-to-end encrypted. I’ll talk more about that in Chapter 9.
In 2008, unidentified hackers broke into the Baku-Tbilisi-Ceyhan oil pipeline in Turkey. They gained access to the pipeline’s control system and increased the pressure of the crude oil flowing inside, causing the pipe to explode. They also hacked the sensors and video feeds that monitored the pipeline, preventing operators from learning about the explosion until 40 minutes after it happened. (Remember what I said in Chapter 1 about new vulnerabilities in the interconnections? The attackers got into the pipeline control systems through a vulnerability in the communications software of those video cameras.)
In 2013, we learned that the NSA had hacked into the Brazilian national oil company’s network. The NSA’s purpose was almost certainly to gather intelligence and not attack. I’ve already mentioned Iran’s 2012 cyberattack against Saudi Aramco, the Saudi national oil company, and Russia’s 2015 and 2016 cyberattacks against the Ukrainian power grid. In 2017, someone was able to spoof the GPS that ships use to navigate, fooling them as to their location.
I wrote in Chapter 4 that we’re in the middle of an increasingly asymmetrical cyber arms race. I wrote about the increasing asymmetry in this regard. With nonstate actors like terrorists, the asymmetry is even greater. We need to better secure our critical infrastructure in cyberspace.
Before we can do that, though, we’ll need to decide what counts as “critical infrastructure.” The term is complex and ambiguous, and what counts varies with shifts in technological and social developments. In the US, a series of documents from the White House and the Department of Homeland Security outlines what the government counts as critical infrastructure. A 2013 presidential directive identified 16 “critical infrastructure sectors.” Much of what’s included is obvious, like air transportation, oil production and storage, and food distribution. Some of it makes less sense, like retail centers and large sports stadiums. Yes, those are places where large numbers of people gather, and it would be a national tragedy if a bomb killed hundreds or thousands in any of those places, but they hardly seem critical in the same way that the power grid does.
If everything is a priority, then nothing is a priority. We need to make some hard choices, designating certain sectors as more vital than others. The 2017 US National Security Strategy identified six key areas: “national security, energy and power, banking and finance, health and safety, communications, and transportation.” Some people add election systems. I think that energy, finance, and telecommunications are the first three to focus on, because they underpin everything else. If we’re looking for where to find most of the near-term catastrophic risks discussed in Chapter 5, it’s there. And it’s where we’ll get the most security for our money.
Why aren’t we doing more to secure critical infrastructure today? There are several reasons:
One: it’s expensive. The threat model we need to defend against is often a sophisticated foreign military unit fielding highly skilled professional attackers. This isn’t easy, and it isn’t cheap.
Two: it’s easy for both the public and policy makers to discount future hypothetical risks. Until US citizens experience an actual cyberattack against critical infrastructure in the US—neither the North Korean attack against Sony or attacks against other countries like Saudi Arabia and Estonia count here—it’s not going to be a priority.
Three: the political process is complicated. President Obama designated 16 broad sectors as part of our critical infrastructure in order to ensure that every industry felt properly recognized. Any attempt to prioritize will be met with resistance from industries that feel slighted by a lower ranking. So, while it might be easy for me to say that our power grid and telecommunications infrastructure should be secured first because everything else is built on top of them, it’s harder for the government to say that.
Four: the government doesn’t have direct control over most of our critical infrastructure. You’ll often hear that 85% of the US critical infrastructure is in corporate hands. That statistic comes from a 2002 document issued by the Office of Homeland Security, and seems to be a rough guess. Certainly, it depends on which industry we’re talking about. As I explained earlier, private owners are more likely to underspend on security because it’s more profitable to save money every year and take the risk.
And five: spending money on infrastructure isn’t sexy. Even when a country touts its infrastructure investments, it usually means building shiny new bridges rather than repairing rickety old ones. Despite both Presidents Obama and Trump touting their infrastructure investments, spending to maintain what already exists isn’t a priority; just look at our crumbling national infrastructure in so many areas. This problem can be even worse when it comes to security. These expenditures have a long time horizon, and it’s hard to take credit for nothing going wrong. By the time it is obvious that the spending was justified, the politician who approved it might no longer be in office.
That we need to secure our critical infrastructure from cyberattack isn’t a new or controversial idea, and governments, industry groups, and academia have conducted many studies of the issue. The challenges are considerable, though. I’m not discussing specifics, because this book is meant to be general, but any defense will necessarily need to be dynamic, integrating a disparate array of people, organizations, data, and technical capabilities. And our infrastructure is made up of complex systems, with gazillions of subsystems and subcomponents—some of which have been around for decades. Fixing any of this will be expensive, but it’s doable.
One of the top-secret NSA documents disclosed by Edward Snowden was a presentation that contained a slide with then–NSA director Keith Alexander’s motto: “Collect it all.” A similar motto for the Internet+ today might be “Connect it all.” Maybe that’s not such a good idea.
We need to start disconnecting systems. If we cannot secure complex systems to the extent required by their real-world capabilities, then we must not build a world where everything is computerized and interconnected. It’s part of what I meant when I talked about engineering security by design at the beginning of this chapter: if we’re building a system and the only way to secure it is by not connecting it, that should be considered a valid option.
This might be regarded as heresy in today’s race to network everything, but large, centralized systems are not inevitable. Technical and corporate elites may be pushing us in that direction, but they really don’t have any good supporting arguments other than profit maximization.
Disconnecting can happen in several ways. It can mean creating separate “air gapped” networks. (These have vulnerabilities as well, and are not a security panacea.) It can mean going back to non-interoperable systems. And it can mean not building connectivity into systems in the first place. There are also incremental ways to do this. We can enable local communications only. We can design dedicated devices, reversing the current trend of turning everything into a general-purpose computer. We can move towards less centralization and more-distributed systems, which is how the Internet was first envisioned.
This is worth explaining. Before the Internet, the telephone network was smart. Complex call-routing algorithms resided inside the network, whereas the telephones that connected to it were dumb. Before the Internet, this was also the model for other computerized networks. The Internet turned that model on its head. Most of the smarts were pushed to computers at the edge of the network, and the network became as dumb as possible—a change that made the Internet a hotbed of innovation. Anyone could invent something new—a new piece of software, a new mode of communication, a new hardware device—and as long as it conformed to the basic Internet protocols, it could connect. There was no certification process, no centralized approval system—nothing. Smart devices, dumb network. For students of Internet architecture, this is called the “end-to-end principle.” And by the way, it’s what everyone in favor of network neutrality wants to preserve.
I anticipate that we will eventually reach a high-water mark of computerization and connectivity. There will be a backlash. It won’t be driven by the market, but by norms and laws and policy decisions that put the safety and welfare of society above individual corporations and industries. It will require a major social shift, and a hard one for many to swallow, but our safety will depend on it.
Henceforth, we will make conscious decisions about what and how we interconnect. We can draw an analogy with nuclear power. The early 1980s saw a dramatic rise in the use of nuclear power, before we recognized that it was just too difficult and dangerous to secure nuclear waste. Today, we still have nuclear power, but there’s somewhat more consideration about when and where to build nuclear plants, and when to choose one of the many alternatives. Someday, computerization is going to be like that.
But not today. We’re still in the honeymoon phase of connectivity. Governments and corporations are punch-drunk on our data, and the rush to connect everything is driven by an even greater desire for power and market share.