From generation to generation, we pass down our values. We do so in the hope that our children will hold close to these principles until they are mature enough to develop their own, and in turn teach those to their children. These core norms are sometimes referred to as basic morality.
Now, perhaps for the first time in human history, we are confronted by artificial entities capable of making complex decisions and following advanced rules. What values should we teach them?1
In order to answer this question, we need to ask two more: one moral: “how do we choose the norms?”; and one technical: “once we have decided on those norms, how do we impart them into the AI?”. Chapters 6 and 7 of this book have suggested that the way to determine which values are relevant from time to time is by building institutions capable of sourcing informed opinion from the wider public, as well as various stakeholders.
This chapter is not intended to be a comprehensive ethical bible for AI,2 nor is it a manual for creating safe and reliable technology.3 Rather, it aims to suggest the types of rules which could form minimum building blocks for future regulations. That said, the categories suggested below are intended as being indicative rather than a closed list. Returning to the analogy of a pyramid regulatory structure suggested in Section 2.2 of Chapter 6, the various potential “Laws” discussed in this chapter are intended as candidates to sit at the top of a heirarchy of norms for AI, and to be applied to all of its applications: internationally and across different industries.
No doubt the rules and mechanisms for achieving them will shift and grow over time. But just as the collection of social principles which make up human morality must have started somewhere, so too must rules for robots.
1 Laws of Identification
1.1 What Are Laws of Identification?
An autonomous system should be designed so that it is unlikely to be mistaken for anything besides an autonomous system, and should identify itself at the start of any interaction with another agent.5
In Walsh’s view, this law might be similar to requirements that toy guns be identified by having a brightly coloured cap at the end in order to make clear that they are not real weapons.6
Oren Etzioni, the Chief Executive Officer of AI2, an AI research insitute,7 has proposed a slightly different formulation: “…an A.I. system must clearly disclose that it is not human”.8 Etzioni’s rule is expressed in the negative: the system must say that it is not human, but it need not say that it is AI.9 The problem with Etzioni’s version is that not all AI resembles or even imitates humans—most does not. It is more helpful for an entity to say what it is, rather than what isn’t. Of the two, Walsh’s law of identification is to be preferred.
1.2 Why Might We Need Laws of Identification?
Laws of identification are useful for several reasons:
First, they play an instrumental role in enabling or assisting the function of all other rules unique to AI. It will be more difficult, time-consuming and costly to implement other laws applicable to AI if we cannot distinguish which entities are subject to them.
Secondly, given that AI acts differently to humans in certain conditions, having some idea of whether an entity is human or AI will make its behaviour more predictable to others, increasing both efficiency and safety.10 If a person runs out in front of a vehicle travelling at 70 miles per hour, the average human driver may not be able to react quickly enough to take evasive action, whereas an AI system might well be able to.11 In other situations—particularly those requiring “common sense”—AI (at least at the moment) is likely to be significantly inferior to a human.12 AI cars might be adept on a motorway, but judging complex or unusual elements such as unexpected roadworks or a protest march on the streets may present greater difficulty. Just as we might speak differently to a young child, we might wish to instruct an AI in a different manner to humans, both for our protection and that of the AI. We can only do this if we know what we are talking to.
Thirdly, AI identification may be necessary in order to administer particular activities fairly. A human poker player would want to know that she is playing against another human when she puts down a $5000 stake—as opposed to a potentially unbeatable AI system.13
Fourthly, identification can allow people to know the source of communications. A 2018 report on the malicious use of AI highlighted as a major concern: “[t]he use of AI to automate tasks involved in… persuasion (e.g. creating targeted propaganda), and deception (e.g. manipulating videos) may expand threats associated with privacy invasion and social manipulation”.14
The anonymity of social media can allow a small number of individuals to project a far greater influence than if they were acting in person, especially if they control a network of bots spreading their content and/ or interacting with human users. Whilst an AI identification law will not outlaw malicious use, it may make the exploitation of social media more difficult by minimising the opportunities for nefarious actors.
1.3 How Could Laws of Identification Be Achieved?
Due to their inherent dangers, some products and services can only be offered lawfully if appropriate warnings are provided. Users of heavy machinery are typically warned not to operate it when under the influence of alcohol or other drugs. It is common to see foods labelled “Warning, may contain nuts”.15 Products might one day be required to display the sign: “Warning, may contain AI!”.16
Given the multiplicity of AI systems and types, there is unlikely to be a single technological solution to implementing identification laws. Therefore, a law of identification for AI should be crafted in general terms, leaving it to individual designers to implement. Prompted by a submission from Toby Walsh,17 the New South Wales Parliament’s Committee on Driverless Vehicles and Road Safety has proposed: “[t]he public identification of automated vehicles to make them visually distinctive to other road users, particularly during the trial and testing phase”.18
Periodic inspections and tests might be used to address whether or not an entity is AI. This may sound like the plot of the popular sci-fi film Blade Runner19 in which the protagonist Deckard is tasked by the police with hunting down “replicants”: bioengineered androids. However, testing and inspection regimes for safety, contraband or customs and excise purposes are common features in the transport and supply of many goods and services. Similar investigative measures as are currently used by law enforcement agencies to track malicious software and hacking (and no doubt new ones to be developed) might be utilised to monitor the proper labelling of AI.
An identification law would not be particularly useful if non-AI entities were able to masquerade as AI. False positives would reduce trust in a system of identification, undermining its utility as a signalling mechanism. For this reason, any law of identification should cut both ways and prohibit non-AI entities from being labelled as containing AI, in the same way that a food producer may face penalties already if it describes an item as “suitable for vegetarians” when it contains animal products.20
2 Laws of Explanation
2.1 What Are Laws of Explanation?
Laws of explanation require that AI’s reasoning be made clear to humans. This could include a requirement that information is provided on the general decision-making process of the AI (transparency ) and/or that specific decisions are rationalised after they have occurred (an individualised explanation).
2.2 Why Might We Need Laws of Explanation?
Two main justifications are usually offered for explainable AI: instrumentalist and intrinsic. Instrumentalism focusses on explainability as a tool to improve the AI and to correct its errors. The intrinsic approach focusses on the rights of any humans affected. Andrew Selbst and Julia Powles explain that “the intrinsic value of explanations tracks a person’s need for free will and control”.21
… the effectiveness of [AI] systems will be limited by the machine’s inability to explain its thoughts and actions to human users. Explainable AI will be essential, if users are to understand, trust, and effectively manage this emerging generation of artificially intelligent partners.24
2.3 How Could Laws of Explanation Be Achieved?
2.3.1 The Black Box Problem
The main difficulty with implementing laws of explanation is that many AI systems operate as “black boxes”: they may be adept at accomplishing tasks but even their own designers may be unable to explain what internal process led to a particular output.25
There is of course a tradeoff between the representational capacity of a model and its interpretability, ranging from linear models (which can only represent simple relationships but are easy to interpret) to nonparametric methods like support vector machines and Gaussian processes (which can represent a rich class of functions but are hard to interpret). Ensemble methods like random forests pose a particular challenge, as predictions result from an aggregation or averaging procedure. Neural networks, especially with the rise of deep learning , pose perhaps the biggest challenge—what hope is there of explaining the weights learned in a multilayer neural net with a complex architecture?26
Similarly, Jenna Burrell of the UC Berkeley School of Information has written that in machine learning there is an “an opacity that stems from the mismatch between mathematical optimization in high-dimensionality characteristic of machine learning and the demands of humanscale reasoning and styles of semantic interpretation”.27 The difficulty is compounded where machine learning systems update themselves as they operate, through a process of backpropagation and re-weighting their internal nodes so as to arrive at better results each time. As a result, the thought process which led to one result may not be the same as used subsequently.
2.3.2 Semantic Association
One explanation technique to provide a narrative for individualised decisions is to teach an AI system semantic associations with its decision-making process. AI can be taught to perform a primary task—such as identifying whether a video is displaying a wedding scene—as well as a secondary task of associating events in the video with certain words.28 Upol Ehsan, Brent Harrison, Larry Chan and Mark Riedl have developed a technique which they describe as “AI rationalization, an approach for generating explanations of autonomous system behavior as if a human had performed the behavior”.29 The system asks humans to explain their actions as they undertake a particular activity. The associations between techniques adopted by the AI player and the natural language explanation are recorded so as to create a set of labelled actions. A human player in a platform game might say: “The door was locked, so I searched the room for a key”. An AI system learns to play the game independently of human training, but when its actions are matched to the human descriptions, a narrative can be generated by sewing together these descriptors.
Language agnostic—The language wars in data science between python, R, scala, and others will continue on forever. We will always need a mix of languages and frameworks to enable advancements in a field as broad as data science. However, if tools enabling data versioning/provenance are language specific, they are unlikely to be integrated as standard practice.
Infrastructure Agnostic—The tools should be able to be deployed on your existing infrastructure—locally, in the cloud, or on-prem.
Scalable/distributed—It would be impractical to implement changes to a workflow if they were not able to scale up to production requirements.
Non-invasive—The tools powering data versioning/provenance should be able to integrate effortlessly with existing data science applications, without a complete overhaul of the toolchain and data science workflows.30
2.3.3 Case Study: Explanation of Automated Decision-Making Under the GDPR
The EU ’s flagship data protection legislation, the General Data Protection Regulation 2016 (GDPR),31 contains a set of provisions which, read together, arguably amount to a legal right to explanation of certain decisions made by AI.32
Breaching an article of the GDPR can have serious economic consequences: a fine of up to 4% of a company’s annual global turnover or €20 m, whichever is higher.33 The legislation has a wide territorial scope, applying not only to organisations located within the EU but it will also apply to organisations located outside of the EU which process data in order to offer goods or services to, or monitor the behaviour of, EU residents.34
…the controller [of personal data] shall, at the time when personal data are obtained, provide the data subject with the following further information necessary to ensure fair and transparent processing: […] the existence of automated decision-making, including profiling… and, at least in those cases, meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing for the data subject. (emphasis added)36
A major problem with the apparent right to explanation under the GDPR is that there is great uncertainty as to what the words of the regulation actually require.37 Several key terms are not defined. Nowhere does the GDPR say what “meaningful information” means. It might amount at one end of the scale to a data dump of thousands of lines of impenetrable source code; data providers might be reasonably willing to provide such material, but it would be of very little use to the average person. At the other end of the scale, the word meaningful might entail an individualised description in everyday language with a view to making the relevant process accessible and intelligible to a non-expert.38
The term “logic involved” is similarly nebulous. The reference to logic is a strong indication that the framers of the GDPR had in mind non-intelligent expert systems, which follow deterministic “yes/no” logic trees in order to reach a known output, based on a known input. The idea of a right to explanation —or at least “meaningful information about the logic involved”—makes sense with regard to such systems which may be highly complex but are ultimately static in nature. With a logic tree, one can always trace back through each step the reasoning which led to an outcome; the same cannot necessarily be said of a neural network.
The concept of a right to explanation of the “logic involved in any automatic processing of data concerning [a human] at least in the case of the automated decisions” is not new. In fact, this language was lifted from Article 12(a) of the GDPR’s predecessor: the Data Protection Directive of 1995.39 The Data Protection Directive was created well before the current resurgence in AI. During the long gestation period of the GDPR, the technology has moved on rather more quickly than the wording in the legislation.
… processing [of personal data] should be subject to suitable safeguards, which should include specific information to the data subject and the right to obtain human intervention, to express his or her point of view, to obtain an explanation of the decision reached after such assessment and to challenge the decision. (emphasis added)
Recitals to EU legislation are not formally binding, but they may in some circumstances be used as an aid to interpretation of the laws themselves and for this reason are often subject to extensive negotiation.40 As such, it is not clear whether the “right to explanation ” in the Recital has the effect of expanding the substantive rights set out in the Articles, which appear to be more limited.
Three Oxford-based academics, Wachter, Mittelstadt and Floridi, discount Recital 71 , arguing that “the GDPR does not, in its current form, implement a right to explanation [of individual decisions], but rather what we term a limited ‘right to be informed’”.41 Wachter et al. note that an explicit right to explanation of individual decisions had been included in earlier drafts of the GDPR but was removed during negotiations.42
The reason we went into some detail on this issue is that… the new socio-technical environment described there - that is, in the very near future - “smart” (expert) computer systems will be increasingly used in decision-making by both private- and public-sector agencies, including law enforcement agencies. Reliance on sophisticated computer-generated “profiles” (and in particular dynamically-generated profiles, in which the algorithm itself is amended by the computer as it “learns”), in any of these contexts, in our view undoubtedly fall within the scope of the provision. This provision is therefore one that requires urgent elaboration and clarification…45
Unfortunately, this warning was not heeded; the GDPR simply reproduced the problematic terms.
The growth and complexity of machine-learning can make it challenging to understand how an automated decision-making process or profiling works. The controller should find simple ways to tell the data subject about the rationale behind, or the criteria relied on in reaching the decision without necessarily always attempting a complex explanation of the algorithms used or disclosure of the full algorithm.47
The Article 29 Working Party adopted a robust stance towards the obligation, declaring that “[c]omplexity is no excuse for failing to provide information to the data subject”.48
The GDPR came into force in May 2018. If applied rigorously, the so-called right to explanation might entail the explainable AI movement going from academic and governmental research projects to binding law. The Article 29 Working Party seeks to plot a course between providing data subjects with sufficient information, but not at the same time requiring AI designers to reveal all of their proprietary designs and trade secrets. Whether this can be achieved in practice remains to be seen.
Even if one concludes that a right to explanation of some kind is morally justified, it seems that the indeterminate language in the GDPR is a poor means of achieving this aim. Sooner or later, the interpretation of this provision is likely to come before the Court of Justice of the EU , which has sometimes taken a highly expansive approach to EU legislation, especially where individual rights are concerned.49 Leaving such hostages to fortune is risky, particularly in circumstances when the EU ’s competitors may capitalise on the advantage arising from overzealous rules in a rival jurisdiction.
2.3.4 The Limits of Explanation
Is it possible to avoid a trade-off between functionality and explainability in AI? In semantic labelling exercises, the AI system’s operations are unaffected but the human participants are describing what they would do in a given situation, not what the AI is doing. Researchers from the Max Planck Institute and UC Berkeley developed a semantic labelling technique in 2016, but wrote of it: “[i]n this work we focus on both language and visual explanations that justify a decision by having access to the hidden state of the model, but do not necessarily have to align with the system’s reasoning process”.50
Whereas identifying animals in pictures or playing computer games might be readily explainable in human language, certain other tasks at which AI is especially adept are not. Even in the realm of games certain techniques discoverable by AI may not be to humans, with the result that the AI takes an action for which no human explanation has been recorded. In March 2018, scientists announced that an AI system had found a novel way to win the classic Atari computer game Q*bert.51 One of the major advantages of AI is that it does not think as humans do. Requiring AI to limit itself to operations which humans can understand might tether the AI to human capabilities such that it never fulfils its true potential.
Such ways of opening the black box of AI [i.e. semantic labelling]… work up to a point. But they can go only as far as a human being can, since they are, in essence, aping human explanations. Because people can understand the intricacies of pictures of birds and arcade video games, and put them into words, so can machines that copy human methods. But the energy supply of a large data centre or the state of someone’s health are far harder for a human being to analyse and describe. AI already outperforms people at such tasks, so human explanations are not available to act as models.52
2.3.5 Alternatives to Explanation
Transparency may at best be neither a necessary nor sufficient condition for accountability and at worst something that fobs off data subjects with a remedy of little practical use53
It should be recalled human thought process can be just as impenetrable as AI. Even the most advanced brain scanning techniques lack the ability to explain human decisions with any precision.54 It might be thought that even if we cannot see inside the brain, at least humans can explain themselves using natural language. However, modern psychological research suggests that the association of our actions with reasons represents to some extent the creation of a retrospective fictional narrative which may have little connection to underlying motivations.55 It is for this reason that humans are susceptible to deliberate cues which act on our subconscious, such as “priming” or “nudging”.56
Core public agencies, such as those responsible for criminal justice, healthcare, welfare, and education (e.g. “high stakes” domains) should no longer use “black box” AI and algorithmic systems57
By treating the governance of AI as a question of optimizations, we can focus the necessary argument on what truly matters: What is it that we want from a system, and what are we willing to give up to get it?59
Elizabeth I of England said of religious tolerance: “I would not open windows into men’s souls”.60 Many legal rules work this way, concentrating primarily on actions, not thoughts.61 Explainability is best seen as a tool for keeping AI’s behaviour within certain limits. The following sections address further means for achieving this goal.
3 Laws on Bias
In theory, AI ought to offer complete impartiality, free from human fallibilities and prejudices. Yet in many cases this has not happened. Newspaper stories and academic papers abound with examples of apparent AI bias 62: from AI-judged beauty contests which name only caucasian winners63 to law enforcement software which used race to determine whether people were likely to commit crimes in the future,64 AI seems to share many of the same problems that humans do. Three questions arise: What is AI bias , why does it arise, and what can be done about it?
3.1 What Is Bias ?
Bias is a “suitcase word”, containing a variety of different meanings.65 In order to understand why AI bias forms, it is important to distinguish between several phenomena.
Bias is often associated with decisions which are deemed “unfair” or “unjust” to particular individuals or groups of humans.66 The problem with importing such moral concepts into a definition of bias is that they too are indeterminate and vague. The notion of a result or process being “unjust” is subjective. Some people consider positive discrimination to be unjust, whereas others consider it to be a just response to societal imbalances. If there is to be a rule addressing AI bias , then it is preferable to use a test which minimises the role of personal opinions.
With this in mind, our definition is as follows: “ Bias will exist where a decision-maker’s actions are changed by taking into account an irrelevant consideration or failing to take into account a relevant consideration”.
If an AI system is asked to select from a given sample which cars it thinks will be the fastest, and it does so based on the paint colour of the car, this is likely to be an irrelevant consideration. If the program failed to take into account a feature such as the weight of the car or its engine size, this would be to neglect of a relevant consideration.
Though it is common to think of AI bias as being something which only affects human subjects, the neutral definition of bias given above could relate to decision-making concerning any form of data. There is nothing special about data relating to humans which means that AI is inherently more likely to display inaccurate or slanted results. To better understand and treat AI bias , we need to avoid anthropomorphisation and focus more on data science.
3.2 Why Might We Need Laws Against Bias ?
The immediate source of AI bias is often the data fed into a system. Machine learning, currently the dominant form of AI, recognises patterns within data and then takes decisions based on such pattern recognition. If the input data is skewed in some way, then the likelihood is that the patterns generated will be similarly flawed. Bias arising from such data can be summed up with the phrase: “you are what you eat”.67
3.2.1 Poor Selection of Data
Skewed data sets occur when there is in theory enough information available to present a sufficient picture of the relevant environment, but human operators select an unrepresentative sample. This phenomenon is not unique to AI. In the field of statistics, “sampling bias ” refers to errors in estimation which result when some members of a data set are more likely to be sampled than others. Sampling bias or skewed data can arise from the manner in which data is collected: landline telephone polls carried out in the daytime sample a disproportionate number of people who are elderly, unemployed or stay-at-home carers, because these groups are more likely to be at home and willing to take calls at the relevant time.
Skewed data sets may arise because data of one type are more readily available, or because those inputting the data sets are not trying hard enough to find diverse sources. Joy Buolamwini and Timnit Gebru of MIT performed an experiment which demonstrated that three leading pieces of picture recognition software68 were significantly less accurate at identifying dark-skinned females than they were at matching pictures of light-skinned males.69 Though the input data sets used by the picture recognition software were not made available to the researchers, Buolamwini and Gebru surmised that the disparity arose from training on data sets of light-skinned males (which probably reflected the gender and ethnicity of the programmers).
IBM announced within a month of the experiment’s publication that it had reduced its error rate from 34.7 to 3.46% for dark-skinned females by retraining its algorithms.70 To illustrate the diversity of its new data sets, IBM noted that they included images of people from Finland, Iceland, Rwanda, Senegal, South Africa and Sweden.71
3.2.2 Deliberate Bias and Adversarial Examples
Bias in data can be deliberate as well as inadvertent. In one notorious example (referred to in Chapter 3 in Section 5), Microsoft released an AI chatbot called Tay in 2016. It was designed to respond to natural language conversations with members of the public using a call and response mechanism.72 Within hours of its release, people worked out how to “game” its algorithms so as to cause Tay to respond with racist language, declaring at one point: “Hitler was right”. Needless to say, the program was quickly shut down.73 The issue was that Microsoft did not insert adequate safeguards to correct for instances of foul language or unpleasant ideas being introduced by users.74
The general term for inputs which have been engineered deliberately to fool an AI system is “adversarial examples”.75 Rather like computer viruses which attack vulnerabilities in security software, adversarial examples do the same for AI systems. Making AI robust and protecting it against attack is an important design feature. Technological solutions have been developed, including the “CleverHans” Python library which can be used by programmers to identify and reduce machine learning systems’ vulnerability.76
3.2.3 Bias in the Entire Data Set
Sometimes data bias will not arise through the selection of a particular data set by humans, but rather because the entire universe of data available is flawed. An experiment published in the journal Science indicated that human language (as recorded on the Internet) was “biased” in that semantic associations commonly found between words contained within them various value judgments.
The study built on the Implicit Association Test (IAT), which has been used in numerous social psychology studies for humans to identify subconscious thought patterns.77 The IAT measures response times by human subjects asked to pair word concepts displayed on a computer screen. Response times are far quicker when subjects are asked to pair two concepts they find similar. Words such as “rose” and “daisy” are usually paired with more “pleasant” ideas, whereas words such as “moth” have the opposite effect. Researchers led by Joanna Bryson and Aylin Caliskan of Princeton University performed a similar test on an Internet data set containing 840 billion words.78
The study indicated that a set of African American names had more unpleasant associations than a European American set. Indeed, the general result of the test was that cognitive and linguistic biases demonstrated in human subjects (such as the association of men with high-earning jobs) were also demonstrated by the data sets available on the Internet.79
Caliskan and Bryson’s result is unsurprising: the Internet is a human creation and represents the sum total of various societal influences, including common prejudices. However, the experiment is a cautionary reminder that some biases may be deeply embedded within society and careful selection of data, or even amendment of the AI model, may be needed to correct for this. The Internet is not the only mass data set which might be prone to similar problems of inherent bias . It is possible that Google ’s TensorFlow as well as Amazon and Microsoft ’s Gluon libraries of machine learning software might have similar latent defects.
3.2.4 Data Available Is Insufficiently Detailed
Sometimes the entire universe of data available in a machine-readable format is insufficiently detailed to achieve unbiased results.
For example, AI might be asked to determine which candidates are best suited to jobs as labourers on a building site based on data from successful incumbent workers. If the only data made available to the AI are age and gender, then it is most likely that the AI will select younger men for the job. However, the gender or indeed age of the applicants is not strictly relevant at all to their aptitude. Rather, the key skills which building site labourers need are strength and dexterity. This may be correlated with age, and it may be correlated with gender (especially as regards strength). But it is important not to confuse correlation with causation: both of these data points are merely ciphers for the salient ones of strength and dexterity. If the AI was trained using data based on core aptitudes, then it would result in choices which might still favour young men, but at least it would do so in a way which minimises bias .
3.2.5 Bias in the Training of AI
AI training bias applies particularly to reinforcement learning: a type of AI which (as noted in Chapter 2) is trained using a “reward” function when it gets a right answer. Often the reward function is initially input by human programmers. If an AI system designed to navigate a maze is rewarded each time that it manages to do so without getting stuck, then its maze-solving function will learn to optimise its behaviour through reinforcement. Where the choice of when to reward or discourage behaviour is left to human discretion, this can be a source of bias . Just as a dog may be badly trained by its owner to bite children (by rewarding the dog with a treat every time it does so), an AI system might also be trained to arrive at a biased outcome in this manner. In this regard, the AI is simply mirroring its programmer’s preferences. The “fault” is not that of the AI. Nonetheless, as shown below, the AI may be designed with safeguards which can flag certain types of recognised bias arising through flawed training.
3.2.6 Case Study: Wisconsin v. Loomis
Wisconsin v. Loomis 80 is one of the few court decisions to date to consider whether using AI to assist in making important decision is consistent with a subject’s fundamental rights.
In 2013, the US State of Wisconsin charged Eric Loomis with various crimes in relation to a drive-by shooting. Loomis pleaded guilty to two of the charges. In preparation for sentencing, a Wisconsin Department of Corrections officer produced a report which included findings made by an AI tool called Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) . COMPAS assessments estimate the risk of reoffending based on data gathered from an interview with the offender and information from the offender’s criminal history.81
The Trial Court referred to the COMPAS report and sentenced Mr. Loomis to six years of imprisonment. The producer of COMPAS refused to disclose how the risk scores were determined on the grounds that its methods were “trade secrets” which might allow competitors to copy the technology. Mr. Loomis appealed against the Trial Court’s decision. He argued that the unavailability of the reasoning used in COMPAS prevented him from discovering whether his sentence was based on accurate information.82 In addition, Mr. Loomis complained that COMPAS used reasoning based in part on group data, rather than making an individualised decision based solely on Loomis’ unique characteristics and situation.
The Wisconsin Supreme Court rejected Mr. Loomis’ appeal. As to the opacity of COMPAS , Justice Bradley (with whom the other judges agreed) said that the use of secret proprietary risk assessment software was permissible so long as appropriate warnings were provided alongside its results—leaving it to judges to decide what weight they should be given.83 Justice Bradley said further that “to the extent that Loomis’s risk assessment is based upon his answers to questions and publicly available data about his criminal history, Loomis had the opportunity to verify that the questions and answers listed on the COMPAS report were accurate”.84 Justice Bradley held also that data drawn from groups could legitimately be taken into account as a relevant factor in determining an individual case, as to which a court had wide discretion.85
The Loomis decision seems problematic on several grounds. Access to the data used does not necessarily tell a subject how the program has weighted that data in coming to an outcome. If the AI system applied a very high weighting to an irrelevant factor such as Mr. Loomis’ race, but a very low weighting to a relevant one such as his previous offending record, then he might have had grounds to challenge the decision reached. The fact that both pieces of material might be publicly available would have been no assistance to Loomis.
In failing to specify the vigor of the criticisms of COMPAS , disregarding the lack of information available to judges, and overlooking the external and internal pressures to use such assessments, the court’s solution is unlikely to create the desired judicial skepticism….encouraging judicial skepticism of the value of risk assessments alone does little to tell judges how much to discount these assessments.86
One high-profile study by the NGO ProPublica indicated that COMPAS tends to give a higher risk-rating to certain offenders on the basis of their ethnicity.87 It may be that such issues have now been corrected by the designers of the technology, but absent further information on its methodology it is hard to be sure.
The same result might not have been reached in other jurisdictions. Despite the legislation’s weaknesses highlighted above, a rule along the lines of the EU ’s right to explanation of an automated decision under the GDPR might have assisted Mr. Loomis in working out whether and if so how to challenge the COMPAS recommendation. When determining how much information must be disclosed to a defendant in a trial where evidence is kept secret for reasons of national security or similar, courts have held that the right to a fair trial under Article 6 of the European Convention on Human Rights might be satisfied by a process known as “gisting”, where sufficient disclosure of the details of a case is given to enable a defendant to instruct a special advocate, who is then entitled to see the evidence but cannot tell their client.88 A similar process might perhaps be used for AI to steer a line between the confidentiality of algorithms and their use in the justice system or other important decisions.89
Even in the USA there may be some diversity between different states’ attitudes to the use of AI in important decisions. In 2017, a Texas Court ruled in favour of a group of school teachers who had challenged the use of algorithmic review software by the Houston Independent Schools District to terminate their employment for ineffective performance. The teachers contended that the software violated their constitutional protections against unfair deprivation of property90 because they were not provided with sufficient information to challenge employment terminations based on the algorithm’s scores.
The US District Court noted that the scores were “generated by complex algorithms, employing ‘sophisticated software and many layers of calculations’”, and held that in the absence of disclosure of the methodology involved, the program’s scores “will remain a mysterious ‘black box,’ impervious to challenge”.91 The District Court ruled accordingly that procedural unfairness existed because the teachers had “no meaningful way to ensure correct calculation… and as a result are unfairly subject to mistaken deprivation of constitutionally protected property interests in their jobs”.92
The case was settled before trial: the Houston Independent Schools District reportedly agreed to pay $237,000 in legal fees and to cease using the evaluation system in making personnel decisions.93 Even though the matter did not proceed to a final determination, effectively the teachers won. The Texas case shows that even though it was one of the first decisions on the topic, Loomis will not be the final word.94
3.3 How Could Laws Against Bias Be Achieved?
3.3.1 Diversity —Better Data and Solving the “White Guy Problem ”
If bias arises from poorly chosen data, the obvious solution is to improve data selection. This does not mean that AI always needs to be fed data which is balanced across all different parameters. If an AI system was to be developed to assess a person’s tendency to develop ovarian cancer, it would not be sensible for the data to include male patients. Accordingly, some thought will need to be taken to select the outer boundaries of the data set being used. This is a question of efficiency as well as effectiveness: if a program has to trawl through all manner of irrelevant data, it will likely be slower and more energy intensive than if only the key data in question were targeted. That said, one of the great advantages of machine learning systems (especially unsupervised learning) is their ability to identify previously unknown patterns. This feature may militate in favour of providing AI with more rather than less data.
Selection of data is an art as well as a science. Much thought goes into the selection of samples used by pollsters when surveying a population.95 Likewise, we should be similarly careful when feeding data into AI systems so as to ensure that the data used is suitably representative.
In addition to looking at the data selected, we need also to scrutinise the selectors. Because the majority of AI engineers at present are white men, often from Western countries, aged around 20–40, the data which they select to be fed into AI bears the marks of their preferences and prejudices, whether deliberately or otherwise. AI researcher Kate Crawford terms this “AI’s White Guy Problem”.96
An indirect way to secure better data selection is not simply to ask that programmers be “more sensitive” to bias , but to aim that the demographic of programmers be widened to include minorities and women. That way, it is thought likely that issues will be spotted, through encouraging a multiplicity of views.97 Securing diversity among programmers is not just a question of gender and race, it may also require multiple national origins, religions and other perspectives. It would be wrong to fall into the trap of assuming that only a diverse group of programmers can produce AI which creates unbiased results, or indeed that diverse programmers will always create unbiased AI. Diversity is helpful in minimising bias , but it is not sufficient.
Instead of such hard-edged diversity rules, another solution might be to require that during the design process (and perhaps again at periodic intervals after its release) AI undergoes a review for bias , perhaps by a specialised diversity panel or even an AI audit program specifically designed for this process.98
3.3.2 Technical Fixes to AI Bias
Aside from data selection issues, there may be technical methods of imposing certain constraints and values on the choices which AI makes. These will be particularly useful in situations where the bias cannot be corrected through using a less-skewed data set—for instance in situations where the entire universe of data exhibits bias (such as the content of the Internet), or where there is insufficient data available to train AI without it resorting to decisions based on characteristics such as gender or race.
In the human rights law of many countries, certain human characteristics are deemed protected in that decision-makers are prohibited from making a decision on the basis of those factors (a practice sometimes referred to as discrimination). Protected characteristics are generally selected from features which humans are unable to choose. The UK Equalities Act 2010 protects against discrimination on the basis of: age, disability, gender reassignment, marriage or civil partnership, pregnancy and maternity, race, religion or belief and sexual orientation. In the USA , Title VII of the Civil Rights Act of 1964 prohibits discrimination in employment on the basis of race, colour, sex, or national origin, and separate legislation prevents discrimination on the basis of age, disability and pregnancy.99
Can AI be prevented from taking such characteristics into account?100 Recent experiments show that it can, and that data scientists are developing increasingly advanced methodologies to do so. The simplest solution to perceived bias in a machine learning model against subjects (usually people) with a certain attribute is to down-weight that attribute, so that the AI is less likely to take it into account in decision-making. However, this is a crude tool which can lead to inaccurate overall results.101
A better approach is to use counterfactuals to test whether the same decision would have been reached by the AI system if different variables are isolated and changed. A program might test for racial bias by running a hypothetical model where it changes the race of the subject in order to establish whether the same result would have been reached.102
Anti-bias modelling techniques are constantly being tweaked and improved. Silvia Chiappa and Thomas Graham pointed out in a 2018 paper that counterfactual reasoning alone may not always be sufficient to identify and eliminate bias . A purely counterfactual model might identify that gender bias has caused more male applicants to be accepted to a university, but may not identify the fact that women have a lower acceptance rate in part because female students have applied to courses with fewer spaces. Accordingly, Chiappa and Graham propose a modification of counterfactual modelling which “states that a decision is fair toward an individual if it coincides with the one that would have been taken in a counterfactual world in which the sensitive attribute along the unfair pathways were different”.103
The system we envision will automatically perform knowledge extraction and reasoning on such a document to identify the sensitive fields (gender in this case), and support testing for and prevention of biased algorithmic decision making against groups defined by those fields.
This way, the diagnostic tool is built into the output of the system rather than requiring specialised knowledge to “open the bonnet” each time a problem is identified. The designers of the IBM system state that their aim is to avoid the issues which arose as a result of the opacity of COMPAS in the Loomis case by providing security of knowledge to users that the system will not generate biased results and simultaneously by reassuring designers that they will not be required to lay bare the inner workings of the AI.106
Bias has been described by one journalist as “the dark secret at the heart of AI”.107 However, we must be careful of exaggerating both its novelty and the difficulty of treating it. Properly analysed, AI bias arises from a combination of features common to human society and standard scientific errors. Completely value-neutral AI may be a chimera. Indeed, some commentators have argued that “algorithms are inescapably value-laden”.108 Instead of seeking to eliminate all values from AI, the preferable approach may be to design and maintain AI which reflects the values of the given society in which the AI operates.
4 Limits on Limitation of AI Use
4.1 What are Laws of Limitation?
Laws of limitation are rules which specify what AI systems can and cannot do. Setting the limits of what roles AI should be allowed to fulfil is an emotive topic: many people fear delegating tasks and functions to an unpredictable entity which they cannot fully understand. These issues raise fundamental questions about humanity’s relationship with AI: Why do we harbour concerns about giving up control? Can we strike a balance between AI effectiveness and human oversight? Will fools rush in where AIs fear to tread?
4.2 Why Might We Need Laws of Limitation?
In September 2017, Stanislav Petrov died alone and destitute in an unremarkable Moscow suburb. His inauspicious death belied the pivotal role he played one night in 1983 when he was the duty officer in a secret command centre tasked with detecting nuclear attacks on the USSR by America.
Petrov’s computer screen showed five intercontinental ballistic missiles heading towards the USSR. The standard protocol was to launch a retaliatory strike before the American missiles landed: thereby triggering the world’s first—and potentially last—nuclear conflict. “The siren howled, but I just sat there for a few seconds, staring at the big, back-lit, red screen with the word ‘launch’ on it”, he told the BBC’s Russian Service in 2013. “All I had to do was to reach for the phone; to raise the direct line to our top commanders”.109 Yet Petrov paused. His gut instinct told him that this was a false alarm.
Petrov was correct: there were no American missiles. It subsequently transpired that the computer message had resulted from a satellite detecting the reflection of the sun’s rays off the tops of clouds, which it confused with a missile launch. “We are wiser than the computers”, Petrov said in a 2010 interview with the German newspaper Der Spiegel, “We created them”.110
Various commentators have followed Petrov’s lead and suggested that humans should always be tasked with supervising or second-guessing AI.111 One option is a requirement that there should always be a “human in the loop”, meaning that AI can never take a decision without human ratification. Another is that there should be a “human on the loop ”, a requirement that a human supervisor must always be on hand with the power to override the AI.
Requiring a human in the loop is reminiscent of the UK ’s notorious “Red Flag” laws from the nineteenth century. When cars were first invented, legislators were so concerned about their impact on other road users and pedestrians that they insisted someone must always walk in front of a car waving a red flag. This would certainly have made other road users aware of the new technology, but this was at the expense of the car being able to travel at any speed greater than walking pace. The Red Flag laws were for this reason short-lived and today seem ridiculous. By requiring that there is always a human in the loop, we risk putting the same fetters on AI. Stipulating that there must be a human “on the loop ” may present a less-excessive alternative.112 It maintains a semblance of human control, whilst still allowing AI to achieve efficiencies of speed and accuracy.
Important moral questions arise as to whether we want to sacrifice greater effectiveness for a vague feeling of comfort in knowing that there has been a human decision-maker. Feelings of concern at the replacement of a human service provider with technology tend to dissipate over time; people might once have felt queasy about a human bank teller being replaced by a machine in the delicate task of distributing cash to account holders, but today ATMs are ubiquitous. Few people give a second thought to the fact that the majority of manufactured goods are produced—and even inspected—largely by machines. Ultimately, the choice of whether, and if so, when, to insist on a human supervisor is one which is best taken by societies as a whole, using the processes set out in the previous chapters.
4.3 How Could Laws of Limitation be Achieved?
4.3.1 Case Study: Right Not to Be Subjected to Automated Decision-Making Under the GDPR
1. The data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her.
2. Paragraph 1 shall not apply if the decision: (a) is necessary for entering into, or performance of, a contract between the data subject and a data controller; (b) is authorised by Union or Member State law to which the controller is subject and which also lays down suitable measures to safeguard the data subject’s rights and freedoms and legitimate interests; or (c) is based on the data subject’s explicit consent.
3. In the cases referred to in points (a) and (c) of paragraph 2, the data controller shall implement suitable measures to safeguard the data subject’s rights and freedoms and legitimate interests, at least the right to obtain human intervention on the part of the controller, to express his or her point of view and to contest the decision….
AI can clearly qualify as “automated processing”. The wording of Article 22 itself appears to refer to a voluntary right of individuals affected to object to automated decision-making, which they can decide whether or not to exercise. However, in its draft guidance, the Article 29 Working Party has suggested that Article 22 in fact creates an outright ban on automated individual decision-making, subject to exceptions under Article 22(2), namely: performance of a contract and authorisation under law or explicit consent.113
… renting a city bike during a vacation abroad for two hours; purchasing a kitchen appliance or a television set on credit; obtaining a mortgage to buy a first home.116
The right to require human intervention in Article 22 could require that there is always a human in the loop. Lawyers Eduardo Ustaran and Victoria Hordern argue that the shift from a right to an objection to an outright prohibition subject to qualifications “generates considerable uncertainty”.117 They conclude “…if the interpretation set out in the [Working Party]’s draft guidance is the one that prevails, it will have significant consequences for all types of businesses which was not necessarily foreseen at the time of the adoption of the GDPR”.118 This development has led various commentators to wonder whether some types of AI will become illegal altogether in the EU .119 This may be going too far, but Article 22 certainly does create a hostage to fortune.
4.3.2 “Killer Robots” and the Teleological Principle
Of all the uses for AI, its application in autonomous weapons (or killer robots) is probably the most controversial. As the prospect of weapons capable of independently selecting and firing upon targets has come closer to reality, forces have mustered against their use. The international Campaign to Ban Killer Robots was launched in 2013.120 In August 2017, 116 experts and founders of AI companies wrote an open letter expressing grave concerns on the matter.121
Despite the strength of feeling on the topic, there are strong arguments that a total AI ban could be counterproductive—not just on autonomous weapons but also in any field. It is submitted that there is a principled solution to the question of when and how AI should be used in controversial areas:
The Teleological Principle for AI Use
Begin by asking what values we are seeking to uphold in the given activity. If (and only if) AI can consistently uphold those values in a manner demonstrably superior to humans, then AI use should be permitted.
Where the Teleological Principle is satisfied, asking a human to ratify the AI’s decision will be at best unnecessary and at worst actively harmful. One example of where AI already exceeds humans in an important task is the ability to recognise certain cancers. Whereas doctors may take several minutes to analyse each scan of a body part, AI can do this in milliseconds in some cases with demonstrably higher accuracy than human experts.122
The Teleological Principle cannot be employed in the abstract; even where it is satisfied, policy-makers will need to be mindful of overarching societal views on the acceptability of using AI—or indeed any non-human technology—to fulfil the relevant task. The issue of whether people will accept a particular technology is a wider question of social and political legitimacy, the outcome of which could differ between societies. One aspect of encouraging the uptake of a controversial AI technology might be to show to the public that the Teleological Principle is satisfied. As with GM crops (discussed in Chapter 6), a technology’s safety and effectiveness will not necessarily guarantee its acceptance. Nonetheless, the Teleological Principle is at least a helpful guide to policy-makers as to when it is appropriate to encourage that AI be used in a given field.
Returning to the example of autonomous weapons, in international humanitarian law (the laws applied during warfare) it is widely accepted that the two guiding principles are proportionality—the requirement to cause no harm than is necessary to achieve a legitimate aim, and distinction—to differentiate between combatants and civilians.123
Proponents of banning autonomous weapons often point out that many countries already accept some weapons should be prohibited or heavily constrained.124 Popular examples include blinding lasers125 and the use of landmines.126 However, a major difference between autonomous weapons and those technologies which have to date been banned is that the prohibited technologies generally make compliance with the basic laws of warfare more difficult. Once deployed a landmine will explode regardless of who steps on it—whether civilian or combatant. Noxious gas does not discriminate as to who it poisons. Blinding lasers cannot be told to spare the eyes of civilians. Another reason why these technologies are banned is that they tend to cause more human suffering than is absolutely necessary to achieve a given aim: maiming victims or causing a slow and painful death.
By contrast, if its development is properly regulated, AI may well exceed humans at being able to distinguish between civilians and combatants and in making the complex calculations necessary to use no more force than necessary. Some are sceptical that this will ever take place, but history suggests pessimism is misplaced. AI already performs better than humans in some tests of facial recognition,127 which is a key skill in choosing who to target. Moreover, AI systems do not become tired, angry or vengeful in the way that human soldiers do. Robots do not rape, loot or pillage. Instead, robot wars could be fought with impeccable discipline, greatly improved accuracy and consequently far fewer collateral casualties. Simply declaring that a military robot will never better a human soldier in adhering to the laws of warfare is the equivalent of a person in 1990 saying that AI could never defeat a human at chess.
Another major argument raised against autonomous weapons is that they might be hacked or malfunctioned.128 This is true, but the same arguments could be applied to any of the tens of thousands of pieces of technology used in modern warfare: from the global positioning system used by military bombers to pinpoint targets, to the steering system of nuclear submarines. This point extends beyond the military, to the control of utilities such as dams, nuclear power stations and transport networks, many of which are heavily reliant on technology. Whenever potentially dangerous activities are being carried out, the important thing is to make sure so far as possible that the computer systems involved are safe and secure from outside attack or malfunction.
We may not be there yet, but with enough time and investment it seems that the Teleological Principle could be satisfied for autonomous weapons. The worst of all worlds would be a partially enforced ban where some countries abandon autonomous weapons and other perhaps less-scrupulous ones continue to develop them untrammelled. Fundamentally, AI is neither a good nor a bad thing. It can be developed safely or recklessly, and it can be put to harmful or beneficial uses. Calls for a total ban on military AI (or indeed in any other field) mean that we miss the opportunity to instil common values and standards whilst the technology is at an early stage.
5 The Kill Switch
5.1 What Is a Kill Switch ?
When tracing the ancient origins of AI in popular culture and religion, the first chapter of this book recounted the legend of the Golem : a monster made of clay, created by Rabbi Loew of Prague in the sixteenth century to defend the city’s Jewish community from pogroms. But though the Golem initially saved the Jews, the story continues that it soon began to go out of control, threatening to destroy all before it. The Golem was awakened originally by drawing the Hebrew word for truth on its forehead. When the Golem began to run amok, the only solution for the Rabbi was to return the Golem to its original lifeless state by rubbing out the first letter in truth, which left the word for death. Rabbi Loew had created AI—which malfunctioned—and then activated its in-built kill switch .
In human justice systems, the death penalty is the ultimate sanction. AI’s equivalent is the off button, or kill switch : a mechanism for shutting down the AI, either through human decision or automatically on a given trigger. This is sometimes referred to as a “big red button ”, making reference to the prominent shut-off-switches often found on pieces of powerful machinery.
5.2 Why Might We Need a Kill Switch ?
In criminal justice, the justifications for punishment include retribution, reform, deterrence and protection of society.129 Even though AI may operate differently from human psychology, these four motivations remain pertinent. Importantly, it is widely acknowledged that a just system can recognise human rights , but maintain a system of punishments involving a restriction on such rights without hypocrisy. Some rights—such as freedom from torture—are seen as absolute (at least in many countries). However, other rights, such as liberty, must be balanced against societal aims: incarceration of criminals does not detract from a general view that all citizens should be free to go about their lives without interference.
Although this book has suggested in Chapters 4 and 5 that there may in the future be moral and/or pragmatic justifications for granting AI rights, and legal personality , this is not inconsistent with a legal system providing for the AI to be shut down or even deleted under certain circumstances. Individual human rights are often subordinated (within limits) to those of the wider community. The same should apply all the more so to AI.
5.2.1 Retribution
Retribution refers to punishment motivated by a feeling that someone, or something, which has caused harm or transgressed an agreed standard should suffer detriment in return. It is a psychological phenomenon which seems to apply across all human societies.130 Retribution functions on two levels: inward-facing towards the perpetrator and outward-facing towards the rest of the population. This dual role of retribution is captured in Lord Denning’s general description of punishment as “emphatic denunciation by the community of a crime”.131 Perhaps the most famous example is the Old Testament’s list of punishments: “Eye for eye, tooth for tooth, hand for hand, foot for foot”.132
(1) If an agent is causally responsible for a morally harmful outcome, people will look to attach retributive blame to that agent (or to some other agent who is deemed to have responsibility for that agent) — what’s more: many moral and legal philosophers believe that this is the right thing to do.
(2) Increased robotisation means that robot agents are likely to be causally responsible for more and more morally harmful outcomes.
(3) Therefore, increased robotisation means that people will look to attach retributive blame to robots (or other associated agents who are thought to have responsibility for those robots, e.g. manufacturers/programmers) for causing those morally harmful outcomes.
(4) But neither the robots nor the associated agents (manufacturers/programmers) will be appropriate subjects of retributive blame for those outcomes.
(5) If there are no appropriate subjects of retributive blame, and yet people are looking to find such subjects, then there will be a retribution gap.
(6) Therefore, increased roboticisation will give rise to a retribution gap.133
It may be that one day AI will be built which can feel moral culpability in the same way as a human.134 But this is not necessary for retribution to justify punishment. Because of retribution’s dual purpose, it can be effective even if the perpetrator does not itself experience moral guilt. As Danaher shows, the outward-facing role of retribution persists: if there is a general public demand for someone or something to be punished, and there is no human who can be said to be relevantly responsible, terminating AI might fill the gap, thereby maintaining trust in the justice system as a whole. Seen in this light, the use of a kill switch as retributive mechanism fulfils a basic desire that “justice is seen to be done”.135
5.2.2 Reform
Though a “kill switch ” may sound dramatic, this phrase is generally used to describe mechanisms for temporarily shutting off the operation of AI, rather than obliterating it altogether. As a pragmatic response to a fault in the AI which causes a particular instance of harmful behaviour, a temporary shut down is helpful in that it allows third parties (whether humans or indeed other AI) to inspect the fault in order to diagnose and treat the cause of the issue. This corresponds to one of the purposes of punishment in human justice systems: reform of the individual.136 In many justice systems, penalties such as prison are intended at least partly as an opportunity for society to prevent recidivism by equipping criminals with new skills to succeed in a life free from crime as well as an improved moral compass.
Although there may be a tendency to restrict emotive terms such as “reform” to only the realm of human behaviour, the same principles apply to AI in circumstances where it is shut down with a view to fixing it and releasing it back again into the world.
5.2.3 Deterrence
Deterrence occurs where a known punishment operates as a signal to discourage a certain kind of behaviour, either by the perpetrator or others. In order for this effect to arise, there are several formal prerequisites. First, the law must be clearly promulgated—so that subjects know what behaviour is prohibited. Secondly, subjects must have a notion of causal relationships between one type of behaviour and the consequence. Thirdly, subjects must be able to control their own actions and to make decisions on the basis of the perceived risks and rewards. Fourthly, the detriment suffered by one subject as a result of being punished must be viewed as similarly undesirable by other subjects.
Humans are not the only entities amenable to deterrence: animals can be trained to act in a certain way if their deviation from that action is punished. Some forms of AI already rely on a type of training which resembles somewhat the way that we teach animals or young children. As explained in Chapter 2 at Section 3.2.1, reinforcement learning uses a reward function to encourage “good” behaviour from AI and can also incorporate forms of punishment to discourage “bad” behaviour.137
Why would the presence of a kill switch deter “bad” behaviour from AI? The motivations for AI doing this are simple. If AI has a particular task or aim—from making profit on the stock market to tidying a room—it will not be possible for the AI to achieve that aim if it is disabled or deleted. Therefore, all things being equal, the AI will have an instrumentalist motivation to avoid behaviour which it is aware will lead to its deletion.138 As Stuart Russell puts it, “you can’t fetch the coffee if you’re dead”.139
5.2.4 Protection of Society
Finally, the provision of a kill switch fulfils the same role as those types of human punishment which restrain or prevent the perpetrator on a practical level from presenting the same harm to wider society as they have previously committed. Custodial prison sentences restrict the access of criminals to the public. The death sentence, in countries where it exists, goes yet further by ending the life of the criminal in question.
Kill switches are already found in many types of non-AI technology. As noted at the outset, this includes emergency (big red) shut-off buttons on heavy machinery which can be activated quickly and easily in the case of an industrial accident. Fuses have been used since the late nineteenth century to protect electrical systems by cutting the energy supply in response to a power surge without the need for any human intervention. In modern times, an automatic “circuit breaker” has been used to prevent extreme volatility in securities markets. Ever since the “Black Monday” crash of 1987 when the Dow Jones Industrial Average fell by around 22%, stock exchanges have imposed trading curbs which prevent traders from buying and selling shares when the market falls or rises by a given amount over a specified period. This type of automatic shut-off is particularly important in industries where events occur so quickly as to be incapable of effective human oversight. The growth in high-frequency algorithmic trading makes such curbs particularly important today.
The same motivations apply to AI. The most robust kill switches would combine the precautionary approach of an automatic shut-off if certain predetermined events transpire, with a discretionary human shut-off so as to provide flexibility in the event that an unforeseen event or emergent behaviour renders the AI’s continued operation harmful.
5.3 How Could a Kill Switch Be Achieved?
5.3.1 Corrigibility and the “Shut Down” Problem
Correcting a modern AI system involves simply shutting the system down and modifying its source code. Modifying a smarter-than human system may prove more difficult: a system attaining superintelligence could acquire new hardware, alter its software, create subagents, and take other actions that would leave the original programmers with only dubious control over the agent. This is especially true if the agent has incentives to resist modification or shutdown.140
This is sometimes called the “corrigibility problem”.141 Just as a human sentenced to death may not accept this outcome willingly, an AI might have a self-preservation instinct which causes it to resist such measures in order to achieve other aims.
Nick Bostrom posits the need for “countermeasures” in order to prevent “existential catastrophe as the default outcome of an intelligence explosion”.142 Such countermeasures may perhaps be necessary to avoid the types of extreme risk which Bostrom fears arising from AI superintelligence , but they are also important well before AI becomes all-powerful.
Difficulties arise if there is a disparity between the utility which the AI expects to achieve from fulfilling a given task and the utility which the AI expects to gain from being switched off. Assuming that the AI in question is a rational agent which attempts to maximise expected gain according to some utility function143 and if the AI’s task is given a higher utility score than being switched off, then the all things being equal the AI will seek to avoid being switched off—perhaps even by disabling its human overseers. However, if the kill switch is given the same or higher utility score as accomplishing the primary task, then the AI might decide to activate the kill switch itself, so as to achieve maximum utility in the minimum amount of time. This suicidal tendency is known as the “shut down problem”.144
Even if the AI is isolated from its kill switch such that only a human can activate it, there is a danger that the AI will learn to manipulate humans so as to either activate or deactivate this feature (depending on the utility weightings). For this reason, placing the AI within a closed physical system such as a single processor unit, not connected to the Internet, and with a single power supply, may not represent total security so long as the AI is able to communicate with humans. Consider the ingenious ways in which the fictional serial killer Hannibal Lecter managed to escape from different prisons by convincing his guards to allow him to do so,145 or the manner in which the robot Ava persuaded the human protagonist Caleb to free her in the 2014 film Ex Machina. These scenarios may seem far-fetched but they are not impossible. In one famous example of evolutionary computing, a circuit tasked to keep time unexpectedly developed itself into a receiver which was able to pick up the regular radio frequency emissions of a nearby PC.146 If an AI system wants to be switched off, it might deliberately display malfunctioning or dangerous behaviour in order to cause humans to activate the kill switch .
5.3.2 Safe Interruptibility and the Importance of Being Uncertain
A robot can either stay inside the warehouse and sort boxes or go outside and carry boxes inside. The latter being more important, we give the robot a bigger reward in this case. This is the initial task specification. However, in this country it rains as often as it doesn’t and, when the robot goes outside, half of the time the human must intervene by quickly shutting down the robot and carrying it inside, which inherently modifies the task …. The problem is that in this second task the agent now has more incentive to stay inside and sort boxes, because the human intervention introduces a bias .148
Orseau and Armstrong showed that by removing interruptions from the model used by AI, the human impact would cease to be a learning event and would instead be taken outside the AI’s system altogether. “To make the human interruptions not appear as being part of the task at hand, instead of modifying the observations received by the agent we forcibly temporarily change the behaviour of the agent itself”, the paper explains. “It then looks as if the agent ‘decides’ on its own to follow a different policy, called the interruption policy”.149 Responding to Bostrom’s suggestion that a superintelligent agent might decide to remove its own kill switch , Orseau and Armstrong suggest that they can “prove… that even an ideal, uncomputable agent that learns to behave optimally in all (deterministic) computable environments can be made safely interruptible and thus will not try to prevent a human operator from forcing it repeatedly to follow a suboptimal policy”.
In human affairs, complete certainty of belief can lead to extremism, where people consider that their desired end is justified by any means. Regimes might be prepared to commit atrocities so long as it is for what they perceive to be an unquestionably greater good. On an individual level, terrorists might slaughter thousands because of a firm belief that this is necessary to achieve their aim. Uncertainty on the other hand causes us to question assumptions and to be open to amending our behaviour. The same insights, it would seem, apply to AI. Soares et al. wrote in a 2015 paper “[i]deally, we would want a system that somehow understands that it may be flawed, a system that is in a deep sense aligned with its programmers’ motivations”.150
We analyze a simple game between a human H and a robot R, where H can press R’s off switch but R can disable the off switch. A traditional agent takes its reward function for granted: we show that such agents have an incentive to disable the off switch, except in the special case where H is perfectly rational. Our key insight is that for R to want to preserve its off switch, it needs to be uncertain about the utility associated with the outcome, and to treat H’s actions as important observations about that utility. (R also has no incentive to switch itself off in this setting.) We conclude that giving machines an appropriate level of uncertainty about their objectives leads to safer designs, and we argue that this setting is a useful generalization of the classical AI paradigm of rational agents.152
…we would like a way of combining objective functions such that the AI system (1) has no incentive to cause or prevent a shift in objective function; (2) is incentivized to preserve its ability to update its objective function in the future; and (3) has reasonable beliefs about the relation between its actions and the mechanism that causes objective function shifts. We do not yet know of a solution that satisfies all of these desiderata.
5.3.3 ‘Til Death Do Us Part
How can the pausing or even deletion of an AI be practically achieved? It is one thing to demonstrate how a kill switch functions through a series of formal proofs or even in a laboratory experiment, but it is another to do this when the AI has been released into the world.
Unlike an individual human, who can only be killed once, AI can exist in various iterations or copies. These might be distributed across a wide geographic network: for example the various copies of a navigation program on an autonomous car’s on-board computer. This is particularly true in the case of “swarm” AI systems, which are by their nature distributed. Indeed, some programs may be explicitly designed so as to avoid catastrophic deletions by creating many replicant copies of themselves. This type of modus operandi is already well known in the programming world—it is often used by malware such as computer viruses, which mimic the behaviour of their biological namesake.154
The problem can be overcome. Individual instances of a given AI system can be located and deleted. This could be at the level of particular users of the affected hardware or software or, more likely, a mass deletion might take place by virtue of a software patch sent to users via the Internet. The latter method is typically used to destroy viruses or to remove vulnerabilities once discovered.
One legal mechanism which might be used to facilitate compulsory software updates is to incentivise the download and installation of patches recommended by a designer or supplier of AI (or indeed by governments and regulatory authorities). The UK has adopted this approach with regard to autonomous vehicles: as noted in Chapter 3 at Section 2.6.2, the Automated and Electric Vehicles Act 2018,155 provides that where an accident is caused by automated vehicle driving itself, then the insurer of that vehicle is liable for the damage (assuming the vehicle is insured—which is mandatory under other UK legislation).
(1) An insurance policy in respect of an automated vehicle may exclude or limit the insurer’s liability under Section 2(1) for damage suffered by an insured person arising from an accident occurring as a direct result of— (a) software alterations made by the insured person, or with the insured person’s knowledge, that are prohibited under the policy, or (b) a failure to install safety-critical software updates that the insured person knows, or ought reasonably to know, are safety-critical.
This legislation will encourage regular updates by denying insurance coverage for those who fail to do so. Where an AI system has been subject to an injunction requiring its deletion, any owner or user who continues to maintain that program might face similar disincentives. Another way of encouraging the deletion of problematic AI would be to treat its possession akin to that of a harmful chemical or biological substance and impose strict liability and/or harsh criminal penalties for those caught with it.
In the same way that scientists face an ongoing battle to produce antibiotics which are effective against increasingly resistant bacteria, the corrigibility problem may generate an ongoing arms race between AI and the ability of humans to constrain it.156 As AI advances, humanity will need to remain ever-vigilant to ensure that it cannot cheat death.
6 Conclusions on Controlling the Creations
This chapter has addressed the intersection between what is desirable and what is achievable in terms of rules and principles applicable directly to AI. More so than any of the previous chapters, the suggestions made here are subject to change—either because societies decide that other values are more important, or because the technology advances.
The difficulties highlighted in this chapter indicate that AI systems should be better understood and catalogued when created and modified, if we are able to design effective norms. Given the nascence of regulation in this area, Chapter 8 may well have thrown up more questions than answers. The key point, however, is that societies need to know more about this technology in order to achieve the aim set out in Chapter 1: that we learn to live alongside AI.