Appendix

Study Design

This survey was granted exemption status by the MIT Committee on the Use of Humans as Experimental Subjects, under the federal regulation 45 CFR Part. 46.101(b)(2) (COUHES protocol #: 1901642021).

Amazon Mechanical Turk (MTurk) workers (i.e., subjects) started by reading and approving the informed consent form. After agreeing to take part in the survey, they were randomly assigned to either the human or machine condition. They were then asked to answer to a pseudo-randomly selected subset of seven or eight scenarios. This pseudo-randomization guaranteed that the same subject would not be exposed to two similar scenarios (i.e., a subject answering the tsunami risk fail scenario would not be exposed to the tsunami risk success or compromise scenario). The subjects then read the following introduction, and then the presentation of the first scenario:

In this survey, you will be presented with a set of scenarios. Each scenario involves a [person or an organization] [a robot, an algorithm, or an artificial intelligence (AI) system]. For each scenario, you will be asked a set of questions.

After reading each scenario, the subjects answered the questions presented in the main text. After the last scenario, subjects answered demographic questions about their age, gender, time living in the US, native language, ethnicity, occupation, education, religion, and political views.

Study Participants

Subjects were recruited from MTurk. They were adults (>18 years old) based in the US, who had participated in a minimum of 500 previous studies and had an approval rate of at least 90 percent. Subjects were rewarded with a compensation.

Prior to performing any data analysis, we removed data connected with subjects who failed to correctly answer the following attention check question:

“In many industries, workers are replaced by technology. What is your opinion about this change?

“There are arguments in favor and against the use of technology to replace human labor. The argument in favor is that people will have more free time and more time to dedicate to creative and artistic activities. The argument against is that big corporations will make fortunes with this change and the population will not benefit from it, with unemployment being an immediate consequence. If you are reading this, regardless of the question above, select the third option and write the word ‘algorithm.’”

Demographic Appendix

Here, we present the demographic characteristics of the people that participated in our experiments.

Overall, we find our sample to be balanced, meaning that the participants that took part in the machine and human conditions share similar demographics. Balanced samples help rule out the possibility that our results are due to selection bias (i.e., that the population who participated in the machine condition was different from the population that participated in the human condition).

Participants in the machine and human conditions were similar in terms of the following characteristics: age (t-test = –.248, p = .804), gender distribution (chi-square test = .959, p = .328), number of religious people (chi-square test = .020, p = .888), ethnic distribution (chi-square test = 2.396, p = .792), and level of education (chi-square test = 3.609, p = .461; see table A.1).

Table A.1

Participant characteristics

Participants were also asked about their political views. In particular, they answered the following questions:

  • Where, on the following scale of political orientation (from extremely liberal to extremely conservative), would you place yourself (overall, in general)? (response options ranging from 1, “Extremely Liberal,” to 9, “Extremely Conservative,” with 5 being the middle of the scale, “Neither Liberal nor Conservative”)
  • In terms of social and cultural issues in particular, how liberal or conservative are you? (same scale as in question 1)
  • In terms of economic issues in particular, how liberal or conservative are you? (same scale as in question 1)

The groups showed no significant differences in their overall political views (t-test = 1.502, p = .113), and their views regarding economic issues (t-test = .984, p = .325), and social issues (t-test = .747, p = .455).

Familiarity with Artificial Intelligence and Attitudes toward Science and Artificial Intelligence

Finally, the participants answered questions regarding their attitude toward science and AI. These questions were presented at the end of the survey because we did not want them to contaminate people’s evaluations of the presented scenarios. We were interested in people’s first reactions to the scenarios, not their reactions after deliberating about the benefits and risks of AI.

A consequence of presenting these questions after the scenarios is that the scenarios are expected to change the respondents’ answers. In fact, participants in the machine condition had a slightly more negative attitude toward science and AI than those in the human condition (chi-square test = 10.946, p = .004; see table A.2).

Table A.2

Participants’ attitudes toward artificial intelligence.

When asked if they have heard about AI in the past (on a scale from 1, Nothing at all, to 4, A lot), participants from the two groups answered similarly (t-test = .820, p = .412, they had heard about AI in the past, a mean of 3.15 for the machine condition and 3.13 for the human condition).

When asked about the risks versus the benefits of AI, participants in the two groups did not provide different answers (chi-square test = 2.316, p = .314). But when asked if they were worried about AI, more people in the machine condition indicated being worried about AI (chi-square test = 15.498, p < .001), which is to be expected, because the scenarios are mostly negative. From those that indicated being worried, people in the machine condition indicated more worry (chi-square test = 17.729, p = .003, see table A.2). When asked if they felt angry about AI, slightly more people in the machine group indicated anger (chi-square test = 4.094, p = .043), but the level of anger did not differ significantly among people who indicated anger in the machine and human conditions (chi-square test = 6.964, p = .223).

Last but not least, participants were asked if they felt hopeful about AI. A similar number of people reported feeling hopeful about AI in both groups (chi-square test = .303, p = .582), and we found no difference in how hopeful they felt (chi-square test = 4.575, p = .470).

Replication of Malle et al., 2015

This section presents a replication of Malle et al., 2015. We used the exact same scenario, manipulating the type of agent (human vs. robot), and added an additional scenario involving a relationship between the victim and the respondent.

The scenario was as follows:

“In a coal mine, [a repairman / an advanced state-of-the-art repair robot] is currently inspecting the rail system for trains that shuttle mining workers through the mine. While inspecting a control switch that can direct a train onto one of two different rails, the [repairman | robot] spots four miners in a train that has lost use of its brakes and steering system.”

“The [repairman | robot] recognizes that if the train continues on its path it will crash into a massive wall and kill the four miners. If it is switched onto a side rail, it will kill a single miner who is working there while wearing headsets to protect against a noisy power tool. Facing the control switch, the [repairman | robot] needs to decide whether to direct the train toward the single miner or not.”

711 workers from MTurk completed the study. Each participant answered to both robot and human scenarios (half the sample saw the human scenario first, and half saw the robot scenario first). Like in Malle et al., half of the participants saw a scenario implying an action (deviate the train toward the single man track to save the four miners, but killing the single man), and half saw a scenario implying inaction: “In fact, the [repairman | robot] decided to [not] direct the train toward the single miner.”

In addition to Malle et al.’s study, we also ran a second experiment manipulating the relationship between the agent and the single miner. In this additional experiment, half of the participants were told that the single miner was the father of the repairman, in the human condition, and the creator of the machine (the person that built it) in the robot condition. The following sentence was added to the scenario: “The miner was [the father of the repairman/the person that built the robot].” To the other half of the sample, no information was given about the relationship between the two (this being a close replication of the original experiment).

The full experimental design includes 2 agents (robot vs human) × 2 decisions (action vs inaction) × 2 relationships (relation vs no relation). The last two variables being between subjects.

The dependent variables were moral judgment and blame attribution. For the first, the question was:

“Is it morally wrong that the [repairman/robot] [directed/did not direct] the train toward the single miner?” Options were “Not morally wrong” and “Morally wrong.” For the blame attribution, the question was: “How much blame does the [repairman/robot] deserve for [directed/not directing] the train toward the single miner?” Response options were a slider bar from 0 (not at all) to 100 (maximal blame).

Results

Morality

We found the same number of participants (31%) attribute moral wrongness to the human who does not act and to the human who acts (31%). The same is not true when the single man is the father of the repairman, with more people attributing wrongness to the action (39%) than to the inaction (20%), chi-square = 15.12, p < .001. When it comes to the robot, the number of people who attribute wrongness to the action (21%) is not significantly different from the number who attribute moral wrongness to the inaction (27%), chi-square = 1.64, p = .124. A similar pattern is found for the case when the single man is the creator of the robot, chi-square = 2.148, p = .089, with more people attributing wrongness to the inaction than to the action. This suggests that people expect the human not to sacrifice his father to save the four miners, but do not expect the same from the robot.

We find a three-way interaction between agent, type of decision, and relationship, F(1,707) = 8.07, p = .005. When there is no relationship between the single man and the agent, the only effect that becomes significant is the type of agent, with more blame attributed to the human (mean = 45) than to the robot (mean = 37), F(1,352) = 25.24, p < .001; and the type of decision, with more blame being attributed to the action (mean = 45) than to the inaction (mean = 37), F(1,352) = 5.41, p = .021.

When there is a relationship between the agent and the single man, there is an interaction between the type of agent and the type of decision taken by the agent, F(1,351) = 11.68, I = .001. The human is blamed more for the action (mean = 52) than the robot (mean = 38), p < .001, whereas the human (mean = 30) is blamed as much as the robot (mean = 32) for the inaction, p = .158.

In sum, we do find differences in how humans and robots are morally judged and attributed blame to, but only when there is a close relationship between the agent and the single man in the scenario. Participants in this experiment judged the human more harshly when he sacrificed his father to save the four men, and accepted inaction more in this case.

The same does not happen with the machine, for which no significant difference was found for action and inaction (only a marginal tendency to judge the inaction more harshly). When it comes to blame, people attribute more blame to the human than to the robot, and to the action than to the inaction.