1 This is a case of a deep, confusing, and extraordinarily common mistake that E.T. Jaynes named the mind projection fallacy (Jaynes and Bretthorst, 2003). Jaynes, a physicist and theorist of Bayesian probability, coined ‘mind projection fallacy’ to refer to the error of confusing states of knowledge with properties of objects. For example, the phrase mysterious phenomenon implies that mysteriousness is a property of the phenomenon itself. If I am ignorant about a phenomenon, then this is a fact about my state of mind, not a fact about the phenomenon.
2 This story, although famous and oft-cited as fact, may be apocryphal; I could notfind a first-hand report. For unreferenced reports see for example, Crochat and Franklin (2000) or http://neil.fraser.name/writing/tank/. However, failures of the type described are a major real-world consideration when building and testing neural networks.
3 Bill Hibbard, after viewing a draft of this paper, wrote a response arguing that the analogy to the ‘tank classifier’ problem does not apply to reinforcement learning in general. His critique may be found in Hibbard (2006); my response may be found atYudkowsky (2006). Hibbard’s model recommends a two-layer system in which expressions of agreement from humans reinforce recognition of happiness, and recognized happiness reinforces action strategies.
4 This follows for the Landauer-Brillouin’s limit, the maximal amount of information you can process in any classical system dissipating energy E : Imax = E/(kT ln 2), where k is Boltzmann constant and T – working temperature.
5 This is usually true but not universally true. The final chapter of the widely used textbook Artificial Intelligence: A Modern Approach (Russell and Norvig, 2003) includes a section on ‘The Ethics and Risks of Artificial Intelligence’; mentions I.J. Good’s intelligence explosion and the Singularity; and calls for further research, soon. But as of 2006, this attitude remains very much the exception rather than the rule.
6 After this chapter was written, a special issue on Machine Ethics appeared in IEEE Intelligent Systems (Anderson and Anderson, 2006). These articles primarily deal in ethics for domain-specific near-term AI systems, rather than superintelligence or ongoing intelligence explosions. Allen et al. (2006, p. 15), for example, remark that ‘Although 2001 has passed and HAL remains fiction, and it’s a safe bet that the doomsday scenarios of Terminator and Matrix movies will not be realized before their sell-by dates of 2029 and 2199, we’re already at a point where engineered systems make decisions that can affect our lives.
However, the issue of machine ethics has now definitely been put on the map; though not, perhaps, the issue of superintelligent machine ethics, or AI as a positive and negative factor in global risk.