Reinforcement learning is an approach for teaching useful actions to software agents so as to maximize a cumulative reward. Famous early uses included maze solving and learning to play checkers, tic-tac-toe, and backgammon.

1951

REINFORCEMENT LEARNING

Reinforcement learning is reminiscent of a simple behavior observed in cats seeking rewards. In the early 1900s, the psychologist Edward Thorndike (1874–1949) placed cats inside boxes from which they could escape only by stepping on a switch. After some wandering, the cats would eventually step on the switch by chance, and the door would open, and the cat would be rewarded, for example, with food. After the cats learned to associate this behavior with the reward, they escaped with increasing speed, up to, eventually, a maximum rate of escape.

In 1951, cognitive scientist Marvin Minsky (1927–2016) and his student Dean Edmunds built SNARC (Stochastic Neural Analog Reinforcement Calculator), a neural network machine consisting of 3,000 vacuum tubes to simulate 40 connected neurons. Minsky used the machine to study a scenario involving a simulated rat navigating a maze. When, by chance, the rat made a sequence of useful moves and escaped from a maze, the connections corresponding to these moves were strengthened, thus reinforcing the desired behavior, facilitating learning. Other famous early examples of reinforcement learning devices include systems for playing checkers (1959), tic-tac-toe (1960), and backgammon (1992).

As implied by these examples, in its simplest definition, reinforcement learning (RL) is an area of machine learning that involves traversing a collection of states to receive a reward or to maximize a cumulative reward. The “learner” discovers which actions yield the highest reward by repeatedly testing actions. Today, RL is often combined with deep learning, which involves a large simulated neural network, often to recognize patterns in data. With RL, a system or machine can learn without explicit instructions; thus, machines like self-driving cars, industrial robots, and drones can develop and improve skills through trial and error and experience. One practical challenge of broadly applying RL is that it requires a large amount of data and practice simulations.