© The Author(s) 2021
J. Roberge, M. Castelle (eds.)The Cultural Life of Machine Learninghttps://doi.org/10.1007/978-3-030-56286-1_3

3. What Kind of Learning Is Machine Learning?

Tyler Reigeluth1   and Michael Castelle2  
(1)
Ethics & AI Chair, Université Grenoble Alpes, Grenoble, France
(2)
Centre for Interdisciplinary Methodologies, University of Warwick, Coventry, UK
 
 
Tyler Reigeluth (Corresponding author)
 
Michael Castelle

Introduction

At the outset of his 1803 lectures on pedagogy, Kant (1803/2007) famously stated that “The human being is the only creature that must be educated.” He explains that human nature is indeterminate and needs to be given care, discipline, and instruction; i.e., humans can only achieve their full potential by developing a second nature. We have since grown accustomed to framing this indeterminacy of human development in terms of debates around “nurture versus nature,” a dichotomous trope which Enlightenment philosophy can be seen as foreshadowing (Keller, 2008). Kant did not, of course, foresee that humans would one day build artificial entities that could themselves learn—but if he had, he might have argued that they would also need to be educated in a similar manner to humans. While the idea that learning machines require an “education” seems strange from the perspective of contemporary machine learning, such a notion is in fact present near artificial intelligence’s inception. Specifically, in the 1948 paper “Intelligent Machinery”—which would remain unpublished until the late 1960s—Alan Turing (1948/1969) acknowledges that learning by machines would, just as in the case of learning by humans, necessarily involve the social roles of teachers, peers, and the larger community:

Although we have abandoned the plan to make a ‘whole man’, we should be wise to sometimes compare the circumstances of our machine with those of a man. It would be quite unfair to expect a machine straight from the factory to compete on equal terms with a university graduate. The graduate has had contact with human beings for twenty years or more. This contact has been modifying his behaviour pattern throughout that period. His teachers have been intentionally trying to modify it. At the end of the period a large number of standard routines will have been superimposed on the original pattern of his brain. These routines will be known to the community as a whole. He is then in a position to try out new combinations of these routines, to make slight variations on them, and to apply them in new ways.

What Turing understood (i.e., the fundamentally social and situated nature of learning), contemporary approaches to machine learning (including the twenty-first-century revival of neural networks known as deep learning) have all but forgotten or neglected. Our claim in this essay is that we need a social theory of machine learning: i.e., a theory that accounts for the interactive underpinnings and dynamics of artificial “learning” processes, even those thought to be “unsupervised.” But in doing so, however, we must be careful not to resort to the term “social” as a self-explanatory term that is simply opposed to the “technical” and does not require further elaboration, for to do so would only reiterate a longstanding—and philosophically problematic, as pointed out by Simondon (1958)—antagonism between the realms of culture and technique. In other words, our attempt to sketch a social theory of machine learning should not be interpreted as simply another demonstration that the models learned by algorithms bear the mark of the humans who design and train them (Diakopoulos, 2014); nor are we simply advancing another claim that opening the algorithm’s “black box” will reveal the social norms embedded within its operations (Pasquale, 2015).

What we are interested in instead is how machine learning behaviors unfold and develop meaning and purpose as a socially recognizable form of activity (i.e., as Turing says above with respect to human-based education, “these routines will be known to the community as a whole”). That is, we want to approach machine learning with reference to, and in comparison with, our understandings of human learning in society. But to do so we must acknowledge that machine learning algorithms are not merely executors or implementors of prior or external social norms or knowledge; instead, their activity reshapes collective activity as much as it is shaped by it (Grosman & Reigeluth, 2019).

Developing such a social theory of machine learning is necessarily a critical enterprise—not in the sense of attacking claims of machine “learning” from the outside, but of establishing the conditions upon which claims of “machine learning” could be convincingly held. In order to do this, we will compare and contrast contemporary machine learning in practice with one of the most distinctive sociotechnical approaches to pedagogy and learning, namely, what has come to be called activity theory. This multidisciplinary perspective builds from the transformational works of the Soviet cultural-historical psychologists Vygotsky, Leont’ev, and Luria. Activity theory differs from American behaviorist and cognitivist theories of learning in that it does not depend on a dualism between behavior and mind, between individual action and sociocultural practice, or between action and interaction.1 We suggest that activity theory—aspects of which have also been taken up and expanded by other contemporary theories such as situated learning (Lave & Wenger, 1991)—can provide an essential, albeit general, framework for developing a social theory of machine learning.2

For the purposes of this essay, however, we will mostly be drawing on Vygotsky’s programmatic contributions around the processes of concept development and analogizing them to the classification practices of machine learning models. Vygotsky’s (1987) central idea is that concepts are not abstract content that is acquired, copied, or directly incorporated by an individual; instead, concepts develop through learning and through a process of generalization (p. 47)—a term which, as we will discuss below, is also used extensively in machine learning. For the learner, a concept has value and meaning by representing a solution to a lived problem, which may further enable the learner to solve future problems. Vygotsky thus offers a dialectical account of conceptualization in which learners and concepts develop together. The activity of learning drives the individual’s development forward, and both the individual and social meaning of concepts transform through that same learning activity. In other words, concept learning is a genetic (i.e., developmental and processual) social activity in which individuals take part in meaning-making and, in doing so, transform both themselves and society.3

While concept development and generalization are certainly not the only learning processes addressed by Vygotsky’s work, they do offer an ideal problem space to explore machine learning, insofar as the latter—in the form of object recognition models in computer vision, for example—is overwhelmingly presented in terms of being able to generalize the detection of higher-level concepts such as “giraffe” or “butterfly” in previously unseen images (Chollet, 2018). Vygotsky’s approach, as well as those sociocultural approaches to learning that grew out of it, can help renew our understanding of what it means to learn in ways other than the “optimization”-oriented approach that is generally used by machine learning practitioners. Our hope is that by applying socially oriented theories of learning to the field of machine learning, we can help provoke deeper discussions about what exactly we should expect from machine “learning.” For example, one central (yet largely implicit) normative premise that polarizes debates around machine learning is the idea that, ideally, machines would ultimately automate the process of learning because, after all, automation is what machines do best. This in turn implies that “full” machine learning will only be achieved when the machine learns “by itself” (in what is called unsupervised learning). Where does this fetish for self-sufficiency come from? Why should we expect that from machine learning techniques, however “deep” they may prove to be, when such a fundamental asociality is considered a contradiction in the context of human learning?4

A Brief History of Human Theories of Learning in Machine Learning and Artificial Intelligence

The recent enthusiasm on the part of researchers and the popular media for artificial intelligence represents the result of the increased relevance in recent decades of the field and methodology of machine learning (ML) and, specifically, deep learning (DL). While not all contemporary machine learning techniques are “deep” and/or connectionist in nature, it is the case that the particular methodologies of ML/DL—in which a model is trained and progressively evaluated on subsets of a large corpus of data, whether manually labeled or not—induces an experimental approach that diverges from previous AI regimes in the 1970s and early 1980s (Langley, 1988). Specifically, the importance of the term “learning” and its metaphors was at a low point in that previous era of cognitivism and “good old-fashioned AI” (GOFAI), in part due to well-known critiques—such as the attacks by Chomsky (1959) on Skinner and by Minsky and Papert (1969) on the Perceptron, which rejected behaviorist and connectionist approaches, respectively—that set the stage for an intellectual environment in which a framework largely (if not exclusively) devoted to rule-based symbol processing could become “the only game in town” (Sejnowski, 2018, p. 250).

The early-1980s Handbook of Artificial Intelligence describes how, after Minsky and Papert’s 1969 devastation of (single-layer) connectionism, “those … who continued to work with adaptive systems ceased to consider themselves AI researchers … [A]daptive-systems techniques are presently applied to problems in pattern recognition and control theory” (Cohen & Feigenbaum, 1982, p. 326). Instead, by the 1970s, according to the authors, AI researchers mainly “adopted the view that learning is a complex and difficult process and that, consequently, a learning system cannot be expected to learn high-level concepts by starting without any knowledge at all” (p. 326). This period, then, was characterized by a belief that tabula rasa-style systems that learned by example were infeasible; and indeed, the idea that the acquisition of new knowledge required an existing base of knowledge was a good fit with Chomsky’s “innate grammar” view of language. This was combined with a specific ideology, highlighted by the historian Cohen-Cole (2005), in which cognitive science and AI came to be unconsciously and reflexively based on a conception of intelligence inspired by the intellectual pursuits of cognitive scientists and AI researchers themselves (e.g., highly trained in the physical and mathematical sciences, good at chess, etc.).5 It was in part because the goals for AI were so lofty—and specifically constrained to a domain of symbolic-centric rationality—that starting from scratch seemed intuitively implausible (and we see this again today in the demands for “artificial general intelligence”).6

With cognitive science and AI practitioners in the United States considering themselves part of a revolution against the previous behaviorist tradition in psychology, which was in part characterized by a dependence on experiments on animals, it should not be surprising that there is little reference to theories of animal learning in those fields. There were, of course, some in cognitive science and AI who made explicit references to human theories of learning in their work, such as Seymour Papert, who studied under the Swiss psychologist and learning theorist Jean Piaget—best known in the United States as a proponent of a discrete, staged development of child intelligence—before joining MIT’s AI Lab (Boden, 1978; Papert, Apostel, Grize, Papert, & Piaget, 1963; Piaget, 1970). However, the cognitive science community remained largely indifferent or hostile to Piaget, a situation illustrated by a fizzled 1975 debate between Piaget and Chomsky in France (Piattelli-Palmarini, 1980).7 Even Papert (1980), in his work on the programming language Logo, described his main influence from Piaget as “a model of children as builders of their own intellectual structures” and his view of “Piagetian learning” as being equivalent to “learning without being taught” (p. 7)—emphasizing the AI field’s intrinsically asocial, individualist, and innatist perspective (Ames, 2018). References to Vygotsky were also, unsurprisingly, rare in the cognitive science and AI literature, in part perhaps because of an early, disparaging misreading of Vygotsky by the cognitivist philosopher Jerry Fodor, one of the more extremist representatives of the symbol-processing ideology of intelligence—who would later go on to attack connectionism in the late 1980s (Fodor, 1972; Leontiev & Luria, 1972).

R. S. Michalski, a Polish-American émigré at the University of Illinois, thus described this 1970s milieu as a time in which learning was “a ‘bad’ area to do research in” (Boden, 2006, p. 1047). However, by 1983, Michalski, Carbonell, and Mitchell (1983) introduced a collection of papers that would be seen as the first significant textbook on machine learning, though Michalski and his coeditors largely hewed to the hegemonic line regarding knowledge acquisition being a largely symbol-centric activity, requiring a base of preexisting symbolic knowledge. The nascent machine learning field instead distinguished itself by emphasizing not just deductive processes but those of analogy and induction, with the latter including under its umbrella the “learning by example” paradigm maligned by the GOFAI community (Carbonell, Michalski, & Mitchell, 1983a).8 Regardless, an awareness of developmental and/or socially oriented perspectives on learning is rather lacking in this era of machine learning literature, where the term “teacher” merely indicates the presence of labeled examples (Michalski, 1983)—the scenario now commonly known as supervised learning. This literature also frequently refers to the wholly individualistic concept of “unsupervised” learning or learning without a “teacher” or outside of society—a concept which, if applied to human learning, a theorist like Vygotsky would find very strange.

The lack of reflection on the relevance of existing theories of human learning to machine learning did not significantly improve with the reintroduction of connectionist methods in the mid-1980s (exemplified by the “Parallel Distributed Processing” or PDP volumes of [Rumelhart & McClelland, 1986]). However, the PDP authors were fully aware of the paradigm-shifting quality of their connectionist conception of learning:

In recent years, there has been quite a lot of interest in learning in cognitive science … All of this work shares the assumption that the goal of learning is to formulate explicit rules (propositions, productions, etc.) which capture powerful generalizations in a succinct way … The approach that we take in developing PDP models is completely different … [W]e do not assume that the goal of learning is the formulation of explicit rules. Rather, we assume it is the acquisition of connection strengths which allow a network of simple units to act as though it knew the rules. (McClelland, Rumelhart, & Hinton, 1986)

While this revival of connectionism was accompanied by some genuine technical improvements—e.g., the unsupervised learning techniques of the fully-connected “Boltzmann machines” (Ackley, Hinton, & Sejnowski, 1985) and a working backpropagation algorithm for real-valued hidden layers (Rumelhart, Hinton, & Williams, 1986)—any hint of a connection between this revised approach to neural learning and developmental theories of learning remained absent for a few years.9 This began to change in the early 1990s with a succession of texts coming from San Diego and the south of England, beginning with the cognitive scientist Jeffrey Elman at the University of California, San Diego, who popularized what we now know as the “simple” version of a recurrent neural network (RNN) in a paper suggestively titled “Finding Structure in Time” (Elman, 1990). Unlike the (single-layer or multi-layer) perceptron, in which input data is successively transformed by matrix multiplications and sigmoid functions, the RNN works with sequence data (such as sequences of words or phonemes), and each input is accompanied by the previous output, allowing a “hidden state” vector of values to evolve over time. The recognition of this “developmental” quality of RNN learning led to collaborations between Elman and the developmental psychologist Elizabeth Bates (Bates & Elman, 1993); a work bridging developmental child psychology with connectionism from a former Piaget student, Annette Karmiloff-Smith (1992), who had visited UCSD and worked with Bates and Elman; and an argument for an “epigenetic developmental interpretation” of “connectionist modelling of human cognitive processes” (Plunkett & Sinha, 1992). These authors would all come together for the volume Rethinking Innateness (Elman et al., 1996), which appears to mark the end of this brief developmental enlightenment in neural network research. This interesting intersection of connectionism and developmental psychology seemed to wane with the (second) decline of connectionism, induced in part by the rise of support vector machines (Cortes & Vapnik, 1995) and decision tree ensembles (Breiman, 2001a)—also forms of supervised learning, but ones which operated primarily on tabular data, as opposed to the sequence-like data that had inspired Elman. It would then be another two decades before the multilayered neural network would make its most recent resurgence (Cardon, Cointet, & Mazières, 2018).10

We contend, however, that the current deep learning “revolution” has already begun to inspire a new intellectual shift comparable to the impact and hegemony of the cognitivists, who were able to popularize an individualistic information-processing metaphor of the mind that ultimately influenced educational practice itself—as in the case of the educational psychologist Robert Gagné’s (1977) approach to cognitive processes (Ertmer, Driscoll, & Wager, 2003), as well as Papert’s constructionism approach (inspired by a partially asocial reading of Piaget’s constructivism). As such, we believe that it will once again become increasingly common to attempt to understand human learning with reference to the new generation of success stories of twenty-first-century AI, perhaps ultimately in the form of radical shifts in educational policy.11 But at the same time—and quite unlike the early 1990s—we are at a moment in which the sociotechnical embeddedness of these machine learning and deep learning systems, radically increased in scale and scope if not in their basic architectural underpinnings, are making their limitations increasingly salient to a wide population—and the audience for understanding the nature and source of those limitations is increasing. The stakes are now arguably higher for understanding what kind of “learning” machine learning—and deep learning specifically—really represents.

In the next three sections, we will describe three significant aspects of Vygotsky’s “cultural-historical” psychology that will, later on, help us address the implicit theory of “learning” in machine learning. First, we will discuss the intrinsically developmental and specifically sociogenetic qualities of learning in the work of Vygotsky and his followers; then, we will show how Vygotsky conceived of concept development in his psychological experiments with children, which will be useful because contemporary machine learning classifiers excel in recognizing objects in images or in learning vectorial “meanings” of words; and we will discuss how Vygotsky’s zone of proximal development highlights that one must always understand “learning” as it is instituted in specific educational systems with specific forms of instruction.

What Is “Social” About Learning?

Unlike in machine learning—where, for instance, the notion of “unsupervised” learning implies that it is possible to learn without a teacher—the study of human learning (sometimes referred to as pedagogy) explicitly acknowledges the necessity of some kind of teaching agent. However, there has for some decades been a debate in pedagogical thought that can be characterized, broadly speaking, as about whether the best learning model is the “guide by the side” or the “sage on the stage” (Ferster, 2014). The former tends to involve a naturalistic or spontaneist stance whereby children learn best when “left to their own devices.” Proponents of this “child-centric” approach claim that learning is optimal and also most enjoyable when unhampered by the limitations and constraints that rigid and homogenizing education systems impose through methods, techniques, and normative expectations; in short, a child’s best teacher is experience—a view prefigured by writers such as Rousseau (1762), Pestalozzi (1827), and Montessori (1912). Education in this case is about accompanying and guiding natural learning processes in a nonintrusive manner, namely by adapting content and methods to the learner’s developmental stage. The work of Piaget (1954) represents one of the clearest expressions of this position in which education is about making the world available to what the child can do at a given stage of its natural development.

Conversely, the “sage on the stage” perspective reflects what could be called an institutional stance, according to which the central site of learning is, and needs to be, a formal educational structure where individuals develop their cognitive abilities alongside their moral character (Hegel, 18081811/1984; Kant, 1803/2007). In this sense, and given human nature’s inherent indeterminacy, education is seen as instituting a certain kind of individual, as “molding,” “shaping,” or “structuring” individuals through their learning. This more “content-centric” approach tends to see education as the process through which culture is reproduced; and when viewed critically, as in Bourdieu and Passeron (1977/1990), additionally as a mechanism for the reproduction of social stratification in general. Although the uncritical version of this perspective is generally considered outdated or conservative when compared to the “child-centric” approach—and disparaged as “instructionism” by Papert (1993) and others—this perspective helps us foreground the fact that education is always, at least in part, about bringing into being a specific kind of individual living within a given culture, and, that education is possible; i.e., as Kant suggested, there is something in human development which is indeterminate and thus requires a process of education.

But as with most binary oppositions, this presentation is a caricature. Few thinkers or researchers in education theory or pedagogy would unilaterally call for one of these two approaches to pedagogy, and most would recognize that children need curriculum-based knowledge to learn as well as more open-ended situations in which their creativity and problem-solving skills are actively stimulated. But simply taking the middle road between two extremes does not lead us to a robust social theory of learning. Vygotsky’s cultural-historical psychology—in which the term “historical” refers to a focus on development at various interrelated biological, individual, and social levels—offers a serious basis for thinking of alternatives to this mired debate through focusing on the role of mediation in the construction of psychological and behavioral functions and processes (Engeström, 1999, p. 28).

Vygotsky developed a research program—cut short by his premature death—that his successors Leont’ev and Luria would carry on and expand (van der Veer & Valsiner, 1991). According to Vygotsky, behavioral and cognitive development through learning is not a process of socialization by which the individual becomes social (as is the case for Piaget); instead, the individual is itself the site of a process through which society is transformed (Brossard, 2012). This genetic and relational view of behavioral and cognitive development is Vygotsky’s way of expressing the Marxist doctrine of historical materialism, which tells us that we are not so much thrown into “the World” (as phenomenologists would have it) as we are thrown into determinate forms of social activity. As the Vygotsky commentator Roy Pea (1985) neatly summarizes, “our productive activities change the world, thereby changing the ways in which the world can change us.”

In this view, development is not simply about successive cognitive functions replacing or evicting one another as the individual accommodates to the pressures of communication and social life, but about transformations in the organization of cognitive functions as they relate to one another (Vygotsky, 1987, p. 175). Cognitive functions do not transcend the specificities of social activity, but instead modulate and reorganize themselves accordingly (Prot, 2012, pp. 308–309). These transformations, in turn, bring about new forms of cognitive activity. For instance, Vygotsky is not interested in concepts in and of themselves, but in how their meaning develops within social activities, on the one hand, and how the individual’s cognitive activity is transformed through the use of certain concepts, on the other.

This brings us to a fundamental aspect of Vygotsky’s sociogenetic theory of psychological development: all higher psychological concepts (such as, for example, the meanings of words) were at some point concrete social relations. As Roth and Jornet (2017) explain, drawing on Vygotsky’s 1934 book Thinking and Speech:

In [Vygotsky’s] approach, there is a primacy of the social. Whatever higher psychological function can be identified first was a social relation with another person. Vygotsky did not suggest that something (e.g. meaning, rule) initially existed in a social relation, something that the participants may have constructed together to be internalized by individuals. Instead, the higher psychological function is identical to that earlier social relation. (p. 106)

To illustrate this general idea, Vygotsky (1979) shows how a toddler’s spontaneous movements and gestures become the basis of a social activity in which meaning is progressively constructed through the mediating presence of an adult, which in turns paves the way for semiotic and linguistic mediations. Following Vygotsky’s account, the act of pointing is first an attempt to grab or reach for an object. Or rather, what matters is that this gesture is interpreted as such by an adult who responds by moving the object closer to the child or helping the child reach the object.12 The child’s spontaneous gesture becomes an indicative one that now appropriates the adult’s mediating presence, which will ultimately serve as a basis for the child being able to say “I want.” The toddler (putatively) reaching for an object is doing something more than failing to reach it. With the help of an adult, one could say that it is reaching a new developmental stage by performing actions it cannot yet do alone but some day will.

Correspondingly, when a child learns the word “apple,” it is not learning some fixed descriptor of a given object in reality, it is using the word as an instrument within a social activity. The word uttered by the child might mean “I want an apple” or “I’m hungry” or “I want to hold that red thing and throw it”; it takes the place of an entire chain of gestures and actions that have become relatively stabilized through interactions with others. Before ever becoming free-floating signifiers, words congeal and condense concrete social relations as they are used in activities. The density of social relation, the obstacles encountered in activities and the ways in which words, as instruments, help direct action and regulate behavior, are the basis upon which concepts are developed. We can begin to see that the way a child learns the meaning of “apple” may be quite significantly different, for example, from the way a convolutional neural network “learns” to detect the presence of apples in bitmap images, or the way a word embedding model learns the vectorial “meaning” of the word “apple” in high-dimensional space.

For Vygotsky, the development of “higher psychological functions” is essentially a mediated process, insofar as these functions involve signs as a constitutive dimension of their activity.13 Vygotsky’s understanding of the term “sign” is broad and not strictly limited to a Saussurean perspective where the sign incorporates a preestablished or fixed relationship between a signifier and a meaning (as in Saussure [1915]). In addition to all the culturally relevant mediations (i.e., language, maps, arithmetic, writing, plans, mnemotechnics, etc.) involved in social activities, for Vygotsky any object can potentially become a sign, or a “psychological instrument,” if it is integrated into action as a means of controlling one’s behavior and planning one’s actions (Friedrich, 2012, p. 261; Vygotsky, 1930/1997c).14

Vygotsky provides a striking illustration of this thesis in his critique of the Piagetian conception of the child’s internal or “egocentric” language. Whereas Piaget sees egocentric language as a transition between pre-social (“autistic”) language and “socialized” language, Vygotsky does not take the beginning of the development process to be pre-social. For Vygotsky, the language a child develops when they seem to be “speaking to themselves,” using words that are directed at no one in particular, as though gratuitously accompanying an activity (e.g., coloring, building with blocks), in fact corresponds to the process of reorganizing the activity. Talking over the activity is not then an idle accompaniment, aimed simply at telling oneself what one is doing, but a way of making sense of the forces or objects that resist the child’s activity, a way of coping with the world’s will, so to speak. Vygotsky (1987) gives the example of a child he studied who was drawing a tram and pressed too hard on the paper with his pencil, declared “broken,” and proceeded to take up a paintbrush to depict a broken tram car while continuing to talk to himself about the new scenario (p. 70). As Yves Clot (1997) comments, words are the instruments children use to think through the obstacles they encounter, but these instruments are not simply lying around waiting to be used; instead, their appropriation transforms the very activity within which they are taken up, thereby actively broadening the meaning they are supposed to “have”.

Conceptualization as Problem Solving and Meaning Transformation

Vygotsky focuses on concept development, rather than concept acquisition, as one of the central forms of mediation through which meaning emerges. In this sense, framing learning as a social activity does not merely imply that the learner progressively acquires socially constructed signifiers, but that a concept’s meaning actually develops through the social relations that underpin the learning process. As Vygotsky (1987) describes, “in the problem of interest to us, the problem of concept formation, [the] sign is the word. The word functions as the means for the formation of the concept. Later, it becomes its symbol’’ (p. 126). The meaning of a signifier is not presupposed nor is it intrinsically attached to a word; rather, it is the result of a dialectical process through which meaning develops socially.

Crucially, the correspondence between a word and the concept it enfolds is not learned once and for all but is itself developed through learning. A child can learn the word “tree” without having mastered “tree” as a concept. Vygotsky (1987) explains that “in itself, learning words and their connections with objects does not lead to the formation of concepts. The subject must be faced with a task that can only be resolved through the formation of concepts’’ (p. 124). Vygotsky is explicitly taking aim at the associationist belief that concepts somehow grow out of repeated associations between an object and a word, that particular traits are gradually superimposed to form a general concept or category that includes all the particular traits or qualities. Learning a concept, Vygotsky insists, is not about thickening the associations or increasing the number of connections. Instead, it implies a qualitatively different form of cognitive activity that is not reducible to quantitative reinforcement of associative connections (pp. 123–124). It isn’t that the child using the word “tree” has a half-baked or poor understanding of what a tree is and that when she reaches the developmental stage in which abstract concepts are mastered (i.e., adolescence) she will have a complete understanding of “tree.” For Vygotsky, a concept’s meaning is not self-contained but intimately connected to the activity within which it is used. A concept comes to life through learning as the subject is confronted with problems it is incapable, in its given developmental state, of resolving by itself.

Vygtosky’s insistence on the social nature of word meaning can be read as analogous to Marx’s critique of the fetishization of exchange-value over use-value, in which, e.g., a coat is valued as an abstract monetary quantity as opposed to an expression of the social nature of its use and the labor involved in its production and distribution. Specifically, Vygotsky (1987) is attacking contemporaneous approaches in psychology and linguistics for the way they reified meaning by matching a word to its signifier in the same way that Marx undermined classical economic theories of value by exposing the inherently social relationship economic value translates (pp. 162–163, 169–173). What Vygotsky is trying to get at is the genesis of meaning and concepts, as well as the activity this genesis entails. Traditional experimental setups in child psychology proceeded by isolating words from their sentences and contexts, thereby depriving the subject of its ability to effectively think through its activity. For example, as Vygotsky describes, “the experimenter selects an isolated word and the child must define it. This definition of the isolated word taken in a congealed form tells us nothing of the concept in action. It tells us nothing of how the child operates with the concept in the real-life process of solving a problem, of how he uses it when some real-life need for it arises” (p. 123).

By contrast, for Vygotsky, every word, insofar as it bears meaning, enfolds a generalization which cannot be understood as a preestablished relationship between a signifier and its meaning but must be studied as an “act of thought”: “The word does not relate to a single object, but to an entire group or class of objects. Therefore, every word is a concealed generalization. From a psychological perspective, word meaning is first and foremost a generalization. It is not difficult to see that generalization is a verbal act of thought; its reflection of reality differs radically from that of immediate sensation or perception” (Vygotsky, 1987, p. 47). One can contrast this notion of generalization to the simpler notion of generalization in machine learning, which merely indicates that large numbers of association tasks (e.g., labeling cats vs. dogs in a corpus of images) can lead to greater performance on unseen images, regardless of whether or not the model truly has a coherent high-level “concept” of a cat or dog. A convolutional neural network can be exceptional at distinguishing between bitmap images of cats and dogs, but it could tell you little about the relationship between cats and dogs in Western society.

Vygotsky would thus likely be dubious of this notion of generalization in machine learning; for him, a concept develops because it can help a learner solve a problem, and the learner must experience the need for a concept as it informs its activity. As he takes care in emphasizing, however, spontaneously solving problems as we happen to run into them is not enough for a concept to truly develop. Concepts emerge and come to matter for the learner because he or she was presented with tasks or problems to solve within certain situations. As Vygotsky (1987) explains:

Where the environment does not create the appropriate tasks, advance new demands, or stimulate the development of intellect through new goals, the adolescent’s thinking does not develop all the potentials inherent in it. It may not attain the highest forms of intellect or it may attain them only after extreme delays. Therefore, it would be a mistake to ignore or fail to recognize the significance of the life-task as a factor that nourishes and directs intellectual development in the transitional age. However, it would also be a mistake to view this aspect of causal-dynamic development as the basic mechanism of the problem of concept development or as the key to this problem. (p. 132)

Learning as Instituted in Specific Educational Systems: The Zone of Proximal Development (ZPD)

This leads us to a pivotal point of our argument: concepts are learned, but they also need to be taught. And this is particularly true, as Vygotsky points out, of non-spontaneous or scientific concepts, i.e., concepts that relate to systematic forms of knowledge (e.g., “gravity” as it forms a system of concepts in physics with “mass” and “force”) and specific problem-solving skills (e.g., solving a mathematical equation). Unlike spontaneous concepts which develop through the child’s daily activities and encounters, one cannot reasonably expect scientific concepts to develop spontaneously; they require certain social conditions and forms of activity typically found in schools or other educational institutions in which learners are confronted with specific types of problems and in which their motivation to develop and use concepts is actively stimulated and nurtured.

Concepts that are developed out of daily experience—such as money, to use a particularly notorious example—are often difficult to define (Brossard, 2012, p. 102; Simmel, 1907/2004). This does not imply that scientific concepts are completely distinct or disconnected from spontaneous ones. On the contrary, scientific concepts need spontaneous concepts even though they synthesize their contradictions on a different cognitive level (Brossard, 2012, p. 103). This entails that education is not so much about inculcating content as it is about mobilizing students’ previous experiences (inside and outside of school) into knowledge and skills they can use in new situations. Although critical of the spontaneist maxim that children learn best when left to their own devices and pedagogical intervention should be used parsimoniously, Vygotsky (1987) does recognize a kernel of incontrovertible wisdom in Tolstoy’s belief that “consciously transferring new concepts or word forms to the pupil is as futile as attempting to teach the child to walk through instruction in the laws of equilibrium” (p. 171).15 He explains:

The development of concepts or word meanings presupposes the development of a whole series of functions. It presupposes the development of voluntary attention, logical memory, abstraction, comparison, and differentiation. These complex mental processes cannot simply be learned. From a theoretical perspective, then, there is little doubt concerning the inadequacy of the view that the concept is taken by the child in completed form and learned like a mental habit. The inadequacy of this view is equally apparent in connection with practice. No less than experimental research, pedagogical experience demonstrates that direct instruction in concepts is impossible. It is pedagogically fruitless. The teacher who attempts to use this approach achieves nothing but a mindless learning of words, an empty verbalism that simulates or imitates the presence of concepts in the child. Under these conditions, the child learns not the concept but the word, and this word is taken over by the child through memory rather than thought. Such knowledge turns out to be inadequate in any meaningful application. (p. 170)

Vygotsky explicitly takes aim at what he considers to be an empirical and theoretical error: we cannot expect to get at learning either through personalization—i.e., simply matching “knowledge” with the child’s current state of development—or standardization, i.e., setting up a rigid learning plan that is supposed to correspond to the students’ developmental stages. In either case, the error is to think tautologically of learning as what can be learned alone given an individual’s developmental level. Ultimately, education should be about situating learning within social activities that demand more of the learner than what the learner can perform by themselves.

This leads us to one of Vygotsky’s (1987) famous concepts, the Zone of Proximal Development (ZPD), an idea that critiques the concept that a child inhabits a particular stage of development based on their present abilities in isolation:

If the gardener decides only to evaluate the matured or harvested fruits of the apple tree, he cannot determine the state of his orchard. Maturing trees must also be taken into consideration. The psychologist must [also] not limit his analysis to functions that have matured. He must consider those that are in the process of maturing. If he is to fully evaluate the state of the child’s development, the psychologist must consider not only the actual level of development but the zone of proximal development. (pp. 208–209)

For Vygotsky, instruction must aim to be just ahead of the child’s potential zone of development, keeping in mind that this potential includes not just what the child can do independently but also what the child can do with others. In this process, imitation is taken to be essential to learning and not as its perversion or inauthentic expression. “Instruction is possible,” he says, “only when there is a potential for imitation” (p. 211); as we will see below, this idea that an interactional form of imitation is potentially productive and not simply “bad” learning finds parallels in the recent ML technique of generative adversarial networks.

In this perspective, learning does not coincide with development, then, but rather pulls it forward. Children learn how to do more than they can do independently because of the social mediations embedded in activity, which entails that learning can only occur where the child is able to imitate what lies just beyond what it is capable of doing on its own.16 The ZPD is thus the site for those concrete social relations which for Vygotsky, as described above, become internalized as words, concepts, or other cognitive mediations which are subsequently deployed in new social situations.

This view of the intrinsically social quality of the learning process is seemingly at odds with the way institutionalized education tends to evaluate “true” or authentic learning by testing the agent’s ability to solve a problem on its own—be that without the help of peers, teachers, or even technical mediations—as well as the way models in machine learning are evaluated on their predictive performance in isolation. On a broader level, this points to the deep interrelation between the way AI in general and ML in particular mirror certain normative or even political stakes involved in defining what learning should be by producing a model of what we think learning is. Proponents of Vygotsky-influenced educational psychology instead argue that determining the learner’s proximal rather than actual developmental stage allows for a differentiated approach to education whereby a learner is confronted with problems that require it to move beyond what it can do by itself, but which are accompanied by the support of social and technical mediations (Moll, 1990). In addition, higher order “scientific” concepts, which can only be learned within formal educational settings, should not all be taught exactly the same way to every student. Making scientific concepts relevant involves having a grasp of each learner’s everyday or “spontaneous” concepts that they use to make sense of the world, which the newly acquired scientific concepts will in turn help organize in new ways. Vygotsky’s account of the ZPD allows for a form of “personalization” of content and rhythm that does not atomize the learner and recognizes that the very things it is learning are social by nature. When it comes to machine learning, this perspective points to the trouble the field has in evaluating model performance with abstract metrics in what would otherwise be highly social tasks such as translation and image generation, and it implies the need for a radical shift in our way of evaluating what should count as learning and what kind of activity it involves.

Is Machine Learning a Social Form of Learning?

We have seen how Vygotsky’s cultural-historical psychology provides an original understanding of what is social about learning; we have seen how he applies this to concept development in particular; and we have learned about the zone of proximal development. Now we can begin to ask: is machine learning—and deep learning in particular, which has over the last decade promised repeatedly to lead to a new paradigm of artificially intelligent agents—actually a kind of learning in Vygotsky’s sense?

We can consider a now-prototypical case of contemporary (connectionist) machine learning, that of the multilayer convolutional neural network classifier model in the lineage of those developed by Yann LeCun in the 1980s on the MNIST dataset or by Geoff Hinton’s students in the early 2010s using Fei-Fei Li’s ImageNet dataset.17 The models that are trained in these scenarios are said to have distinct architectures, a kind of morphology in which each “layer” consists of one or more matrices (if two-dimensional) or tensors (if three-dimensional or higher) (with corresponding bias vectors). These architectures generally do not change throughout the “lifespan” of the model and thus DL models can be said to have a highly restricted form of “development” that psychologists would call microgenetic: namely, the potential to modify, during the training process, their weight parameters, today sometimes numbering in the hundreds of millions or billions.18 (Like the field of architecture for buildings/physical structures, neural architectures also “evolve” in a phylogenetic way, but this is highly dependent on the actions of their [human] architects and the technical environment in which they both reside.19) As with all forms of supervised learning, these models update their weight parameters through the use of a simple loss function (e.g., categorical cross-entropy) and optimizer (e.g., stochastic gradient descent) on labeled training data; i.e., each example image is accompanied by a categorical label indicating that it represents a picture of a butterfly and not a picture of a tiger or 998 other objects.

So what kind of learning is supervised learning? Clearly it does not easily correspond to the “sage on the stage” or “guide by the side” caricatures mentioned above—we find no lecturer or dynamic accompaniment, only labels and the mechanical instructions of stochastic gradient descent. Supervised learning is instead what early machine learning researchers called “learning by example”—which is ideally not a rote memorization but a learning process most closely akin to the stimulus-response animal experiments of behaviorism. Specifically, supervised machine learning is more similar to the classical conditioning of Pavlov than the operant conditioning of Thorndike and Skinner, in which the animal subject has the freedom to engage in a variety of actions; the latter corresponds more closely with reinforcement learning.20 But if we appreciate supervised learning (as well as reinforcement learning) as essentially behaviorist—despite its complex ideological and technical underpinnings in a cognitivism that previously revolted against behaviorism as well as (in the case of deep learning) a connectionism that previously revolted against aspects of said cognitivism (Baars, 1986; Bechtel & Abrahamsen, 2002)—then we can ask: well, what did Vygotsky think about behaviorism?

Indeed, the conference talk that kicked off Vygotsky’s career in psychology in the mid-1920s was a critique of the Soviet genre of behaviorism known as reflexology specifically associated with Pavlov and the neurologist Vladimir Bekhterev—a field whose proponents believed that “all human conduct could be conceived as combinations of conditional reflexes” but which “deemed the scientific study of [subjective experience] impossible” (van der Veer & Valsiner, 1991, p. 41).21 Vygotsky saw this approach—which Bekhterev explicitly intended to supplant all other forms of psychology—as doomed to fail, especially if the ultimate goal involved an understanding of human consciousness. This was ironic, Vygotsky explained, because these researchers often had spoken interactions with their human subjects in practice, and speech for Vygotsky indeed represented a form of “reflexes” connected to the subject’s internal experience; but the reflexologists simply did not consider those interactions a valid form of behavioral data. (At the time, Vygotsky [1997a] instead believed that one might be able to understand higher psychological processes and even consciousness through these “reflexes of reflexes” [p. 79].)

This inattention to speech, in turn, meant that behaviorism—locked in as it was to the “stimulus-response” or “S-R” framework—was blind to the importance of mediation, which was a fundamental feature of Vygotsky’s later studies of child development. In this transitional stage of Vygotsky’s work, he and L. S. Sakharov performed experiments to determine how children learned to form concepts, given a set of differently colored and differently shaped objects along various dimensions that had been assigned nonsense names like “dek” (small and tall objects) or “mup” (large and tall objects) by the experimenters (van der Veer & Valsiner, 1991, p. 261). They argued that children specifically navigate from nonsensical “syncretic” groupings to arrangements into what are called “complexes”—in which groupings are based on objective features, but these features may be irrelevant to adults—and eventually to what Vygotsky then called “true concepts,” although this latter stage could only be reached by adolescents and adults.22 Vygotsky’s (1987) insight is that younger children thinking in complexes—specifically the highest form of complex, called the “pseudoconcept”—and adults thinking in concepts can nevertheless “establish mutual understanding and verbal interaction” (p. 145), despite the child only being able to conceive of, e.g., a word’s meaning as a “bundle of attributes or features” (Blunden, 2012, p. 231).23 Vygotsky (1930/1997c) referred to the transitions to complexes and then concepts as a process of generalization that went beyond pattern-matching and to the level of a scientific concept seen in a relational network with other concepts:

I will give an example. Let us compare the direct image of a nine, for example, the figures on playing cards, and the number 9. The group of nine on playing cards is richer and more concrete than our concept “9,” but the concept “9” involves a number of judgments which are not in the nine on the playing card; “9” is not divisible by even numbers, is divisible by 3, is 32, and the square root of 81; we connect “9” with the series of whole numbers, etc. Hence it is clear that psychologically speaking the process of concept formation resides in the discovery of the connections of the given object with a number of others, in finding the real whole. That is why a mature concept involves the whole totality of its relations, its place in the world, so to speak … the concept is not a collective photograph. It does not develop by rubbing out individual traits of the object. It is the knowledge of the object in its relations, in its connections. (p. 100)

We can contrast this distinction between complexes and concepts in Vygotsky with the way that deep convolutional models train on MNIST data, which involves correctly labeling scanned images of handwritten digits. Just as in Vygotsky’s examples, such a model can reliably emit the label ‘9’ when confronted with an image of a handwritten number nine, but it does not understand the concept of 9 as described above. This highlights the difference between generalization in Vygotsky and generalization in machine learning. For supervised machine learners, “generalization” merely means high accuracy in the model’s ability to label previously unseen input data. For Vygotsky, such a model has perhaps learned a pseudoconcept of the ten digits, but it has not learned to generalize to the true concept of a digit.24 From his perspective, even the supposedly “super-human” object classifiers of today do not have these scientific concepts—those acquired through explicit instruction guided by a teacher, like the term “square root.” (One can test this proposition today by asking a state-of-the-art generative text model like GPT-2 [OpenAI, 2019] to complete the phrase “The square root of 4 is,” a task at which it is typically unsuccessful.)

What are the greater implications of Vygotsky’s critique of behaviorism and his understanding of concept development on our understanding of machine learning? We can go back to our earlier characterization of machine learning as essentially the study and practice of microgenesis: a process of training that (in the case of contemporary AI), depending on the task, might take anywhere from an hour to a couple of months.25 And it is also, to some extent, a practice of phylogenesis in which new architectures are designed and “evolved” by human researchers. But what Vygotsky’s work shows is that machine learning is not about, in general, ontogenesis, i.e., the long-term development of a conscious individual, acting in the world, and “trained” by their social surroundings in both informal and formal ways. And machine learning research is also only tangentially concerned with sociogenesis, meaning the development of social groups and identities—although in many cases it is clearly implicated in such development, as in the influence of YouTube’s recommendation algorithm on political preferences (Ribeiro, Ottoni, West, Almeida, & Meira, 2020). We summarize this comparison in Table 3.1.
Table 3.1

The role of different developmental processes in a human development framework and in a machine learning “model development” approach

 

Time scale (approx.)

Human development

Machine learning

Microgenesis

Minutes to months

A given pedagogical lesson or psychological experiment

Training an ML model with a given loss function

Ontogenesis/sociogenesis

Days to decades (up to ~100 years)

The natural and sociocultural development of the individual

n/a (but see transfer learning and generative adversarial networks)

Phylogenesis/ethnogenesis

Up to hundreds to thousands/millions of years

Biological and cultural evolution of the human species

Technical and cultural development of new ML architectures

In our descriptions in the previous sections, it is clear that it is the complexities of microgenesis and ontogenesis—specifically the ontogenesis of the higher psychological functions and of true and/or scientific concepts—with which Vygotsky is most concerned. It is also relatively clear that, to a large degree, the natural and sociocultural development of the individual is not the kind of learning with which machine learning is concerned. This indicates that as long as the individual learning subject in machine learning is taken to be the ML model/architecture, someone like Vygotsky would not recognize it as a form of learning due to its lack of emphasis on the ontogenesis of a conscious individual, who develops over time in sequences of dialogical processes, each of which permit transitions from intermental to intramental functions (Vygotsky, 1997b, p. 106).

The technical exceptions to this ontogenetic lack in ML/DL, as noted in the table, are quite interesting in that our framing manages to surface the most intriguing and even underrated features and/or innovations of contemporary deep learning techniques: the ability to retrain or fine-tune certain types of models on new corpora of data that differ in some way from (and are often much smaller than) the original “pre-training” datasets, which is known as transfer learning (Pan & Yang, 2010; Pratt & Jennings, 1996), as well as distinctively interactional training architectures like those of generative adversarial networks (GANs). The transfer learning case demonstrates a limited way in contemporary machine learning in which trained models can have an ontogenetic “lifespan” beyond an individual training session by permitting the reuse of parameters learned over longer periods of training. In the case of GANs, a distinct discriminator model, D, and a distinct generator model, G, are initialized randomly, but D is gradually trained in a supervised manner to recognize, e.g., “real” images of human faces, and G repeatedly attempts to “fool” D with generated face images by learning from D’s assessment of its generated images’ plausibility (Castelle, 2020; Goodfellow et al., 2014). This training process differs from traditional supervised learning in that the discriminator model acts as a dynamic “instructor” of sorts, who (in the ideal GAN training process) is just slightly ahead of the student in its ability to determine the validity of the generated face images and can thus pull or guide the generator model forward. This can, arguably, represent the use of a kind of zone of proximal development within the otherwise sociogenetically degenerate world of machine learning techniques.

However, what if the subject of machine learning is not simply the model? If we instead perform an act of reconfiguration (Suchman, 2007) and situate the subject of machine learning as a human-machine activity—i.e., including the ML researcher, their models, and their associated tools and organizational resources—then we have a situation which permits closer analogies to Vygotsky’s cultural-historical approach. This is the approach taken by Adrian Mackenzie (2017) when he uses the phrase machine learners to refer “to both humans and machines or human-machine relations.” In this latter perspective, which observes the intrinsic technicity of human action and becoming—and thereby of ontogenesis and phylogenesis—we would instead argue that a supervised model that finds objects from the space of ImageNet categories permits the coexisting “user” of that model to ontogenetically improve their development of concepts. That is, as a socialized, technical human with an awareness (if not an expertise) in bird species, I can leverage deep learning classifiers to accelerate my own learning and ability to take action in new situations (such as birdwatching). In other situations, such leveraging of ML classification may be seen as hazardous; consider the court judge who, as a machine learner using an algorithm for pretrial risk assessment (Barabas, Doyle, Rubinovitz, & Dinakar, 2020), effectively “learns” a new scientific concept of “pretrial risk” and, by deploying their preexisting agency and authority, leverages it to take further action against specific groups. Vygotsky (1930/1997c) would consider such devices yet another psychological tool that individuals can use to master one’s mental processes; in this case “machine learning” would refer to the development of these psychological tools that extend the ontogenetic development of the machine learner. In order for machine learning to truly embrace theories of learning, such a reconfiguration may indeed be necessary.

Conclusion: Who Is Learning in Machine Learning?

The aim thus far has been neither to provide a comprehensive introduction to Vygotskian psychology nor a full critique of machine learning, but merely to emphasize three essential starting points for undertaking a social theory of machine learning. The first essential point is the idea that learning occurs when a problem is encountered and resolved by the learner, thereby transforming the problem space itself. To resolve a problem, a child uses instruments (in activity theory, semiotic and/or technical mediations) that are not simply extraneous to its activity but constitute and transform that activity on a much deeper level. When it comes to machine learning, this gives a novel perspective on what it means to evaluate an algorithm’s learning on an already-given problem space. From a Vygotskian point of view there is little sense in evaluating learning performances on ready-made problems without taking into account the activity within which the problem is encountered and resolved. Developmentally speaking, there is little sense in unleashing algorithms on abstract image classification or word substitution problems and hoping that their results will lead to a “generalization of generalizations” necessary for higher-level thinking (Vygotsky, 1987, p. 229). While some deep learning models are often held as examples of successful generalization, it is still the case that reinforcement learning models with superhuman performance on a majority of Atari games (Mnih et al., 2015) are still wholly limited to the world of Atari games and would be useless outside that domain. From a Vygotskyan perspective, the difficulty most machine learning techniques have in generalizing models beyond their given problem space is linked to the fact that practitioners tend to consider their problem spaces as equivalent and comparable when in fact they should be approached as qualitatively different learning experiences directed by the individual’s development. The excitement around “transfer learning” reflects an implicit recognition that AI, in its current form, is ontogenetically weak: its models can only “fine tune” and—unlike the child learner over the course of years of instructional microgenetic interactions—cannot undergo repeated reformations of past generalizations.

The second essential point is that learning is not something done on one’s own, but something that occurs in the space between what one can do on one’s own and what one can do with the help of another. In this sense, learning is always a differential experience and implies being presented with certain problems to resolve rather than others and not simply running into problems randomly. Again, Turing’s quote can serve as a valuable guide when he reminds us that the college graduate has spent twenty years or so having his or her behavior intentionally modified by teachers. One of the roles of education as an institution is to determine which problems are worth solving, i.e., which problems will have a value for the learner once they are solved, which will be truly transformative in shaping their cognitive development. From this standpoint the ideal or “optimal” learning process is not one where content is made available for a learner’s given developmental stage, but one where the learner reaches beyond their developmental stage through a series of sociotechnical mediations that are not mere crutches but make up the very stuff of learning. This, however, implies that we change our ethical and epistemological expectations by not only demanding optimization and automation of ready-made problems (i.e., evaluating based solely on objective functions), but by engaging in concrete learning processes as an educator would, that is an educator who is not simply a test administrator. One can see hints of this today in generative adversarial networks and even more in the experimental genres of developmental robotics that take ontogenesis and sociogenesis more seriously (Oudeyer & Kaplan, 2006).

The third essential point is that the words, concepts, or images that are learned are never fixed relationships between a signifier and signified but are in fact social relations that have been abstracted from their original activity. Crucially, this means that learning is not merely about acquiring abstract content, knowledge, or symbols and applying them to concrete situations, but it is about turning a concrete social relation into abstract technical and semiotic mediation that can be used in other situations to regulate thinking and behavior. In other words, this is where we see learning as an activity in and of itself and not only something that merely happens within a given social context; this is where we see learning transforming the world, to paraphrase one of Vygotsky’s primary inspirations. We argue, then, that in order to formulate a learning theory of machine learning, it may be necessary to move from seeing an inert model as the machine learner to seeing the human researcher or developer—along with, and not separate from, his or her model and surrounding social relations—as the machine learner. Only then can we see the sociotechnical process of empirically observable machine learning in the light of cultural-historical psychology and/or activity theory. These machine learners are developing ontogenetically, whereas the model in isolation is largely developing only microgenetically (although see our ontogenetic caveats in Table 3.1). They are taking the social relations from which their data has been constructed (e.g., the massive human labor of the Mechanical Turk workers who labeled ImageNet), internalizing the labeled pseudoconcepts into their models, and incorporating those internalizations into their own ontogenetic development (whether by fine-tuning the existing models or by moving on to training more interesting architectures). On public and private forums, they educate each other in their techniques at levels appropriate to their development; through the various communicative activities of releasing research papers, codebases, and pre-trained models, they pull each other’s development forward. Machine learning as a technique is a hermetic process of training individual models—but machine learning as a cultural activity is a vast and social process of training other machine learners.

Notes
  1. 1.

    On the dualism between behavior and mind, see Vygotsky (1997a, p. 65); on the dualism between the individual and the social see Cole and Wertsch (1996); for a discussion of the key assumptions of activity theory, see Chaiklin (2019).

     
  2. 2.

    Our reasons for using activity theory rather than other dominant educational theories are twofold. First, more conventional theories of learning from the late twentieth century—which consider learning as a hierarchy of individual behaviorist and/or information-processing-like achievements ranging from simple associations to logical problem solving (Gagné, 1970)—tend to reduce or eliminate the social, semiotic, contextual, and technological qualities of any human learning situation. Second, alternative approaches, which vary from constructivism (Piaget, 1954) to constructionism (Papert, 1991) to situated learning (Lave & Wenger, 1991), frequently incorporate one or more of the dialectical insights of activity theory.

     
  3. 3.

    Vygotsky’s term genetic is unrelated to the branch of biology concerned with DNA or RNA nucleotides, but instead refers to the way his experiments focused on uncovering the developmental phases of a learning process (Ageyev, 2003).

     
  4. 4.

    While self-learning in humans is of course possible in some situations (Gibbons et al., 1980; Rousseau, 1762), few learning theorists would advocate for the elimination of teachers entirely. Vygotsky argued that instructional experience is internalized, and thus “when the school child solves a problem at home on the basis of a model that he has been shown in class, he continues to act in collaboration, though at the moment the teacher is not standing near him … It is a solution accomplished with the teacher’s help. This helpthis aspect of collaborationis invisibly present. It is contained in what looks from the outside like the child’s independent solution of the problem” (Vygotsky, 1987, p. 216; emphasis added).

     
  5. 5.

    “As cognitive scientists like George Miller and Herbert Simon crossed back and forth between scientific descriptions of the human and normative discussions of the best way for scientists to think, they borrowed from the folk and social psychological image of right thinking to inform their own personal and public images … These very same scientific self-images would form the basis for the image of human nature that cognitive science produced” (Cohen-Cole, 2014).

     
  6. 6.

    By contrast, DL architectures do begin in a tabula rasa manner (while the neural architecture is fixed, the parameter weights are set randomly), and their goals are arguably more limited, to what would have then be dismissively known as a kind of pattern recognition (e.g., the classification of a bitmapped image as containing a tiger).

     
  7. 7.

    This debate, staged at the Abbaye de Royaumont outside of Paris, was later described as one in which “Chomsky’s argument focused exclusively on complex details of the learning of syntax, about which Piaget had virtually nothing to say … [and] Piaget’s ground for argument was conceptual learning, about which Chomsky had nothing to say” (Jackendoff, 1995).

     
  8. 8.

    For internalist depictions of the history of machine learning of this era, see Carbonell, Michalski, & Mitchell (1983b) as well as Kodratoff (1992).

     
  9. 9.

    It is remarkable to read one early text attempting to bridge PDP and cognitive psychology (Kehoe, 1988) describing how “cognitive research and theory has focused largely on the already highly productive performance by highly experienced human subjects.” On PDP and classical conditioning see also (Klopf, 1988).

     
  10. 10.

    In the intervening years, a field known as epigenetic robotics emerged that integrated developmental perspectives with robotics, which even inspired occasional reference to learning theorists like Piaget and Vygotsky (Berthouze & Ziemke, 2003). Because our focus here is on supervised learning and not reinforcement learning or interactive robotics, we do not discuss this work in detail, but it will undoubtedly be relevant to future histories of “social” reinforcement learning.

     
  11. 11.

    This inevitable analogical reframing of deep learning toward human learning is a traditional “universalizing” quality of AI, identified by Goldstein and Papert (1977), who explained that “it may seem paradoxical that researchers in the field [of AI] have anything to say about the structure of human language or related issues in education.… [But] [a]lthough there is much practical good that can come of more intelligent machines, the fundamental theoretical goal of the discipline is understanding intelligent processes independent of their particular physical realization” (p. 85).

     
  12. 12.

    The semiotic novelty of Vygotsky’s account of word meaning, in which indexical signs are seen as a foundation for the development of conceptual meaning, has been noted by Wertsch (1985): “The indicatory or indexical function of speech makes it possible for adults to draw young children into social interaction because it allows intersubjectivity on perceptually present objects even though complete agreement on the interpretation or categorization of these objects is lacking” (p. 57). Silverstein’s (1985) characterization is even more complete: “Vygotsky’s account [on the development of concepts] is really a logical reconstruction of the passage of words from being indexicals connected to the things they ultimately will truly denote, through an ‘egocentric’ stage in which they serve as performatives of a sort, to their ultimate emergence as sense-valued elements of propositional schemata, each stage being a functional enrichment, not a replacement, of the cognitive utility of language” (p. 231).

     
  13. 13.

    While some interpreters, following in the line of Leont’ev, hold that signs such as speech are just one type of “tool” that includes external tools and techniques, Vygotsky focused on speech as the most fundamental form of mediation (Miller, 2011, pp. 20–24; Vygotsky, 1987, pp. 60–61).

     
  14. 14.

    See also Davydov and Radzikhovskii (1986) for a discussion of signs as psychological tools.

     
  15. 15.

    The passage Vygotsky cites is from Tolstoy (1967, p. 278).

     
  16. 16.

    A similar conclusion has been reached, although with different social and cognitive implications, by advocates of the extended mind theory (Clark & Chalmers, 1998; Hutchins, 1994; Wheeler, 2011).

     
  17. 17.

    We describe connectionist ML over the “classic” tabular-oriented ML of Breiman (2001b) so that our critiques have greater purchase on claims for twenty-first-century AI, although in principle our critiques could apply to the simpler architectures of decision trees or SVMs (the latter of which, as pointed out by LeCun (2008), can be seen as a particular genre of two-layer neural network or “glorified template match[ing]”).

     
  18. 18.

    Microgenesis is “based on the assumption that activity patterns, percepts, thoughts, are not merely products but processes that, whether they take seconds, or hours, or days, unfold in terms of developmental sequence” (Werner, 1957, pp. 142–143).

     
  19. 19.

    See Grosman and Reigeluth (2019) on how the “lineages” of DL architectures impose certain constraints on human invention.

     
  20. 20.

    Rescorla and Wagner’s (1972) mathematical formalization of Pavlovian classical conditioning can be shown to be similar to simple supervised neural network models that typically update their weights based on a learning rate hyperparameter and a least-mean-squares loss function (Sutton & Barto, 1981). (Today, this similarity of machine learning to behaviorism is only really appreciated in the reinforcement learning community, where some of the language is in a direct lineage.)

     
  21. 21.

    Pavlov was certainly interested in behavior (a term that acquired almost as many varied meanings as the word ‘objective’), but he was not a behaviorist. Unlike John Watson and other American behaviorists of his day, he consistently acknowledged the existence and paramount importance of subjective phenomena—of the internal emotional and intellectual experiences of humans and other animals—and he always believed that science should seek to explain them” (Todes, 2014, p. 295).

     
  22. 22.

    It is important to recognize that Vygotsky’s discussion of syncretism, complexes, and true concepts in the fifth chapter of Thinking and Speech (1987) was originally written around 1930, but his sixth chapter on spontaneous and scientific concepts was written in 1934, the year of his death (van der Veer & Valsiner, 1991, p. 257).

     
  23. 23.

    Blunden points out that the distinction between pseudoconcepts and true concepts is related to the influence from Hegel’s Logic: “Dialectical logic is in fact nothing more than the art of dealing with concepts, that is, true concepts, rather than simplified, impoverished pseudoconcepts” (Blunden, 2012, p. 253).

     
  24. 24.

    Such an objection recalls the recent critiques of Gary Marcus: “Deep learning doesn’t recognize a dog as an animal composed of parts like a head, a tail, and four legs, or even what an animal is, let alone what a head is, and how the concept of head varies across frogs, dogs, and people, different in details yet bearing a common relation to bodies” (Marcus & Davis, 2019).

     
  25. 25.

    For a discussion of microgenesis in the context of Vygotsky, see Wertsch and Stone (1978).