AN essential property of language is that it refers to things in the world. Reference specifies this relationship between objects or ideas and linguistic expressions. As speakers plan their utterances, they make rapid decisions about how exactly they will refer to the entities that are part of their intended message. For every object or concept, a range of expressions is available, from highly explicit modified Noun Phrases (NPs), to less explicit deictic or pronominal expressions. Even a familiar, concrete, co-present object can be referred to as my favourite stripy mug, the cup, this one, or it. Though each of these choices would be semantically acceptable, they differ in terms of their pragmatic appropriateness.
In this chapter we examine the question of how speakers choose linguistic expressions to refer to things. As we will show, speakers are heavily influenced by context, and exhibit strong preferences for certain referential forms. For example, I would be unlikely to say ‘it’s red’ to talk about a cup that is not either part of the context or known to my addressee. Likewise, a story about my cup would likely not involve repeated explicit descriptions (‘I love my red stripy cup. My red stripy cup was a present from my friend. I use my red stripy cup all the time’). Addressees also have expectations about the referential forms they encounter. They are sensitive to the appropriateness of referential choices, and work to find explanations for the use of unexpected or inappropriately gauged expressions (see section 28.2.2.3).
Many researchers have noted that referential expressions fall along a continuum of explicitness (Chafe, 1976, 1994; Givón, 1983; Ariel, 1990, 2001; Gundel et al., 1993). Yet this continuum results from speakers making choices along numerous dimensions, a few of which are illustrated in Table 28.1.
Table 28.1. Choices between referential expressions
In this review, we focus on two dimensions of referential choice: modification (do I need to use more than just a noun to communicate reference?) and pronominalization (can I use a pronoun instead of a more explicit expression?) Both of these dimensions are critical to successful communication because providing too little information might lead to miscom-munication, and producing a too-specific expression may sound clunky and interfere with the listener’s ease of understanding.
In the next section, we review two major theoretical approaches accounting for speakers’ referential preferences, both of which highlight the fact that speakers’ choices are highly constrained by what is appropriate in context. We then examine questions about the cognitive processes underlying speakers’ choices, and the impact of these decisions on communication.
Within the theory of Conversational Implicature (1975), Grice proposed four maxims of conversation: Quality, Quantity, Relation, and Manner. Most relevant to the notion of informativeness is the maxim of Quantity, itself subdivided into two parts: (i) Make your contribution as informative as is required for the current purposes of the exchange; (ii) Do not make your contribution more informative than is required. That is, speakers are expected not to give less or more information than necessary for the purposes of the interaction. For example, when referring to one type of flower in a garden containing many other types, a rational speaker would be expected to provide sufficient information for her addressee to unambiguously identify the intended flower by using a term more specific than the basic level term ‘flowers’, for example ‘peonies’, or by modifying her expression such that it distinguishes the flower in question from others in the garden, for example ‘the big pink blousy flowers’. In doing so, the Gricean speaker conforms to the hearer’s expectation that she would make her contribution as informative as required. Regarding the second submaxim, a speaker would not be expected to provide that level of detail if the garden contained just one type of flower. Thus, under a Gricean account, speakers are expected to be minimally informative, producing utterances that convey all of the necessary information while being as economical as possible. Crucially for this theory of implicature, if a speaker fulfils these expectations, there is no need for the addressee to engage in extra pragmatic interpretation beyond what is literally stated.
So rather than prescribing proper conversational behaviour, the function of the maxims and the Cooperative Principle to which they belong is to account for the derivation of conversational implicatures via comparisons between what is expected and what is heard. For example, if a speaker flouts Quantity-1 and uses ‘flowers’ to refer to a specific type of flower in a context containing many other types, the addressee might infer that this underinformative speaker does not know the hyponymic term, or that other aspects of the context might disambiguate the apparently underinformative expression (e.g. prior reference). If the speaker flouted Quantity-2 and used ‘the big pink blousy flowers’ in a single-flower-type context, the addressee might make a contrastive inference, and assume the existence of a non-pink, non-blousy flower. Or, he might make an inference based on speaker goals, hypothesizing that the more detailed description of the flower is critical in the current exchange.
Although Grice never intended his theory of conversation to explain the psychological processes by which we understand utterances (Saul, 2002), his work has influenced the field of reference in two major ways: it has provided a theoretical backbone for the development of subsequent models of reference and informativeness; and it has inspired experimental investigations to test his philosophical ideas (see Noveck and Reboul, 2008).
Consistent with Grice’s Cooperative Principle, one theoretical tradition explains how speakers choose referential expressions on the basis of informativeness (e.g. Deutsch & Pechmann, 1982; Brown-Schmidt & Tanenhaus, 2006, 2008; Engelhardt et al., 2006; Davies & Katsos, 2010; Arts et al., 2011a,b; Pogue et al., 2016; among others). The basic assumption behind this approach is that speakers aim to be cooperative by providing the right amount of information for communicative success, where the ‘right amount’ is defined by the context. The definition of informativeness depends on the interaction between the context (including visual and discourse features, and speaker intentions) and the referring expression. That is, informativeness is affected as the context and/or the expression changes. Thus, informativeness is defined as a property of expressions within their contexts, such that more informative expressions are those that match a smaller set of candidate referents in the context.
For example, the expression ‘the cup’ is informative in the context of a lone cup, but as more cups are introduced into the context, it decreases in informativeness and ambiguity increases. And although ‘the large cup’ is informative in the lone cup context, it is not felicitous (in Gricean terms) since the size modifier is redundant and the expression risks the addressee drawing a false contrastive inference. As more potential (small) cup referents are introduced into the scene, ‘the large cup’ becomes more felicitous. Note that, under this conceptualization, the informativeness of the modified expression would not increase as more (smaller) cups are introduced, since it fits exactly the same number of candidate referents in both the single- and multiple-cup contexts. Using this reasoning, the literature has typically adopted a three-level taxonomy to categorize levels of felicity: optimally or minimally informative (e.g. ‘the large cup’ when two cups differ in size), underinformative (‘the cup’ in the same situation), or overspecified1 (‘the large cup’ in the lone cup context).
Although these levels of referential informativeness are reasonably intuitive, scholars have asked how contextual constraints are calculated and applied to both the speaker’s linguistic choices and the listener’s pragmatic inferences. Recent research has conceptualized informativeness as the extent to which a referring expression reduces uncertainty about the identity of a referent, given the set of potential referents in the concurrent domain (Frank & Goodman, 2012; Pogue et al., 2016). Informativeness in this sense has also been cast as surprisal in work formalizing informativeness by modelling human behaviour (Frank & Goodman, 2012; Goodman & Stuhlmüller, 2013). Surprisal is an information-theoretic measure of the degree to which a word reduces uncertainty about the speaker’s message (see also Hale, 2001, Levy & Jaeger, 2007; Mahowald et al., 2013). All of the conceptualizations discussed are in the spirit of the Gricean expectation of unambiguous referring.
Informativeness is discussed both as a property of referring expressions and a property of speakers (Mangold & Pobel 1988; Davies & Katsos, 2010; Frank & Goodman, 2012; Goodman & Stuhlmüller, 2013; Frank & Goodman, 2014; Davies et al., 2016; Davies & Kreysa, 2017). If a speaker composes her referring expression with words that apply to the intended referent and no/few others, ‘words which pick out relatively smaller sections of the context’ (Frank & Goodman, 2014: 85), she is informative in that situation. This behaviour may be considered a trait if speakers habitually use highly informative terms (Grodner & Sedivy, 2011; Yildirim et al., 2016). Relatedly, a message is informative about a speaker’s intended meaning in that particular communicative context. If the communication is successful, the addressee’s uncertainty about a speaker’s intended meaning will be reduced. Thus, a referring expression will only be fully informative if it matches the demands of the context (e.g. the presence of a contrast object of the same nominal class) and the speaker’s intention (e.g. for the addressee to uniquely identify a referent and/or to realize the greater or lesser importance that specificity plays in the current communicative context). In sum, informativeness is a function of referring expressions, speakers, and contexts. Even under models of informativeness that contrast with the model sketched earlier, such as those that claim that ambiguous referring expressions are efficient and desirable (Piantadosi et al., 2011), the role of context is to inform about meaning. There cannot be a measure of informativeness in the absence of any of these components.
A rich seam of research into the incidence and effects of informativeness has been opened in experimental pragmatics and psycholinguistics over the last decade. A number of studies take a functionalist approach to understanding informativeness, consistent with Grice’s proposal that speakers make choices for the purpose of communication. This work tends to show that hearers expect felicitously informative expressions in communicative exchanges, and that speakers are largely rational with respect to this. Methodologically, researchers have used referential communication games (Deutsch & Pechmann, 1982; Mangold & Pobel, 1988; Maes et al., 2004; Brown-Schmidt & Tanenhaus, 2006, 2008; Engelhardt et al., 2006; Davies & Katsos, 2010; Nieuwland et al., 2010; Arts et al., 2011b; Brown-Schmidt & Konopka, 2011; Engelhardt et al., 2011; Koolen et al., 2011; Davies & Kreysa, 2017), the visual world paradigm (Brown-Schmidt & Tanenhaus, 2006, 2008; Engelhardt et al., 2006; Brown-Schmidt & Konopka, 2011; Davies & Kreysa, 2017), and Event-Related Potentials (ERPs) (Nieuwland et al., 2010; Engelhardt et al., 2011) to manipulate aspects of the discourse context and monitor addressees’ online interpretations of referring expressions.
For example, a visual display might contain two objects of the same type, such as a large and a small apple, or just one object of that type, requiring a modified referring expression for disambiguation in the former case, and not in the latter. Typically, levels of informativeness produced by speakers and/or the timing and pattern of eye movements around the referential scene are analysed. These approaches have been successful in providing controlled contexts as a backdrop to reasonably spontaneous language and its processing from both speakers’ and addressees’ perspectives. They have also shed light on the processes linking speakers’ fixations to aspects of a scene and their referential choices. For example, Brown-Schmidt & Tanenhaus (2006) found that earlier fixations to a contrast object were associated with use of a prenominal modifying adjective, whereas later fixations were associated with postnominal modification. Thus, speakers are found to revise their utterance plans on-line.
In general, adults2 produce informative expressions, and in that sense they are Gricean. For example, when participants were presented with an array containing both a big and small star, they typically requested a target object (e.g. a big star) using expressions with modifiers (‘the big star’) around 80 per cent of the time (Davies & Katsos, 2010; Davies & Kreysa, 2017). Similarly, Brown-Schmidt & Tanenhaus (2006: Experiment 1) found that speakers produced a modifier like ‘with the triangles’ to identify a target square in a context of squares with and without triangles 98 per cent of the time (cf. 27% for displays with a single square), and Engelhardt etal. (2006) found that modified utterances were more common in a contrast condition than in the singleton condition (98% vs. 30%). When displays are very simple, rates of informativeness can reach ceiling. For example, Nadig & Sedivy (2002) found that adults consistently produced a modifier like ‘large’ to identify a large glass in a context of a large and small glass, and never produced a modifier when there was only one glass.
Although speakers are usually informative, they sometimes fall short of this expectation. For example, they may fail to notice that there is another object in the context that could be confused with their intended referent. Ferreira et al. (2005) found that speakers were more likely to provide modification for sets of differently sized identical objects (e.g. a big and small bat) than for sets of objects that were conceptually distinct but had the same linguistic label (e.g. a bat-for-sports and a bat-as-flying-mammal). This and other work also suggests that, unsurprisingly, modification is more likely when the speaker has attended to the competitor items in the context (Brown-Schmidt & Tanenhaus, 2006; Wardlow-Lane & Ferreira, 2008; Davies & Kreysa, 2017). Thus, knowledge of context or attention to context modulates use of that context in referential choice, a point to which we return in section 28.3 on speaker goals and feature salience.
Speakers appear to make decisions about modification based on the broad goal of communicative success, not merely as a simple response to the presence of competitor objects in the context. For example, if the conversation has already focused on an object such as a figure that looks like an ice skater, the speaker can use a simpler expression, like ‘the skater’, even if there are other figures in the context that may roughly fit this description (Clark & Wilkes-Gibbs, 1986). Brown-Schmidt & Tanenhaus (2008: Experiment 2) categorized almost half of the referring expressions as ambiguous in a referential communication task: for example, speakers asked addressees to put an object ‘above the red block’ where there were several red blocks. In such cases, the underinformative referring expression ‘the red block’ was licensed since aspects of the task ruled out many of the alternatives, for instance if only one of the red blocks had a free space above it. So, while speakers generally adhere to expectations of informativeness, expressions that superficially appear to be underinformative do not necessarily present difficulty to addressees, thanks to the integration of additional cues from context.
Speakers sometimes provide more information than is strictly needed. For example, Engelhardt et al. (2006) presented participants with a scene including a single apple on a towel, an empty towel, a puppet, and an empty box. Speakers frequently overspecified the target by saying ‘put the apple on the towel on the other towel’ (cf. ‘put the apple on the other towel’), and hearers were equally satisfied with both variants despite the fact that the first one was overspecified. However, Davies & Katsos (2013) argued that this redundancy serves other purposes, for example helping listeners interpret a complex scene, especially when the redundant information is salient (in this case, the duplication of the towel in the scene, and the fact that the target referent was composed of two objects). Thus, if we take human processing biases into account in order to identify the amount of information that is ‘required’, these situations may not represent true violations of the second Quantity submaxim. It seems that there are a number of properties that lead people to provide redundant information when it might help find the referent or identify the intended action. This is backed up by natural language generation literature documenting frequent overspecification in particular conditions (Koolen et al., 2011; Gatt et al., 2014; discussed in section 28.3.3).
Addressees make fast inferences about what their speaker intends to refer to in an unfolding discourse (Tanenhaus et al., 1995). For example, on processing modified nouns, they recruit information from prenominal adjectives to restrict the domain of potential reference not only to those objects displaying the property encoded in the adjective but also those objects to which the adjective might plausibly apply. In a classic study of this process, Sedivy et al. (1999) showed that on hearing ‘the tall glass’ speakers were quicker to resolve reference to a glass that was part of a contrast set than to a singleton tall glass, where using the adjective would have been overspecific (see also Wolter et al., 2011). Other forms (e.g. the ambiguous pronoun ‘she’ when there are two female protagonists in the discourse) may trigger a process of revision as addressees work to override incorrect predictions as the discourse proceeds (Rohde, Chapter 27 in this volume). This is not necessarily problematic: even when speakers deviate from expected forms, addressees are able to accommodate them by using aspects of the communicative context—this is pragmatic processing in a nutshell. Whether speakers diverge from expected forms intentionally or otherwise (e.g. when they misjudge the accessibility of a referent and provide inappropriately explicit referring expressions), addressees ultimately resolve reference.
Hearers are generally good at dealing with unexpected or infelicitous referential forms. As mentioned, speakers produce underinformative referring expressions when aspects of the communicative situation restrict the referential domain, ruling out competitors that would otherwise match the expression (Brown-Schmidt & Tanenhaus; 2008). In that study, addressees asked for clarification in only 7 per cent of ambiguous trials and were not generally confused by referring expressions that would have been ambiguous if the referential domain had not restricted reference (a pattern confirmed by eye movement analyses, which showed that addressees exhibited a strong preference to fixate the target). These findings provide evidence that addressees are well able to resolve ambiguous reference by using extra-linguistic information. But what about in contexts where an underinformative referring expression is truly ambiguous?
Although there has been extensive developmental research on children’s responses to underinformative expressions,3 there has been less direct attention on adults’ responses to utterances that do not ultimately disambiguate. Off-line acceptability judgements of underinformative expressions are lower than those of informative expressions (Davies & Katsos, 2010, 2013). In naturalistic conversation, asking for clarification is a common strategy. This has been described by Clark & Wilkes-Gibbs (1986) as the principle of mutual responsibility, under which interlocutors minimize their referential efforts using underinformative referring expressions in the knowledge that their partner can ask for more detail if required. When encountering ambiguous pronouns, adults are able to disambiguate on the basis of information presented in the preceding discourse, such as mapping a pronominal form onto the most prominent discourse referent (Light & Capps, 1986; Hendriks et al., 2014). Thus addressees recruit information from linguistic and non-linguistic sources. If these do not ultimately disambiguate, they may verbally seek clarification.
What, then, are the effects of overspecified expressions on the hearer? A speaker might produce a higher level of specification than an addressee is expecting given the referential context, for example ‘the brown rabbit’ to refer to a singleton rabbit. In such situations, the addressee may make a contrastive inference, whereby they assume the presence of a competitor in the discourse (Sedivy et al., 1999). In a study investigating the incidence of contrastive inference in a card game task, addressees inferred in around 80 per cent of cases that the speaker was holding a hidden card featuring a white rabbit after she had referred to a brown rabbit in the same hand of cards (Kronmüller et al., 2014). Thus, adults frequently enrich modified utterances. Alternatively, if after doing the necessary inferential work to justify a speaker’s inclusion of an adjective, the addressee concludes that the modifier was genuinely gratuitous, due to feedback about the function of the adjective on earlier trials (Engelhardt, 2008), or having been explicitly told about the speaker’s unreliability (Grodner & Sedivy, 2011), the contrastive inference may be suspended. Off-line measures complement this finding, revealing that acceptability judgements are lower for overspecified relative to minimally informative expressions, though they are penalized less than underinformative ones (Davies & Katsos, 2010).
A recent debate in the literature has focused on whether overspecified referring expressions help or hinder reference resolution (Rubio-Fernández, 2016). For example, some studies find that such terms lead to faster identification of the target referent (Mangold & Pobel, 1988; Arts et al., 2011b). Conversely, other studies have concluded that overspecifications impair comprehension (Engelhardt, Bailey, et al., 2006; Engelhardt, Demiral, et al., 2011). One possibility is that the differing outcomes are a methodological artefact. For instance, consider the relative simplicity of Engelhardt et al.’s (2011) materials. In that study, addressees had to identify a target from two-object displays following a modified expression, for example ‘the red circle’. Longer reaction times for the overspecified expression were taken to indicate an impairment to comprehension, though the methodology used does not clarify whether the latencies indexed delayed visual identification and/or implicit pragmatic judgements of the unexpectedly overspecified referring expression (note that overspecification is rare in simple two-figure displays; Rubio-Fernández, submitted). In addition, findings that overspecification facilitates comprehension tend to come from studies of written language (e.g., Arts and colleagues), while evidence that overspecification impairs comprehension stems from studies on spoken language (e.g. Engelhardt and colleagues). Further work on this question is needed.
Care should be taken in reconciling approaches using referential communication in the visual world, and those using extended discourses (written or spoken). In the former, evidence suggests that overspecified reference helps the search for a referent, whereas in the latter, inappropriately heavy NPs can impede comprehension (the Repeated Name Penalty; see section 28.2.3 and Rohde, Chapter 27 in this volume). As when resolving underinformative expressions, addressees recruit multimodal resources to interpret the speaker’s utterance, a strategy not available in reading or unimodal listening tasks.
A second major theoretical tradition has focused on how referential expressions are constrained by the linguistic discourse context (Chafe, 1976, 1994; Ariel, 1990, 2001; Gundel et al., 1993; Grosz et al., 1995; Gordon & Hendrick, 1998), which affects the referents’ information status (see Arnold et al., 2013, for a review). This work builds from the assumption that speakers and their addressees maintain mental representations of the discourse and situation, where information within these representations varies in its cognitive status (Bransford et al., 1972; Johnson-Laird, 1983; Van Dijk & Kintsch, 1983; Kintsch, 1988; Bower & Morrow, 1990; Zwaan & Radvansky, 1998). One of the broadest characterizations of information status is the distinction between given and new information. Things that are physically present or have been mentioned linguistically are considered ‘known’, ‘old’, or ‘given’ in context whereas information that has not been contextually evoked is ‘new’ (Halliday, 1967; Chafe, 1976, 1994; Prince, 1981, 1992). This distinction has been used to explain linguistic choices such as the use of definite (e.g. ‘the cup’) vs. indefinite expressions (‘a cup’), or the contrast between prosodically prominent (‘the CUP’) vs. reduced (‘the cup’) expressions.
Yet all given information is not equal. Certain events and entities are considered more salient, topical, accessible, or prominent. While these terms vary, they all refer to some dimension of cognitive status. Scholars in this tradition have proposed that the cognitive status determines the appropriate linguistic form. For example, highly reduced expressions, like pronouns (he, she, they, it), are specialized for highly accessible referents. On the other hand, specific expressions (‘my third grade piano teacher’) are reserved for highly inaccessible referents, such as when new referents are introduced to the discourse.
Chafe (1976, 1994) defines accessibility as the degree to which the referent is active in the hearer’s consciousness. On this account, a referent can be active (in the current focus of consciousness), semiactive (in peripheral consciousness, i.e. the prior topic) or inactive (neither active nor semiactive for the duration of the exchange). In contrast, starting with the speaker’s choice of referring expression, Ariel (1990) argues that specific linguistic terms signal to the hearer where to ‘look’ for the referent in the current discourse: in current focus, or somewhere more distant. That is, modified NPs signal least accessible information while pronouns signal highly accessible information. Working in the same direction, Gundel et al.’s (1993) givenness hierarchy claims that a pronominal form like ‘it’ signals that the referent’s cognitive status is that it is the current topic of the conversation and thus highly accessible, whereas a NP with an indefinite determiner, such as ‘a cat’, identifies the type of referent with the lowest cognitive status.
Many theorists suggest that discourse salience/accessibility can be explained in terms of attention (Bower & Morrow, 1990). Gundel etal. (1993) suggest that the speaker selects referential forms based on assumptions about their addressee’s attention, and Brennan (1995) says that speakers use word order and other linguistic devices to indicate the focus of their own attention.
The discourse-based framework has been used to explain the fact that pronouns and other reduced expressions tend to be used more often in certain linguistic contexts. For example, pronouns are most likely for referents that have been recently mentioned. These patterns emerge in analyses of written texts and elicited narrative production (e.g. Givón, 1983; Ariel, 1990; Arnold et al., 2009). For example, Arnold (1998) analysed the use of pronouns in written stories, and found that pronouns represented 86 per cent of references to something mentioned in the last clause, but only 32 per cent of references to entities mentioned two to five clauses back. This frequency pattern matches the conditions that are easiest for comprehenders. For example, Clark & Sengul (1979) asked people to read stories like
A broadloom rug in rose and purple colors covered the floor. The light from a small brass lamp cast shadows on the walls. In one corner of the room was an upholstered chair. The chair appeared to be an antique.
They measured the reading time of the last sentence, which was faster when the referent had been in the previous clause, both when the anaphoric expression was a NP (the chair) and when it was a pronoun, compared to when the referent had appeared two or three sentences previously. They concluded that the previous clause has a privileged place in memory.
Syntactic position also has a strong effect on pronoun use (see also Rohde, Chapter 27 in this volume). One of the most robust observations is that pronouns tend to be more common when the referent was last mentioned in subject position, compared with objects or obliques (Brennan et al., 1987), as shown by corpus and speech elicitation experiments (Stevenson et al., 1994; Brennan, 1995; Arnold, 1998, 2001; Kehler et al., 2008; Arnold et al., 2009). Visual world eye-tracking studies show that when listeners hear an ambiguous pronoun, they show a preference to gaze at a picture of the subject of the previous clause, and this preference emerges as early as 400 milliseconds (ms) after the pronoun onset (Arnold et al., 2000; Järvikivi et al., 2005; Arnold, 2015; Arnold & Lao, 2015; Hartshorne, Nappa, & Snedeker, 2015). Reading studies find that pronouns are read more quickly when their antecedent was the subject (Fukumura & Van Gompel, 2015). Moreover, comprehend ers also experience problems in the face of expressions that are more explicit than necessary. For example, Gordon et al. (1993; see also Gordon & Scearce, 1995; Hudson-D’Zmura & Tanenhaus, 1998; Almor, 1999; Fukumura & Van Gompel, 2015) found that readers slowed down when they read a repeated name referring to a highly prominent entity, such as the subject of the previous clause. That is, it sounds unnatural to read Bruno was the bully of the neighborhood. Bruno chased Tommy…, where you would expect a pronoun instead of the second ‘Bruno’, and this unnaturalness slows comprehension processes. The effects of grammatical role are especially strong when the pronoun and its antecedent fall in parallel grammatical roles (Sheldon, 1974; Chambers & Smyth, 1988). In addition, pronouns can be considered appropriate for referents that have occurred in syntactically focused positions (Arnold, 1998; Almor, 1999; Cowles, Walenski, & Kluender, 2007; Foraker & McElree, 2007).
The relevance of the discourse context has often been characterized in terms of the topicality of referents—the idea that the discourse is ‘about’ some referents more than others (e.g. Givón, 1983). The idea that pronouns refer to topics is also frequently instantiated in computational models: for example, Van Rij’s (2012) model suggests that pronouns refer to the subject of the previous sentence due to its high topicality. Centering Theory (Brennan, 1995; Grosz et al., 1995; see also Kim and Rohde, Chapters 25 and 27, respectively, in this volume) recasts the notion of topic as the ‘center of attention’, which is computed on the basis of grammatical function and anaphoric links across utterances (for a related model, see Gordon & Hendrick, 1998).
As in the informativeness tradition, some scholars in the discourse tradition have proposed to account for contextual constraints in terms of predictability. For example, Prince (1981) identifies three ways in which givenness has been defined: (1) recoverability/predictability, (2) salience, and (3) hearer knowledge, and she goes on to say that the three are related. Givón (1983) explains topicality in terms of properties like persistence, which is the number of clauses over which a referent will continue to be mentioned. Likewise, the Centering formalism proposes that discourse entities are ranked in a list of forward-looking centres, typically based on a grammatical function hierarchy (subject < object < oblique; Brennan et al., 1987). For example, in ‘The dog buried the bone in the yard’, the three descriptions are ranked (dog, bone, yard). The top-ranked one is the ‘preferred center’, and is the most likely one to be the backward looking centre (similar to the topic) in the next sentence. The definition of centres in terms of likelihood of continuation highlights the relationship between referential predictability and discourse coherence. In addition, corpus analyses show that when a referent has been mentioned recently, or in a prominent syntactic or semantic position, it has a higher likelihood of being mentioned again in the next utterance than less recent or less prominent entities (Arnold, 1998, 2001). This led Arnold to propose the Expectancy hypothesis, which suggests that linguistic cues to accessibility are relevant in that they indicate likelihood of continued importance to the discourse.
Nevertheless, the relation between discourse accessibility and prominence is debated. With respect to reference comprehension, researchers agree that comprehension is facilitated when the context increases the likelihood of the referent being mentioned at all. Much of this research comes from evidence that the semantic structure of utterances affects the predictability of reference to discourse entities. For example, in Sandy admired Kathryn because…, people expect Kathryn to be the cause of the event, and when participants are asked to invent a continuation of this story, they are more likely to mention Kathryn than Sandy (Stevenson et al., 1994; Kehler et al., 2008; Fukumura & Van Gompel, 2010; Rohde & Kehler, 2014). Similarly, in Sandy sent a letter to Kathryn, and then …, the transfer event leads to an expectation that the goal of the transfer (Kathryn) will feature in the next event more than the source (Sandy; Stevenson et al., 1994; Arnold, 2001; Rosa & Arnold, 2017). That is, the thematic roles of discourse participants are associated with referential predictability. This expectation guides pronoun comprehension, where readers are faster for expected references and more likely to choose the expected referent (Caramazza et al., 1977; McDonald & MacWhinney, 1995; Garnham et al., 1996; Stewart et al., 2000; for other types of predictability effects, see Altmann & Kamide, 1999; Arnold, Tanenhaus, Altmann, & Fagnano, 2004; Arnold, Hudson Kam, & Tanenhaus, 2007; Arnold & Lao, 2008).
By contrast, researchers debate whether predictability affects the speaker’s decision to use pronouns vs. names/descriptions. On the one hand, several researchers have found that for the causal type of sentences shown earlier, semantic inferences have no effect on pronoun production choices (Kehler et al., 2008, Fukumura & Van Gompel, 2010), and instead speakers use pronouns only for syntactically prominent referents. This led Kehler & Rohde (2013; Kehler et al., 2008; Rohde & Kehler, 2014) to make the strong claim that the speaker’s decision to use a pronoun is driven only by syntactic and topicality factors, but not by semantic factors. On the other hand, other studies have shown that pronoun production can be influenced by factors related to predictability. Rosa & Arnold (2017) found that the predictability of goal entities following transfer verbs leads to a small but robust tendency to use pronouns for goals more than for sources (see also Zerkle & Arnold, 2016). Using a different approach, Tily & Piantadosi (2009) asked participants to play a guessing game, where they gave them newspaper articles with certain words missing. They emphasized that the game was to guess the thing or person that would be mentioned, not the particular word. They found that people were more likely to guess correctly when the original context had contained a pronoun or a proper name, as opposed to a description (e.g. ‘the woman’). In sum, it appears that predictability can affect referential decisions, but it may not be the only relevant contextual property.
In addition to the constraints of information status, the discourse-based tradition recognizes the relevance of ambiguity. As described in section 28.2.2.1, a two-cup situation calls for a modified expression like ‘the red cup’, or ‘the cup on the left’. Likewise, a situation with two boys means that the English pronoun ‘he’ is ambiguous. Indeed, speakers appear to be sensitive to this ambiguity. For example, speakers are more likely to say ‘she’ for Ana following Jacob saw Ana and she… than following Liz saw Ana and Ana … (Francik, 1985; Karmiloff-Smith, 1985; Arnold et al., 2000; but for a different interpretation, see Arnold & Griffin, 2007; Fukumura et al., 2013).
The informativeness and discourse-based approaches focus on different linguistic phenomena, but they both make the same general claim: referential forms vary in their appropriateness, on the basis of the constraints on successful communication within a discourse context. Both traditions suggest that less-specific expressions are acceptable when the referent is unambiguous. Both traditions suggest that the informational context constrains the degree of specificity of referential expressions, and both suggest that these constraints are related to the communicative goals of the discourse participants.
In addition, scholars within both traditions have proposed that the context is important because it helps comprehenders predict the speaker’s meaning. When the speaker’s meaning is redundant given the context, it is highly predictable, and less explicit linguistic input is needed for successful communication. As explained in section 28.2.2, computational models of informativeness use surprisal to model contextual effects. Likewise (see section 28.2.3), the relevance of the discourse context is at least partly related to predictability, although this issue is debated.
Thus, the informativeness and discourse traditions overlap in their goal of accounting for contextual effects on reference form, and share some theoretical proposals. Nevertheless, each tradition represents different dimensions of the reference production process. The informativeness tradition focuses on what it takes to achieve successful communication. That is, have I given enough detail to allow my addressee to pick out the right referent, but without being too verbose? This focuses heavily on the presence of competing referents in the physical or linguistic context. The discourse tradition instead focuses on the functional role of referring expressions. Pronouns do more than just establish reference, they also provide a signal that the utterance should be connected with previous information, a communicative goal that goes beyond simply indicating a referent. This goal is heavily influenced by the structure of the discourse, which has led to a focus on the prior linguistic context. There are also conceptual differences between the two approaches. For example, in the informativeness and computational literature, perceptual salience is associated with things that are new, highly contrasting, or surprising (Gatt et al., 2012; Clarke et al., 2013). This is the opposite of discourse salience, which is associated with things that are old or predictable. Care should be taken not to conflate the two.
The informativeness and discourse-based frameworks suggest a regular correspondence between the context and speakers’ choices about how much information to provide when referring. However, real-life references present widespread variation in the application of these contextual constraints. This raises questions about how these models should be instantiated. How precisely is the context calculated, and does it involve calculations about what other people know, or just the speaker’s own knowledge? How is the context applied when selecting referential expressions? How is this process influenced by individual (e.g. speaker goals) or situational (e.g. referential pacts; contextual salience) differences? Do aspects of the referents themselves influence referential choice? We now address these questions by reviewing empirical investigations into some of the constraints involved in referential processing.
If I want you to pick up an apple that’s on the table in front of us, how do I choose an appropriate referring expression? Which aspects of the context do I use in my mental calculations? Multiple contextual dimensions are at play, including physical, linguistic, and interpersonal. A full theory of reference production requires an understanding of which ones are relevant, and how they interact.
The discourse-based tradition focuses heavily on the constraints from the linguistic, textual context. In fact, some computational models make the simplifying assumption that speakers choose pronouns to refer to the grammatical subject of the preceding sentence (Kehler & Rohde, 2013; Van Rij et al., 2013, or other text-based calculations (e.g. Centering Theory: Brennan, 1995; Grosz et al., 1995). On the other hand, scholars agree that the nonlinguistic context matters too. The non-linguistic context clearly affects modification preferences (see section 28.2.2), and pronouns can be used deictically, such as pointing at a cat while saying ‘it’s so furry!’
One question is how we should define ‘given’ information. Clark & Marshall (1981) propose that given status is defined in terms of what is mutually known amongst discourse participants, and suggest that people use heuristics based on physical and linguistic co-presence to estimate shared knowledge. If we both see something, or have both heard it, we know it is given.
Yet other work suggests that linguistic and conceptual information may have independent effects. One well established generalization is that reference to given information tends to be prosodically reduced, while reference to new information tends to be accented, and acoustically prominent (Halliday, 1967). Kahn & Arnold (2012) tested how speakers pronounced their words as they described objects moving on screen, for example ‘the airplane rotates’, where the target object (the airplane) was pictured on screen as one of six objects. Before the object movement, participants were exposed to one of three priming conditions: linguistic priming (a voice spoke the names of three objects), conceptual priming (three objects flashed on screen), or a control condition on which nothing happened. In the linguistic and conceptual priming conditions, the target was the last of the three objects to move, which meant that it was also fully predictable. They found that the target name (airplane) was shorter in the conceptual priming condition than the control condition, but that it was even shorter in the linguistic priming condition, where the target word was given both linguistically and conceptually (since the word also evoked the concept). This suggests that linguistic and conceptual exposure have independent effects (see also Baumann & Hadelich, 2003).
By examining how the non-linguistic context affects speakers’ choices, we can independently address questions about how speech is influenced by properties of the context, as well as questions about what the speaker and hearer know jointly or independently. Later, we discuss each of these types of constraint as we consider how referential choice is influenced by both top-down (e.g. constraints stemming from the interlocutors themselves) and bottom-up (e.g. features of the referents) pressures.
Researchers agree that the context constrains reference use, but they disagree about the role of common ground. Common ground refers to information that is mutually known—that is, if you can see a cup on the table, and I know you see that cup, and you know that I know that you see the cup, etc., we consider that knowledge to be in common ground (Clark & Marshall, 1981). For many theorists, common ground defines the relevant context. For example, I cannot say ‘it’s blue’ to you if we have no basis for the assumption that you know what ‘it’ refers to. This leads scholars like Chafe (1994: 97) to suggest that the goal of speakers is ‘categorize a shared referent in a way that allows the listener to identify it’ (see also Gundel et al., 1993).
On this view, common ground is fundamental to understanding the process of both choosing referential forms and understanding them. For example, interlocutors quickly assess their partner’s level of expertise on a subject, and design their utterances accordingly, such as providing more detail when one of the pair is a novice (Isaacs & Clark, 1987). Speakers also track their addressee’s feedback, and adjust their utterances as needed. For example, Clark & Krych (2004) examined how speakers instructed addressees to build Lego models. When both people could see the workspace, addressees sought feedback, for example pausing before placing a block, and speakers rapidly responded to these gestures. Other work demonstrates sensitivity to cultural or contextually-established common ground (Clark et al., 1983; Clark & Wilkes-Gibbs, 1986; Clark & Schaefer, 1987); see also the section on conceptual pacts (28.3.2.3). In addition, discourse participants actively seek out information about their partner’s knowledge by asking questions (Brown-Schmidt et al., 2008), and the task goals guide the degree to which people consider their partner’s perspective (Yoon et al., 2012).
On the other hand, other researchers have pointed out that the cognitive means of tracking other people’s knowledge and perspective can be demanding, and that this places limits on the extent to which real-time choices are made. For example, interlocutors’ knowledge states have to be represented based on available information deduced from various cues, such as visual context, textual distance, hearer feedback, etc. (De Cat, 2015: 266). Studies have found that when there is a conflict between the speaker’s knowledge and their addressee’s knowledge, they do not always ignore their own information (Keysar et al., 2000; Barr, 2008). One popular idea (Horton & Keysar, 1996; Ferreira et al., 2005; Wardlow-Lane & Ferreira, 2008) is that it requires cognitive resources to keep track of what is in common ground, and language users might fail to notice potential ambiguities in their speech. Evidence for this view primarily comes from experiments in which the speaker’s knowledge differs from the listener’s, for example where the speaker can see a hidden object (a large triangle) and uses an unnecessary modifier when referring to an object (a small triangle) in common ground.
The purpose of reference is to communicate: that is, for the speaker to get the addressee to identify the referent. As discussed in section 28.2.3, the accessibility/prominence of discourse entities has sometimes been explained as increased attention to those entities. If attention is what drives reference form choices, other indicators of attention should matter. A prime candidate is physical gestures or eye gaze, which can signal attention to a physically co-present object.
There is substantial evidence that listeners attend to the speaker’s eye gaze, and use it to help resolve ambiguous referring expressions. For example, participants in Hanna & Brennan’s (2007) study were faster to resolve expressions like ‘the blue circle with five dots’ when the speaker gazed in the direction of the referent (see also Staudte et al., 2014). Goodrich Smith & Hudson Kam (2012) showed that listeners could also follow gestures to positions in space that represented entities in the discourse, and use this to resolve ambiguous pronouns. In addition, 2- to 4-year-olds check the gaze of their interlocutor in an attempt to establish which of two possible objects she is referring to with a novel word (Nurmsoo & Bloom, 2008; see also Diesendruck, 2005; Grassmann et al., 2009). Younger still, 13- to 18-month-old infants check the gaze of a speaker more when a novel word is produced with two novel objects in view rather than one (Baldwin, 1993; Vaish et al., 2011).
Nappa & Arnold (2014) tested how the comprehension of ambiguous pronouns is influenced by gazing and pointing, and how these interact with linguistic contextual constraints. Participants in their experiment viewed a video of a woman at a table with two same-gender animal characters, one on either side, and an object in the centre. She told a story, for example This story is about Puppy and Panda Bear. Puppy is having some pizza with Panda Bear. He wants a pepperoni slice. The next screen posed a question, ‘Who wants a pepperoni slice?’, which they answered by pressing a button. In the neutral condition, people tended to pick Puppy about 80 per cent of the time, exhibiting the well-known subject bias for pronouns. On top of this, gazing was a weak cue, either supporting the subject bias when the speaker gazed at Puppy, or leading listeners to respond at chance when the speaker gazed at the non-subject, Panda Bear. Pointing had a stronger influence, leading to a strong bias to choose the pointed-at character, regardless of the linguistic context.
These findings are consistent with the idea that referring is constrained by the attention of the discourse participants. However, Nappa & Arnold (2014) demonstrated that pronoun comprehension is not driven by attention alone. In a second experiment, they manipulated a sudden-onset visual capture cue: a black square that appeared over one character as the speaker said the pronoun. If attention is the only thing that matters, irrelevant attentional capture should also influence pronoun interpretation. However, it did not. Arnold & Lao (2015) found that the listeners’ egocentric attention could partially bias the listener toward one character, but only briefly, during online comprehension. Final interpretation in that task was primarily driven by public, discourse-relevant information—that is, the linguistic context itself. Together, these experiments suggest that discourse accessibility is related to attention, but it cannot be reduced to attention. Instead, pronoun interpretation is driven by evidence about the speaker’s intentions. Pointing provides a strong, intentional indicator of what the speaker is referring to, while eye gaze reflects the speaker’s attention, which is only partially related to the speaker’s intentions.
Reference production and comprehension is influenced by the shared experience of particular speakers and addressees, for example when one person calls an object ‘the triangle shelf’, and the name is adopted by other discourse participants. If a speaker has used a specific expression taking a particular perspective on the referent, and an addressee has responded correctly to that reference, it becomes subject to a ‘pact’ in that particular discourse situation (Brennan & Clark, 1996, building on earlier work on lexical entrainment by Clark & Wilkes-Gibbs, 1986). Empirical work in this area has focused on the benefits of maintaining precedents with specific partners and the costs of breaking them (Metzing & Brennan, 2003; Van der Wege, 2009). For example, if an existing speaker uses a new expression, for example by switching from using ‘the shiny cylinder’ to ‘the silver pipe’, addressees are slower to look at the intended referent than when that new expression was produced by a new speaker, one with whom a pact had not been formed. Divergent theories have been proposed to account for this partner-specific language use: a socially grounded account, in which addressees meta-represent their interlocutor’s perspective and come to expect consistency in referring expressions (Barr & Keysar, 2002; Metzing & Brennan, 2003), and a domain-general episodic priming account, whereby speakers act as memory cues to situationally relevant information through associations with a particular expression (Horton & Gerrig, 2005; Horton & Slaten, 2012; see Kronmüller & Barr, 2015, for a meta-analysis of referential pact research).
The literature on referential pacts is helpful in explaining apparently overspecified referring expressions (e.g. ‘the shiny cylinder’ to refer to a lone cylinder). That is, expectations of speaker consistency may override expectations of minimal informativeness. Although the precise mechanism behind referential pacts is still being debated, the calculation of context and the production of an informative form may not be a pragmatic, Gricean process: instead there may be more automatic memory-based influences at play.
Under linguistic theories of intentionality (Searle 1969, 1983; Grice, 1975), speakers not only produce utterances but intend their utterances to have some effect, for example to bring their addressee’s attention to a specific referent or aspect of a referent, to teach or persuade their addressee of something, or to end the exchange as quickly as possible. These discourse goals have been shown to affect referential choice. In a study manipulating speaker goals, overspecified referring expressions were common in a condition requiring a speaker to teach a hearer a long-term skill vs. when a hearer only had to execute the action once (Arts, 2004). Task-criticality also increases overspecification: more detailed referring expressions occurred in a task requiring participants to describe an object in a high-importance condition (long-distance medical surgery) than in a low-importance condition (straightforwardly describing an object; Arts et al., 2011a).
While these findings are perhaps intuitive (speakers give more information where precision is prioritized and the risk of misunderstanding is high), higher-level discourse features like speaker goals should be integrated into a comprehensive theory of reference production. That is, when it is critical that information must be communicated precisely, it is safer for the speaker to assume that that information is harder to grasp and thus use more explicit forms (cf. redundancy in Shannon & Weaver’s (1949) information theoretic model). Such assessments may involve grounding considerations, such as community co-membership (Clark & Marshall, 1981). If an addressee belongs to the same community of say, fishmongers, a speaker may reduce the information contained in a referring expression to ‘the shucker’ from the alternative explicit expression ‘the stubby knife with the black handle’ since she can assume that her addressee can integrate critical specialist information about the best type of knife for opening oysters. In assessing and adjusting knowledge in this way, referring becomes more efficient (Isaacs & Clark, 1987). Parallels can be seen here with the less accessible/more explicit relationship pervasive in discourse-based models of reference.
A key part of the context in any referential situation is the referent itself, along with its referential competitors. While higher-level aspects of the discourse situation such as common ground have been extensively investigated, the role of referent features has received less attention. Needless to say, to achieve a comprehensive theory of reference, an understanding of each of the constraints on reference is required. Moreover, from a processing perspective, examining features of the referent itself can shed light on how speakers select the particular modifiers they eventually encode into their expressions. How do speakers identify the particular type of information that will be most informative when referring? For example, if you want to refer to a yellow stripy cup in the context of a plain yellow cup and a blue spotty cup, how would you do it? Informativeness accounts suggest that the optimal expression is ‘stripy cup’, though speakers frequently choose to encode colour too (Koolen et al., 2013; among others). In this section we discuss how (i) salience of features and (ii) display density might mediate discourse- or hearer-based informativeness expectations.
According to discourse-based models of reference, the more conceptually accessible a referent is, the less explicit its informational form. For example, once a referent has been introduced into the discourse, a speaker is licensed to refer to it using a pronoun. However, outside of the linguistic context, there are instances in which a referent is highly accessible due to its perceptual salience and it is precisely that salience (e.g. its brightness, bigness, or redness) which leads speakers to encode the associated feature into their referring expression, even if this would strictly speaking be redundant given the context. For example, speakers favour colour over other dimensions like size when referring (Pechmann, 1989; Dale & Reiter, 1995; Belke, 2006; Koolen et al., 2013). As a low-level feature, colour is salient, especially when it differs significantly from variation in the background (Vazquez et al., 2010). It is also salient due to the fact that it is an absolute rather than a relative property of a referent. Tarenskeen et al. (2015) concluded that colour is more likely to be encoded in referring expressions due to its salience, in turn stemming from how important it is relative to the nominal class it is modifying (e.g. clothing vs. office supplies) and due to the paucity of other attributes in a referent (e.g. a coloured geometric figure). Speakers mention colour when it does not have any discriminatory power (Koolen et al., 2013), especially when its referent has an atypical colour (Westerbeek et al., 2015; Rubio-Fernández, 2016). Aside from colour, increasing the salience or accessibility of a referent’s property by duplicating that property in an array also leads speakers to include that modifier in their referring expressions (Carbary & Tanenhaus, 2007; Davies & Katsos, 2009; Koolen et al., 2015).
Thus, increasing the perceptual salience of a referent or a property by rendering it colourful, making the colour more important to the referent itself, including an atypical property of the referent, or duplicating the property across other items in the display increases the likelihood of a speaker referring to it with a more explicit NP. This body of literature provides evidence that visual contrasts are not only easier to detect than linguistic contrasts (see also Ferreira et al., 2005), but that these contrasts are readily encoded into referring expressions, overriding Gricean predictions and generating overspecified forms.
Another referent-based constraint affecting referential choice is display density. Speakers tend to overspecify more when there are two targets rather than a single target to find in a display (Koolen et al., 2011, echoing Arnold & Griffin, 2007, who found that pronouns were more common in a context with a single animate character than contexts with two animate characters). Further, they are more likely to redundantly modify when a visual scene is cluttered, compared to when this is not the case (Paraboni et al., 2007; Clarke et al., 2013; Koolen et al., 2015). A similar effect has been found with referents showing a larger number of attributes (Van Gompel et al., 2014), and when speakers are put under time pressure (Belke, 2006). Since it requires cognitive effort to calculate the distinctive features between the target and multiple distractors, speakers might bypass such a calculation and overspecify their referring expression without posing a serious threat to communication. In other words, under live communicative pressures, they may sacrifice Gricean felicity while maintaining the informativeness required for their addressee to identity the target.
These effects have been explained from both speaker- and hearer-oriented perspectives. For example, under an efficiency-based analysis (Koolen et al., 2015), a speaker does not need to work out precise distinctive features between a target referent and the many distracters in a cluttered display and instead takes a shortcut by mentioning the salient property of colour as a preferred attribute. Rationally, colour may help reduce the search space by eliminating candidate referents of a colour other than that mentioned in the chosen referring expression, and even if it doesn’t, or does so only marginally, the speaker has not conceded too much of a cost. On the other hand, addressee-oriented accounts centre around speakers enabling hearers to profit from the pop-out effect that colour affords (Gatt et al., 2012). Finally, and following the view of reference as a collaborative process (Clark & Wilkes-Gibbs, 1986), speaker-hearer accounts have also been proposed, promoting the view that overspecification is efficient in communication for reasons shared by the speaker and the addressee: what is salient for the speaker is also salient for the addressee (Rubio-Fernández, 2016).
In sum, accounting for reference and its variation requires an understanding of how multiple constraints in the context impact not only referential choice, but also its mediating influences like attention. A wide range of non-linguistic pressures are at play in addition to the multitude of linguistic constraints comprehensively studied by scholars working in the discourse tradition. Each of the factors we have discussed are deemed important at a functional level because they relate to inferences about the goals, knowledge states, and perceptions of the discourse participants. In addition, they affect the production and comprehension processes necessary to achieve successful reference, because they affect fundamental memory and attentional processes that are relevant to language use.
Identifying constraints on referential choice is a relatively straightforward task. Measuring their interaction and relative impact is a more complex undertaking. Moreover, although variables such as discourse accessibility or presence in common ground can predict a speaker’s choice of expression to a certain extent, their ultimate forms remain probabilistic. One challenge for models of reference production is the considerable variability present in speakers’ choice of referring expressions, both between and within speakers. Different speakers choose different referring expressions in telling even a simple vignette (Castro Ferreira et al., 2016; Zerkle & Arnold, 2016), and the same speaker is unlikely to be consistent in her word choices when telling the same story on different occasions.
Although the factors influencing referential choice and interpretation have been studied extensively from various perspectives (e.g. discourse, psycholinguistic, developmental, computational), we do not yet have a comprehensive account of reference. In order to develop such an account, several outstanding questions must be addressed. Here we list a few.
• Individual differences. Novel considerations such as individual cognitive constraints, for example processing speed and memory capacity, coupled with innovative methods, for example computational modelling (Hendriks, 2016) may help the field progress towards a fuller understanding of variability in reference.
• Goals vs. processes. The research we have reviewed suggests that a theory of reference production must grapple with two overlapping questions: (1) how do contextual constraints affect the speaker’s goals, and (2) how do contextual constraints modulate the cognitive processes used to implement those goals? For example, an unusually coloured object (a purple banana) might change the communicative goals, increasing the speaker’s wish to note the colour. At the same time, the salience of the colour may attract the speaker’s attention and modulate discourse cues to accessibility.
• Linguistic vs. non-linguistic sources of information. How do physical, linguistic, and interpersonal aspects of the context interact? Bringing together these constraints has the potential to further advance our understanding of referential variability. For example, how might the effect of display density interact with recency of mention and speaker goals?
• Examination of all referential decisions. When researchers examine reference production, they typically examine just one dimension of linguistic form choice, such as pronoun vs. more explicit forms, modified vs. unmodified expressions, or acoustically prominent vs. reduced expressions. Yet speakers must weigh multiple options at once for each referring event, suggesting that a full model must account for the set of all choices at once.
In reviewing the literature from the informativeness and discourse-based traditions, this chapter has brought two highly related yet previously distinct theoretical approaches together. We have described their major assumptions and concerns, for example that referring expressions are expected to reduce uncertainty about the identity of referents, and have reviewed the methods and data that have informed their development.
We have seen that both the informativeness and discourse-based approaches provide a systematic account of referential choice. Empirically, the Gricean approach has focused on levels of informativeness and of felicity across a variety of visual contexts, and finds that although speakers frequently produce informative and felicitous referring expressions, deviations from expected forms are driven by complex pragmatic processes recruiting multimodal information. Unexpected forms can also have efficient pragmatic effects on the hearer, for example contrastive inference and faster reference resolution. The discourse tradition also takes a functionalist view of reference production, asking which form types are appropriate under different discourse conditions. Empirically these studies have aimed to characterize the linguistic contexts that affect reference form. They also examine the relation between the linguistic context and referential form choice, as well as the relation between the context and psychological mechanisms like attention and common ground.
We have examined the constraints on referential choice, exploring how aspects of context influence speakers’ means of referring. A wide range of constraints are highlighted: those coming from the extra-linguistic context (the interlocutors and the referents that make up a communicative situation), from the discourse itself, and from the cognitive representations of that discourse.
In reviewing theory and data on reference, we aim to raise awareness of the research questions, assumptions, methods, and findings from the two established traditions of discourse and informativeness. We look forward to collaborations investigating interactions between the two, with the ultimate aim of building a comprehensive model of reference production.
J. Arnold was partially supported by NSF grant #1348549.
1 We favour the terms ‘overspecified’ or ‘over-modified’ in contrast to ‘overinformative’ commonly found in the literature (Davies & Katsos, 2010; Engelhardt et al., 2006; Pogue et al., 2016; among others) since informativeness is no greater in overspecified referring expressions than in minimally specified ones.
2 There is also a large developmental literature on reference and informativeness. For reviews, see Graf & Davies (2014); Serratrice & Allen (2015).
3 For example, Nadig & Sedivy (2002) found that 5–6-year-olds used various strategies for resolving genuinely underinformative reference. See Morisseau et al. (2013) for a review of other work in this area.