5.1 INTRODUCTION

LANGUAGE is often used to talk about events. While each event is in some sense a singular unique occurrence, we generally talk about them as happening in a certain way with particular consequences, carrying on or ending after some amount of time, and involving different participants who bear particular relationships to the event. These conceptually general and grammatically relevant properties of events are captured in theories of event structure. The goal of such theories is to explain the patterns in meaning that arise from the structural aspects of language. Event structures do this by decomposing an individual event into a set of sub-events which stand in particular relationships to one another. These component sub-events and their relationships are often encoded in independently articulated linguistic constituents that are scattered throughout a sentence. As a result, these components must be recovered and put back together for a comprehender to properly understand the event that a sentence denotes.

This chapter examines how some of the grammatically determined interpretations arising from event structure unfold in real time, with a focus on sentence processing. We will examine studies from the psycholinguistic literature that have investigated different aspects of event structure using a variety of experimental techniques. But let’s first motivate the theory of event structure and decomposition before turning to consider how different component parts of event structure are put to use during real-time language processing.

5.2 DECOMPOSITION AND EVENT STRUCTURE

In many respects, the observation that language encodes certain properties of events stretches back to Aristotle, but more recently it has been Davidson’s (1967) programme that put events centre stage.¹ That paper, along with Lakoff (1971), McCawley (1971), Dowty (1972), and Ross (1972) set the stage for a wider appreciation of the treatment of events as being decomposed in language. Consider the relationship between the examples in (1). The differences in meaning correspond with differences in the formal structure of each sentence. The presence of -en in (1b) changes a non-verbal predication in (1a) to a verbal predication and elicits a process meaning absent in (1a) while retaining the end state of the steel being hard. The transitivity change from (1b) to (1c) introduces an agent that causes the process described, which is missing from (1b), while retaining the process and end state meanings of (1a) and (1b).

(1)a. The steel is hard.

b. The steel hardened.

c. John hardened the steel.

To capture these form-meaning correspondences, theories of event structure decompose the main event of the sentence into sub-events where predicates like DO, CAUSE,² and BECOME act as relations over events. Roughly following Parsons (1990), the event analyses in (2) secure the intuitions that (2c) entails (2a,b) and that (2b) entails (2a). In each case the additional event predicates add processes, causes, and agents into the overall representation.

(2)a. λe₁ [ hard(e₁, the steel) ]

b. λe₁ [ ∃e₂ [ BECOME(e₁, e₂) & hard(e₂, the steel) ]

c. λe₁ [ DO(e₁, John) & ∃e₂ [ CAUSE(e₁, e₂) & ∃e₃ [ BECOME(e₂, e₃) & hard(e₃, the steel) ] ] ]

Considerations of this type led researchers to generalize event structures into recurring event templates that host the verbal root of the event alongside its participants, including patients and agents, and describe the temporal unfolding of the event via the relationships between its sub-events (Dowty, 1979; Parsons, 1990; Pustejovsky, 1991, 1995b; Rappaport Hovav & Levin, 2008). The following three sections focus on these aspects of event structure and how they are employed during real-time language processing. Section 5.2 examines how event structure constrains the interpretation of verbal roots, with particular reference to expressions of manner or result, and explores how these constraints play out in language processing. Section 5.3 considers how verbal roots compose with event predicates and their arguments to establish an event’s temporal contours and whether psycholinguistic evidence has something to say about the components of that composition. Section 5.4 investigates how event participants that are implicit in different event structures are encoded and used during real-time language processing. Finally, section 5.5 offers a brief summary of the chapter with some observations on experimental study and strategy.

5.3 THE MANNER AND RESULT OF EVENTS IN THE ENCODING OF VERBAL ROOTS

At the core of events are questions about verbal roots and how they relate to event structure. Verbs contribute crucial information to event descriptions, including highly idiosyncratic information. For instance, the verb waltz provides fairly specific information about the motion of an action, whereas dance is more vague. From an event structure perspective, however, both encode the manner of motion in which an event is carried out, and thus differ from other verbs, like disintegrate and break which, regardless of their idiosyncratic differences, encode the result or change-of-state that occurs because of the event (Gentner 1978; Behrend 1990; Rappaport Hovav & Levin, 2008). This distinction between manner and result is grammatically relevant in a number of respects. For instance, manner verbs can occur with unspecified or non-subcategorized objects (4a,b), whereas result verbs cannot (5a,b).

(4)a. Mary danced.

b. Mary danced her feet sore.

(5)a. *John broke.

b. *John broke his leg bloody.

To account for facts like these, Rappaport Hovav & Levin (2008) propose a limited inventory of event structures that verbal roots associate with. Manner verbs are those that modify the DO predicate, specifying the way an event’s activity occurs (6a). Result verbs are those that are an argument of the BECOME predicate, specifying the resultant state of the event (6b). Because they are associated with BECOME, result verbs require an argument to directly bear their result/change-of-state, thus blocking unspecified and non-subcategorized objects.

To say that a verbal root is associated with one primitive predicate in an event structure is to say that there is an entailment of that meaning component. For example, while the manner verb scrub encodes a particular manner of cleaning, it does not necessarily entail the result of cleaning. Thus in (7) it is acceptable to deny the implied result of scrubbing, but not acceptable to deny the entailed manner of cleaning. Similarly, the result verb clean encodes a particular resultant change-of-state, but does not provide the manner of cleaning. In (8), it is acceptable to deny the implied manner of cleaning, but not the result of cleaning.³

(7)a. Mary scrubbed the bathtub, but it didn’t get any cleaner.

b. Mary scrubbed the bathtub, #but not by scrubbing it.

(8)a. Mary cleaned the bathtub, #but it didn’t get any cleaner.

b. Mary cleaned the bathtub, but not by scrubbing it.

Similarly, the meaning component that is not lexicalized may be expressed outside of the verb itself, as shown in (9), but it is odd or redundant to have the encoded verbal component also expressed outside of the verb, as in (10).

(9)a. Mary scrubbed the bathtub clean.

b. Mary cleaned the bathtub by scrubbing it.

(10)a. #Mary scrubbed the bathtub by scouring/scrubbing it.

b. #Mary cleaned the bathtub sterile/clean.

Given that verbs can have manner and result meanings, we might expect to find verbal roots that lexically entail both a manner and a result. This third class of verbs, however, is thought to not exist.⁴ In a series of papers over the last twenty-five years, Levin and Rappaport Hovav have argued that the manner and result components of verbs are in complementary distribution (Levin & Rappaport Hovav, 1992, 1995, 2006; Rappaport Hovav & Levin, 1998, 2010; see also Kiparsky, 1997). They propose that this complementarity reflects a constraint on how much meaning a verb root can lexicalize.

(11) Lexicalization Constraint: A [verbal] root can only be associated with one primitive predicate in an event schema, as either an argument or a modifier (Rappaport Hovav & Levin, 2010: 25).

Building on this dichotomy of verbal meaning components, McKoon & Love (2011) approached manner/result complementarity from a processing complexity viewpoint. As seen in (6), result verbs represent a more complex schema than manner verbs. They hypothesized that such complexity would be revealed in a variety of experimental tasks, including lexical decision, speeded sentence acceptability, and a stops-making-sense reading task.⁵

In the lexical decision task, participants were presented with letter strings and indicated whether the letter string formed a word of English or not as quickly and accurately as they could. The logic of this task is that a more complex representation will take longer to access, predicting that response times to result verbs would be longer than to manner verbs when controlled for confounding factors like length and frequency. McKoon & Love reported that ‘yes’ responses to result verbs took an average of 28 milliseconds (ms) longer than those to manner verbs, a finding consistent with the prediction that result verbs are more complex than manner verbs.

In speeded sentence acceptability, participants were presented with a sentence and asked to judge whether it was acceptable as quickly and accurately as possible. Manner and result verbs were matched on all the characteristics of their lexical decision study and on the proportion of their transitive usage, and put in transitive sentence frames like (12). Whole-sentence reading times were the main measure.⁶ Sentences with result verbs took an average of 178ms longer to judge acceptable than those with manner verbs, again consistent with the complexity prediction.

These same sentences were included in a stops-making-sense judgement task where participants were presented each sentence word-by-word, pressing ‘yes’ to read the next word in the sentence until the sentence no longer made sense, at which point they responded ‘no’. While somewhat artificial in terms of the naturalness of reading, this task is thought to tap more deeply into incremental semantic processes. As in speeded sentence acceptability, result verb sentences elicited slower response times compared to manner verb sentences. This difference emerged at both the verb itself (~89ms) and on the last word of the sentence (~51ms).

Taken together, these studies support the prediction that the representation of result verbs is more complex than manner verbs. To the extent that these verbs were consistent with only manner or result meanings, the stops-making-sense reading task further supports the idea that this information is accessed immediately during real-time processing, as these complexity costs arose at the verb itself.

The additional complexity of result meanings suggests that manner meanings might be the default interpretation during processing. This was examined in a learning study by Naigles & Terrazas (1998). They presented English speakers and Spanish speakers three training videos showing a motion event (e.g. a woman moving towards a tree while skipping) paired with a novel verb (English: kradding; Spanish: mecando) that appeared in one of three sentence frames (13). After this training, participants were shown two more videos. One video preserved the manner but changed the result of the event (‘manner preserving’, e.g. a woman moving away from a tree while skipping), the other preserved the result but changed the manner of the event (‘result preserving’, e.g. a woman moving towards a tree while marching). Participants were asked to choose which of the two videos was consistent with their interpretation of the novel verb, indicating whether they had assigned the novel verb a manner or result interpretation.

Let’s focus first on the English-speaking participants. When presented with verbs in neutral or manner-biased sentence frames, English speakers showed a strong preference for manner preserving videos, suggesting that they had interpreted the novel verb as a manner verb, in line with the prediction that manner verbs would be easier to learn. However, when the verb was in the result-biased frame, English speakers showed no preference for manner or result preserving videos.

These results suggest that more than just lexical complexity is involved in manner and result meaning; structural cues are also at play. Considering these structural effects, Mateu & Acedo-Matellán (2012) approached manner/result complementarity as a constraint not on the lexicon but rather one that arises from general syntactic principles. They propose that a verbal root can either adjoin to v as in (14a), or initially form a small clause followed by incorporation with v as in (14b).⁷ The first derives a manner interpretation while the second derives a result interpretation, in line with (6). From this analysis, manner/result complementarity arises as a simple syntactic fact: a verbal root cannot occupy both positions simultaneously in a single structure.

This analysis suggests that syntactic structure should affect whether a verb is interpreted as manner or as result.⁸ For instance, verbs like climb have been proposed as counterexamples to manner/result complementarity because they exhibit both manner and result meanings. In (15a), a manner akin to ‘clambering’ is attributed to the explorer, but in (15b) such a manner is missing; instead, the result of the upward direction of the prices is attested.

Levin & Rappaport Hovav (2013) propose that climb and its ilk are polysemous (see Schumacher and Rabagliati & Srinivasan (Chapters 19 and 22, respectively, in this volume) for further discussion), but Mateu & Acedo-Matellán (2012) observed that this apparent polysemy is structurally conditioned. They note that the presence of a directional Prepositional Phrase (PP) triggers a manner interpretation because this phrase occupies the result argument within the small clause, forcing the verb to adjoin to v.⁹ In (16a), importantly, the ‘upward direction’ result of climb is unavailable and does not clash with the directional PP. Forcing a manner interpretation in (16b) results in oddity, perhaps because it is conceptually unclear how prices can ‘clamber’.

Transitive constructions, too, condition the availability of manner and result interpretations. When the direct object is unaffected by the event, no change-of-state happens to it. Therefore, the only option is for the verbal root to adjoin with v. However, if the direct object is affected by the event, the verbal root must begin its life as the result argument within the small clause and then incorporate with v. Climb in these cases behaves like a manner verb because no change-of-state occurs to, for instance, the mountain in (17), as compared with break, a result verb (18).

Given this analysis, it is clear why the sentence frames used in Naigles & Terrazas (1998) had their effect. In English, the path-denoting PP in the manner-biased frames in (13b) occupied the result argument within the small clause, thus requiring the verb to adjoin to v and be interpreted as manner. For the transitives in the path-biased frames, however, manner or result depended on whether participants considered the action to have a change-of-state, leading to a mixed result.

Turning to the Spanish-speaking participants, they showed a strong preference for manner preserving videos when the novel verb was in the neutral frame, much like English speakers. However, unlike English speakers, Spanish speakers showed a strong preference for result preserving videos in the result-biasing frames, and no preference for the manner-biasing frames. This difference suggests that cross-linguistic factors modulated the structural manipulations that drove participant preferences.

As a Romance language, Spanish is typologically distinct from English in that it typically conflates result rather than manner with its motion verbs (Talmy, 1975, 1985). Romance languages in general do not allow manner verbs to occur with bounded path PPs, requiring a paraphrastic translation of an English manner verb construction (19; from Levin & Rappaport Hovav, 2006).

Levin & Rappaport Hovav (2006) suggested that languages differ as to which linguistic unit manner/result complementarity holds over. For manner-conflating languages like English, this unit is the verb. Result-conflating languages like Romance and Greek use the Verb Phrase (VP). Mateu & Acedo-Matellán (2012) related this proposal to Mateu’s (2002) proposal that resultative elements such as directional prepositions obligatorily incorporate in result-conflating languages. This restricts the adjunction of a verbal root with v, blocking manner verbs in those languages when a resultative element is present. Result-conflating languages such as Catalan thus allow (20b) where the directional component is incorporated into the verb, but not (20c) where the resultative/directional component (e.g. fora, ‘out’) is stranded.

Again, this structural analysis explains the pattern of results reported in Naigles & Terrazas (1998). Spanish speakers’ strong preference for result preserving videos in the result-biasing frames derives from the obligatory incorporation of an abstract directional element (likely triggered by the oblique a preposition adjoined to the definite article in al árbol, see (13c)). This incorporation blocks the possibility of adjoining the verb to v, deriving the result interpretation. The lack of preference in the manner-biasing frame also follows since either manner or result verbs can appear with PPS denoting unbounded paths.

Finally, both English and Spanish showed a preference for manner interpretations in neutral contexts. Feist (2010) suggests a pragmatic account. She proposes that, although neutral contexts allow both manner and result verbs, the lack of an overt object reduces the possibility of the verb itself encoding a result because object and result are conceptually connected as the object is the goal of directed motion. Invoking Grice’s Maxim of Quantity, she argues that without overt mention of the object, overt mention of the result is infelicitous. Alternatively, in each of Naigles & Terrazas’s scenarios, the subject is clearly acting as an Agent. From an event structure perspective, Agents are arguments of DO and begin their syntactic lives in the specifier of v. Having no need for a small clause, a manner interpretation emerges as the preferred interpretative choice.

Because events are linguistically decomposed, elements like verbs and their surrounding functional structure have to be carefully considered when determining what their individual contribution to the event is. Verbal roots can occupy one of two structural positions in event structure, taking on manner or result meanings. The complexity of these meanings is thus structurally determined, triggering costs during real-time sentence processing and guiding verbal meaning, including the acquisition of new verbal concepts.

5.4 THE TEMPORAL BOUNDARIES OF EVENTS AND THE ENCODING OF VERB PHRASES

Properties of event structure extend beyond how verbs encode properties like manner and change-of-state. Since Vendler (1957), aspectual properties that describe the temporal contours of events have been relevant to the grammar and processing of event structure, including dynamicity, durativity, and telicity. Telicity has by far received the most attention in the literature.¹⁰ At first blush, telicity relates to whether an event has a natural temporal endpoint or culmination.¹¹ Parsons (1990) introduced a CULMINATE predicate to the theory of event structure to capture this meaning. Kratzer (2005) offers (21) as a denotation, where R takes the relation between an individual and an event and asserts that the event culminates given this individual.¹²

Theories of telicity have proposed how verbs and their arguments combine together in a VP to determine whether an event description reflects an unbounded (atelic) or bounded (telic) aspectual interpretation, often revealed by the VP’s compatibility with terminative modifiers like in ten minutes that measure the temporal endpoint of the event.¹³ Verbs themselves, for instance, can contribute directly to telicity. Win is compatible with a terminative modifier but swim is not.

(22)a. John won in ten minutes.

b. John swam *in ten minutes.

Aspectual interpretations can also be detected by durative modifiers which act as natural temporal measures of unbounded events. In (23b), the swimming event lasts for at least ten minutes. Durative modifiers are also compatible with bounded events, but only if such events can be coerced into an unbound temporal frame, a phenomenon called aspectual coercion. Under a durative modifier, win is coerced to require multiple events of winning.¹⁴

This requirement of coercing a bounded event to receive an unbounded interpretation has been the focus of research interested in how aspectual interpretations unfold in real time. Early work by Piñango et al. (1999) investigated the verbal contributions to telicity by focusing on the processing of durative modifiers (e.g. until, for a long time, for hours) when they were preceded by unbounded verbs like glide or bounded verbs like hop¹⁵ using a cross-modal lexical decision task. In this task, participants were asked to listen to a sentence like (24) while watching a blank screen in front of them. At some point during the sentence, a letter string appeared and participants indicated whether the letter string formed a word of English or not. The logic of this task is that making a lexical decision will be more difficult if the participant is currently devoting more processing resources to understanding the sentence. Assuming that the initial mismatch plus repair by aspectual coercion between a bounded event and a durative modifier is a more costly process than a match (and thus no repair) between an unbounded event and a durative modifier, they predicted more difficulty with the lexical decision task when aspectual coercion was required. Sentences like (24) were presented auditorily with letter string probes presented 250ms after the offset of the durative modifier. Lexical decisions were an average of 39ms slower when aspectual coercion was required than when it was not. These results were replicated and extended in Piñango et al. (2006) which again found slower lexical decision times during aspectual coercion 250ms after the offset of the durative modifier, but not when the letter string probes were presented immediately after the offset of the durative modifier, suggesting that aspectual coercion may be a late repair process triggered by an initial meaning mismatch.

(24) The insect glided/hopped effortlessly until it reached the far end of the garden that was hidden in the shade.

Brennan & Pylkkänen (2008) reversed the order of the temporal adverbial and verb found in (24) to focus on the aspectual contribution of bounded verbs to aspectual interpretation. They manipulated whether the modifier preceding the bounded verb was durative (throughout the day) or non-durative (after twenty minutes), as shown in (25). Participants performed a word-by-word self-paced reading task, and made a sensicality judgement at the end of the sentence. They found a 37ms slowdown on the reading times of bounded verbs that were preceded by durative modifiers compared to non-durative modifiers, consistent with the idea that aspectual coercion is costly.

(25)a. Throughout the day the student sneezed in the back of the …

b. After twenty minutes the student sneezed in the back of the …

Following this, Brennan & Pylkkänen investigated the underlying neural sources of this processing cost with a similar design using magnetoencephalography (MEG), a non-invasive neuroimaging technique that measures magnetic fields arising from neural activity. The same sentences were presented to participants word-by-word for a set amount of time (300ms, with 300ms blank screen between words) while neuromagnetic fields were recorded over the whole head. They found that the cost of aspectual coercion on the bounded verb was reflected in greater neuromagnetic activity at two different neural sources. The first of these gave rise to a peak in activity occurring 340—380ms after the onset of the verb in the right frontal and anterior temporal lobes, and appeared similar to the N400, an electrophysiological component thought to reflect lexical semantic access and integration (Lau, Phillips, & Poeppel, 2008; Kutas & Federmeier, 2011). The second source, giving rise to activity 440—460ms after the onset of the verb in the anterior midline region, was argued to be the same source that is responsible for repair of the meaning mismatch in complement coercion.¹⁶ Brennan & Pylkkänen argued that these two neural sources reflected two component processes: first, the composition of an anomalous meaning, and second, the aspectual coercion repair process. This is consistent with Piñango et al. (2006)’s result of aspectual coercion as a late repair process triggered by an initial meaning mismatch.

Paczynski et al. (2014) also investigated the neural sources of aspectual coercion, extending Brennan & Pylkkänen (2008), by using both bounded (pounce, 26) and unbounded (prowl, 27) verbs in an event-related potential (ERP) study using electroencephalography (EEG). This technique is similar to MEG, but measures electrical instead of neuromagnetic activity. Bounded verbs elicited an anteriorly distributed sustained negativity beginning 500ms after verb onset when preceded by durative modifiers (26b) compared to non-durative modifiers (26a). This sustained negativity ERP was distinct from the N400 response, but similar to ERPs elicited by complement coercion (Baggio et al., 2010; but cf. Kuperberg et al., 2010). Unbounded verbs did not elicit any differential ERP response to durative/non-durative modifiers, which was as expected since no coercion process was necessary.

(26)a. After several minutes, the cat pounced on the rubber mouse.

b. For several minutes the cat pounced on the rubber mouse.

(27)a. After several minutes, the cat prowled about the backyard.

b. For several minutes the cat prowled about the backyard.

Given that aspectual coercion is often interpreted as iterating a single bounded event, Paczynski et al. (2014) also included frequentive modifiers as in (28) which also gives rise to a multiple event interpretation, but through quantification instead of coercion.

(28) Several times, the cat pounced/prowled…

Bounded verbs preceded by durative modifiers elicited a similar frontal sustained negativity (beginning 800ms after verb onset) when compared to frequentive modifiers. Paczynski et al. (2014) argued that this similar ERP effect suggests that the cost of aspectual coercion is not driven by event iteration but instead by the insertion of an aspectual operator that repairs the meaning mismatch. This operator then generates a multiple event interpretation not reflected by differential ERPs.

Taken together, these studies suggest that bounded verbs contribute immediately to the unfolding aspectual interpretation.¹⁷ Mismatches between bounded verbs and durative modifiers trigger a repair process that gives rise to an iterative interpretation of the event.

Event structure is, however, more than just the meaning of the verb in the sentence. Verkuyl (1972) established that telicity actually ranges over larger structural units. In particular, the countability of the internal argument of a VP plays an important role in determining the telicity of an event, suggesting that telicity is actually calculated over the VP. For a verb like eat, a quantized (i.e. count) internal argument licenses a bounded interpretation while a homogeneous (i.e. non-count) internal argument licenses an unbounded interpretation.

(29)a. John ate the apple in ten minutes.

b. John ate apples *in ten minutes.

The processing cost found for aspectual coercion of events that are bounded by their verb in the aforementioned studies has also been found for events that are bounded by their internal argument. In a stops-making-sense reading task, Todorova et al. (2000) investigated this compositional component of aspectual interpretation by manipulating whether the internal argument was an indefinite singular (i.e. quantized; 30b) or a bare plural (i.e. homogeneous; 30a). If participants rapidly composed the quantity of the internal argument with the verb to calculate aspectual interpretation, we would expect to find the processing costs of aspectual coercion on durative modifiers. Todorova et al. found longer reading times on durative modifiers that followed VPs with quantized internal arguments compared to those with homogeneous ones, evidence that aspectual interpretation takes into account both verbal and non-verbal sources of information during real-time language processing.

Interestingly, the properties of internal arguments do not always appear to contribute to aspectual interpretation. Some verbs appear to block the contribution of their internal argument. VPs with bounded verbs like find seem to always receive a bounded interpretation regardless of the countability of their internal argument. Similarly, those with unbounded verbs like push appear to always carry an unbounded interpretation.

(31)a. John found (the) books in ten minutes.

b. John pushed (the) carts *in ten minutes.

The evidence from (31a) and (31b) suggest a complex relationship between verbs and their arguments in forming aspectual interpretations, with some verbs being inherently specified for telicity and others not. In general, research in linguistics (Mittwoch, 1991; Rothstein, 2004; Rappaport Hovav, 2008) and in the psycholinguistics reviewed earlier has supported the idea of inherently bounded verbs.¹⁸ The status of inherently unbounded verbs has, however, proved to be more contentious, with some researchers supporting a class of inherently unbounded verbs (Dowty, 1991; Krifka, 1992a, 1998; Verkuyl, 1993; Tenny, 1994; Kennedy & Levin, 2008) and others arguing against an inherently unbounded class of verbs (Schein, 2002; Borer, 2005b).

Stockall & Husband (2014) took an experimental approach to this question. They varied the three possible verb classes (inherently bounded, inherently unbounded, unspecified) with a quantized or homogeneous plural direct object in two word-by-word self-paced reading studies. In the first study pairing inherently bounded verbs (lost) with unspecified verbs (read) as in (32), they found a slowdown in reading times on the first word after the direct object noun when an unspecified verb was followed by a bare plural compared to a definite plural. This slowdown was not present when the verb was inherently bounded. These findings suggested that participants combined unspecified verbs with the homogeneous property of bare plurals to arrive at an unbounded interpretation, slowing down processing perhaps because unbounded interpretations are more semantically complex than bounded ones (Husband & Stockall, 2014). The other three conditions did not differ in reading times since all three yielded bounded interpretations. In particular, inherently bounded verbs ignored the homogeneous property of bare plurals, behaving like the other bounded interpretations.

(32) The expert physicist lost/read (the) files on the formation of black holes.

In the second study pairing inherently unbounded verbs (roam) with unspecified verbs (inspect) as in (33), they found a slowdown in reading times on the first word after the direct object noun when either unspecified or unbounded verbs were followed by a bare plural. The same slowdown was not, however, present for unbounded verbs followed by a definite plural. Instead, reading times were similar to those of unspecified verbs. Given that unbounded and unspecified verbs patterned together, Stockall & Husband (2014) argued against a distinction between inherently unbounded verbs and unspecified verbs, suggesting that they, in fact, form a single class of unspecified verbs since both cases showed sensitivity to the countability of their internal argument during real-time processing.

(33) The local horticulturalist roamed/inspected (the) gardens in the neighborhood.

Such a finding is in line with Borer (2005b), who also argued against a class of inherently unbounded verbs. She noted that the classically cited cases for inherently unbounded verbs like push act like bounded events in sentences where world knowledge permits a bounded interpretation. As shown in (34), pushing a cart, for instance, has no natural endpoint in the real world and behaves like an unbounded event with respect to durative modifiers, whereas, pushing a button does have a bounded sense in the real world and thus behaves like a bounded event with respect to durative modifiers. She argued that without real-world support for a bounded interpretation, only an unbounded interpretation is possible.

Properties of events such as telicity further demonstrate how the relationship between verbs and their surrounding elements must be pulled together to properly interpret the event. Telicity is sensitive to temporal adverbs that measure the duration and endpoints of events, coercing the underlying event when possible to achieve a coherent aspectual interpretation. Arriving at an aspectual interpretation is not without cost, and is detectable in a variety of different experimental techniques that shed light on its time course. Such techniques can also give us a handle on the aspectual properties of verbs. Through their similar processing behaviour in terms of their relationship to their internal argument, inherently unbounded verbs and unspecified verbs appear to form a natural class, albeit one where the ultimate interpretation of the event is still beholden to our knowledge of the real world.

5.5 PARTICIPANTS IN EVENTS AND THE (IMPLICIT) ENCODING OF ARGUMENTS

The arguments that enter into events do more than establish its temporal contours; they also find interpretations of their own within event structure.¹⁹ Any given event description has a multitude of associated and entailed components. Events are identified in general by their time and location, they often but not always involve some objects or persons, and they can be highly idiosyncratic from verb to verb; for example, every event of dribbling in basketball involves a player, a ball, and at least one bounce. Natural language predicates, however, encode only some of these components as grammatically privileged.²⁰ These privileged components are known as event participants. One common theoretical device used to capture different event participants is the notion of thematic roles, an intuitive set of roles that identify the relationship of an event participant to the event (Fillmore, 1968). For example, Agents perform the action of the event, Patients undergo this action, and Instruments are used to assist with the action. These thematic roles are encoded to some extent, in poorly understood ways, by different verbal concepts. A verb like jab, for instance, seems to encode an Agent, Patient, and Instrument while other roles such as Location and Time are not explicitly encoded though they may still be expressed. Similar verbs like attack appear to encode fewer roles, only Agent and Patient.

(35) [The pygmies]_AGENT jabbed/attacked [the lion]_PATIENT [with a spear]_INSTRUMENT [in the field]_LOCATION [yesterday]_TIME.

Researchers since Jackendoff (1976) and Dowty (1979) have observed that thematic roles are intimately bound up in the representation of event structure, suggesting that they may derive from event structure itself. The notion of Agent or Patient, for instance, might be more elegantly captured by appealing to the roles they play in event structure, that is as the causers of a sub-event in (36a) or the undergoer of a change-of-state in (36b) (Levin & Rappaport Hovav, 1995; Van Valin, 2004; Schein, 2012).²¹ This implies that the elements that realize event structure are also implicated in the way that event participants themselves are linguistically encoded.

(36)a. λxλe[ Agent(e,x) ] =_def λxλe₂[ ∃e₁ [ DO(e₁,x) & CAUSE(e₁,e₂) ] ]

b. λxλe[ Patient(e,x) ] =_def λxλe₁[ ∃e₂ [ BECOME(e₁,e₂) & root(e₂,x) ] ]

Agent participants have been the focus of much of the experimental work on event structure. Many events are initiated and controlled by Agents, but Agents are not always overtly expressed in a sentence. For instance, every selling event has a seller, but sentences describing a selling event need not explicitly say who that seller was. Passives are one linguistic device used to demote an Agent. In their full form, like (38a), the Agent is still overtly expressed in a by-clause, but passives can omit this by-clause, leaving the Agent unexpressed, as in (38b). These short passives appear to be superficially similar to an intransitive clause like (38c) which also lack an overt Agent.

Although it is tempting to treat short passives and intransitives in the same way, there are formal differences between them that correlate with the encoding of an Agent. To probe for the presence of an Agent, we can take advantage of rationale clauses, which are only acceptable if an Agent is present, whether explicit or implicit.²² Rationale clauses distinguish short passives from intransitives, suggesting that only short passives encode Agents implicitly.

(39)a. The vase was sold immediately to raise money for charity.

b. #The vase sold immediately to raise money for charity.

These different formal cues are rapidly used during real-time processing when encoding Agents. Following initial research in Mauner et al. (1995) that used this property of rationale clauses to detect the presence of unexpressed Agents, Mauner & Koenig (2000) investigated the processing of implicit agents using a stops-making-sense judgement task. In their second experiment, they presented participants with either short passive or intransitive clauses followed by a rationale clause as in (40). In both conditions, an Agent of some kind was implicated. The main clause (including the adverb) was presented in its entirety, followed by the rationale clause read word-by-word.

Mauner & Koenig reported disruption in reading for the intransitive compared to the short passive condition. Participants made significantly more ‘no’ judgements starting on the verb of the rationale clause (17% vs. 5%) which rapidly accumulated over each subsequent word. By the final word of the rational clause, intransitives had elicited a cumulative 49.4 per cent ‘no’ responses compared to 16 per cent for short passives. This disruption was also reflected in ‘yes’ response times. Although some participants had not yet indicated that the sentence had stopped making sense, ‘yes’ response times were significantly slower at the initial to of the rationale clause for intransitives compared to short passives, a trend that continued throughout the clause.

In their third experiment, Mauner & Koenig reversed the order of rationale clause and main clause as in (41). Participants again performed a stops-making-sense task, reading these sentences word-by-word.

Reading disruption was again found in the intransitive condition compared to the short passive, with significantly more ‘no’ judgements elicited at the main verb (25.8% vs. 19%). Reading times of ‘yes’ responses also slowed somewhat, eliciting an 86ms difference at the postverbal position and 622ms difference on the final word. Mauner & Koenig reasoned that readers rapidly encoded an implicit agent in short passives, enabling them to license the rationale clause early in processing. This effect appeared immediately on the verb in their third experiment, suggesting that participants made immediate use of the structural cues for an implicit Agent. Given that the events encoded by their verbs implicated an Agent participant, their findings suggest that linguistically relevant event structures are responsible for these effects, and not some extra-linguistic conceptual or real-world knowledge.

Some initial evidence that Patients are also represented as event participants by some verbs was reported in Carlson & Tanenhaus (1988). They took advantage of the requirement that definites need a discourse antecedent for felicitous use or must otherwise be accommodated into the discourse. Assuming that accommodation is a costly process (Haviland & Clark, 1974), they hypothesized that a context that provides a suitable unfilled event participant role would allow for a more rapid accommodation of a definite than one that merely set up a likely scenario for the definite. They asked participants to read one of two context sentences which either introduced an implicit Patient (42a) or did not (42a’) and then judge whether a following sentence that began with an unfamiliar definite made sense in the given context.

Judgements for the target definite sentences were 219ms faster in implicit Patient contexts compared to no implicit Patient contexts. Participants also rejected target definite sentences less often in implicit Patient contexts, with 97 per cent judged to make sense, while only 84 per cent were judged to make sense in no implicit Patient contexts. This result suggested that implicit event participants can guide definiteness accommodation, allowing for a more rapid integration of discourse that maintains coherence between sentences.

Sluicing, an ellipsis construction that involves an overt wh-phrase, offers another way to probe for the presence of event participants. Frazier & Clifton (1998) compared encoded event participants to those that are left as simply entailed by the verb. An event of writing, for instance, encodes whatever is written as a Patient, but also entails something used to write with as an Instrument. Frazier & Clifton (1998) manipulated the wh-phrase that introduced the elided material of the sluice to target either the Patient (43) or an unencoded role (including Instrument, Goal, Location, or other oblique; 44). The antecedent was either overtly mentioned with a some indefinite (43a, 44a) or left unexpressed (43b, 44b).

(43)a. Kathy wrote | something | but nobody | seems | to remember what.

b. Kathy wrote | but nobody | seems | to remember what.

(44)a. Kathy wrote with something | but nobody | seems | to remember with what.

b. Kathy wrote | but nobody | seems | to remember with what.

They found that overt antecedents were read faster than implicit antecedents by a length-corrected average of 149ms. Additionally, encoded patient sluices were faster than unencoded argument sluices, but this difference was not statistically reliable (~41 length corrected ms; F₁ = 1.79, F₂ = 1.91). These results were replicated by Dickey & Bunger (2011), who reported a statistically reliable difference in Patients compared to unencoded arguments (~98ms).²³ Together, these studies suggest that the processing of sluicing is sensitive to event participant status, even in a technique like self-paced reading which does not require participants to make sensicality decisions at each word.

Finally, the integration of arguments within a sentence is guided by their event participant status. Koenig et al. (2003) reported a study that examined whether Instrument participants were more easily integrated into sentences with verbs that encode Instruments as event participants. Events of beheading require an Instrument to be used whereas events of killing are more general (and may require no Instrument at all). Koenig et al. (2003) hypothesized that events requiring Instruments like (45a, 46a) would more quickly integrate an Instrument into a gap than those that do not (45b, 46b).

(45)a. Which sword | did the rebels | behead | the traitor king with | during the rebellion?

b. Which sword | did the rebels | kill | the traitor king with | during the rebellion?

(46)a. With which sword | did the rebels | behead | the traitor king | during the rebellion?

b. With which sword | did the rebels | kill | the traitor king | during the rebellion?

For both wh-NP (45) with-wh-NP (46) sentences, they found that reading times on the direct object (+with) were faster with instrument-encoding verbs compared to no-instrument-encoding verbs (NP filler: ~70ms; PP filler: ~113ms), suggesting that verbs that encode Instruments as event participants guide the rapid integration of displaced Instrument arguments during real-time sentence processing.²⁴

Arguments and their interpretations are tied up with event structure. Even when unexpressed, the cues to event structure, either formal or via verbal concepts, can lead to an implicit encoding of event participants. These implicit participants are accessed and used in real-time sentence processing to guide interpretative processes that put an event together incrementally.

5.6 SUMMARY

In this chapter, we have looked at the consequences that different aspects of event structure have for real-time sentence processing. Because events are decomposed and articulated in constituents across a sentence, cues to their disparate components must be recognized and put back together to construct a complete and coherent representation of the event under discussion. This is made all the more complex during real-time processing as these components arrive one after another in quick succession, and yet studies show that speakers are highly sensitive to the right cues, able to rapidly extract them from individual units, and use them to guide interpretation in a highly incremental fashion. For verbal roots, event structure constrains their interpretation in terms of expressing manner or result. The complexity of result meanings involves additional processing compared to a manner meaning, but is structurally conditioned, sensitive to formal cues and cross-linguistic differences. Verbal roots also participate in establishing the temporal contours of an event structure in accordance with their inherent aspectual properties, if they have any, and properties of their arguments. This initial aspectual interpretation may go on to enter into further aspectual elaboration, at times requiring a resolution of mismatched aspectual meanings by costly coercion processes reflected in behavioural and neural measures. Finally, verbal roots and the formal structure surrounding them license the interpretation of different participants in event structure. Idiosyncratic encodings by verbal roots combine with formal cues to mark the presence or absence of implicit event participants, including Agents, Patients, and Instruments, regardless of whether they are implicated by the event or not. Encoding of these implicit event participants is triggered immediately by these verbal and structural cues, and such encoded participant roles go on to license rationale clauses and speed the processing of unfamiliar or displaced arguments.

Throughout our discussion of various experiments we have noted differences between more passive implicit tasks, like self-paced reading and electrophysiology, and those like stops-making-sense reading tasks which require more active decision-making on the part of experimental participants but prove to be rather robust techniques for detecting incremental interpretative processes. Whether the differences between these techniques affects the underlying conclusions of any one study is always up for debate; however, convergent evidence from multiple techniques resulting in relatable findings puts us on a surer footing. As a matter of general experimental strategy, initial processing cost results in self-paced reading and other behavioural tasks warrant further investigation with more fine-grained but experimentally complex techniques such as eye-tracking and electrophysiology, which may tap into the time course of sentence processing in such a way that behavioural techniques find difficult to discern. Given our overview, further experimental research on different aspects of event structure holds much promise.

Linguistic theories have laid a solid foundation for understanding how event structure is decomposed and represented in language. Building upon this, experimental research has endeavoured to bring to light the details of the consequences of this decomposition during real-time sentence processing. Going forward, these techniques offer potentially new and interesting ways to investigate how these structures are put together in real time, and perhaps even help to decide between competing theoretical perspectives by investigating shared processing behaviours. Such approaches hold promise for establishing a deeper connection being grammatically determined aspects of sentence meaning and the interpretative processes that guide its use online.

ACKNOWLEDGEMENTS

Many thanks to Lisa Levinson and those who attended the LSA 2017 Summer Institute course ‘Experimental Approached to Verb Meaning’ for their thoughtful discussion on many of the issues raised in this chapter. We also thank the editors for their helpful feedback on previous drafts.

¹ Events are not the only way researchers have tried to execute Davidson’s programme, though the relationship events encode may be indispensable. Bayer (1996) argues that events themselves are a distinct type from possible worlds/situations. He notes that worlds (or slices of worlds) are unable to distinguish buying and selling, but these cases are linguistically distinct as shown with modification. (3a) seems true given the context, whereas (3b) is not even though the same transaction is taking place.

(3) Mary’s religion has a tradition where wedding guests get a flower from a local vendor to symbolize their respect for the impending union.

a. John bought a flower from a street vendor in honor of Mary’s wedding.

b. A street vendor sold John a flower in honor of Mary’s wedding.

² We will set aside questions concerning whether CAUSE is the right event relationship here (Parsons, 1990; for arguments against CAUSE, see Fodor, 1970, Lombard, 1985, Pietroski, 2005, Williams, 2015).

³ Such tests may seem trivial, but they can reveal rather surprising behaviour, for instance, with degree achievements as in (i).

(i) The physicists cooled the plasma (for 2 hours), but it didn’t become cool.

⁴ Manner/result complementarity is not without controversies. Beavers & Koontz-Garboden (2012) argue that manner of death verbs, which include behead, crucify, hang, drown, and electrocute, lexicalize both a manner (i.e. the way the death occurred) and a result (i.e. the death itself). This argument relies heavily on a claim about what constitutes a, ‘lexicalized’ meaning. According to Dowty (1991), lexicalized components are meanings that are entailed regardless of the context. Husband (2011) argues that to the extent both meaning components are encoded, the manner component of manner of death verbs is entailed while the result component is actually presupposed. As a presupposition, the result meaning projects out of negation, as shown in (i).

(i) Socrates was not decapitated.

a. He poisoned himself.

b. #He didn’t die.

This suggests that the manner and result components of manner of death verbs reside on different levels of meaning and that manner/result complementarity may capture something about how much of a verb’s meaning can be asserted.

⁵ In previous work, McKoon & Ratcliff (2008) compared change of location verbs (e.g. descend) with manner of motion verbs (e.g. drift) and found similar results to those reported here. This is predicted given that change of location is a type of result under the wider umbrella of change-of-state (Levin & Rappaport Hovav, 1992).

⁶ McKoon & Macfarland (2002) argue that whole-sentence reading times are more sensitive than self-paced reading times to differences of this type when asking participants for acceptability judgements.

⁷ The v’s in (14a) and (14b) may be of distinct flavours, giving rise to DO and BECOME predicates respectively (Folli & Harley, 2007).

⁸ Of course, if a known verbal root has no possible result meaning but finds itself in a result construction then unacceptability should arise; and similarly if it has no manner meaning but is in a manner construction. Such an account may be appropriate for the unacceptability of *Mary danced her legs and of examples in (5) involving break where both verbs carry only a manner or result meaning, respectively.

⁹ Given that verbs like climb are ambiguous between these manner and result interpretations, there is a straightforward question about how this ambiguity is resolved during real-time processing. One might expect an initial preference for manner interpretations over result interpretations, either because the manner schema is simpler than the result template (McKoon & Love, 2011) or because the manner structure is simpler than the result structure (following minimal attachment; Frazier, 1978). To our knowledge, a study testing this ambiguity has not yet been conducted.

¹⁰ On the real-time processing of dynamicity and durativity, see Gennari & Poeppel (2003); Yap et al. (2009); Bott (2010); and Coll-Florit & Gennari (2011).

¹¹ Building on Jackendoff (1990), Gawron (2006) argues that spatial aspect shares properties with temporal aspect, suggesting that the notion of endpoint/culmination encoded by telicity is wider than what is being discussed here. See Grigoroglou & Papafragou (Chapter 7 in this volume) for further discussion on spatial uses of language.

¹² Kratzer’s (2005) final denotation for CULMINATE in (i) spells out this condition in terms of sub-parts of the object (under the appropriate measure) with sub-parts of the event. Taking eat the apple as an example, to culminate is to have a sub-event of eating a sub-part of the apple for all sub-parts of the apple:

(i) ⟦ [telic] ⟧ =_def λRλxλe [ R(e,x) & ∃f [ measure(ƒ) & ∀x’ [ x’ ≤ ƒ(x) → ∃e’ [ e’ ≤ e & R(x’)(e’) ] ] ] ]

¹³ Results themselves can be distinguished from the temporal boundedness of event telicity (Levin & Rappaport Hovav, 1995). Some result verbs require telic interpretations (arrive, die, find), but others do not (cool, descend, fall). And non-result verbs can occur in telic predicates (read a book).

¹⁴ Bott (2010) reports processing costs for the addition of a process to an achievement, suggesting that durativity can also be aspectually coerced.

¹⁵ Events like hop and sneeze are typically classified as semelfactives, momentary or punctive events (Moens & Steedman, 1988; Smith, 1991). This can be compared to achievements like win or reach the summit which appear to denote the boundaries of events (Piñon, 1997).

¹⁶ Complement coercion (e.g. John began the book) is another well-studied case of meaning mismatch repair. In complement coercion, the mismatch between an event-selecting verb (begin) and entity-denoting complement (the book) is repaired by coercing an event from the entity-denoting complement. Like aspectual coercion, this process is costly (Traxler et al., 2002) and elicits greater neuromagnetic activity in the anterior midline region (Pylkkänen & McElree, 2007).

¹⁷ The literature has not been univocal in reporting costs of aspectual coercion. Pickering et al. (2006) attempted to replicate and extend Piñango et al. (1999) and Todorova et al. (2000) using eye-tracking, but found no differences in eye movement measures. Townsend (2013), however, reported costs for aspectual coercion in first-pass reading times. Husband & Stockall (2015) suggested that this difference is due to the use of frequentive instead of durative modifiers in Pickering et al. (2006), since frequentive modifiers do not trigger aspectual coercion (following Rothstein’s (1995) event quantification analysis).

¹⁸ Verkuyl (1989) stands out as one case of an argument specifically against inherently bounded verbs, suggesting that those that appear to be inherently bounded are actually those that world knowledge specifies as having punctual/momentary durations.

¹⁹ Beyond what is discussed here, another important aspect of the interpretation of the arguments of an event is that they may be interpreted collectively or distributively, which has further consequences for the real-time processing. See in particular Syrett (Chapter 9 in this volume) for the effects of distributive items such as each on the interpretation of event structures.

²⁰ Williams (2015) proposes that privileged event participants form a representation he calls a sketch. We might think of a sketch as those participants of an event that are given by default in cognition. Such a representation appears to underlie behaviour in studies on event perception. Wellwood et al. (2015) used a similarity judgement task in which participants viewed two videos depicting different events described in (i,ii) and rated them on how similar they were to one another. The first video showed three participants, the second either the two core participants or three participants where the third was merely associated to the event. Wellwood et al. reported that giving vs. hugging (i), where giving requires three arguments while hugging only requires two, showed significant differences in their response times. The other conditions which could be described with either a three-participant (jimmy, steal, bean) or two-participant (open, pick-up, hit) verbal concept also showed differences in either response times (iia,b) or similarity ratings (iib,c), suggesting that participants that are engaged in the event are reflexively highlighted in a sketch-like representation, even if they are not frequently expressed with an overt constituent.

(i) Anne gave a teddy bear to Beth vs. Anne hugged Beth (while holding a teddy bear).

(ii) a. Anne jimmied the box (with a screwdriver) vs. Anne opened the box (while holding a screwdriver)

b. Anne stole the box (from Beth) vs. Anne picked up the box (while Beth stood by).

c. Anne beaned Beth (with a ball) vs. Anne hit Beth (while holding a ball).

²¹ Instruments are also linked to event structure. Pylkkänen (2008) notes that Instruments (as a type of High Applicative in her theory of argument structure) are related to CAUSE. Instrumentals are, for instance, prevented from occurring with unaccusatives like (ic) which lack a CAUSE in their representation.

(i) a. John broke a window with a stone.

b. A window was broken with a stone.

c. *A window broke with a stone.

²² McCourt et al. (2015) argue that anaphora in rationale clauses may be pragmatically mediated, with its reference being sensitive to a notion of responsibility, a concept that is often related to that of an Agent. For the following studies, the particular licensing conditions of rationale clauses are not directly relevant, so we will set these issues aside.

²³ Dickey & Bunger (2011) also compared sluiced sentences to ellipsis like (i) and found similar effects of event participant status on reading times in the final region (~66ms).

(i) a. Kathy wrote (with) something | but nobody | seems | to remember (with) what she wrote.

b. Kathy wrote | but nobody | seems | to remember (with) what she wrote.

²⁴ Of course, other extra-linguistic sources of information may be driving Koenig et al.’s (2003) effects. Stowe et al. (1991) reported faster self-paced reading times for sentences with plausible wh-NPs compared to implausible wh-NPs at the verb (~85ms).

(i) a. The teacher wondered which book the students read during their class.

b. The teacher wondered which song the students read during their class.

Reading times, therefore, can be influenced by factors other than those that are linguistic in nature. It could be that expectations of or co-occurrences between the wh-NP and the verb sped processing through prediction or priming. As with simple acceptability judgements, careful manipulation and control of the stimuli are necessary to understand the source of any processing cost.

CHAPTER 5