22
What can a corpus tell us about creativity?

Thuc Anh Vo and Ronald Carter

1. Concept ofcreativityin linguistics

The history of creativity study cannot be described as long, time-wise. Indeed, its first official appearance as an academic subject was in the 1920s (Pope 2005: 19) and it has been actively pursued only since the latter half of the twentieth century. However, achievement-wise, the field is not devoid of valuable findings. During these past fifty years or so, research into creativity within both psychological, socio-cultural and linguistic paradigms has been significantly fruitful, offering an ever-growing understanding of the phenomenon.

The study of creativity in linguistics can be seen as having started, albeit indirectly, with studies of ‘literariness’ in poetry and literature in the early twentieth century, considering that creativity is a companion that is ‘not easily separated from the nature of literariness in language’ (Carter 2004: 81). Early inherency models of literariness posit a distinction between ‘literary’, ‘poetic’ language and ‘ordinary language’ whereby the differentia specifica of poetic language in relation to ordinary language is believed to be identifiable in formal characteristics of the verbal sign itself, (e.g. metre, formulas), hence the theories of ‘defamiliarisation’ (Shklovsky 1989 [1917]) or ‘deviation’ (Jakobson 1960). In other words, literary language essentially shows formal departures from ‘normal’ rules or patterns of language.

Social influences are seen to have brought ‘a cultural politics to literary studies’ (Rice and Waugh 1989: 4), forming a body of post-structural works that moved away from the speaker and the text while bringing the reader and social ideology to the fore. For instance, taking the role of the addressee into account, the ‘reader response theory’ claims ‘it is possible to determine [a work’s] artistic nature by the nature and degree of its effect on a given audience’ (Jauss 1970). An attempt to marry up deviation theory and readeroriented schema theory was found in ‘discourse deviation’ theory by Cook (1994:182), who maintains that literary texts typically carry out the function of challenging/altering existing schemata in the reader, which is possible via deviation at the linguistic-structural level.

Further developments in pragmatic approaches to creativity are built on the foundation of Austin’s(1962)andSearle’s (1969) Speech Act Theory in interaction with reader-response theory. Ohmann (1971) calls literary speech acts ‘quasi-speech acts’ because, he points out, they do not exist outside literary contexts. The criterion for literary discourse is therefore argued by Ohmann to be ‘a discourse whose sentences lack the illocutionary forces that would normally attach to them’ (Ohmann 1971: 14). Pratt (1977), however, argues that non-literary categories such as hypothesis, pretence, fantasy, joking, etc., can also display this characteristic, effectively refuting Ohmann’s arguments. Pratt herself tries to analyse literary language in terms of its violation and flouting of Austin’s and Searle’s felicity conditions and Grice’s (1975) conversational maxims.

Increasingly, it has been noticed that many of the criteria set out exclusively for literary language can be applied to ordinary language as well. Creativity/literariness is not exclusive to literary texts; it also exists outside of the literature realm. The poetic/ordinary language distinction, therefore, is brought into question and is acknowledged to be unhelpful (Carter 2004). Indeed, ‘creative’ aspects of language are referred to in Chomsky’s (1965) model as the language user’s ability to generate an infinite number of sentences from a finite set of rules. Chomsky’s ‘invisible’ creativity, therefore, stands in stark contrast with the more ‘visible’ and highly valued literary creativity.

Interest in creativity in everyday, non-literary texts grow stronger in the latter half of the twentieth century and the beginning of the twenty-first century, particularly since the birth of electronic corpora. Findings from corpus creativity studies have further validated earlier suggestions: that is, many of the criteria set out exclusively for literary language are found to be applicable to ordinary language as well. It is argued that creativity is neither trivial, all-inclusive (as in Chomsky’s theory) nor literature-exclusive (as in earlier models of literary language); creativity is inherent in everyday speech but is still special in certain ways. As Carter and McCarthy (2004) put it, creativity is ‘not simply a capacity of special people but a special capacity of all people’ (p. 83). Carter (2004) proposes a theory of creativity in all common talk with two components, including pattern-reforming, i.e. creativity by displacement of fixedness, reforming and reshaping patterns of language, and pattern-forming, i.e. creativity via conformity to language rules rather than breaking them, creating convergence, symmetry and greater mutuality between interlocutors.

In summary, theories of creativity are far from exhaustive and even further from being unanimous. Due to creativity’s complicated nature and intertwining relationships with various surrounding entities, researchers often wonder ‘if we’ll ever reach a consensus about creativity’ (Sawyer 2006: 20). Fortunately, as Jakobson (1960) remarks, in scholarly discussions ‘disagreement generally proves to be more productive than agreement’ (p. 350). This inconclusiveness and diversity should, therefore, be seen as presenting opportunities instead of problems. More research in the field is expected if the big picture is to be completed.

2. Corpora and creativitycontradiction or continuity?

Defined as ‘a large and principled collection of natural texts’ (Biber et al. 1998: 12), ‘to represent, as far as possible, a language or language variety as a source of data for linguistic research’ (Sinclair 2004 – emphasis added), a corpus is characterised by its truthful representation of naturally occurring discourse as it is produced in everyday life activities. It entails a generally accepted fact that at the heart of corpus research lie patterns of language and recurrences of linguistic items. Hoey (2005) admits:

corpus linguists … have typically seen their goal as the uncovering of recurrent patterns in the language … with probability of co-occurrence … with fluency in language rather than creativity.

(Hoey 2005: 152)

This bias towards patterns in corpus linguistics can be attributed to the popular belief that uniqueness ‘cannot be observed with certainty in a corpus, because uniqueness in a corpus does not entail uniqueness in a language’ (Sinclair 2004). Since creativity is essentially characterised by ‘newness’ and ‘unexpectedness’ or, in other words, ‘uniqueness’,it appears on the surface that corpora and creativity stand in contradiction to each other and cannot work together.

However, if we look deeper into the developments of creativity studies in the last five decades or so, the relationship between creativity and corpora begins to reveal itself more fully. First of all, it is important to bear in mind that not all aspects of creativity involve newness or uniqueness. Repetitions or figures of speech, for instance, are established phenomena in language and can be analysed using corpus data and techniques. Other aspects, which are indeed characterised by newness and uniqueness, such as new word formations or novel idiomatic exploitations, can only acquire their statuses of being new and creative if a comparative background can be established for newness and creativity to be measured against:

individual texts can be explained only against a background of what is normal and expected in general language use, and this is precisely the comparative information that quantitative corpus data can provide. An understanding of the background of the usual and everydaywhat happens millions of timesis necessary in order to understand the unique.

(Stubbs 2005: 5, emphasis added)

Although some researchers might choose to use intuition or the existing set of prescriptive rules of English as their baselines against which new constructions are compared and analysed, others prefer to use the evidence of language norms generated by large corpora analyses. The advantages of this latter approach stem from the fact that many creative uses of language do not necessarily violate any rules of English – they are perfectly grammatical in every sense. The creative nature of such uses might reside in an aspect (e.g. semantic priming, semantic prosody, collocation or colligation) that is generally neither available in grammar books nor assessable by intuitions, particularly of nonnative speakers, but can only be detected via concordances and/or other corpus analyses. That is not to mention the fact that the English language is changing rapidly in parallel with social changes. Such items as netizen (citizen of the internet community) or TTFN (ta-ta for now, meaning ‘goodbye’) probably will not appear immediately in any dictionary or English grammar book, but a corpus of electronic communication messages will already be updated with these new terms. Corpora, if carefully compiled (or properly chosen) will be far more representative of the norms of contemporary English than any existing set of prescriptive rules, providing much more accurate backgrounds for analyses of creativity/uniqueness.

It is also important to remember that corpora are invaluable sources of data for our quest into linguistic creativity in spoken genres, an achievement that would otherwise be impossible due to the lack of material. Spoken language, which is an unplanned, unscripted activity during which speakers create ‘a lot of the performance on the fly’ (Sawyer 2006: 16), is ephemeral in nature; there would have been no product that remains for analysis afterwards had it not been for modern recording technology. Indeed, the linguistic community has continuously been recording spoken language in different contexts and genres, compiling both general reference and specialist spoken corpora with the hope that they will help shed more light on this under-exploited part of discourse. Multimodal corpora with video recordings of language events are also underway and offer more insights into non-verbal communication (see Thompson, this volume).

Admittedly, corpora do not have all the answers to creativity. Socio-cultural or cognitive information about creativity cannot usually be directly elicited from corpora or by corpus techniques. Hence, such aspects as the functions of creative language, its effects on the reader/listener, or the cognitive processes involved in the production and comprehension of creative language, need relevant experiments, possibly interdisciplinary research designs, to be effectively resolved. Corpora and corpus analyses to date confine themselves to being large electronic databases which offer invaluable statistical information about co-occurrences, trends, tendencies, frequency and distributions with accompanying software to allow a large number of searches and analyses to be carried out faster than any other manual method. However, the speed and the level of sophistication with which corpus annotation is evolving today mean that more layers of social and cultural information are being added, which will arguably bridge the gaps in corpus linguistic creativity studies in the near future.

3. What can corpora reveal about creativity in discourse?

Since the emergence of electronic corpora, pieces of the puzzle of creativity have continuously been added to the big picture. A review of prominent developments in corpusbased and corpus-driven creativity studies will be discussed in this section, illustrating the kinds of information corpora have been able to provide us with about creativity.

Creativity through departure from patterns

While searching their corpora for patterns, many linguists notice a common trend in which some alterations to ‘prepatterns’ are made, hence the term ‘pattern-reforming’ creativity suggested by Carter (2004). This category of creativity is essentially seen as the result of the ‘flouting of expectations of conventional regularities in language but depends on an intimate familiarity with those conventions’ (Prodromou 2007: 17). Aspects of this type of creativity, including the coinage of new words and novel expressions, creative collocations, creative idiomaticity and punning, have gradually been unravelled using information offered by corpora of different types.

Novel word formation

‘Real creativity’, as Lamb (1998: 205) contends, is when we ‘invent new lexemes for new or old concepts; when we build a new concept, especially one that integrates ideas in our conceptual systems that have not been previously connected’. What he highlights (i.e. novel lexical formations) are indeed phenomena easily identified in texts due to their ‘newness,’ and innovativeness, diachronically speaking.

Searching a corpus of English electronic communications (SMS, e-magazine or web pages), Rua (2007) identifies the prominent trends in e-communication as full off creative lexical manipulations executed via blending, compounding or affixation. Rua, however, emphasises that instances of word formations with new meanings created (e.g. the word friennaissance for ‘friendship’ + ‘renaissance’ from the TV show Friends) are not frequent. In many cases, creative forms are produced for convenience of communication via letter reduction, clipping, initialising, phonetic respelling or using number and letter homophones. (See also Munat 2007 for similar analyses but with data from her corpus of children’s literature and science fiction narratives.) The following text message illustrates ‘reduction techniques’ quite clearly:

Extract 1

Happy new yr 2 u. Yeh I had a gd nyte … R we out 2moro nyte then? If so wat sort of time, was thinkin id get bus wime mum at about 7:20, will be in town 4about 7:45 x

[Meaning: Happy new year to you. Yeah I had a good night … Are we out tomorrow night then? If so what sort of time, was thinking I’d get bus with my mum at about 7:20, will be in town for about 7:45]

With similar interest in creative lexical formations, Renouf (2007), however, places the emphasis on the relationship between these creative forms and the historical contexts, thereby suggesting possible links between social events and developments in linguistic creativity. She draws primarily from a large newspaper corpus of over 700 million words, and examines lexical creativity in a diachronic manner between 1989 and 2005. The study makes use of various filters as well as regular concordancing and word-processing software to detect novel words, word formations and productive inflections. It was found that creative formations are fertile in the corpus. Significant collocates of target words and frequency charts are used to illustrate ‘low frequency’ and the ‘peak’ period as well as the popular trends in word combinations over different periods of time. Political and social events during these periods are examined to find underlying influences on these terms and their collocations.

Creative collocations

Linguistic creativity on the basis of the violation of semantic prosodies has been one of the prominent topics of interest among linguists. The concept of ‘semantic prosody’, first used by Louw (1993), is to some extent comparable with Hoey’s concept of lexical priming, i.e. the process whereby:

a word … becomes cumulatively loaded with the contexts and co-texts in which it is encountered, and our knowledge of it includes the fact that it co-occurs with certain other words in certain kinds of context

(Hoey 2005: 8)

This priming effect, however, as Hoey emphasises, is a matter of weighting, not a matter of requirement. As a result, creativity is still possible through resistance to rules of priming by selective overriding of the primings (see also Hoey 2007). For example, the habitual collocates of break out, as found in the British National Corpus (BNC) of more than 100 million words, include unpleasant things and events, a tendency called ‘bad, unpleasant or negative’ semantic prosodies (Louw 1993: 160; Stubbs 1995: 246). Such co-occurrences dictate that any ‘pleasant’ collocates of this phrasal verb are to be considered departures from recurrent patterns (Sinclair 1991). As a result, when freedom (a desirable state of affairs) is coupled with break out as in ‘freedom was breaking out everywhere’ (BNC), the sentence is considered creative, unusual and intended to emphasise and draw attention to the statement. Looking further into this type of creativity, Hori (2004) identifies eight categories of creative collocations in the Dickens corpus (a 4.5-millionword corpus of Charles Dickens’ work searchable online), including metaphorical, transferred, oxymoronic, disparate, unconventional, modified idiomatic, parodied and relexicalised. More importantly, the coinage of these unusual, unfamiliar collocations shows an important aspect of the author’s literary creativity (Hori 2004: 113).

Creative idiomaticity

Creative idiomatic expressions, as a subcategory of creative collocation, have, with the availability of corpora, grown substantially into a separate area of research. Fixed expressions such as proverbs, sayings and idioms, contrary to the traditional views as being fixed, have been found to allow variability to different extents. Fernando (1996), consulting the Birmingham Collection of English Texts and her own corpus of newspaper articles, literary, academic and personal correspondence and conversations, identifies four ways idioms can be manipulated, namely replacement/substitution, additions, permutations (i.e. rearrangements/conversion) and deletions. The following variants of the idiom open a can of worms found in BNC are perfect examples of these four categories:

Substitution: advertising? Ah, well now thatsadifferent erm barrel of worms isntit really?

Addition: would open up an entire can of constitutional worms that the monarch could do without

Deletion: a can of worms or a heap of possibilities

Rearrangement: in total ignorance of the can of worms she was about to open

Partington (1998) notices a special type of idiomatic creativity in which the fixed expressions are not formally altered but ambiguity is still possible through ‘the process of replacing or coupling an idiomatic sense with a concrete one’ (Partington 1998: 134). The process is called ‘demetaphorisation’, as illustrated in this extract from a conversation between two friends where kicking can be understood both metaphorically (meaning ‘well’) and literally:

Extract 2

A: Are you alright Simon?

B: Yeah, you know, still alive and kicking. Just trying to decide whose ass to kick next

Similarly, Moon (1998), with the help of the Oxford Hector Pilot Corpus and the Bank of English Corpus, also comes up with a classification of idiomatic creativity that more or less includes the same categories as Fernando’s and Partington’s. In addition to the valuable evidence she found of one-off creative language uses (which she calls ‘exploitations’), Moon also discovers a vast number of systematic variations that certain idioms can have due to their inherent open nature, i.e. including open slots or having the same ‘schemas’.

The important message conveyed is that there is ‘massive evidence of the instability of the forms of fixed expressions and idioms’ (Moon 1998: 309) as revealed by different corpus studies. Moon, as a result, argues for more corpus studies to classify and correlate the different kinds of lexical and syntactic variations of idioms. So significant and ubiquitous is idiomatic creativity that it is suggested that it be incorporated into classroom activities to raise students’ awareness and appreciation of this phenomenon. For example, Stewart (2005) suggests using a list of creative newspaper headlines which contain aspects of idiomaticity or culture-specific references not readily accessible to non-native speakers. Students are then required to use the BNC to work out the wordplay based on any departure from usual patterns shown in concordances. (See also Cook 2000; O’Keeffe et al. 2007.)

Punning

While the preceding discussions focus on the formal, structural and semantic play of words and phrases, this subsection focuses on the phonetic and phonological pole. Puns through rhyme and assonance such as I say these are magic, they leave no stern untoned (from the TV show How To Look Good Naked) or No turn unstoned for pens that are mightier than the sword (headline from the Yorkshire Post 11 June 2007) are found to constitute an important category of creative manipulations of fixed expressions and idioms. Studying a corpus of cartoon language from the work of Australian cartoonist Cathy Wilcox, Kuiper (2007) identifies two types of ‘phonological deformation’, including exchanges (in which the ‘exchange source’ is the original phrase) and addition/substitution.

Admittedly, studies of the creative manipulations of form, structure and meaning have prevailed in the literature over creative sounds and rhythms, partly due to the fact that corpora of spoken language have been less available and more difficult to compile. However, with the rapid development of spoken corpora it is expected that creativity at the phonetic and phonological levels will catch up with its formal and semantic counterparts (for more on available spoken corpora, see Lee, this volume).

Creativity through patterning

Repetition and prepatterning as characteristics of the poetics of talk are observed by Tannen (1989) to be as pervasive in casual conversations as in public speeches or poetry/ drama discourse with the prevailing functions of creating meaning, coherence in discourse and interpersonal involvement in the linguistic events. The uses of repetitions, paraphrases and echoes (or shadows) to draw attention and serve special purposes in conversations as such are indeed considered another aspect of linguistic creativity. They are essentially different from the type of purposeless ‘negative repetition’ as a result of limited linguistic competency that Tannen (1989: 53) mentions. Carter (2004) calls this type of creativity pattern-forming creativity, which is found to be ubiquitous in the fivemillion-word Cambridge and Nottingham Corpus of Discourse in English (CANCODE). Pattern-forming creativity is underscored as an important aspect of spoken discourse in comparison to written discourse, precisely because of the interactional nature of conversations in which interlocutors co-produce the messages and because of the need to create convergence while taking part in conversations as well (Carter 2004: 111). Repetition and patterns, as a result, are not an impediment to creativity; they should be seen as ‘a limitless resource for individual creativity and interpersonal involvement’ (Tannen 1989: 97).

Corpora and creativity, as we can see, are not so contradictory after all. It can be seen that corpora and corpus techniques have proven useful in both literary studies (prose style, authorial styles, stylistics teaching) and literariness in non-literary everyday discourse as well (see chapters by Amador-Moreno and McIntyre and Walker, this volume). It is believed that corpora are a significant way forward for creativity studies.

4. Spoken and written creativity

Despite the strong interest in creativity in spoken discourse since the latter half of the twentieth century, research into creativity in written discourse, e.g. poems, prose, printed advertisements, book titles, newspaper articles and columns, graffiti and the like, have admittedly advanced much further (Carter 2004). The unsettling nature of spoken language, the unavailability of necessary equipment and the labour-intensity of the whole process of recording and transcribing data have delayed studies in spoken discourse in general (McCarthy 1998) and spoken creativity in particular. The lack of research into spoken creativity inevitably makes it difficult to draw any concrete conclusion about the similarities and differences between them. To make matters worse, there has not been much comparative research into spoken and written creativity either. Existing literature on the subject consists of a large number of studies whose data are a mixture of written and spoken, literary and non-literary genres treated as part of a common linguistic heritage without any discrimination (e.g. Tannen 1989; Fernando 1996; Moon 1998). A small number of studies, although separating data in writing and speech, do not provide comparative analyses, which do not guarantee the findings being exclusive to either mode. Claims about the similarities and/or differences between written or spoken creativity, therefore, can only be tentative and urgently require further investigations.

By and large, it is generally acknowledged that creativity is more likely to be found in one mode than the other (Partington 1998: 122). To be specific, many spoken genres such as daily conversations, it is argued, trail in the creativity department in comparison to written genres because of the practical limitations of both the online processing time to interpret novel collocations and the cognitive effort required to produce them. Such written genres as poetry, humorous writings, advertising, newspaper headlines are often found to be particularly rich in language play. Some authors, however, argue for the ubiquity of creativity in common talk (Tannen 1989; Cook 2000). It has been observed that not only the frequencies but also the manifestations of creativity in speech and writing show significant overlaps as well. Indeed, a lot of the features traditionally considered written literature specific (new word formations, figures of speech, creative collocations and idiomaticity, etc.) are frequently encountered in spoken genres. The boundary between literary creativity and everyday spoken creativity is thus very blurred, and it is suggested that it is best conceived as a gradation instead of an absolute division (Carter 2004: 81). Carter also maintains that contextual elements such as nature of transaction and relationship between speakers have the greatest impact on the degree of creativity. Conversations of an informal and intimate nature, for instance, are more prone to creativity, while more formal transactional exchanges are less so. In other words, the degree of creativity in different texts varies according to contextual aspects of discourse instead of mode of communication and should be viewed as such.

Data used in creativity studies in speech and writing can range from small, selfcollected and self-annotated, specialised corpora to large commercial general reference corpora, depending on the purposes and scale of each study rather than on the mode of discourse. However, through the limited availability of large commercial spoken corpora, the majority of studies into spoken creativity have shown a tendency to consult small, self-collected corpora, whereas those exploring written creativity have had a wider range of choices and have taken more advantage of various large corpora of millions of written words. In terms of methodology, creativity studies in both speech and writing do not differ significantly, following one of the two main approaches, i.e. corpus-based or corpus-driven. In the former approach, i.e. corpus-based investigations, corpora are used as sources of empirical data (linguistic, socio-cultural, textual) against which intuitions about creativity are tested or preliminary findings from smaller data sets are validated. In the latter, i.e. corpus-driven research, corpora themselves are the data from which creative language uses are uncovered. At the time of writing, the identification and systematic extraction of instances of linguistic creativity in both spoken and written corpora has proved to be the can of worms in the corpus linguistics–creativity nexus. As Carter (2004) admits, purely quantitative methods of retrieving creative language are not yet available as a result of the limitations of software development. Essentially, the lexical and syntax-based search functions available in corpora software and packages to date help us find only the exact phrases/words we want to search, not their creative variants. Most searches for variants are therefore qualitative in such a manner that ‘the corpus is “read” like a transcribed, living soap opera’ (Carter 2004: 150) or chanced upon as a matter of good fortune (Moon 1998: 51). Others try to overcome this by carrying out repeated searches of a combination of key words, crossexamination of concordances of each individual content word, or searches of syntactic frames using wildcards (Cignoni and Coffey 1998; Francis 1993; Philip 2000). However, the large number of searches to be carried out and the hit-or-miss nature of these searches render the whole process laborious, time-consuming and not always sufficiently reliable.

5. What else is needed of a corpus to facilitate creativity studies?

One of the biggest problems with identifying variants of fixed expressions and idioms in corpora is that we often do not even know what we are looking for. Based on findings so far made available about the nature of creative language and its relationships with surrounding contexts and co-texts, some suggestions are made in this section in the hope that the extraction of linguistic creativity from corpora will be less painstaking, more straightforward and more productive.

Humour, creativity and corpora

First of all, humour, wordplay and creativity are found to be closely related, to the extent that wit and humour are the most frequent functions of creativity in spoken language (Chiaro 1992; Carter 2004). As a result, during the process of tagging a corpus, it is important that laughter (in a spoken corpus) or humour (in a written corpus) is carefully coded in a systematic manner to facilitate searches for creative language. For example, using WordSmith Tools (Scott 2004) to concordance ‘laughs’ in a small corpus of four tourist commentaries recorded on four different London tour buses, twenty-six concordances were retrieved. A sample is presented in Figure 22.1.

Figure 22.1 Two lines from a concordance search of laughs.

The phrases London burger market and Viagra memorial monument are instances of creative collocations due to their unusual combination (neither one of these can be found in the BNC of 100 million words) and humorous effects. Admittedly, in many cases a creative language use is not readily spotted within the span of concordances around ‘laughs’, but if we go to the source to consider the extended co-texts, creative language can be found:

Extract 3

George III, Mad King George we call him … They thought he was mad because they found him in his garden, dressed in his night shirt at five o’clock in the mor-eve-morning talking to a tree. That’s no reason to call a man mad. After all, we have a prime minister who keeps talking to a bush and nobody seems to think he’s crazy < laughs >

Extract 4

Mick Jagger came here but he was confused, he didn’t want LSE at all. Turned out he wanted an LSD. So he left here and went off to join the Rolling Stones < laughs >

In the first extract, there is a pun played by hyponyms (treebush) and homonyms bushBush (plantation vs proper name, US president) in the political context of UK former Prime Minister Tony Blair’s conversations with US former President Bush. Similarly, the pun in the second extract is based on the phonological similarity between LSE (the initials for London School of Economics) and LSD (Lysergic Acid Diethylamide, a hallucinogenic drug) in the context of rock-singer Mick Jagger’s alleged association with this substance at the time.

As we illustrate here in brief, careful tagging of humour in corpora will provide useful pointers to creativity in texts as demonstrated.

Semantic annotation and creative idiomaticity

Various studies in corpus linguistics, supported by other research in the field of cognitive semantics, suggest that the fixedness of idioms is actually conceptual instead of lexical (Moon 1998; Langacker 1987). For example, alongside the canonical form eat humble pie, the following variants were found in the BNC among the concordances for the phrase humble pie:

were swallowing large slices of humble pie after the reformed for ever now began to chew humble pie and were drawn to Yes, I tasted the sourness of humble pie </p><p> ‘So do you He found the taste of humble pie just a little too much to stomach

Although the actual word eat was replaced in these examples, the concept is still there, albeit with additional connotation colours to each substitute swallow, chew, taste or taste the sourness of, to stomach. It is suggested that, first of all, corpora need to be semantically annotated and tagged into semantic categories on the basis of their senses being related to each other at some level, including synonyms, antonyms, hypernyms and hyponyms (the same principles are used in the electronic lexical database WordNet, see Fellbaum 1998). In effect, such a task (i.e. semantic annotation of texts) has been made possible by such a corpus tool as UCREL Semantic Analysis System (USAS), developed at Lancaster University. The software offers automatic semantic annotation of English texts whereby each content word in the text is assigned a value within twenty-one primary semantic fields, which are then further subdivided into 232 categories (see Wilson and Thomas 1997; Rayson et al. 2004). F1 for example is the category of FOOD. The level of sophistication of these categories might still need further evaluation, but the principles can be applied to any corpus so that each word can be tagged with semantic information as well as grammatical.

The second, and important, step therefore is to develop more sophisticated corpus tools/software which allow all members of selected semantic categories to be included in the query. In the case of swallow/chew/taste/stomach humble pie above, for instance, if all the synonyms of ‘eat’ could be considered and incorporated into the concordances, the probability of identifying creative variants of the idiom would significantly increase while easing the laborious process of performing repeated individual searches for each entry.

Punning and corpora

Instead of semantic grouping, we need a phonological annotation program to cater to such puns as:

Gossip girls live to diss and tell (advertisement for the E4 show Gossip Girls)

Hollywood veteran kiss and sell (headline in the Observer, 31 August 2008)

By the same principles as semantic annotation, with these puns there is a need for phonologically related items such as diss, sell to be included in the search results of the phrase kiss and tell. For such a search function to be at all possible, corpora first of all need to be annotated with phonemic information, so that homophones, rhymes, assonance, alliteration, consonance, etc., could be grouped into a coherent system. For instance, such words as day, pain, whey, rein will be grouped together under [eI] sound, and so on, using the IPA for English as the tagset. In effect, automatic phonemic annotation of words has been attempted using different grapheme-to-phoneme conversion techniques (see Bosch and Daelemans 1993; Divay and Vitale 1997; Auran et al. 2004). As a result, the task of tagging corpora with phonemic information is feasible. The task that actually requires attention is the construction of a search engine which allows all members of a category of similar phonemic quality to be included in the result list, e.g. in the case of kiss/diss and tell/sell, assonance.

Creative structures and corpora

The available lexical and syntax-based search functions in commercial corpora will not allow for such an inversion and insertion as in fairly thin ice on which to skate ones credibility (BNC) to be identified if the search string were entered as be skating on thin ice. A suggestion springs to mind, as all the key words are still present albeit in a different order than in the original, that it would be very helpful if the corpus search engine were amended to allow flexible structures built on key words (without word adjacency) to be retrievable. If this function could then be incorporated into the search engine with other functions (semantic and phonemic relatedness), the researcher would be equipped with better tools to extract creative language from corpora.

In summary, corpora have played a huge part in creativity studies and have undeniably pushed the field forward. Different aspects of creativity have been unravelled using corpus data and analyses, strengthening our understanding of the subject matter. That creative language in the form of everyday metaphors, puns, riddles or verbal duelling, and the like, is ubiquitous in everyday conversations has led many authors to argue that creativity and literariness are not exclusive to literature. Comparative analyses, however, are needed to illuminate the written–spoken creativity relationship. Adjustments and additions to current corpus annotation systems and software are proposed to help increase the probability of finding creative language in corpora while reducing the time and effort put into the task. However, the suggestions are far from exhaustive; there will still be creative uses of language that escape even the most tightly woven net thanks to the limitless capacity for creativity of the human brain. Rather, the suggestions are part of a bigger picture, offering alternative approaches to certain areas of creativity and giving some directions for further research in the fascinating and fast-growing field of corpus creativity studies.

Further reading

Carter, R. A. (2004) Language and CreativityThe Art of Common Talk. London: Routledge. (This book provides a detailed description of creativity in spoken language with plenty of examples from CANCODE and presents a proposal for a new theory of everyday spoken creativity.)

Garside, R., Leech, G. and McEnery, T. (1997) (eds) Corpus Annotation: Linguistic Information from Computer Text Corpora Corpus Annotation. London: Longman. (This book gives a picture of the area of corpus annotation at different levels (semantics, syntactics, etc.) and different techniques and software developments and applications.)

Moon, R. (1998) Fixed Expressions and Idioms in EnglishA Corpus-Based Approach. New York: Oxford University Press, Ch. 6 ‘Variation’. (The chapter offers a detailed analysis of idiomatic creativity using corpus data and proposes using ‘idiom schemas’ to explain systematic variability in fixed expressions and idioms.)

Munat, J. (2007) (ed.) Lexical Creativity, Texts and Contexts. Amsterdam and Philadelphia, PA: John Benjamins. (This volume includes a wide range of subtopics within the field of lexical creativity using corpora of different types and sizes.)

References

Auran, C., Bouzon, C. and Hirst, D. (2004) ‘The Aix-MARSEC Project: An Evolutive Database of Spoken British English’, in B. Bel and I. Marlien (eds) Proceedings of the Second International Conference on Speech Prosody. Nara, Japan, pp. 561–64.

Austin, J. L. (1962) How to Do Things with Words. Oxford: Oxford University Press.

Biber, D., Conrad, S. and Reppen, R. (1998) Corpus Linguistics: Investigating Language Structure and Use (Cambridge Approaches to Linguistics). Cambridge: Cambridge University Press.

Bosch, A. V. D. and Daelemans, W. (1993) ‘Data-oriented Methods for Grapheme-to-Phoneme Conversion’, Proceedings of the 6th Conference of the EACL, pp. 45–53.

Carter, R. A. (2004) Language and CreativityThe Art of Common Talk. London: Routledge.

Carter, R. A. and McCarthy, M. J. (2004) ‘Creating, Interacting: Creative Language, Dialogue and Social Context’, Applied Linguistics 25: 162–88.

Chiaro, D. (1992) The Language of Jokes: Analysing Verbal Play. London: Routledge.

Chomsky, N. (1965) Aspects of the Theory of Syntax. Cambridge, MA: MIT Press.

Cignoni, L. and Coffey, S. (1998) ‘A Corpus-based Study of Italian Idiomatic Phrases: From Citation Forms to “Real-life” Occurrences’, Euralex98 Proceedings, pp. 291–300.

Cook, G. (1994) Discourse and Literature: The Interplay of Form and Mind. Oxford: Oxford University Press.

——(2000) Language Play, Language Learning. Oxford: Oxford University Press.

Divay, M. and Vitale, A. J. (1997) ‘Algorithms for Grapheme-phoneme Translation for English and French: Applications for Database Searches and Speech Synthesis’, Computational Linguistics 23(4): 495–523.

Fellbaum, C. (ed.) (1998) WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.

Fernando, C. (1996) Idioms and Idiomaticity. Oxford: Oxford University Press.

Francis, G. (1993) ‘A Corpus-driven Approach to Grammar – Principles, Methods and Examples,’ in M. Baker, G. Francis and E. Tognini Bonelli (eds) Text and Technology: In Honour of John Sinclair. Amsterdam: John Benjamins, pp. 137–54.

Grice, H. P. (1975) ‘Logic and Conversation,’ in P. Cole and J. Morgan (eds) Syntax and Semantics, Volume 3. New York: Academic Press, pp. 41–58.

Hoey, M. (2005) Lexical Priming: A New Theory of Words and Language. London: Routledge.

——(2007) ‘Lexical Priming and Literary Creativity,’ in M. Hoey, M. Mahlberg, M. Stubbs and W. Teubert, Text, Discourse and CorporaTheory and Analysis. London: Continuum, pp. 7–29.

Hohenhaus, P. (2007) ‘How to Do (Even More) Things with Nonce Words (Other Than Naming),’ in J. Munat (ed.) Lexical Creativity, Texts and Contexts. Amsterdam and Philadelphia, PA: John Benjamins, pp. 15–38

Hori, M. (2004) Investigating DickensStyleA Collocational Analysis. London: Palgrave Macmillan.

Jakobson, R. (1960) ‘Closing Statement: Linguistics and Poetics’, in T. Sebeok (ed.) Style in Language. Cambridge, MA: MIT Press, pp. 350–77.

Jauss, H.-R. (1970) ‘Literary History as a Challenge to Literary Theory’, trans. Elizabeth Benzinger, New Literary History 2: 7–37.

Kuiper, K. (2007) ‘Cathy Wilcox Meets the Phrasal Lexicon: Creative Deformation of Phrasal Lexical Items for Humorous Effects,’ in J. Munat (ed.) Lexical Creativity, Texts and Contexts. Amsterdam and Philadelphia, PA: John Benjamins, pp. 93–114.

Lamb, S. M. (1998) Pathways of the Brain: The Neurocognitive Basis of Language (Current Issues in Linguistic Theory 170). Amsterdam: John Benjamins.

Langacker, G. (1987) Foundation of Cognitive Grammar, Vol 1. Stanford, CA: Stanford University Press.

Langlotz, A. (2006) Idiomatic Creativity: A Cognitive-linguistic Model of Idiom-representation and Idiom-variation in English. Philadelphia, PA: John Benjamins.

Louw, B. (1993) ‘Irony in the Text or Insincerity in the Writer? The Diagnostic Potential of Semantic Prosodies’, in M. Baker, G. Francis and E. Tognini Bonelli (eds) Text and Technology. In Honour of John Sinclair. Philadelphia, PA and Amsterdam: John Benjamins, pp. 152–76.

McCarthy, M. (1998) Spoken Language and Applied Linguistics. Cambridge: Cambridge University Press.

Moon, R. (1996) ‘Data, Description, and Idioms in Corpus Lexicography’, Euralex96 Proceedings, pp. 245–56.

——(1998) Fixed Expressions and Idioms in EnglishA Corpus-Based Approach. New York: Oxford University Press.

Munat, J. (2007) ‘Lexical Creativity as a Marker of Style in Science Fiction and Children Literature’, in J. Munat (ed.) Lexical Creativity, Texts and Contexts. Amsterdam and Philadelphia, PA: John Benjamins, pp. 163–87.

Ohmann, R. (1971) ‘Speech Acts and the Definition of Literature’, Philosophy and Rhetoric 4: 1–19.

O’Keeffe, A., McCarthy, M. J. and Carter, R. A. (2007) From Corpus to Classroom: Language Use and Language Teaching. Cambridge: Cambridge University Press.

Partington, A. (1998) Patterns and Meanings. Philadelphia, PA: John Benjamins.

Philip, G. (2000) ‘An Idiomatic Theme and Variations’,inC.Heffer and H. Sauntson (eds) Words in Context: A Tribute to John Sinclair on His Retirement (ELR Monograph 18). Birmingham: University of Birmingham, pp. 221–33.

Pope, R. (2005) Creativity: Theory, History, Practice. London: Routledge.

Pratt, M. L. (1977) Toward a Speech Act Theory of Literary Discourse. Bloomington, IN: Indiana University Press.

Prodromou, L. (2007) ‘Bumping into Creative Idiomaticity’, English Today 89(3): 14–25.

Rayson, P., Archer, D., Piao, S. L. and McEnery, T. (2004) ‘The UCREL Semantic Analysis System’, Proceedings of the Workshop on Beyond Named Entity Recognition Semantic Labelling for NLP Tasks in Association with 4th International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal, pp. 7–12.

Renouf, A. (2007) ‘Tracing Lexical Productivity and Creativity in the British Media: “The Chavs and the Chav-Nots”’, in J. Munat (ed.) Lexical Creativity, Texts and Contexts. Amsterdam and Philadelphia, PA: John Benjamins, pp. 61–90.

Rice, P. and Waugh, P. (eds) (1989) Modern Literary Theory: A Reader. New York: Routledge.

Rua, L. (2007) ‘Keeping up with the Times: Lexical Creativity in Electronic Communication’,inJ. Munat (ed.) Lexical Creativity, Texts and Contexts. Amsterdam and Philadelphia, PA: John Benjamins, pp. 137–62.

Sawyer, R. K. (2006) Explaining Creativity: The Science of Human Innovation. New York: Oxford University Press.

Scott, M. (2004) WordSmith Tools Version 4. Oxford: Oxford University Press.

Searle, J. R. (1969) Speech Acts. An Essay in the Philosophy of Language. Cambridge: Cambridge University Press.

Shklovsky, V. (1989 [1917]) ‘Art as Technique’, reprinted in P. Rice and P. Waugh (eds) Modern Literary Theory: A Reader. New York: Routledge.

Sinclair, J. M. (1991) Corpus, Concordance, Collocation. Oxford: Oxford University Press.

——(2004) ‘Developing Linguistic Corpora: A Guide to Good Practice Corpus and Text – Basic Principles’, Tuscan Word Centre, available at http://ahds.ac.uk/creating/guides/linguistic-corpora/chapter1.htm (accessed 17 July 2007).

Stewart, D. (2005) ‘Hidden Culture: Using the British National Corpus with Language Learners to Investigate Collocational Behaviour, Wordplay and Culture-Specific References’, in G. Barnbrook, P. Danielsson and M. Mahlberg (eds) Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corpora. London: Continuum, pp. 83–95.

Stubbs, M. (1995) ‘Corpus Evidence for Norms of Lexical Collocation’, in G. Cook and B. Seidlhofer (eds) Principle and Practice in Applied Linguistics: Studies in Honour of H G Widdowson. Oxford: Oxford University Press, pp. 243–56.

——(2005) ‘Conrad in the Computer: Examples of Quantitative Stylistic Methods’, Language and Literature 14(1): 5–24.

Tannen, D. (1989) Talking Voices: Repetition, Dialogue, and Imagery in Conversational Discourse. New York: Cambridge University Press.

Wilson, A. and Thomas, J. (1997) ‘Semantic Annotation’, in R. Garside, G. Leech and T. McEnery (eds) Corpus Annotation: Linguistic Information from Computer Text Corpora. London: Longman, pp. 53–65.