Angela Chambers
As the developments described in the previous chapters have shown, the analysis of large collections of naturally occurring discourse, both written and spoken, came to play a central role in linguistic research in the twentieth century. The application of this research in language learning and teaching, however, was a slow process, which has been referred to by metaphors such as percolation (McEnery and Wilson 1997: 5) or trickle down (Leech 1997: 2). The fact that developments did take place is easy to explain when one considers the context of research and practice both in language learning and teaching and in applied linguistics. First, the communicative approach emphasised the use of authentic texts, although for several decades accompanying exercises tended to be based on invented examples, and such examples can still be found. Second, research by McCarthy (1998: 18) and others revealed that the language of course books continued to differ significantly from actual language use, particularly in relation to the spoken language. The production of corpus-based dictionaries and grammars as well as course books now gives learners access to actual language use, but it is nonetheless important to note that a data-driven learning (DDL) approach implies a level of active participation in the learning process which such resources cannot provide in the same way as learner (and teacher) interaction with the corpus itself.
DDL takes the developments listed above a step further, in that it not only uses corpus data in the preparation of language-learning materials, but gives learners access to more substantial amounts of corpus data than can be found in a dictionary, grammar or course book, either indirectly by allowing them to learn about language use by studying concordances prepared in advance by the teacher, or directly by allowing them access to corpora and concordancing software to carry out their own searches. The increasing availability of computers from the 1980s onwards and also of concordancers, either commercially or freely via the web, has made it possible for teachers to have access to – and even create – corpora, and for learners to study the patterns of language use in a corpus, mostly through observing concordances, and work out for themselves how a word or a phrase is used. This process of inductive learning, in which the learner plays an active part in the learning process, is the essence of DDL. It also corresponds closely to current thinking in educational research in general, and in language-learning pedagogy in particular, providing a way ‘for students to take more active, reflective and autonomous roles in their learning’ (Hyland 2002: 120). As researchers/teachers with an interest in the use of corpus data with language learners experimented with different ways of using the data with their learners, and with using different types of corpus (for example large reference corpora or small genre-specific corpora, monolingual or parallel corpora), a substantial body of research publications appeared reporting on the success of this new approach and the obstacles which the learners noted in their explorations of the corpus data.
This chapter will first provide a brief overview of the history of DDL from the late 1960s to the present, including information on the types of corpora used with learners, the successes and the problems reported by researchers on its use in areas such as the learning of vocabulary, grammar and Languages for Specific Purposes. In addition to an account of the research carried out by those interested in using corpus data with learners, the views of those less convinced of its merits will be noted. This will be followed by an investigation of the type of data which can be obtained from written and spoken corpora in a DDL approach. Only easily available corpora will be included. The relationship between DDL and second language acquisition theory will then be explored, asking questions such as: Are concordances comprehensible input? Does DDL facilitate the application of the idiom principle (Sinclair 1991: 110) in language learning? How important is frequency? Finally the changes in language pedagogy which a DDL approach implies will be discussed, including the changing roles of teacher and learner, native speaker intuition and the non-native speaker teacher’s role, and variation and nonstandard language. In a brief look to the future the potential for a wider adoption of a data-driven learning approach will be assessed.
The first publications disseminating information internationally on developments in the use of corpus data with language learners appeared in the 1980s, notably those by Tim Johns (1986, 1988, 1991), who coined the term DDL. Johns initially used the concordancing software MicroConcord as a tool for learners to use, although he also recognised its usefulness for the teacher and linguistic researcher (1986: 158). He introduced concordancing in his work teaching English for Specific Purposes to non-native speakers of English, and also situated it in the context of similar developments by a number of researchers, citing experimentation by his then colleague in Birmingham, Antoinette Renouf, and Ahmad, Corbett and Rogers (1985) in Surrey (Johns 1986: 159). McEnery and Wilson (1997: 12) situate the first attested use of concordances in language teaching as early as 1969, by Peter Roe in Aston University, Birmingham. It was not until the late 1980s and the early 1990s, however, that it was brought to public attention in the discourse community of researchers in applied linguistics by the work of Johns and also of Tribble and Jones (1990). Johns used the simile of the language learner as researcher (1986: 160) and the Sherlock Holmes metaphor (1997: 101) to highlight the more active role of the learner in this approach, and described the computer and the concordancer as a research tool for both learner and teacher (1986: 151). His learners considered working with a concordance printout to be a much more effective way of studying the use of common prepositions, finding that an exercise such as underlining the headword colligating with the preposition on (‘depending on’, ‘on demand’) was more helpful than a gap-filling exercise involving filling in the prepositions (1986: 160). The concordancer also served as a research tool for Johns himself in his role as teacher (1986: 159).
It is important that teachers themselves should have experience in using concordance output if they expect their students to make use of it. In my own case, examining output has often proved chastening: for example a concordance of ‘if’ showed how often in scientific and technical texts it is followed by the bare adjective or past participle e.g. ‘if available’, ‘if known’–a usage I found I had neglected in my materials on conditional constructions in English.
As we shall see, this early experimentation by Johns and others was to develop into a substantial research area from the 1990s onwards, although the researchers involved still recognise that the adoption of this new approach to language learning tends to be restricted to a limited number of researchers and enthusiasts (see, for example, Conrad 2000: 556).
Johns and others, notably Joseph Rézeau, developed websites which provided a bibliography of articles on this emerging area and examples of exercises. From 1990 onwards more sophisticated concordancers, such as WordSmith Tools (Scott 2004) and MonoConc (Barlow 2000) became available commercially at a relatively modest cost, and more recently free web-based concordancers, such as AntConc, have also become available. In addition to these concordancers, which allow users to load and consult the corpus of their choice, huge corpora may be consulted online. The Cobuild Concordance and Collocations Sampler, for example, allows the user to insert a search word and have instant access to forty examples of its use. Similarly, the British National Corpus provides fifty occurrences. Thus a teacher or learner wanting examples of the phrasal verb end up could quickly have access to the following examples from Cobuild.
If you drink with other people who regularly buy rounds for each other, it’s easy to end up drinking more than you want.
if you forget to spray it with simazine every March you end up with a lot of extra weeding.
It’s true you do get stared at in clubs, but you know, I am fat, I do live in the real world, and I don’t want to end up some kind of fat separatist.
was a very tough little man, a very hard little man who knew what he wanted, where he was going and where he was going to end up.
As a result, the child may end up in a distress-provoking, or even physically dangerous, situation.
Many politicians end up simply hating the press.
We’re gonna end up living in a broom cupboard.
the kids end up you know homeless and uneducated at sixteen.
Tony Galluci visited Italy for the first time and almost ended up in the army.
Those who have tried to be honest have ended up at the bottom of the ladder.
This has the benefit of providing teachers with examples of actual language use to support their teaching, although the easy access to the full text which a concordancer such as WordSmith Tools (Scott 2004) or MonoConc (Barlow 2000) provides is not possible here. The Business Letter Corpus (Someya 1999) also has the advantage of giving a teacher or learner access to concordance data with no training whatsoever, although here again, access to the full text is not available for copyright reasons. While the above examples involve only English, multilingual resources are also available which allow corpus consultation without the training necessary to use the commercially and freely available concordancers. The Lextutor website provides easy access to a variety of corpora in French and English, while the SACODEYL project allows the user to consult interviews with teenagers in a variety of European languages (see Thompson, this volume).
While these easily available and, more important, easy to use resources may well have the greatest potential for popularising the use of corpus data by language learners and teachers, the substantial number of publications reporting on the use of corpora by learners tend to concentrate on the use of commercial concordancers by either the teacher or the learners themselves, using either large corpora such as the British National Corpus (Bernardini 2000) or small corpora created specifically to meet the learners’ needs. These corpora were used to investigate a variety of aspects of language learning, particularly the acquisition of vocabulary, grammar and specialised language use. Stevens (1991), for example, created a small corpus based on the physics textbook of his students to investigate whether concordance-based exercises were better than gap-fill exercises for learning vocabulary, and concluded that they were. The small corpus of 180,000 words created by O’Sullivan (O’Sullivan and Chambers 2006: 53) included texts on the history of the French language recommended to the students. Their results revealed the strengths of the small specialised corpus, in that learners could often find multiple relevant examples. Small corpora also have their limits, however, in that they may contain no or very few occurrences of certain items. While Cobb (1997: 304) used a corpus of just 10,000 words assembled from his students’ reading materials to investigate the use of corpus data for vocabulary learning, Gaskell and Cobb (2004: 316), who investigated the use of concordances to correct writing errors, concluded that a corpus of more than one million words would be more likely to produce a substantial number of occurrences of the words and expressions which the learners wished to study. In addition, these small corpora created specifically to meet learners’ needs, what Willis (1998: 46) calls pedagogic corpora, are often not publicly available.
The results of these and other empirical studies were largely positive. The quantitative studies (Stevens 1991; Cobb 1997; Gaskell and Cobb 2004; Yoon and Hirvela 2004) strongly suggest that learners benefit from corpus consultation in learning vocabulary and grammar, and in improving writing skills. A much larger number of qualitative studies support this view, with the additional benefit of giving the learners themselves a voice in the debate. The learners appreciated having access to a large number of genuine examples of the aspect of language use which they were studying (Cheng et al. 2003: 181; Yoon and Hirvela 2004: 275; Chambers 2005: 117). They also enjoyed the exploratory nature of the activity, what Johns had in mind in the well-known phrase ‘Every learner a Sherlock Holmes’ (Johns 1997: 101). As one learner commented on the activity of direct corpus consultation: ‘I discovered that achieving results from my concordance was a highly motivating and enriching experience. I’ve never encountered such an experience from a textbook’ (Chambers 2005: 120).
Negative reactions to corpus consultation by learners come from two sources. First, in the empirical studies even the learners whose reactions were generally very positive found the activity of analysing the corpus data time-consuming, laborious and tedious (Cheng et al. 2003: 182–3; Yoon and Hirvela 2004: 274; Chambers 2005: 120). Possible solutions to this include integrating the consultation of corpus data with the study of individual texts, giving learners access to simple resources such as the Cobuild Concordance and Collocations Sampler or the SACODEYL corpora, or using concordances prepared in advance by the teacher. One could envisage a gradual process over several years, where learners are first introduced to concordance printouts prepared by the teacher so that they become familiar with learning inductively from concordances, moving on to consultation of easy-to-use online resources, and then finally to independent corpus consultation analysing data from the corpus of their choice.
The second source of negative reactions involves the issue of context. For Widdowson (2000: 7), corpora contain ‘decontextualised language’. Charles (2007: 298) comments, however, that it is arguable that a corpus contains more context than much classroom material which consists of extracts, as the learner consulting a corpus of complete texts has access to the full text corresponding to each concordance line. In other publications reporting on the use of corpora with learners, the authors also underline that teachers preparing materials using concordances should ensure that the context is meaningful, that it contains ‘somewhere in it some clue, however small, to assist students in placing the target word in that context’ (Stevens 1991: 51). In the following occurrences of ‘I am/ I’m afraid’ in the Business Letter Corpus, for example, it is clear that the expression is used to apologise or to turn down a request, and also to fulfil other roles, such as threatening (legal action).
Enclosed are some photos, which I am afraid did not come out very well.
much as I would like to help, I’m afraid I could not consider it.
I am afraid I will be unable to attend your party.
I’m afraid I’m not the appropriate person for you
I am afraid my correspondence has fallen behind.
I am afraid that we cannot help you, unfortunately
I am afraid we cannot agree to your request.
Under the circumstances, I’m afraid we can’t bear the cost of any repairs.
party is on the extravagant side, I’m afraid we have to limit tickets to staff only.
I’m afraid we will be forced to take legal action
Unfortunately, however, I’m afraid we’ll have to postpone it.
Like Stevens, Willis (1998: 46) recommends creating what she terms a pedagogic corpus to meet the specific needs of students (see also Braun 2005). These small genre-specific corpora help to minimise the problems learners could encounter when faced with large numbers of occurrences from a variety of genres. At the same time other researchers (Bernardini 2000; Cheng et al. 2003) prefer to give their – usually advanced – students access to the large corpora originally created for the purpose of linguistic research. At present all these trends coexist, with large and small corpora increasingly available to learners and teachers, and an increasing number of researchers experimenting with ways to make DDL a reality for the majority of language learners.
In this section we shall see how small, easily and freely available written and spoken corpora can provide data which can be used by the teacher or learner in a DDL approach. As written corpora can be created more easily than their spoken counterparts, it is not surprising that they are more readily available. While the concordances quoted earlier in this chapter can be retrieved from the corpus websites with no training, the lack of access to the full context is clearly a limitation. Using a concordancer such as WordSmith Tools (Scott 2004) or MonoConc (Barlow 2000), a teacher or learner can consult a number of freely available corpora. The examples below are taken from the Chambers– Le Baron Corpus of Research Articles in French (Chambers and Le Baron 2007), a pedagogic corpus of approximately one million words created to be of use to learners when writing essays in French, and available via the Oxford Text Archive. The corpus contains 159 articles taken from twenty journals. The articles, published between 1998 and 2006, belong to one of ten categories: media/culture, literature, linguistics and language learning, social anthropology, law, economics, sociology and social sciences, philosophy, history and communication.
The use of the first person plural in research articles in French has given rise to a substantial amount of phraseology related to the metalanguage of the article, the language used by the author to signal to the reader how the article is organised. The equivalent phrases for ‘as we have seen’ and ‘as we shall see’, for example, include a redundant definite article and may refer to the article in spatial rather than temporal terms: ‘plus haut’ and ‘plus loin’ rather than ‘earlier’ and ‘later’. I had corrected learner errors by writing in the phrases ‘comme nous l’avons vu plus haut’, and ‘comme nous le verrons plus loin’. I discovered from studying the seventy-four occurrences of ‘comme nous’ in the corpus, however, that there is a much wider variety of phrases than this, with only fifteen occurrences of the verb ‘to see’, and verbs such as ‘underline’, ‘indicate’, ‘mention’, ‘observe’, ‘show’, ‘note’ and ‘remind’ commonly occurring.
Comme nous l’avons souligné précédemment
comme nous l’avons déjà souligné auparavant.
comme nous venons de le décrire.
Comme nous l’avons indiqué plus haut,
Comme nous l’avons déjà indiqué,
Comme nous l’avons mentionné plus haut
Premièrement, comme nous l’avons constaté dans le cadre de nos recherches sur l’IRC,
comme nous avons pu le constater à plusieurs reprises lors de l’analyse
D’autre part, comme nous l’observerons en seconde partie de cet article,
Comme nous allons tenter de le montrer dans la suite de ce texte,
comme nous le montrerons dans les pages qui suivent.
Comme nous l’avons expliqué dans le paragraphe précédent,
comme nous l’avons exposé dans notre préambule méthodologique
comme nous l’avons rappelé,
Comme nous en avons fait état plus haut,
comme nous allons maintenant le noter,
Alongside ‘voir’ one also finds verbs such as ‘souligner’ (five occurrences), ‘constater’ (three occurrences), ‘tenté de le montrer’ (three occurrences), and many others. A variety of adverbial expressions is used as well as ‘plus haut’ and ‘plus loin’. In addition, inversion of the subject and verb is sometimes used in these phrases.
Comme nous l’a suggéré un rapporteur anonyme.
Comme nous l’apprend son dossier de faillite,
Comme nous le dit cet interviewé, ‘la communauté ne nous appartient pas’.
comme nous l’a précédemment montré la scène de leur premier face à face
comme nous le laisse entendre la section Religion de sa Phénoménologie
Like Tim Johns in the references to the use of ‘if’ above, I found the consultation of the corpus a useful way of getting immediate information on the practices of a substantial number of native speakers, prompting me to include in my teaching features which I had hitherto neglected.
In addition to formulaic phrases such as these, using the first person plural pronoun ‘nous’ as a search word can provide information on other aspects of the metalanguage of native speaker authors of research articles. Verbs used to describe the plan of an essay, for example, are easy to identify by searches for different expressions for first, second, then, etc., such as ‘premièrement’, deuxièmement’, ‘ensuite’, ‘dans un premier/deuxième temps’, etc. Many other common expressions can also be discovered by examining the occurrences of the first person plural. The verb ‘permettre’, to permit, for example, is commonly used in academic writing in French, with twenty-three occurrences of ‘permett*’. Examples are listed below.
Les statistiques dont nous disposons ne nous permettent pas de distinguer explicitement
Ce questionnement nous permettra de mettre à jour le degré de compatibilité
L’analyse de ces traces nous permettra de répondre à notre interrogation initiale, à savoir :
Cela nous permettra de mesurer combien la culture politique corse est étrangère à
L’examen des caractéristiques de la domiciliation nous permettra de réfléchir, en creux, aux limites de son usage
Thus, while a detailed study of the first person plural in this corpus would be well beyond the scope of this chapter, it is clear that even a small corpus of one million words can provide a rich source of attested examples in support of learner writing. Data such as this can be exploited in a number of ways in a data-driven learning approach, ranging from concordances prepared by the teacher from which learners inductively learn how native speakers use the language in specific contexts, to direct consultation by learners to find solutions to problems which they encounter.
Although freely available spoken corpora are less common than their written counterparts, they are slowly becoming more accessible. A number of recent projects have created spoken pedagogic corpora which learners and teachers can easily access via the web. SACODEYL, for example, focuses on teen talk, including video-recorded interviews and transcripts of approximately ten minutes with twenty to twenty-five teenagers in each of the following languages: English, French, German, Italian, Lithuanian, Romanian and Spanish. The interview transcripts are available as online corpora (hosted by the University of Murcia). According to the SACODEYL website at the time of writing, they are ‘pedagogically annotated and enriched for language learning and teaching purposes’ (SACODEYL website, 2009). The interviews cover the following topics: personal information, home and family, present and past living routines, hobbies and interests, holidays, school and education, job experiences, plans for the future, open discussion topics. The SACODEYL site illustrates how the problem of decontextualised concordances can relatively easily be overcome. The concordance can simply serve to produce supplementary examples and to check if an aspect of language use which is evident in one speaker is used by the others as well. In the French sub-corpus component of SACODEYL, for example, one of the speakers, Margaux, uses ‘sinon’ repeatedly, not necessarily in the sense of ‘otherwise’, but simply to link the items in a list. A concordance of the sixty occurrences of ‘sinon’ reveals that it is commonly used in two ways. In the concordance lines below, ‘sinon’, sometimes preceded by ‘or’, indicates that what precede and follow the conjunction are alternatives.
ce serait des études de droit comme mon père. Ou sinon, devenir ingénieur dans le domaine scientifique,
quand je décide quelque chose, j’aime bien y parvenir sinon,c’est la catastrophe.
cinéma ou on va au bowling aussi. Ça dépend. Ou sinon, on fait des fêtes chez l’un ou chez l’autre
In the first example the pupil intends either to study law or to become an engineer, mutually exclusive alternatives. Similarly in the second concordance line the pupil likes to succeed or ‘c’est la catastrophe’. In the third example a pastime can either be cinema, bowling or a party at someone’s house. In the examples below, ‘and’ is sometimes used rather than ‘or’ before ‘sinon’ to list various pastimes.
deux heures et demie d’anglais par semaine et sinon, je fais aussi une option facultative LV3 italien
j’aime beaucoup de films différents, et sinon,j’aime aussi j’aime aussi dessiner Mais aussi, je fais du badminton au lycée le lundi sinon,j‘aime beaucoup lire Au lycée, je fais du badminton. Et sinon, je pratique la guitare depuis dix ans.
Interestingly, the third of the first set of examples (cinema, bowling or party) is similar in content to the three examples in the second set. The concordance reveals that these uses of ‘sinon’ are common among the teenagers, thus making it possible for learners at secondary level who do not have regular interaction with native speakers to encounter multiple examples of the everyday language use of their counterparts in the target language. With its combination of videos, transcripts and concordances, SACODEYL provides a rich learning environment for teenagers, integrating the concordances with more traditional language-learning resources such as text and video.
As empirical studies have revealed generally positive reactions by learners to the consultation of concordances, the question arises as to how this new approach relates to theories of Second Language Acquisition. It could be said to constitute a form of comprehensible input (Krashen 1988), particularly when the content of the corpus is carefully chosen to be familiar to the learners (Allan 2009). It does, however, differ from Krashen’s scenario in one important way. The simplified language or caretaker talk which Krashen describes as helpful to the learner (1988: 136) is absent here. Although the content may be familiar, the language in a native speaker monolingual corpus consists of attested examples of actual language use. The multiple contexts do, however, enable the learner to observe patterns. Thus, in the concordance lines with ‘end up’ listed above, the learner can observe that this phrasal verb is followed by the -ing form of the verb (‘end up hating’, ‘end up living’), by a preposition (‘end up in the army’), by an adjective (‘end up homeless and uneducated’), or by a noun (‘end up some kind of fat separatist’). It cannot be guaranteed, of course, that a learner looking at that concordance will learn all these uses. As no one advocates data-driven learning as the main component in an approach to language learning, but rather as an enhancement of text-based work, the learner may be using the concordance to check if one particular use is correct, and the concordance could confirm that and reinforce the learning process. By providing a rich variety of examples, it might also help the learner to notice – in the sense in which Schmidt (1990) uses the term – the other uses.
The emphasis on grammatical structures in the above example, while justified, does not fully convey the essential characteristics of data-driven learning. While traditional language-learning resources and methods, and even research, tend to separate the learning of grammar and lexis, in DDL they are seen as fully integrated. Distinguishing between the open-choice principle, according to which the language user has freedom to slot different parts of speech together to form utterances, and the ‘idiom principle’, Sinclair (1991: 110) defines the latter as follows: ‘The principle of idiom is that a user has available to him or her a large number of semi-preconstructed phrases that constitute single choices, even though they might appear to be analysable into segments.’
For proponents of this approach to language, concordances facilitate the application of the idiom principle in language learning, giving teachers and learners easy access to large numbers of attested examples of use so that the patterns can become clear and the learners can creatively integrate them in their own language use (see Hunston, this volume). We have seen how these patterns can be illustrated not only in the highly formulaic context of academic writing, but also in casual teenage conversation. Giving learners access to multiple examples of common patterns could help to overcome what Debrock et al. (1999: 46) call ‘le manque de naturel [the lack of naturalness]’ in learner language. If generally applied in a way that was easily accessible to the learner and teacher, access to multiple examples from appropriate corpus data could thus have a profound effect on language learning and teaching.
The concept of semantic prosody (Louw 1993: 157) takes this patterning a step further, showing how certain forms can be imbued with ‘a consistent aura of meaning’ by their collocates. Xiao and McEnery (2006: 106) provide a useful table of semantic prosodies which have been observed by various researchers, showing how ‘happen’, ‘cause’, ‘end up’, ‘a recipe for’, and ‘signs of’ have negative connotations. They also (p. 106) point out that
Semantic prosodies are typically negative, with relatively few of them bearing an affectively positive meaning. However, a speaker/writer can also violate a semantic prosody condition to achieve some effect in the hearer – for example irony, insincerity, or humour can be explained by identifying violations of semantic prosody (Louw 1993: 173).
(Xiao and McEnery 2006: 106)
In French, for example, the verb se jouer normally has connotations of major events with genuinely or potentially disastrous connotations. In the following concordance lines from a million-word corpus of journalistic writing in French (Chambers and Rostand 2005), the potentially negative connotations are clear. (The corpus contains fifty-four occurrences of this pronomial verb, of which sixteen are literal: eleven referring to sporting events and five to theatrical performances. The remaining thirty-eight are used in the figurative sense as discussed here.)
les chaises renversées, les traces de sang sur les murs disent le drame qui s’est joué ici, où trois kamikaze ont opéré. ‘J’ai entendu deux explosions
Dans le box des accusés, celui-ci est tendu. Il sait que son sort se joue peut-être dans cette audience.
est devenue politique, au point que l’on peut supposer que tout, en réalité, va se jouer, maintenant, entre l’Elysée et la Maison Blanche.
avec Clemenceau et de Gaulle, parmi ces grands irréguliers dont la destinée s’est jouée sur un moment crucial où leur singularité l’a emporté
‘Notre avenir se jouera demain matin, au tribunal de commerce de Béthunes.
This strong semantic prosody is, however, violated in the reference to ‘L’acteur principal du petit drame qui s’est joué à l’arrivée, au télésiège’, which describes the journalist’s critical reaction to the minor drama of the cancellation of a skiing competition. Consulting a concordance is thus a useful way for a learner at advanced level to discover not only instances of semantic prosody but also the contexts in which it can be violated, thus providing examples of language creativity (see chapter on creativity by Vo and Carter, this volume).
The examples of semantic prosody illustrate another aspect of second language acquisition theory which is of particular relevance to the use of corpora, namely the importance of frequency. The term is used in different ways by corpus linguists and by those researching the language-learning process. For the corpus linguist, frequency refers to the fact that by analysing a corpus one can discover what words, expressions and collocations occur very frequently, information which is important both for researchers in linguistics and also for language-learning professionals so that they can ensure that they are emphasising the most frequently occurring aspects of language use in their classes, course books or grammars, or at the very least not omitting them. In the language-learning context, frequency refers rather to the number of times a learner has to encounter an aspect of language use to be aware of it – what Schmidt (1990) terms ‘noticing’–and to be able to use it. Data-driven learning brings these two uses of the term together, by allowing the learners to have access to corpora so that they can discover frequent patterns and, perhaps more importantly, observe a large variety of examples of their use. These examples could include evidence of semantic prosody as well as a small number of violations of that prosody. Thus data-driven learning, while encouraging learners to take an active part in their learning, is also contributing to a revival of interest in frequency, not the mindless repetition of behaviourism, but rather what Ellis (2002: 177) calls ‘mindful repetition in an engaging communicative context by motivated learners’.
We have already seen how the role of the learner is more active in DDL, analysing a concordance prepared by the teacher, or consulting a corpus directly to find an answer to a specific problem or to study the language use of a particular group of native speakers, such as teenagers or academic writers. The role of the teacher also changes fundamentally, as s/he is no longer the sole source of knowledge about the target language, but rather a facilitator of the learning process, helping the learners to interpret the data, and giving them advice on how best to search the corpus and analyse their search results. Kennedy and Miceli (2001: 82) propose a four-stage search strategy for learners:
1. Formulate the question.
2. Devise a search strategy.
3. Observe the examples and select relevant ones.
4. Draw conclusions.
As they analyse the results of their students’ exploration of a corpus, they develop a series of tips to help the learners to get the most benefit from the data (for more on strategies for preparing learners to use corpora, see the chapter by Sripicharn, this volume). This new learning environment thus demands new skills from teachers, including knowledge about what corpora are available, corpus consultation and analysis skills, and the capacity to decide how best to present the data to the learners, either as pre-prepared concordances or as a resource to be directly explored by the learners. In addition, teachers and learners now engage in a new way of ‘reading’ a text – not just left to right, but from the centre outwards, and vertically up and down, something which the majority of them will not have done before. In an environment where web literacy is the norm, however, the innovative aspect of this practice may be less of a challenge, particularly for the learners.
In addition to the challenge of acquiring these new skills, consulting corpus data changes the relationship between the teacher and the target language. When faced with examples of variation in a corpus, a teacher is encouraged – or even obliged – to adopt a descriptive rather than a prescriptive approach to target language use. After studying the 236 occurrences of majorité in the Chambers–Rostand Corpus of Journalistic French in relation to singular and plural verb forms, one is forced to rethink one’s practice of correcting students’ use of the plural. Examples from the corpus are given below.
D’ailleurs Jacques Chirac lui-même l’a dit : ‘ Je sais que la grande majorité des
Corses veulent rester français. [plural] Par contre, une écrasante majorité (93%) craint une nouvelle catastrophe sur
nos côtes. [singular]
‘Comme je vous l’ai dit, l’écrasante majorité des étudiants est favorable au système LMD (licence-mastère-doctorat). [singular]
Pas plus d’un Français sur cinq répond ‘oui’. En revanche, une très forte majorité (63%) pensent qu’il vaudrait mieux, pour atteindre cet objectif [plural]
It is nonetheless tempting at times to edit a pre-prepared concordance, not with a desire to be prescriptive but to avoid confusing the learners, particularly when looking for examples of an aspect of language use which is problematic for them. A more difficult challenge for teachers is presented by the presence of taboo language in corpora of spontaneous conversation, such as the Corpus of London Teenagers or the Limerick Corpus of Irish English (Farr et al. 2004). The teacher wishing to include corpus data thus faces a number of challenges, particularly as the integration of corpora in language learning is not yet included in the majority of language teacher education courses (McCarthy 2008).
A number of researchers in corpora and language learning (Conrad 2000: 556; Farr 2008: 40) recommend such a development as the way to overcome the barrier which currently exists to the addition of corpora to the resources commonly available to learners, the grammar, dictionary and course book. In addition, Chambers and Wynne (2008) suggest a framework for the development of web-based resources for higher education, inspired by Fligelstone’s scenario, in which a teacher can advise a learner to ‘go to any of the labs, hit the icon which says “corpus” and follow the instructions on the screen’ (Fligelstone 1993: 101). This scenario, however, is still a long way from being realised, although the slow development described by Leech (1997:2) as ‘trickle down’, and by McEnery and Wilson (1997: 5) as ‘percolation’ is still continuing. In addition to the inclusion of corpus consultation skills in language teacher education courses, the increasing number of corpus resources which are easily available and require no training other than basic web skills may well contribute to the realisation of Fligelstone’s vision. A final question arises as to whether the web itself, using a conventional search engine or WebCorp, can replace the corpus, using electronic literacy skills already mastered by the vast majority of learners (see Lee, this volume). Rundell (2000) notes on the positive side that no corpus will ever be as up-to-date as the web, but also warns that the web is ‘not a corpus at all according to any standard definitions’, but rather ‘a huge ragbag of digital text whose content and balance are largely unknown’. In the current situation, where it does not seem likely that the majority of language teachers will have easy access to the corpora they need and the skills to make use of them in the immediate or near future, it is difficult to predict the future role of the web as a corpus or rather, taking account of Rundell’s comment, as a source of data for language learning and teaching. If the availability of corpora becomes well known to teachers and learners, this may no longer be an issue, as corpus data will be easily available to them, providing them with attested examples of language use in relevant registers.
Sinclair, J. McH. (2004) (ed.) How to Use Corpora in Language Teaching. Amsterdam and Philadelphia, PA: John Benjamins. (This book contains useful chapters on a variety of ways in which corpora can be used, with a good combination of pedagogy and practice.)
Wichmann, A., Fligelstone, S., McEnery, T. and Knowles, G. (eds) (1997) Teaching and Language Corpora. London; New York: Longman. (Although it is not a recent publication, this edited volume from one of the series of conferences on Teaching and Language Corpora provides a wealth of information on aspects of the integration of corpus data in language learning.)
Ahmad, K., Corbett, G. and Rogers, M. (1985) ‘Using Computers with Advanced Language Learners: An Example’, The Language Teacher 9(3): 4–7.
Allan, R. (2009) ‘Can a Graded Reader Corpus Provide “Authentic” Input?’ ELT Journal 63(1): 23–32.
Barlow, M. (2000) MonoConc Pro. Houston, TX: Athelstan.
Bernardini, S. (2000) ‘Systematising Serendipity: Proposals for Concordancing Large Corpora with Language Learners’, in L. Burnard and T. McEnery (eds) Rethinking Language Pedagogy from a Corpus Perspective. Frankfurt: Peter Lang, pp. 225–34.
Braun, S. (2005) ‘From Pedagogically Relevant Corpora to Authentic Language Learning Contents’, ReCALL 17(1): 47–64.
Chambers, A. (2005) ‘Integrating Corpus Consultation in Language Studies’, Language Learning and Technology 9(2): 111–25.
Chambers, A. and Le Baron, F. (eds) (2007) The Chambers–Le Baron Corpus of Research Articles in French/ Le Corpus Chambers–Le Baron d’articles de recherche en français. Oxford: Oxford Text Archive, available at http://ota.ahds.ac.uk/headers/2527.xml (accessed 2 June 2009).
Chambers, A. and Rostand, S. (eds) (2005) The Chambers–Rostand Corpus of Journalistic French/Le Corpus Chambers–Rostand de français journalistique. Oxford, University of Oxford: Oxford Text Archive, available at http://ota.ahds.ac.uk/headers/2491.xml (accessed 2 June 2009).
Chambers, A. and Wynne, M. (2008) ‘Sharing Corpus Resources in Language Learning’, in F. Zhang and B. Barber (eds) Handbook of Research on Computer-Enhanced Language Acquisition and Learning. Hershey, PA: IGI Global, pp. 438–51.
Charles, M. (2007) ‘Reconciling Top-down and Bottom-up Approaches to Graduate Writing: Using a Corpus to Teach Rhetorical Functions’, Journal of English for Academic Purposes 6(4): 289–302.
Cheng, W., Warren, M. and Xun-feng, X. (2003) ‘The Language Learner as Language Researcher: Putting Corpus Linguistics on the Timetable’, System 31(2): 173–86.
Cobb, T. (1997) ‘Is There Any Measurable Learning from Hands-on Concordancing?’ System 25(3): 301–15.
Conrad, S. (2000) ‘Will Corpus Linguistics Revolutionize Grammar Teaching in the Twenty-first Century?’ TESOL Quarterly 34(3): 548–60.
Debrock, M., Flament-Boistrancourt, D. and Gevaert, R. (1999) ‘Le manque de “naturel” des interactions verbales du non-francophone en français. Analyses de quelques aspects à partir du corpus LANCOM’, Faits de Langue 13 (Oral-Ecrit: Formes et théories): 46–56.
Ellis, N. (2002) ‘Frequency Effects in Language Processing. A Review with Implications for Theories of Implicit and Explicit Language Acquisition’, Studies in Second Language Acquisition 24: 143–88.
Farr, F. (2008) ‘Evaluating the Use of Corpus-based Instruction in a Language Teacher Education Context: Perspectives from the Users’, Language Awareness 17(1): 25–43.
Farr, F., Murphy, B. and O’Keeffe, A. (2004) ‘The Limerick Corpus of Irish English: Design, Description and Application’, Teanga 21: 5–29.
Fligelstone, S. (1993) ‘Some Reflections on the Question of Teaching, from a Corpus Linguistics Perspective’, ICAME 17: 97–109.
Gaskell, D. and Cobb, T. (2004) ‘Can Learners Use Concordance Feedback for Writing Errors?’ System 32(3): 301–19.
Hyland, K. (2002) Teaching and Researching Writing. London: Pearson Education.
Johns, T. (1986) ‘Micro-Concord: A Language Learner’s Research Tool’, System 14(2): 151–62.
——(1988) ‘Whence and Whither Classroom Concordancing’, in T. Bongaerts, P. De Haan, S. Lobbe and H. Wekker (eds) Computer Applications in Language Learning. Dordrecht: Foris, pp. 9–27.
——(1991) ‘Should You Be Persuaded: Two Examples of Data-driven Learning’, in T. Johns and P. King (eds) Classroom Concordancing, ELR Journal 4. Birmingham: Centre for English Language Studies, University of Birmingham, pp. 1–16.
——(1997) ‘Contexts: The Background, Development, and Trialling of a Concordance-based CALL Program’, in A. Wichmann, S. Fligelstone, T. McEnery and G. Knowles (eds) Teaching and Language Corpora. London: Longman, pp. 100–15.
Kennedy, C. and Miceli, T. (2001) ‘An Evaluation of Intermediate Students’ Approaches to Corpus Investigation’, Language Learning and Technology 5(3): 77–90.
Krashen, S. (1988) Second Language Acquisition and Second Language Learning. London: Prentice-Hall.
Leech, G. (1997) ‘Teaching and Language Corpora: A Convergence’, in A. Wichmann, S. Fligelstone, T. McEnery and G. Knowles (eds) Teaching and Language Corpora. London: Longman, pp. 1–23.
Louw, B. (1993) ‘Irony in the Text or Insincerity in the Writer? The Diagnostic Potential of Semantic Prosodies’, in M. Baker, G. Francis and E. Tognini Bonelli (eds) Text and Technology: In Honour of John Sinclair. Amsterdam: John Benjamins, pp. 157–76.
McCarthy, M. (1998) Spoken Language and Applied Linguistics. Cambridge: Cambridge University Press.
——(2008) ‘Accessing and Interpreting Corpus Information in the Teacher Education Context’, Language Teaching 41(4): 563–74.
McEnery, T. and Wilson, A. (1997) ‘Teaching and Language Corpora’, ReCALL 9(1): 5–14.
O’Sullivan, Í. and Chambers, A. (2006) ‘Learners’ Writing Skills in French: Corpus Consultation and Learner Evaluation’, Journal of Second Language Writing 15: 49–68.
Rundell, M. (2000) ‘The Biggest Corpus of All’, Humanising Language Teaching 2(3), at http://www.hltmag.co.uk/may00/idea.htm (accessed 31 May 2009).
SACODEYL website, at www.um.es/sacodeyl/ (accessed 31 May 2009).
Schmidt, R. (1990) ‘The Role of Consciousness in Second Language Learning’, Applied Linguistics 11 (2): 129–58.
Scott, M. (2004) WordSmith Tools Version 4.0. Oxford: Oxford University Press.
Sinclair, J. (1991) Corpus, Concordance, Collocation. London: Longman.
Someya, Y. (1999) ‘A Corpus-based Study of Lexical and Grammatical Features of Written Business English’, MA thesis, University of Tokyo.
Stevens, V. (1991) ‘Concordance-based Vocabulary Exercises: A Viable Alternative to Gap-fillers’,inT.Johns and P. King (eds) Classroom Concordancing: English Language Research Journal 4. Birmingham: Centre for English Language Studies, University of Birmingham, pp. 47–63.
Tribble, C. and Jones, G. (1990) Concordances in the Classroom: A Resource Book for Teachers. Harlow: Longman.
Wichmann, A., Fligelstone, S., McEnery, T. and Knowles, G. (eds) (1997) Teaching and Language Corpora. London; New York: Longman,
Widdowson, H. G. (2000) ‘On the Limitations of Linguistics Applied’, Applied Linguistics 21(1): 3–25.
Willis, J. (1998) ‘Concordances in the Classroom without a Computer: Assembling and Exploiting Concordances of Common Words’, in B. Tomlinson (ed.) Materials Development in Language Teaching. Cambridge: Cambridge University Press, pp. 44–66.
Xiao, R. and McEnery, T. (2006) ‘Collocation, Semantic Prosody and Near Synonymy: A Cross-linguistic Perspective’, Applied Linguistics 27(1): 103–29.
Yoon, H. and Hirvela, A. (2004) ‘ESL Student Attitudes Towards Corpus Use in L2 Writing’, Journal of Second Language Writing 13: 257–83.