Winnie Cheng
Corpus-based language teaching is advocated by Sinclair (2004) as representing a new revolution in language teaching. Corpora, corpus-analytic tools and corpus evidence have been increasingly used in English language teaching and learning for the last two decades (see, for example, Sinclair 1987, 1991, 2004). Applications derived from corpus investigation are found in a number of different areas, for example lexicography, translation, stylistics, grammar, gender studies, forensic linguistics, computational linguistics and, equally importantly, in language teaching (Tognini Bonelli 2001).
Fligelstone (1993) describes three aims of corpus-based linguistics in teaching: teaching about (the principles and theory behind the use of corpora), teaching to exploit (the practical, methodological aspects of corpus-based work), and exploiting to teach (using corpora to derive or drive teaching materials). Renouf (1997) adds a fourth aim which is teaching to establish resources; and this then involves the learners in data collection, corpus design and corpus compilation (Lee and Swales 2006: 70). In a similar vein, Leech (1997) describes three relations between corpora and teaching: teach about corpora, exploit corpora to teach, and teach to exploit corpora. In recent years, increasingly, the focus on corpora and teaching has shifted to corpora and learning. The selection of papers at the 5th Teaching and Language Corpora (TaLC) conference in 2002, for example, is organised into three sections: corpora by learners, corpora for learners, and corpora with learners (Aston et al. 2004). In fact, research into learner corpora, that is corpora comprised of texts spoken or written by novices rather than experts, brings about ‘exciting pedagogical perspectives in a wide range of areas of English language teaching (ELT) pedagogy: materials design, syllabus design, language testing, and classroom methodology’ (Granger 2003: 542).
The advances in the direct access to corpora by language teachers and learners have created the need to research into a number of pedagogic issues, including ‘the types of corpora to be consulted, large or small, general or domain-specific, tagged or untagged’; the kinds of learning strategies to benefit from direct corpus consultation; and the means by which direct access to corpora can be integrated into the language learning context.
(Chambers 2005: 111)
Language corpora provide systematic access to naturally occurring language, and corpus-linguistic methods support exploratory and discovery learning (Bernardini 2004), which encourages autonomous learning and teaching (Braun 2005). Many empirical studies in corpus linguistics have contributed to the understanding of the value and benefits language corpora can bring to language pedagogy, particularly with respect to whether corpora can capture reality, and whether corpora can provide valid models for learners (Gavioli and Aston 2001). Even if corpora are too small to capture the full range of linguistic experience, they are useful to test claims based purely on intuition, and to motivate the decisions for teaching particular linguistic features (Gavioli and Aston 2001). Even if a corpus rarely reflects ‘typical’ usage in every aspect (Sinclair and Kirby 1990), and hence does not provide valid models for learners, corpus data are a useful means to engage learners in the interpretive process to create models of their own (Leech 1986). The subjective interpretation of corpus data is best seen in the light of Widdowson’s (1978) distinction between ‘genuineness’ and ‘authenticity’; and his claim that learners are often unable to authenticate genuine texts as they do not belong to the community for which the texts are created, and so they are unqualified to participate in the discourse process and interpretation (Widdowson 1998). An alternative way for learners to authenticate discourse is for them to adopt the role of a discourse observer, who views the interaction critically and analytically to understand interactional strategies (Aston 1988). Indeed, learners can alternate between the roles of discourse observer and discourse participant, and the latter role allows learners to test interactional strategies (Aston 1988; Gavioli and Aston 2001).
The view that language learners can be simultaneously active learners and language researchers accessing corpus data directly is advocated in Johns’ (1991) ‘data-driven learning’ or DDL (see also chapters by Chambers, Gilquin and Granger, and Sripicharn, this volume):
What distinguishes the DDL approach is the attempt to cut out the middleman as much as possible and give direct access to the data so that the learner can take part in building his or her own profiles of meanings and uses. The assumption that underlines this approach is that effective language learning is itself a form of linguistic research, and that the concordance printout offers a unique resource for the stimulation of inductive learning strategies – in particular, the strategies of perceiving similarities and differences and of hypothesis formation and testing.
(Johns 1991: 30)
DDL has been found to be a useful language learning methodology, and there is evidence that learners can indeed benefit from being both language learners and language researchers (see, for example, Kennedy and Miceli 2001; Cheng et al. 2003; Chambers and O’Sullivan 2004; Lee and Swales 2006).
The relative importance of implicit knowledge and explicit knowledge in language learning is often discussed (Kennedy 2003) and is particularly relevant to DDL-related activities. Implicit knowledge, which is especially acquired from meaning-focused interaction, is learning without awareness, whereas learning explicit knowledge through explicit instruction is learning with awareness. This explicit instruction can speed up implicit learning processes when supported with ‘language items, patterns, and rules’ (p. 482). The use of language corpora as a resource in language teaching and learning has been shown to contribute to the acquisition of both implicit and explicit knowledge. While some studies recommend the use of small corpora tailored to the learners’ needs (Aston 1997; Roe 2000), Johns (1997) recommends mediation by the teacher through the preparation of corpus-based materials as a first stage, or as Widdowson (2003) puts it, the use of a ‘pedagogic mediation of corpora’. Indeed, corpora are no longer used solely by teachers and material writers, and are increasingly used by learners as resources both inside and outside the classroom to ‘problematize language, to explore texts, and to authenticate discourse independently and collectively, adding to the reality of the corpus the reality of their own experience of it’ (Gavioli and Aston 2001: 244). Language corpora act as tools in the hands of learners ‘to observe and participate in real discourse for themselves’ (p. 238). As Gavioli and Aston state: ‘the question is not whether corpora represent reality but rather, whether their use can create conditions that will enable learners to engage in real discourse, authenticating it on their terms’ (p. 240). For example, St John (2001) describes the great benefits in the case of a student of German who was asked to find satisfactory answers to unknown vocabulary and formulate appropriate grammar rules for himself using a parallel corpus and concordance software as the only tools in an unsupervised environment. The conclusion echoes Johns’ (1991: 2) notion of DDL, as teachers ‘simply provide the evidence needed to answer the learner’s questions, and rely on the learner’s intelligence to find answers’.
In another study of students’ behaviour when using a corpus, and their perceptions of the strengths and weaknesses of corpora as a second language writing tool, Yoon and Hirvela (2004) find that the students generally perceived the corpus approach to be beneficial for the development of L2 writing skills and increased their confidence in L2 writing. In addition, they highlight the importance of adjusting and staging corpus input to meet the specific level of the learners’ proficiency (Allan 2009), and emphasise the need for clear guiding principles to enable learners to work within as they are learning the new approach to acquiring lexical and grammatical input. Yoon and Hirvela (2004) also warn against making the assumption that more advanced students embrace corpora readily and that they learn effectively working on their own. They suggest that teachers establish students’ learning preferences in this regard, and that teachers should be prudent in their evaluation of learners’ needs and the application of corpus-based activities to ensure more effective learning and teaching.
Based on observations of corpus linguistics in the 1980s, Sinclair (1991) expounds the theory of ‘units of meaning’ which states that ‘in all cases so far examined, each meaning can be associated with a distinct formal patterning … There is ultimately no distinction between form and meaning’ (Sinclair 1991: 496). The meaning of language is primarily expressed by linguistic units called ‘the lexical item’ (Sinclair 1996), which is a unit in the lexical structure to be selected independently and which then selects lexical or grammatical patterns for its expression. Many corpus studies have examined the lexical item and their forms of co-selection. Cheng (2006), for instance, examined the extended meanings of lexical cohesion in a corpus of SARS (Severe Acute Respiratory Syndrome) spoken discourse. She finds that the item ‘SARS’ does not display any strong collocates; it collocates with ‘epidemic’ in 15.7 per cent of its occurrences and ‘outbreak’ in 7.8 per cent of its occurrences. Regarding colligation, ‘SARS’ is preceded by the definite article (23.5 per cent of occurrences) or a preposition (29.4 per cent). The semantic preference is typically (76.5 per cent) to do with the impact of SARS on Hong Kong and those directly affected by it, with words and phrases such as ‘turmoil’, ‘damage’, ‘disease’, ‘crisis’, ‘infected area’, ‘patient’, ‘epidemic’, ‘outbreak’ and ‘morbidity’. In certain word co-selections, the semantic preference of ‘the impact of SARS’ fuses with the semantic prosody of a negative sense of ‘embattled/besieged’. However, in the post-SARS texts, a different semantic prosody is observed through the speakers’ choice of such phrases and grammatical constructions as ‘successfully brought (SARS epidemic) under control’, ‘positive outcome from the (SARS situation)’ and ‘in spite of the (turmoil that SARS has caused)’ which can be characterised as ‘positive assessment’ in 21.6 per cent of instances.
Kennedy (2003) examines the collocations of the twenty-four adverbs of degree in the 100-million-word BNC to show how English is structured, and to demonstrate the nature, extent and importance of collocation in language learning. He finds that each modifier collocates ‘most strongly with particular words having particular grammatical and semantic characteristics’ (p. 467). For instance, when boosters such as ‘very’, ‘really’ and ‘terribly’ occur before ‘useful’ and ‘interesting’, they appear to be synonymous and interchangeable, but this does not apply to other boosters such as ‘clearly’ and ‘badly’ (p. 467). In addition, some amplifiers are not found to collocate with particular adjectives, for example, ‘completely easier, fully classical, badly dead and heavily unique’ (p. 474).
In Osborne’s (2004) study which exploits learner and native-speaker corpora to provide material for language awareness exercises, what are termed the ‘top-down’ and ‘bottom-up’ approaches are employed. The top-down approach refers to ‘drawing data from a native-speaker corpus to provide evidence of target usage to increase learners’ awareness of the language’, and the bottom-up approach refers to ‘drawing data from a learner corpus and using the learners’ own productions as a starting point for error correction and gradual enrichment’ (Osborne 2004: 251). Osborne’s study identifies a few features used by non-native French speakers of English in lexical overuse and grammatical anomalies. For example, compared to native speakers, non-native speakers are found to overuse the lexical word ‘interesting’ by a factor of four in writing, and have three characteristic uses which are almost never found in native-speaker writing; namely ‘it is interesting to notice (or note), used with intensifiers (very, particularly more, etc. interesting), and coordinated adjectives (relevant and interesting, etc.)’ (p. 254). With regard to grammatical anomalies, the use of the present perfect tense, non-count nouns and connectors are examined. Non-native speakers are found to use the present perfect tense with a similar frequency to native speakers, but they have over-interpreted the function of the tense, and so use it in inappropriate contexts. The major errors in the use of non-count nouns by French-speaking learners of English are the anomalous countable uses of ‘informations’ and ‘a(an) information’. The use of connectors by non-native speakers, particularly ‘In fact’, ‘As a matter of fact’ and ‘Anyway’, is found to be 20–30 per cent higher. All of these findings have pedagogical implications and can be usefully described and discussed by learners and teachers.
In the past twenty years, empirical analyses of corpora have contributed to the description of the actual patterns of language use in English (Biber and Reppen 2002). Braun (2005) describes both indirect and direct uses of corpora. Regarding indirect use, corpusbased analyses of English have indirectly influenced syllabus design, the methods and materials for language teaching and learning, test design, feedback and evaluation references, and the contents of reference works and grammars. In direct ways, large and small language corpora can be exploited by learners themselves in different ways. A corpus is now perceived as a primary contributor of teaching and learning resources as ‘empirical analyses of representative corpora provide a much more solid foundation for descriptions of language use’ (Biber 2001: 101) through ‘its potential to make explicit the more common patterns of language use’ (Tao 2001: 116). Corpus-based descriptions of language provide realistic, rich, illustrative and up-to-date data as a resource for the creation of interesting teaching materials (Braun 2005).
New accounts of the English language which challenge traditional views have been provided by corpus linguists, for example, the large academic grammars at the turn of the millennium. One example is the Longman Grammar of Spoken and Written English (LGSWE) (Biber et al. 1999) which incorporates the latest achievements of linguistics and offers statistical data on the use of English in both its written and its spoken form, and describes grammatical features for both structural characteristics and discourse patterns of use. Another major reference grammar is the Cambridge Grammar of English (Carter and McCarthy 2006) which has English as it is spoken today as its starting point with the necessary reference to English as it is written today.
Biber and Reppen (2002) examine actual language use based on empirical studies and develop ESL–EFL materials for grammar instruction. They contrast the presentation of information in six ESL grammar textbooks, and consider three aspects of materials development for grammar instruction; namely ‘the grammatical features to be included, the order of grammatical topics, and the vocabulary used to illustrate these topics’ (p. 199). They focus on noun premodifiers when considering which grammatical features to include or exclude, progressive and simple present tense when considering the order of grammatical topics, and the verbs used in the discussion of present progressive and simple present tense when considering specific words to include when illustrating a grammatical feature (p. 201). Biber and Reppen (2002) highlight sharp discrepancies between the information found in grammar materials and the real-life language use that learners encounter. Based on their findings, they argue that ‘frequency should play a key role in the development of materials and in the choices that teachers make in language classrooms’ (pp. 206–7). They suggest using the frequency studies and register differences described in the LGSWE (Biber et al. 1999) to facilitate the learning process and so to better integrate pedagogy and research.
Nonetheless, while resources are available and plenty of relevant applied corpus linguistics research exists, there is still a need for more resources of the kind developed by Sinclair (2003). The textbook by Sinclair provides a very thorough treatment of how to interrogate a corpus in an efficient and effective manner. Sinclair tackles the problems faced by newcomers to corpus-driven language study and learning, such as handling large amounts of data and systematically interpreting the evidence in the concordance lines. He proposes a seven-step procedure (2003: xvi–xvii) which he then puts into practice across a range of corpus-driven activities, each of which exemplifies a particular theme (e.g. ‘meaning distinction’, ‘co-selection’, ‘grammar and lexis’ and ‘semantic prosody’)by means of close analysis of a word/phrase.
Sinclair (2003) describes seven procedural steps to analyse concordance lines, namely initiate, interpret, consolidate, report, recycle, result and repeat. These steps are useful to both researchers and language learners alike. The book convincingly makes the case that concordance lines can be analysed using the above approach to reveal new evidence about language use. The same case is increasingly addressed through other textbooks (see, for example, O’Keeffe et al. 2007) which cover aspects such as corpus building, corpus linguistic techniques and the many insights into language use which corpus linguistics is in a unique position to provide.
Research into the grammatical and pragmatic characteristics of spoken English has found that the ability to do inexplicitness, among others such as indirectness and vagueness, is a key component in the repertoire of all competent discoursers, particularly in conversations (Cheng and Warren 1999, 2003). Examination of the Cambridge and Nottingham Corpus of Discourse in English (CANCODE) conducted by McCarthy and Carter (2004) and elsewhere (McCarthy and Carter 1994, 1995) has found that ‘ellipsis is a category of grammar that varies markedly according to context, allowing speakers considerable choices in the expression of interpersonal meanings’ (McCarthy and Carter 2004: 337). They advise that when selecting a particular structure from a corpus to teach, teachers should consider not only quantitative analysis, i.e. the frequency of occurrence, but also qualitative analysis of the structure in the corpus.
Based on corpus findings relating to amplifier-adjective collocates, Kennedy (2003: 283–4) suggests that the most frequently occurring collocates, e.g. very good, really good should be explicitly taught as units of language in order for learners to internalise them, and that less frequent collocates, e.g. ‘completely clear’, ‘highly skilled’ or ‘clearly visible’, are best left for implicit learning.
Braun (2005) describes the construction of the ELISA corpus which serves as pedagogic mediation between corpus materials and the corpus users. The ELISA corpus is a collection of fifteen video recordings of interviews with English native speakers from the US, Britain, Australia and Ireland who are professionals in banking, local politics, tourism, the media, agriculture, environmental technology, and so on. The materials in the ELISA corpus are said to ‘support culture-embedded language learning’, and fulfil the criteria for communicatively relevant contents as learning a language is ‘often done to acquire professional, vocational or somewhat “technical” language’ (Braun 2005: 56) and needs to appeal to learners from different backgrounds. The corpus is analysed ‘thematically, linguistically and functionally’ (p. 56) to generate enrichment materials, namely audiovisual materials, information and explanation, tasks and exercises, and study aids and didactic hints (pp. 57–8). As a result of the potential multidimensional access to the materials, ‘different user perspectives, interests and profiles’ can be accounted for (p. 61).
In Languages for Specific Purposes (LSP) teaching and learning, the use of specialised corpora for linguistic evidence, input and insights has been gaining in importance (Hewings 2005). Compared to using a general language corpus, a specialised corpus tends to have a greater concentration of vocabulary (Sinclair 2005) and can be particularly useful for those learning and teaching languages in LSP contexts. Bowker and Pearson (2002), for example, discuss how small specialised corpora that contain texts of a particular genre can be extremely useful for language teachers and learners, including identifying specialised terms and detecting collocations in the specialised target language for glossary compilation, term extraction and writing.
Corpus data have helped researchers to identify patterning that differs from traditional models of the English language. Recent studies comparing English presented in ELT textbooks and English used in natural communicative situations outside of the classroom have, however, found that textbook accounts of language use are often decontextualised and lack an empirical basis (Römer 2005; Cheng and Warren 2005). For example, in their studies of the speech acts of disagreement, and giving an opinion, Cheng and Warren (2005, 2006) conclude that English textbook writers need to incorporate a wider range of and more accurate forms into their materials in order to better reflect the realities of actual language use. Another example is provided by Römer (2006) who examines the use of English conditionals (i.e. if-clauses) in the BNC and compares them with the norms presented in EFL teaching materials in Germany. She finds mismatches between English in use and English in the textbook. She argues that this misrepresentation of English conditionals, coupled with inconsistencies across individual textbooks, then compounds the difficulties encountered by German learners of English.
Osborne’s (2004) study which identifies discrepancies between native and non-native choices in lexis, grammar and rhetoric has applications in teaching. A number of exercises have been designed, based on the corpus data, which aim to develop ‘critical linguistic distance, and to increase overall sensitivity to the characteristics of native and non-native writing’ (p. 260). Other exercises include comparisons, lexical enrichment, collocations, concordance tasks, completion exercises and proof-reading/revision activities. All of these aim to raise learners’ awareness of aspects of the target language which are in need of attention.
While corpora of learner language use are a relatively new phenomenon (see, for example, Granger 2003), they all have very clear pedagogical aims (see also Gilquin and Granger, this volume). The International Corpus of Learner English (ICLE) (Granger 1998), for example, is a two-million-word corpus of written essays by students of English representing approximately twenty different first languages. The corpus has been errorcoded which enables users to identify patterns of errors as well as the relative over- or under-use of certain forms and items. This corpus has a spoken equivalent, the Louvain International Database of Spoken English Interlanguage (LINDSEI), which can provide similar useful information about learners’ spoken language. Such corpora, of course, have been specifically designed to be an invaluable resource for learners and teachers of the English language and can be replicated for other languages. They enable teachers and learners to identify and prioritise very specific and general patterns of use which may need to be addressed.
The very successful Touchstone series (see, for example, McCarthy et al. 2005) is another example of teaching materials based on corpus evidence and demonstrates how everything from syllabi to textbook examples can be informed entirely by corpus data (see also McCarten, this volume). In another study which explores how to better exploit corpus linguistics in the classroom, O’Keeffe et al. (2007: 246–8) outline ways to facilitate synergy between corpus linguistics and language teaching. They make the important point that the links need to be in both directions and that for corpus linguistics to best inform language teaching, ‘teachers need to inform corpus linguistics’ (p. 246). For this to happen, language teachers need to engage with corpus linguistics as an integral part of their professional training and development. They argue that more of the research questions in corpus linguistics need to be language teacher-driven because they ‘arise out of practice’ and they are the best ‘mediators between corpus findings and practice’ (p. 246). They point out the lack of research into feedback on the use of corpora in language teaching which could usefully inform future corpus linguistics research (p. 247). All this takes time, and they remind us that just as teachers and learners have embraced dictionaries and grammars based on corpus evidence, so they can become comfortable with using corpora themselves in their teaching. They propose some improvements to help with this last point: wider availability of corpora and corpus tools online, and a greater diversity of corpora, especially more non-English corpora, more spoken corpora, more non-native user corpora, and more specialised corpora (p. 247). In line with others, they advocate the need to move away from native versus non-native distinctions and argue for corpora which reflect the discourse of ‘expert users’ (p. 248).
Another development advocated by O’Keeffe et al. (2007) is the compilation of multimedia corpora which currently is rarely undertaken. The most notable exception to this observation is the multi-modal corpus currently being compiled by Professor Gu Yueguo of the Chinese Academy of Social Sciences and Beijing Foreign Studies University. This corpus is based on video data and requires eight layers of analysis (e.g. orthographic, phonetic, prosodic, speech act, discourse function and various forms of non-verbal communication), and promises to provide the most complete resource to date for the study of face-to-face communication with huge implications for language teaching (Gu 2008; see also chapters by Adolphs and Knight, and Thompson, this volume).
Studies have mainly reported on ‘what can be done in language learning’ but ‘relatively little on how learners actually go about investigations’ (Kennedy and Miceli 2001: 77). For those who adopt ‘exploration learning’, ‘discovery learning’ (Bernardini 2004), ‘data-driven learning’ or DDL (Johns 1991; Tribble and Jones 1997; Turnbull and Burston 1998), concordance-based activities, for example, are designed to familiarise learners with various types of investigations and to stimulate the development of appropriate learning strategies through practice. These language learning approaches concur with the contemporary task-based language learning approach (Willis and Willis 2007) which emphasises the development of tasks and activities to engage learners in using the language. These are adopted to exploit the pedagogic context by focusing on both the authenticity of the source text and ‘its authenticity by the learner, which arises out of the involvement of the learner with the material, via the task’ (Mishan 2004: 219). In other words, the process by which learners ‘authenticate’ the corpus data as they engage in a ‘data-driven learningcum-research exercise’ is both motivational and effective (Lee and Swales 2006: 71).
Kennedy and Miceli (2001: 82), for instance, design activities that engage learners in a four-step approach to corpus investigation: (1) formulate the question, (2) devise a search strategy, (3) observe the examples and select relevant ones, and (4) draw conclusions. Adopting a similar approach, Gavioli and Aston (2001) describe tasks which involve the learner in playing the roles of both the discourse observer and discourse participant. The learner interacts critically with both corpus and dictionary data, identifies patterns, and adapts the patterns in the revision of the essay in which s/he is the discourse participant. In so doing, the learner constructs a model from the data which s/he can authenticate in her/his own discourse. In a similar attempt to train language learners to become language researchers, Cheng et al. (2003) report that over 80 per cent of their undergraduate students found the corpus-driven and data-driven projects useful. In Chambers and O’Sullivan’s (2004) study, eight MA students of French write a short text, then receive training in corpus consultation skills, and finally try to improve the text by consulting a corpus containing similar texts with the help of concordancing software. The findings show that the learners make changes in (in descending order of frequency) ‘grammatical errors (gender and agreement, prepositions, verb forms/mood, negation and syntax); misspellings, accents and hyphens; lexico-grammatical patterning (native language interference, choice of verb and inappropriate vocabulary); and capitalisation’ (p. 158). The tools to interrogate corpora, particularly the interactive concordancer, constitute potential learning resources, as they can be used by both the teacher and the learner to investigate lexical, grammatical, semantic and pragmatic features (Zanettin 1994; Sinclair 1996, 2004; St John 2001).
Another study (Kaltenbock and Mehlmauer-Larcher 2005) describes language learning tasks that enable learners to authenticate corpus data. The tasks involve learners as observers and discourse participants respectively. As observers, learners explore and analyse texts to identify patterns of language use and then, as discourse participants, they read corpus texts and carry out tasks which involve exchanges with other learners who read the same text or a similar one (Kaltenbock and Mehlmauer-Larcher 2005: 79). Other tasks adopting such an approach include learners working in groups on deducing the meanings of phrasal verbs based on one lexical word, e.g. ‘get on, get off, get by, get through’, or idioms and idiomatic use, e.g. ‘“on the one hand”, “give a big hand to”, “out of hand” and “to hand”’ (Mishan 2004: 224).
Recent studies using the search engine ConcGram (Greaves 2009; Cheng 2007; Greaves and Warren 2007) have highlighted the importance of introducing phraseology in EFL curricula, and how corpus tools may be implemented in CALL environments to help students gain knowledge of phraseological items. In their study of verb-particle combinations (VPCs), Campoy-Cubillo and Silvestre López (2008) analyse the relational unit ‘out’ as the origin of several ConcGram searches to describe its prototypical and derived phraseological uses in a corpus of spoken academic English (MICASE). Campoy and Silvestre describe tasks for learners to conduct similar ConcGram searches. For instance, the constituency configuration lists on ConcGram help in analysing internal patterns and presenting VPC complementation in the classroom arranged by order of importance in the texts analysed; working on a specific pattern ‘V + it + out’ that is meaningful in the two-word ConcGram search and working from the most literal to the most idiomatic combinations; and designing activities that focus on the uses of the particle, e.g. general (VPCs and Particle) and specific (specific contribution of particle, ‘verb’ vs. ‘verb + out’).
Greaves and Warren (2007) outline replicable language learning activities that raise learners’ awareness of the prevalence and importance of phraseology, and help to develop in learners the computational and analytical skills needed to conduct an initial study of the phraselogical profile of a text. The specifics of the activities are detailed below.
• Learners work with two texts: Policy Addresses given by the Chief Executive of Hong Kong in October 2006 and October 2005.
• Compile a list of the ten most frequent words in each text.
• Compile a list of the twenty most frequent phrases in each text.
• Monitor and record the frequency with which the most frequent words and phrases found in 2005 Policy Address occur in the 2006 Policy Address and vice versa.
• Discuss the findings from the two texts.
• Throughout the analysis of the two policy addresses, the differing lengths of the two Policy Addresses are noted, and so direct comparison of frequencies need to take this into account.
Despite the fact that corpus linguistics and DDL have a relatively long history in the field, with many of the publications deriving from the COBUILD project specifically aimed at learners appearing in the late 1980s and 1990s, as Römer (2006) notes, corpus linguistics and its applications have yet to become mainstream in language teacher education programmes or in language teaching. She cites a study by Breyer (2005) who found that corpus linguistics and corpus applications in language learning and teaching receive little attention in teaching training programmes in Germany. Römer (2006: 122) concludes that there is ‘strong resistance towards corpora from students, teachers and materials writers’.
In Hong Kong, similar observations have been made by Cheng and Warren (2007) who find mismatches between naturally occurring English and the English that is taught to learners as a model, pointing out the urgency for an improvement in learning and teaching materials in terms of language forms and functions introduced to language learners. They suggest that writers of instructional materials should draw on the findings of corpus researchers, in the form of research papers, dictionaries, reference grammars and other resources, when they write and revise materials, tasks and activities. There is still a serious disconnect between abundant corpus evidence on the one hand and the standard traditional language descriptions to be found in most language textbooks still widely used around the world.
Further evidence for Römer (2006) and Cheng and Warren’s (2007) conclusions regarding the current status of corpus linguistics and the use of corpora in language teaching is provided by a corpora-list subscriber who in April 2008 posted a question to other list members to ask why there were fewer DDL resources available than one might think, and requested information on any published or online materials that adopt a DDL approach. A number of colleagues responded to the question, and their responses were summarised by the sender of the enquiry message. The responses included the following comments.
• Concordances are too difficult for learners. After twenty years, DDL remains a tiny minority interest.
• Individual teachers use DDL in class to meet specific needs, but do not publish their work or record it in any permanent form because it is not easy to get resources into a web format.
• People might not want to publish resources that they see as imperfect in some way.
• DDL is inaccessible to many teachers for lack of know-how or resources.
• DDL is inaccessible to many learners, as it presupposes an introspective, reflective approach to study, using new and relatively difficult technology, by learners and teachers who may not be highly motivated.
A response by Adam Kilgarriff to the comment that concordances are too difficult for learners is to select corpus sentences according to readability, and the beta version of his Sketch Engine has an option to sort concordances ‘best first’, from a learner’s point of view (Kilgarriff et al. 2004). Work is also underway so that corpora can be used in language learning to only show users sentences which learners are likely to be able to read and understand.
In another response from colleagues responsible for MICASE (the online version of the Michigan Corpus of Academic Spoken English), it is stated that there are a number of MICASE-derived teaching materials accessible through their project website on some common topics (and problem areas) in EAP and ESL teaching (e.g. hedging or say–talk– tell). There are also two interactive MICASE-based lessons for self-study: one on spoken academic English formulas and one on clarifying and confirming. In addition, there is the MICASE Kibbitzer page that covers a number of language problems that students (and teachers) can examine further such as ‘Less and fewer’ and ‘End up’.
In Hong Kong, corpus linguistics in language learning has been advocated, and the development of corpus-based learning has gone one step further than those proposed by Römer (2006). The Research Centre of Professional Communication in English (RCPCE) at the Hong Kong Polytechnic University has worked very closely with professional associations such as the Hong Kong Institute for Certified Public Accountants (HKICPA) and the Hong Kong Institution of Engineers (HKIE) to compile specialised corpora (publicly online) based on the spoken and written discourse of those professions. These professional associations have served as consultants to give advice as to the texts and discourses that are representative of their respective professions. Under the auspices of the RCPCE, a series of seminars have been held jointly with these professional bodies for their members to promote the use of corpora as an invaluable resource to enhance professional communication. The findings of corpus-based studies should be explored with respect to the practical implications for curriculum design, materials writing and learning and teaching not only for general English, ESP and LSP, but also for informing professional communicative practices.
The work described in this paper was substantially supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region (Project No. PolyU 5459/08H, B-Q11N).
Hunston, S. (2002) Corpora in Applied Linguistics. Cambridge: Cambridge University Press. (This book contains useful chapters on corpus data interpretation with plenty of examples.)
O’Keeffe, A. M., McCarthy, M. J. and Carter, R. A. (2007) From Corpus to Classroom. Cambridge: Cambridge University Press. (This book provides theoretical points and a large number of practical tasks for learners and teachers.)
Sinclair, J. McH. (2003) Reading Concordances. London: Longman. (This book contains eighteen tasks of various types of corpus interrogation arranged at four levels of task complexity, with a large number of theoretical input presented mainly in the keys to the tasks.)
Allan, R. (2009) ‘Can a Graded Reader Corpus Provide “Authentic” Input?’ ELT Journal 63: 23–32.
Aston, G. (1988) Learning Comity: An Approach to the Description and Pedagogy of Interactional Speech. Bologna: Cooperativa Libraria Universitaria Editrice.
——(1997) ‘Small and Large Corpora in Language Learning’, in B. Lewandowska-Tomaszczyk and J. P. Melia (eds) Practical Applications in Language Corpora. Lodz, Poland: Lodz University Press, pp. 51–62.
Aston, G., Bernardini, S. and Stewart, D. (eds) (2004) Corpora and Language Learners. Amsterdam and Philadelphia, PA: John Benjamins.
Bernardini, S. (2004) ‘Corpora in the Classroom: An Overview and Some Reflections on Future Developments’, in J. Sinclair (ed.) How to Use Corpora in Language Teaching. Amsterdam and Philadelphia, PA: John Benjamins, pp. 15–36.
Biber, D. (2001) ‘Using Corpus-based Methods to Investigate Grammar and Use: Some Case Studies on the Use of Verbs in English’, in R. Simpson and J. Swales (eds) Corpus Linguistics in North America. Michigan: University of Michigan Press, pp. 101–15.
Biber, D. and Reppen, R. (2002) ‘What Does Frequency Have to Do with Grammar Teaching?’ Studies in Second Language Acquisition 24: 199–208.
Biber, D., Johansson, S., Leech, G., Conrad, S. and Finegan, E. (1999) Longman Grammar of Spoken and Written English. Harlow: Longman.
Bowker, L. and Pearson, J. (2002) Working with Specialized Language: A Practical Guide to Using Corpora. London: Routledge.
Braun, S. (2005) ‘From Pedagogically Relevant Corpora to Authentic Language Learning Contents’, ReCALL 17(1): 47–64.
Breyer, Y. (2005) ‘Love’s Labour’s Lost: The Troublesome Relationship between Corpus Linguistics Research and Its Application in EFL Teacher Training in Germany’, paper presented at Corpus Linguistics 2005, University of Birmingham, UK, July.
Campoy-Cubillo, M. C. and Silvestre López, A. J. (2008) ‘Congraming “Out” in a Spoken Academic Corpus’, paper presented in Department of English, the Hong Kong Polytechnic University ,12 December.
Carter, R. and McCarthy, M. (2001) ‘Size Isn’t Everything: Spoken English, Corpus, and the Classroom’, TESOL Quarterly 35: 337–40.
——(2006) Cambridge Grammar of English: A Comprehensive Guide: Spoken and Written English: Grammar and Usage. Cambridge: Cambridge University Press.
Chambers, A. (2005) ‘Integrating Corpus Consultation in Language Studies’, Language Learning and Technology 9(2): 111–25.
Chambers, A. and O’Sullivan, I. (2004) ‘Corpus Consultation and Advanced Learners’ Writing Skills in French’, ReCALL 16(1): 158–72.
Cheng, W. (2006) ‘Describing the Extended Meanings of Lexical Cohesion in a Corpus of SARS Spoken Discourse’, in J. Flowerdew and M. Mahlberg (eds) ‘Corpus Linguistics and Lexical Cohesion’, Special Issue of International Journal of Corpus Linguistics 11(3): 325–44.
——(2007) ‘Concgramming: A Corpus-driven Approach to Learning the Phraseology of Disciplinespecific Texts’, CORELL: Computer Resources for Language Learning 1: 22–35.
Cheng, W. and Warren, M. (1999) ‘Inexplicitness: What Is It and Should We Be Teaching It?’, Applied Linguistics 20(3): 293–315.
——(2003) ‘Indirectness, Inexplicitness and Vagueness Made Clearer’, Pragmatics 13(3/4): 381–400.
——(2005) ‘// well I have a DIFferent //
THINking you know //: A Corpus-driven Study of Disagreement in Hong Kong Business Discourse’, in M. Gotti and F. Bargiela (eds) Asian Business Discourse(s). Frankfurt am main: Peter Lang, pp. 241–70.
——(2006) ‘“I would say be very careful of …”: Opine Markers in an Intercultural Business Corpus of Spoken English’, in J. Bamford and M. Bondi (eds) Managing Interaction in Professional Discourse. Intercultural and Interdiscoursal Perspectives. Rome: Officina Edizioni, pp. 46–58.
——(2007) ‘Checking Understandings in an Intercultural Corpus of Spoken English’,inA.O’Keeffe and S. Walsh (eds) ‘Corpus-based Studies of Language Awareness’, Special Issue of Language Awareness 16(3): 190–207.
Cheng, W., Warren, M. and Xu, X. (2003) ‘The Language Learner as Language Researcher: Corpus Linguistics on the Timetable’, System 31(2): 173–86.
Cheng, W., Greaves, C. and Warren, M. (2006) ‘From N-gram to Skipgram to Concgram’, International Journal of Corpus Linguistics 11(4): 411–33.
Fligelstone, S. (1993) ‘Some Reflections on the Question of Teaching, from a Corpus Linguistics Perspective’, ICAME Journal 17: 97–110.
Gavioli, L. and Aston, G. (2001) ‘Enriching Reality: Language Corpora in Language Pedagogy’, ELT Journal 55(3): 238–46.
Granger, S. (ed.) (1998) Learner English on Computer. Austin, TX: Addison Wesley Longman.
——(2003) ‘The International Corpus of Learner English: A New Resource for Foreign Language Learning and Teaching and Second Language Acquisition Research’, TESOL Quarterly 37(3): 538–46.
Greaves, C. (2009) ConcGram 1.0. Amsterdam: Benjamins.
Greaves, C. and Warren, M. (2007) ‘Concgramming: A Computer-driven Approach to Learning the Phraseology of English’, ReCALL Journal 17(3): 287–306.
Gu, Y. G. (2008) ‘Come to Grips with Video Data: Introducing Techniques in Agent-oriented Modeling and Video Data-mining’, plenary paper presented at Partnerships in Action: Research, Practice and Training Inaugural Conference of the Asia-Pacific Rim LSP and Professional Communication Association, City University of Hong Kong and the Hong Kong Polytechnic University, 8–10 December.
Hewings, M. (2005) Advanced Grammar in Use, second edition. Cambridge: Cambridge University Press.
Johns, T. (1991) ‘From Printout to Handout: Grammar and Vocabulary Teaching in the Context of Data-driven Learning’, English Language Research Journal 4: 27–45.
——(1997) ‘Contexts: The Background, Development, and Trialling of a Concordance-based CALL Program’, in A. Wichmann, S. Fligelstone, T. McEnery and G. Knowles (eds) Teaching and Language Corpora. London: Longman, pp. 100–15.
Kaltenbock, G. and Mehlmauer-Larcher, B. (2005) ‘Computer Corpora and the Language Classroom: The Language Classroom: On the Potential and Limitations of Computer Corpora in Language Teaching’, ReCALL 17(1): 65–84.
Kennedy, C. and Miceli, T. (2001) ‘An Evaluation of Intermediate Students’ Approaches To Corpus Investigation’, Language Learning and Technology 5(3): 77–90.
Kennedy, G. (2003) ‘Amplifier Collocations in the British National Corpus: Implications for English Language Teaching’, TESOL Quarterly 37(3): 467–87.
Kilgarriff, A., Rychly, P., Smrz, P. and Tugwell, D. (2004) ‘The Sketch Engine’, Proc. Euralex (Lorient, France, July): 105–16.
Lee, D. and Swales, J. (2006) ‘A Corpus-based EAP Course for NNS Doctoral Students: Moving from Available Specialized Corpora to Self-compiled Corpora’, English for Specific Purposes 25: 56–75.
Leech, G. (1986) ‘Automatic Grammatical Analysis and Its Educational Applications’, in G. Leech and C. Candlin (eds) Computers in English Language Teaching and Research. London: Longman, pp. 205–14.
——(1997) ‘Teaching and Language Corpora: A Convergence’, in A. Wichmann, S. Fligelstone, A. M. McEnery and G. Knowles (eds) Teaching and Language Corpora. London: Addison Wesley Longman, pp. 1–23.
McCarthy, M. and Carter, R. (1994) Language as Discourse: Perspectives for Language Teaching. London: Longman.
——(1995) ‘Spoken Grammar: What Is It and How Do We Teach It?’ ELT Journal 49: 207–18.
——(2004) ‘Size Isn’t Everything: Spoken English, Corpus, and the Classroom’, TESOL Quarterly 35 (2): 337–40.
McCarthy, M., McCarten, J. and Sandiford, H. (2005) Touch stone Student’s Book 2 a with Audio CD/CD-Rom. Cambridge: Cambridge University Press.
Mishan, F. (2004) ‘Authenticating Corpora for Language Learning: A Problem and Its Resolution’, ELT Journal 58(3): 219–27.
O’Keeffe, A., McCarthy, M. J. and Carter, R. A. (2007) From Corpus to Classroom. Cambridge: Cambridge University Press.
Osborne, J. (2004) ‘Top-down and Bottom-up Approaches to Corpora in Language Teaching’,inU. Connor and T. A. Upton (eds) Applied Corpus Linguistics. A Multidimensional Perspective. London: Rodopi, pp. 251–65.
Policy Address (2006/7) www.policyaddress.gov.hk/06–07/eng/pdf/speech.pdf
Renouf, A. (1997) ‘Teaching Corpus Linguistics to Teachers of English’, in A. Wichmann, S. Fligelstone, T. McEnery and G. Knowles (eds) Teaching and Language Corpora. London: Longman, pp. 255–66.
Roe, P. (2000) ‘The ASTCOVEA German Grammar in conText Project’, in B. Dodd (ed.) Working with German Corpora. Birmingham: University of Birmingham Press, pp. 199–216.
Römer, U. (2005) Progressives, Patterns, Pedagogy: A Corpus-driven Approach to English Progressive Forms, Functions, Contexts, and Didactics. Amsterdam: John Benjamins.
——(2006) ‘Pedagogical Applications of Corpora: Some Reflections on the Current Scope and a Wish List for Future Developments’, Zeitschrift für Anglistik und Amerikanistik 54(2): 121–34.
Sinclair, J. McH. (1987) ‘The Nature of the Evidence’, in J. McH. Sinclair (ed.) Looking Up: An Account of the COBUILD Project in Lexical Computing. London: Collins, pp. 150–9.
——(1991) Corpus, Concordance, Collocation. Oxford: Oxford University Press.
——(1996) ‘The Search for Units of Meaning’, Textus 9(1): 75–106.
——(2003) Reading Concordances. London: Longman.
——(2004) ‘Introduction’, in J. M. Sinclair (ed.) How to Use Corpora in Language Teaching. Amsterdam and Philadelphia: John Benjamins, pp. 1–13.
——(2005) ‘Corpus and Text. Basic Principles’, in Martin Wynne (ed.) Developing Linguistic Corpora: A Guide to Good Practice. Oxford: Oxbow Books, pp. 1–16.
Sinclair, J. McH. and Kirby, D. M. (1990) ‘Progress in English Computational Lexicography’, World Englishes 9: 21–36.
Sinclair, J. McH. and Mauranen, A. (2006) Linear Unit Grammar. Amsterdam: John Benjamins.
St John, E. (2001) ‘A Case for Using a Parallel Corpus and Concordance for Beginners of a Foreign Language’, Language Learning and Technology 5(3): 185–203.
Tao, H. (2001) ‘Discovering the Usual with Corpora: The Case of remember’, in R. Simpson and J. Swales (eds) Corpus Linguistics in North America: Selections from the 1999 Symposium. Ann Arbor, MI: University of Michigan Press, pp. 116–44.
Teubert, W. (2005) ‘Evaluation and Its Discontents’, in E. Tognini Bonelli and G. Del Lungo Camiciotti (eds) Strategies in Academic Discourse. Amsterdam: John Benjamins, pp. 185–204.
Tognini Bonelli, E. (2001) Corpus Linguistics at Work. Amsterdam: John Benjamins.
Tribble, C. and Jones, G. (1997) Concordances in the Classroom: Using Corpora in Language Education. Houston, TX: Athelstan.
Turnbull, J. and Burston, J. (1998) ‘Towards Independent Concordance Work for Students: Lessons from a Case Study’, ON-CALL 12(2): 10–21. Widdowson, H. G. (1978) Teaching Language as Communication. Oxford: Oxford University Press.
——(1991) ‘Context, Community and Authentic Language’, TESOL Quarterly 32(4): 705–15.
——(1998) ‘EIL: Squaring the Circles. A Reply’, World Englishes 17(3): 397–401.
——(2003) Defining Issues in English Language Teaching. Oxford: Oxford University Press.
Willis, D. and Willis, J. (2007) Doing Task-based Teaching. Oxford: Oxford University Press.
Yoon, H. and Hirvela, A. (2004) ‘ESL Student Attitudes toward Corpus Use in L2 Writing’, Journal of Second Language Writing 13: 257–83.
Zanettin, F. (1994) ‘Parallel Words: Designing a Bilingual Database for Translation Activities’,inA. Wilson and T. McEnery (eds) Corpora in Language Education and Research: A Selection of Papers from TALC94 (Technical Papers) 4. Lancaster, UK: UCREL, pp. 99–111.