45
How can corpora be used in language testing?

Fiona Barker

1. What is language testing?

This chapter considers how corpora can be used in the field of language testing. Referred to as Language Testing and Assessment (LTA), this field is concerned with measuring the language proficiency of individuals in a variety of contexts and for a range of purposes, assessing language knowledge, performance or application. The general aim of language testing is to measure a latent trait in order to make inferences about an individual’s language ability. Language tests allow us to observe behaviours which can be evaluated by attaching test scores which provide evidence for an individual’s ability in a specific skill or their overall language competence. For an introduction to LTA and its terminology see ALTE Members (1998), McNamara (2000) and Hughes (2003).

One useful way of thinking about testing language proficiency is to explore the essential principles, or test qualities, that underlie good assessment. There are various elaborations of such principles in the LTA literature, following Bachman’s ‘essential measurement qualities’ of Validity and Reliability (Bachman 1990: 24–6) and the addition of Impact and Practicality (Cambridge ESOL 2008). Validity means that a test measures what it sets out to measure, so that a test score reflects an underlying ability accurately (there are many types of Validity, see Weir 2005). Reliability means that a test’s results are accurate, consistent and dependable. Impact is the effect, preferably positive, which a test has on test takers and other stakeholders, including society. A test score needs to be meaningful to those who base decisions on it and for purposes such as admission to university or access to professional training. The fourth principle, Practicality, means that a test is practicable in terms of the resources needed to write, deliver and mark the test. In order for language proficiency to be assessed successfully, the purpose of testing must be clearly defined and the approach taken must be ‘fit for purpose’. Key considerations of designing and administering tests that will be explored further below include: establishing the purpose for a test; deciding what to test and how to test this; and working out how to score and report a result. We shall explore these areas of language testing before suggesting how corpora relate to these different stages in the lifecycle of a test (Saville 2003).

Establishing the purpose of a language test may arise from the need to assess students’ current level of general language proficiency for placement purposes into a suitable language learning class, or their suitability to enter a particular course or to study abroad. A test can also be used to measure students’ progress through or achievement at the end of a language learning course, or to give them a widely recognised qualification that allows access to educational, professional or other opportunities. Market forces also drive the provision of new tests, as do governments, education ministries, companies and professional bodies that require tests for specific groups, e.g. school leavers, or for specific purposes, e.g. the accreditation of overseas medical staff. Alongside a well-defined test purpose, we need to understand the historical and social context of any language test. In the USA, for example, the psychometric tradition (viewing language proficiency as a psychological trait that can be measured along a single dimension; see ALTE Members 1998: 158, 168) and ‘objective’ test formats such as multiple-choice questions are better suited to testing discrete aspects of language, for example specific grammatical points, whereas the testing tradition in Western Europe, in the UK in particular, has maintained its interface with pedagogy and its focus on task-based testing. This legacy is reflected in the history of the oldest English language exam in existence, Cambridge ESOL’s Certificate of Proficiency in English, which celebrates its centenary in 2013 (see Weir 2003 for its development; see Spolsky 1995 for the US tradition and its influence on the TOEFL, the Test of English as a Foreign Language).

The next consideration is deciding what to test, i.e. the abstract construct that is to be realised in an assessment instrument. This construct incorporates the tester’s view of how to approach the assessment of language proficiency, for example by testing individual skills which are commonly grouped into receptive (reading and listening), productive (writing and speaking) and enabling skills (including grammatical, vocabulary and pragmatic knowledge). Sometimes two skills are assessed together by means of an integrative task where, for example, a test taker has to combine spoken and written input with their own ideas to produce a written response (see McNamara 2000: 14). All language testers need to establish and explain their own approach to assessing language, for example in their publicity and support materials.

The test construct informs the specification of how each skill is operationalised in specific task formats and how each skill should be scored or rated. The test designer needs to work out how to test the construct, i.e. what task types to use to get the test taker to recognise, produce or manipulate the required linguistic phenomenon (for example, the most appropriate word to fill a gap). Part of this process involves identifying the functions and facets of the skill that are to be tested which are normally stated in test specifications documentation available to test writers; a simpler version may also be available for test takers, teachers and other stakeholders. For example, when testing reading at an elementary level, level A2 on the Common European Framework of Reference (CEFR, Council of Europe 2001), Cambridge ESOL tests ‘a range of reading skills with a variety of texts, ranging from very short notices to longer continuous texts’ with assessment focusing on the ‘ability to understand the meaning of written English at word, phrase, sentence, paragraph and whole text level’ (UCLES 2008: 2). The tasks used to assess language proficiency at any level should match the cognitive and psychological ability of the test taker as well as their linguistic knowledge; these are some of the test taker characteristics that influence how an individual performs on any test and which form part of a socio-cognitive framework for test development (see Weir 2005). There are various task formats, some used for one skill or level, others being used for many skills and across levels. Common task types for assessing receptive skills (reading and listening) include gapfill, multiple matching and multiple-choice questions. A gapfill (or cloze) task contains a prose passage or set of sentences with gaps where the correct word or phrase needs to be inserted (open cloze) or selected (multiple-choice cloze). Multiplematching involves matching paragraphs, sentences or words with the correct picture, definition, etc., or re-arranging them in the most logical order. Multiple-choice questions include a question and a set of answers, one or more of which are correct. Tasks that test productive skills tend to include a rubric (the instructions the test taker reads or hears) and there may also be some written or visual input (the prompt), for example a letter or a diagram. There are also word, phrase or sentence formation tasks and error identification or correction tasks that may be used for either receptive or productive skills.

A final aspect of task design concerns the authenticity of the format and content, which relates to the real-worldness of the activity given to the test taker. In the Western European tradition, the testing of writing and speaking is as realistic as possible, with test takers interacting with each other and examiners in speaking tests, and writing tasks requiring them to undertake life-like communicative activities, so that they can demonstrate aspects of their Communicative Language Ability (Bachman 1990: 84; also see Purpura 2008). Inferences based on test scores are made by test users (teachers, employers, etc.), so it is essential that language testers ensure that what is being tested is relevant and appropriate to what is required by these users; i.e. that a language test actually assesses what test takers need to be able to demonstrate for a specific purpose or situation.

Corpora, as generalist or specialist collections of texts, clearly have a role to play in helping language testers to decide on the constructs that they intend to test, by providing evidence of what is involved in expert, or native-like, texts of various types. In relation to test design, corpora have a role in helping language testers to write more realistic tasks, by basing them on or by taking inspiration from real-life texts. Additionally, evidence from corpora, particularly collocational or colligational information, is extremely useful in informing, ratifying or refuting test designers’ intuitions about what should or should not be tested in a specific area, for example lexico-grammatical knowledge.

The final consideration in test design is working out how to rate or score a test. This includes deciding whether to mark according to a description of what a satisfactory performance is on a particular task (criterion referenced), or comparing an individual’s response to those of other test takers (norm referenced), or to use a combination of both approaches. This decision is linked to how the test results will be used, whether to measure what a learner can already do in a specific or general area, or whether to suggest the next steps to be taken in learning a particular language (see McNamara 2000: 62–6). Corpora also have a role in this process, as evidence of what both learners and proficient users of a language or particular variety can do leads to the use of such data to describe typical performances at various levels. The Association of Language Testers in Europe (ALTE) has developed a set of ‘Can-do’ statements aligned to the CEFR which illustrate general ability at each of the six CEFR levels and typical ability in three skill areas of Social and Tourist, Work, and Study, such as ‘CAN write letters on any subject and full notes of meetings or seminars with good expression and accuracy’ (C2 level writing overall general ability; see ALTE Members 2008).

Beyond establishing the purpose of a language test, deciding on the construct and the means of testing and rating this, there are wider societal aspects of language testing, such as the moral and ethical dimension which includes maintaining fairness and equality of access for all test takers (see McNamara 2000: 67–77). Clearly language testing and assessment involve many inter-related aspects and take place in a variety of contexts, for various purposes, involving both experts and non-experts who may have to design or administer tests or make decisions based on test results. There are formal, external, potentially life-changing assessments of language proficiency (high stakes tests) that contrast with lower stakes tests which are important in a localised context but have fewer life-changing implications for the test taker and are therefore less well-researched and reported in the LTA literature.

Language testing occurs in a range of local, regional, national and international contexts, and language tests should take into account both the social context and the cognitive make-up of test takers and have various degrees of impact on them. Within this complex picture, where do, and should, corpora fit? It is clear from this volume that Corpus Linguistics (CL) has contributed to a range of pure and applied fields of study for many years. Corpora of various types are currently used for theoretical and applied research in various fields associated with LTA, including developing teaching materials and in publishing. They also have both practical and theoretical uses in large-scale language testing carried out by awarding institutions as well as in smaller-scale, more localised learning and teaching contexts (see Taylor and Barker 2008 for an overview). The following sections outline the development and current use of corpora in language testing and look to the future of corpus-informed language testing.

2. The development of corpus use in language testing

One of the first direct involvements of corpora in the field of language testing occurred with the development of electronic collections of learner data by examination boards, publishers and academic institutions. In the UK, the Cambridge Learner Corpus (hereafter CLC) was set up by the EFL Division of the University of Cambridge Local Examinations Syndicate (UCLES EFL) and Cambridge University Press in the early 1990s as an archive of general English examination scripts together with accompanying demographic and score data (see the CLC website for information and corpus-informed publications). The CLC was designed as a unique collection of learner writing at various proficiency levels; it has expanded to include other domains and proficiency levels of English beyond the initial three levels of general English exams at its inception (the First Certificate in English, Certificate in Advanced English and Certificate of Proficiency in English). This corpus is used for exam validation and related research, as described below.

Another key learner corpus instigated around this time is the International Corpus of Learner English (ICLE), developed at the Centre for English Corpus Linguistics (CECL) in Belgium (Granger et al. 2002). The CECL team have continued to develop a suite of comparative native and learner corpora to support their Contrastive Interlanguage Analysis approach (Granger 1996; also see Gilquin and Granger, this volume). ICLE consists of argumentative essays and literature papers collected by research teams worldwide; contributors are required to provide writing from advanced students defined as ‘university students of English in their 3rd or 4th year of study’ with any ‘doubtful cases’ being checked by the corpus developers (a detailed bibliography is available online, see Centre for English Corpus Linguistics 2008). Establishing the nature of language proficiency at different levels is vital for language testers seeking to design tests that either aim to assess candidates at a particular proficiency level or report results across part of or the whole proficiency scale. Language testers therefore need to use corpus data to identify the linguistic exponents of a particular proficiency level which can only be done reliably if a learner’s level is correctly identified and recorded in a corpus.

Other corpora have been developed specifically to inform tests of English or other languages. For reasons including data protection and commercial sensitivity, some of these corpora are only accessible outside the contributing institutions under certain conditions, including through funded research awards (see IELTS or TOEFL websites). An International English Language Testing System funded study using exam scripts is reported in Kennedy and Thorp (2007). This research highlights key features of L2 writing performance at different levels of proficiency which led to a reformulation of the band descriptors used to assess IELTS writing. A domain-specific corpus that informs language test development is the TOEFL 2000 Spoken and Written Academic Language Corpus (T2K-SWAL) built by Educational Testing Services (ETS; Biber et al. 2004). This contains spoken and written texts that typify American academic discourses and it has been used to design receptive components of the TOEFL 2000 exam and also to enhance research into US academic registers. Biber et al. (2004) identified representative patterns of language use in each register, followed by the development of a suite of tools to compare reading and listening tasks from the TOEFL 2000 test with the corpus data. Other corpora of academic English include MICASE for American English and BASE and BAWE for British English (see their websites).

It is clear that language testers both started to develop and find ways of using learner and native corpora in the 1990s. However, there was little discussion of corpora in the LTA literature until Alderson (1996) outlined various applications of corpora in language testing including writing test items and scoring tests. A decade later, Alderson investigated native speaker judgements of frequency for languages without large corpora; the results indicate a surprising lack of agreement between the expert raters that suggests that corpora are indeed the best way to obtain reliable word frequency measures (Alderson 2007: 407). The benefits of reference or domain-specific native corpora for language testing are now established, and for a number of years language testers have been taking into account native and learner evidence, in the same way that publishers, authors and teachers had been doing since the 1980s.

In the lengthy and complex process of designing and administering language tests, corpora can be used in various ways. On a practical level, native corpora provide evidence that test writers use alongside their intuitions and experience to decide whether a particular structure or phrase is sufficiently common in a particular language variety for inclusion in a test (either as input material or as an item to be tested). For example, Cambridge ESOL has used native reference corpora such as the British National Corpus (BNC; see website) to inform the test writing process for a number of years (see Saville 2003). Barker (2004) describes how native and learner evidence informs the development of word and structure lists used by test writers to produce question papers for specific levels. Learner corpora allow comparisons of task performance over time or between proficiency levels and analysing learner errors can suggest what could be tested to distinguish between particular levels. Hargreaves (2000) describes how native reference corpora were used alongside a corpus of candidates’ scripts to identify collocations to feed into a new task type within an advanced general English test. On a more theoretical level, corpora are used to develop rating scales and linguistic descriptions of proficiency levels of learner writing and speech (see Hawkey and Barker 2004; Hendriks 2008).

This review suggests that the use of corpora in language testing is fairly recent compared to their use in related fields such as lexicography or language teaching (see chapters by Coxhead, Flowerdew, Jones and Durrant, and Walter, this volume). Nevertheless, language testers develop specialist corpora of learner output and test materials, and use both native and learner corpora to inform the development and validation of their examinations in the ways outlined above. Essentially, corpus data reveal what people actually do with language in real life, learning and testing contexts. Corpora therefore can be used to inform our understanding of learner and expert discourse in various domains, as well as suggesting suitable aspects of language to test or to avoid and, additionally, to show what differentiates learners of different proficiency levels from each other and from an expert user of the language. We will now consider in more depth the ways in which learner corpora can be used to inform language testing and assessment.

3. How can we use a learner corpus to inform language testing?

This section describes some examples of the use of learner corpora in LTA. Learner corpora tend to be designed for a specific purpose from the outset, enhancing their applicability in certain areas, for example the study of academic discourse, while limiting their generalisability beyond their specific focus, meaning that they would be inappropriate for lexicographic work requiring a large general language sample. We therefore start with some recent learner corpus developments which could have direct applications for LTA, before turning to specific ways in which language testers use learner corpora.

The Varieties of English for Specific Purposes Database (VESPA), initiated in 2008, contains written ESP texts from various L1 groups spanning academic disciplines, text types and writer expertise (see Centre for English Corpus Linguistics website). Contributors are encouraged to undertake collaborative research, one of the strengths of multinational and inter-disciplinary corpus-building enterprises, although the contributor guidelines stress that only academic research will be permissible, which could be viewed as a missed opportunity for improving the formal assessment of language for specific purposes. Another CECL resource under development is the LONGDALE Project (Longitudinal Database of Learner English) which will include written and spoken longitudinal data collected by European, Asian and American researchers (see Centre for English Corpus Linguistics website). Alongside learners’ production, demographic details and task information are being collected and, importantly for ascertaining learner levels, all students are required to take two language tests whenever they contribute to the corpus in order to obtain an objective measure of their proficiency level (see Centre for English Corpus Linguistics website). This approach is similar to that being taken in collection of new corpora for the English Profile Programme (see Alexopoulou 2008). It is not yet clear whether any commercial research will be permissible using this resource, but academic research will surely lead to improvements in our understanding of SLA and teaching practices.

Learner corpora of other languages under development include a spoken corpus of L2 Italian, the Lexicon of Spoken Italian by Foreigners (LIPS), which was developed in Siena (Barni and Gallina 2009). This corpus contains 1,500 speaking tests from the Certificate of Italian as a Foreign Language (CILS) and researchers have used this corpus to investigate lexical acquisition and development, and to develop word lists for comparing vocabulary size and range with native data (Gallina 2008). The LIPS corpus has various planned applications in publishing, teaching and assessment: it will inform a dictionary for L2 Italian; it will be used to develop curricula and teaching materials, to inform the selection of input texts for future versions of the CILS, and to explore the learners’ proficiency in writing and speaking. Future research could compare the same learners’ spoken data with written data, something also being planned for English Profile and the LONGDALE Project. In a comparative study, Mendikoetxea (2006) reports on the WOSLAC project that uses two written learner corpora: WriCLE (L1 Spanish–L2 English) and CEDEL2 (L1 English–L2 Spanish) to investigate the properties which influence word order in the interlanguage of L2 learners of these languages. This has clear applications for learning, teaching and assessing such structures.

There are many ways in which learner corpora can inform various stages in the lifecycle of a language test: we shall concentrate on their applications in defining user needs and test purpose; in test design; and in task rating. In relation to user needs and test purpose, learner corpora show us what learners of a language can do at certain levels of proficiency, which can inform what is tested at a particular proficiency level, whether overall or at task or item level (see ALTE Members 2008). This adds a qualitative dimension to the language tester’s traditionally quantitative approach to analysing test data, and informs the writing of test materials and how they are rated (see Cambridge ESOL 2008). Hawkey and Barker (2004) used learner corpus data and CL analysis techniques to support tabula rasa manual analysis in the development of a Common Scale for assessing writing across a wide range of proficiency levels and types of English. Another exam board, the Testing and Certification Division of the University of Michigan English Language Institute (ELI-UM) in the USA, used a corpus of test-taker writing and speech to inform changes to their scoring criteria for the Examination for the Certificate of Competency in English (ECCE), a B2-level general English exam. The test takers’ output was analysed and the criterion-referenced rating scales revised to include five levels on four criteria (content and development; organisation and connection of ideas; linguistic range and control; and communicative effect), to better reflect the linguistic features of learners’ output (see ELI-UM Testing and Certification Division website).

In relation to test design, learner corpora can reveal much about the influence of demographic variables, test mode (paper-based or computer-based) and learning environment on the learners’ output. Both learner and native corpora form part of an interdisciplinary research programme, English Profile, whose primary aim is to develop Reference Level Descriptors (RLDs) for English, a project registered with the Council of Europe (see the English Profile website). English Profile uses the thirty-million-word CLC as its starting point and is developing both written and spoken corpora to complement this resource (see Alexopoulou 2008). Data collection involves international teams submitting sets of learner data, written responses, spoken data, background information plus self-, teacher and external assessments of proficiency level, from various educational contexts with the aim of balancing the existing range of mother tongues, text types and proficiency levels in the CLC, as well as exploring hypotheses about specific features that seem to be criterial for identifying a specific proficiency level or L1 influence. See the English Profile website for a summary of English Profile activities, including the tagging and parsing of the CLC and analysis of learner errors and discourse and sentence level features at different proficiency levels and by L1.

Learner corpora also help language testers to more accurately describe various linguistic domains. Horner and Strutt (2004) analysed business English vocabulary using a word list derived from a learner corpus; they applied four categories to the list, then asked native and non-native informants to apply these to a subset of 600 words. While both groups had difficulty in applying the categories consistently when asked to identify core and noncore vocabulary, the study identified ways of classifying vocabulary using meaning-based categories (Horner and Strutt 2004: 8).

Learner corpora are used by test writers to explore the collocational patterning of learners’ written or spoken production at various levels, which can be used to show what patterns are common/less frequent at certain levels, guiding the inclusion of these in tests, or additionally what are the most frequent errors or misuse of specific collocational pairings, therefore suggesting suitable distractors for multiple choice options. Learner corpora are also used to support or refute item writers’ intuitions about what learners can be expected to know at a certain level, based on evidence of what they can already produce. Furthermore, analysing learners’ most frequent errors (by first language or proficiency level) will suggest to the language tester what could be tested to distinguish between candidates at a particular level and could also inform teaching materials and practices (see Section IV in this volume on using a corpus for language pedagogy and methodology).

In relation to task rating, various researchers are exploring ways of automatically scoring or evaluating writing and speaking, which links to research to detect errors in learners’ output. In the USA, ETS started using corpora and various CL and Natural Language Processing (NLP) techniques to develop automated systems for assessing writing in the late 1990s (see Burstein et al. 2004). Tetreault and Chodorow (2008) present a system for error detection in non-native essays, focusing on preposition errors; they also evaluate other systems that use non-English corpora. In another study, Deane and Gurevich (2008) describe the use of a corpus of both native and non-native speakers responding to the same TOEFL writing test prompt, in order to identify similarities and differences between the phrasing and content of both groups’ responses. This type of research has implications for the automatic rating of writing and for systems which provide formative feedback for learners. Exam boards are starting to offer online evaluations of learners’ written production whereby learners, or their teachers, upload written text and receive back personalised feedback. This process can contribute to corpus development if the evaluation system captures data usage permissions and background information alongside learners’ language samples.

Language testers vary in their view and use of automatic rating software and it is worth noting that impact and practicality often drive their use. Generally speaking, the use of technology to rate learner output is not done to the exclusion of human involvement in the assessment process as no software can rate a learner’s production in the exact same way a human can, i.e. according to the successful completion of a communicative task. Computer software can only rate what it is trained to recognise, such as specific measurable linguistic features (vocabulary, collocations, word stress, etc.), usually by comparing a learner’s text with a standardised dataset that includes examples of performance at given levels. While it does not tire or err on the side of caution, rating software can be tricked into giving a high rating that would not be given by a human rater. Technology certainly has a role to play in rating language performance and providing evaluative feedback, but as an accompaniment to human raters, rather than as a replacement for them. An example of a package for classroom-based assessment is the CALPER GOLD (graphic online diagnostic) software, which enables teachers to enter their students’ data directly into an online tool so that they can analyse emergent structures over time, with graphical displays rather than statistical tables, thus informing teaching practices and guiding students’ assessment (McCarthy 2008: 571).

Having explored how learner corpora are being developed and used to inform LTA, we shall now consider how native speaker corpora are being used in the same field before looking to the future of this alliance between fields and methodologies. We should bear in mind, however, that many of the applications of learner corpora outlined above also apply to the use of native corpora so we will only outline specific additional uses of native corpora below.

4. How can we use a native speaker corpus to inform language testing?

In relation to user needs and test purpose, native corpora are of particular relevance to testing language for specific purposes; they are needed to support the recent growth in language tests for areas beyond the established academic and business domains (for example, financial and legal domains). For academic English, a key resource is MICASE, a spoken corpus of American university speech (native and non-native) (Simpson et al. 2002; see MICASE website for related studies). The ELI-UM Testing and Certification Division has used this corpus to develop and validate various examinations; for example, word frequencies have informed new listening test items for a high-level test, the Examination for the Certificate of Proficiency in English (ECPE). The analysis of candidates’ responses to a listening test revealed that listening items ‘containing MICASE phraseology’ successfully discriminated between high- and low-scoring test takers (see MICASE website). Furthermore, the test specifications used by test writers have been updated to include an indication of the range of different speech events a word occurs in as well as their frequency. Future work will compile academic frequency word lists, concentrating on lexical bundles, and core and specialised vocabulary, as Horner and Strutt (2004) attempted to do for the business domain. The ELI-UM Testing and Certification Division intend to use MICASE to obtain information about realistic speech rates and other aspects of spoken English in the academic US setting, presumably to better inform their test writers. In the UK, Brooks (2001) developed a checklist to identify communicative functions used in academic IELTS speaking tests; the resulting checklist is used to validate speaking tasks for various domains.

In test design, native corpora are used by language testers to ensure that all parts of a test are valid and reliable. Native corpora either provide authentic texts to be used ‘as is’ or simplified, or model texts that test writers can base their own texts on. Hughes (2008) investigates the impact that the editing of authentic texts has on the language within thirteen First Certificate in English (FCE) reading passages, comparing original with edited versions and additionally comparing the lexis from the FCE reading texts with native corpus frequencies. Hughes (2008) established that these reading tasks are fair to test takers in that they provide evidence for their ability to decode English by using phraseologies they would normally expect to meet in everyday language, lending further support to the use of corpora to ensure that a test’s content is directly relevant to the world beyond the testing situation. Crossley et al. (2007) analysed eighty-four simplified and twenty-one authentic reading passages from beginner-level English textbooks, grammar books and basic-level readers, in order to describe the linguistic structures of both types of reading passages and to assess the implications for language learning. The results indicate that the simplified texts differed from authentic texts in a number of ways but there were no significant differences in the abstractness and ambiguity between the two groups of texts (Crossley et al. 2007: 1, 27). This research will be of use, therefore, to those involved in writing, or selecting, prose passages for use in examinations.

With regard to establishing text readability, researchers at the University of Memphis have developed Coh-Metrix, an online tool that assesses a text’s coherence and cohesion on over 600 measures, linking approaches from computational and psycho-linguistics (see Coh-Metrix website). This tool is increasingly used for L2 testing applications such as measuring L2 lexical proficiency, distinguishing between high- and low-proficiency essays (Crossley et al. 2008), and developing reading competency profiles based on reading passages in the SAT Reasoning Test, a US college entry test (VanderVeen et al. 2007).

Most test writers would know of, and some would frequently use, native reference corpora, corpus-informed reference works, and some would now also turn to the web for evidence (see Lee, this volume). Alongside the use of native corpora for large-scale language assessment, there is growing recognition in their use in language teaching. Shortall (2007), for example, investigates whether frequency data derived from the Bank of English (see website) provides a more realistic representation of the present perfect tense than current textbooks. What he says in relation to writing and using textbooks also applies to assessing language proficiency: ‘textbook language would more truly reflect the cross-structural hybrids commonly found in authentic language … there is clearly a role for both pedagogic necessity and frequency of occurrence in natural language in the devising of learner materials’ (Shortall 2007: 178–9). This mirrors how language testers triangulate corpus and other research methodologies to inform test validation procedures (Hawkey and Barker 2004).

A final use of CL techniques within LTA that relies on both native and learner corpora is the identification of malpractice in language tests, for example the growing use of plagiarism detection software in higher education and other contexts (also see Cotterill, this volume). Having summarised key aspects of using corpora for language testing, we now look to the future in the final section.

5. Looking to the future of corpus-informed language testing

The use of corpora in LTA has been established over the past twenty years and surely has a promising future as new corpora are developed and innovative ways of using these resources are found. The application of corpora and related analytical techniques seems poised to grow further, used alongside the expertise of the teams of professionals who develop, administer and mark language tests. Why does this seem to be the case?

The growing popularity of computer-based tests of language proficiency, whether for general language or for other domains such as business, legal or aviation language, will enable domain-specific corpora to be developed more easily than has been possible before, partly thanks to the automated collection of test takers’ demographic and other information, their language sample and their scores. Similarly, improvements in the digital recording and storing of soundfiles will make the collection of spoken corpora more straightforward, thus increasing their availability for language testers. Alongside newer types of corpora, including the web as corpus, multi-modal corpora and corpora of new language varieties, there will remain a place for regular collections of native and learner writing and speech in LTA. Increasingly, corpus development will involve individuals and teachers uploading work directly to web portals and getting back a personalised evaluation.

The field of corpus-informed language testing is growing rapidly, aided by theoretical, technological and methodological advances in the fields of language testing and assessment and corpus linguistics. More work clearly remains to be done in this area, and this will benefit from the increasing number of corpus resources being built and shared worldwide between the corpus linguistics and language testing and assessment communities.

Further reading

ALTE Members (1998) Multilingual Glossary of Language Testing Terms (Studies in Language Testing vol. 6). Cambridge: UCLES and Cambridge University Press. (This multilingual glossary provides full definitions of technical terms in LTA in ten European languages, allowing readers to explore the LTA literature through their own language.)

Centre for English Corpus Linguistics (2008) Learner Corpus Bibliography, available at http://cecl.fltr.ucl.ac.be/learner%20corpus%20bibliography.html (accessed 23 December 2008). (This bibliography of works relating to learner corpora is relevant for those interested in using or developing learner corpora, particularly in relation to pedagogical uses and contrastive analysis.)

Taylor, L. and Barker, F. (2008) ‘Using Corpora for Language Assessment’, in E. Shohamy and N. H. Hornberger (eds) Encyclopedia of Language and Education, second edition, vol. 7, Language Testing and Assessment. New York: Springer, pp. 241–54. (This chapter summarises the development and use of corpora in language testing, focusing on high-stakes uses. It provides a summary of the development of computerised corpora since the 1960s and links to the LTA literature.)

References

Alderson, J. C. (1996) ‘Do Corpora Have a Role in Language Assessment?’ in J. A. Thomas and M. H. Short (eds) Using Corpora for Language Research. London: Longman, pp. 248–59.

——(2007) ‘Judging the Frequency of English Words’, Applied Linguistics 28(3): 383–409.

Alexopoulou, T. (2008) ‘Building New Corpora for English Profile’, Research Notes 33: 15–19, Cambridge: UCLES.

ALTE Members (1998) Multilingual Glossary of Language Testing Terms (Studies in Language Testing vol. 6). Cambridge: UCLES/Cambridge University Press.

——(2008) The Can-do Statements, available at www.alte.org/cando/index.php (accessed 28 April 2009).

Bachman, L. F. (1990) Fundamental Considerations in Language Testing. Oxford: Oxford University Press.

Barker, F. (2004) ‘Corpora and Language Assessment: Trends and Prospects’, Research Notes 26: 2–4, Cambridge: UCLES.

Barni, M. and Gallina, F. (2009) ‘Il corpus LIPS (Lessico dell’italiano parlato da stranieri): problemi di trattamento delle forme e di lemmatizzazione’, in Atti del ConvegnoCorpora di italiano L2:tecnologie, metodi, spunti teorici’, Pavia, November, Perugia: Guerra Edizioni, pp. 139–51.

Biber, D., Conrad, S., Reppen, R., Byrd, P., Helt, M., Clark, V. Cortes, V., Csomay, E. and Urzua, A. (2004) Representing Language Use in the University: Analysis of the TOEFL 2000 Spoken and Written Academic Language Corpus (Report Number: RM-04–03, Supplemental Report Number: TOEFLMS-25). Princeton, NJ: Educational Testing Service.

Brooks, L. (2001) ‘Converting an Observation Checklist for Use with the IELTS Speaking Test,’ Research Notes 11: 20–1, Cambridge: UCLES.

Burstein, J., Chodorow, M. and Leacock, C. (2004) ‘Automated Essay Evaluation: The Criterion Online Writing Evaluation Service’, AI Magazine 25(3): 27–36.

Cambridge ESOL (2008) Research: At the Heart of What We Do, available at www.cambridgeesol.org/what-we-do/research/index.html (accessed 21 December 2008).

Centre for English Corpus Linguistics (2008) Learner Corpus Bibliography, available at http://cecl.fltr.ucl.ac.be/learner%20corpus%20bibliography.html (accessed 23 December 2008).

Council of Europe (2001) Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge: Cambridge University Press.

Crossley, S. A., Louwerse, M., McCarthy, P. M. and McNamara, D. S. (2007) ‘A Linguistic Analysis of Simplified and Authentic Texts’, Modern Language Journal 91: 15–30.

Crossley, S. A., Miller, N. C., McCarthy, P. M. and McNamara, D. S. (2008) ‘Distinguishing between Low and High Proficiency Essays Using Cognitively-inspired Computational Indices’, paper presented at the 38th Annual Meeting of the Society for Computers in Psychology, Chicago, November.

Deane, P. and Gurevich, O. (2008) Applying Content Similarity Metrics to Corpus Data: Differences between Native and Non-native Speaker Responses to a TOEFL Integrated Writing Prompt (Report Number: RR-08-51). Princeton, NJ: Educational Testing Service.

Gallina, F. (2008) ‘The LIPS Corpus (Lexicon of Spoken Italian by Foreigners)’, paper presented at the international seminar New Trends in Corpus Linguistics for Language Teaching and Translation Studies. In Honour of John Sinclair, Granada, September.

Granger, S. (1996) ‘From CA to CIA and Back: An Integrated Approach to Computerized Bilingual and Learner Corpora’, in K. Aijmer, B. Altenberg and M. Johansson (eds) Languages in Contrast. Papers from a Symposium on Text-based Cross-linguistic Studies, Lund 45 March 1994 (Lund Studies in English 88). Lund: Lund University Press.

Granger, S., Dagneaux, E. and Meunier, F. (eds) (2002) The International Corpus of Learner English. Handbook and CD-ROM. Louvain-la-Neuve: Presses Universitaires de Louvain.

Hargreaves, P. (2000) ‘How Important is Collocation in Testing the Learner’s Language Proficiency?’ in M. Lewis (ed.) Teaching CollocationFurther Developments in the Lexical Approach. Hove: Language Teaching Publications, pp. 205–23.

Hawkey, R. and Barker, F. (2004) ‘Developing a Common Scale for the Assessment of Writing’, Assessing Writing 9: 122–59.

Hendriks, H. (2008) ‘Presenting the English Profile Programme: In Search of Criterial Features’, Research Notes 33: 7–10, Cambridge: UCLES.

Horner, D. and Strutt, P. (2004) ‘Analysing Domain-specific Lexical Categories: Evidence from the BEC Written Corpus’, Research Notes 15: 6–8, Cambridge: UCLES.

Hughes, A. (2003) Testing for Language Teachers, second edition. Cambridge: Cambridge University Press.

Hughes, G. (2008) ‘Text Organisation Features in an FCE Reading Gapped Sentence Task’, Research Notes 31: 26–31, Cambridge: UCLES.

Kennedy, C. and Thorp, D. (2007) ‘A Corpus-based Investigation of Linguistic Responses to an IELTS Academic Writing Task’, in L. Taylor and P. Falvey (eds) IELTS Collected Papers: Research in Speaking and Writing Assessment (Studies in Language Testing vol. 19). Cambridge: UCLES and Cambridge University Press, pp. 316–77.

McCarthy, M. (2008) ‘Assessing and Interpreting Corpus Information in the Teacher Education Context’, Language Teaching 41(4): 563–74.

McNamara, T. (2000) Language Testing. Oxford: Oxford University Press.

Mendikoetxea, A. (2006) ‘Exploring Word Order in Learner Corpora: The Woslac Project’, unpublished presentation to Corpus Research Group, 20 November, Lancaster, available at http://eprints.lancs.ac.uk/285/ (accessed 23 December 2008).

Purpura, J. (2008) ‘Assessing Communicative Language Ability: Models and Components’,inN. Hornberger and E. Shohamy (eds) Encyclopedia of Language and Education, second edition, vol. 7, Language Testing and Assessment. New York: Springer, pp. 53–68.

Saville, N. (2003) ‘The Process of Test Development and Revision within UCLES EFL’, in C. J. Weir and M. Milanovic (eds) Continuity and Innovation: Revising the Cambridge Proficiency in English Examination 1913:2002 (Studies in Language Testing vol. 15). Cambridge: UCLES and Cambridge University Press, pp. 57–120.

Shortall, T. (2007) ‘The L2 Syllabus: Corpus or Contrivance?’ Corpora 2(2): 157–85.

Simpson, R. C., Briggs, S. L., Ovens, J. and Swales, J. M. (2002) The Michigan Corpus of Academic Spoken English. Ann Arbor, MI: The Regents of the University of Michigan.

Spolsky, B. (1995) Measured Words: The Development of Objective Language Testing. Oxford: Oxford University Press.

Taylor, L. and Barker, F. (2008) ‘Using Corpora for Language Assessment’, in E. Shohamy and N. H. Hornberger (eds) Encyclopedia of Language and Education, second edition, vol. 7, Language Testing and Assessment. New York: Springer, pp. 241–54.

Tetreault, J. and Chodorow, M. (2008) ‘Native Judgments of Non-native Usage: Experiments in Preposition Error Detection’, paper presented at COLING Workshop on Human Judgments in Computational Linguistics, Manchester, UK, August.

UCLES (2008) KET Handbook for Teachers, available at www.cambridgeesol.org/assets/pdf/resources/teacher/ket_handbook.pdf (accessed 21 December 2008).

VanderVeen, A., Huff, K., Gierl, M., McNamara, D. S., Louwerse, M. and Graesser, A. C. (2007) ‘Developing and Validating Instructionally Relevant Reading Competency Profiles Measured by the Critical Reading Section of the SAT Reasoning Test’, in D. S. McNamara (ed.) Reading Comprehension Strategies: Theories, Interventions, and Technologies. Mahwah, NJ: Lawrence Erlbaum, pp. 137–72.

Weir, C. J. (2003) Continuity and Innovation: Revising the Cambridge Proficiency in English Examination 1913:2002 (Studies in Language Testing vol. 15). Cambridge: UCLES and Cambridge University Press.

——(2005) Language Testing and Validation: An Evidence-Based Approach. Houndmills: Palgrave Macmillan.