Xiaofei Lu
Language development refers to the process in which the language faculty develops in a human being. First language development is concerned with how children acquire the capability of their native language, while second language development is concerned with how children and adults acquire the capability of a second language.
Theories of first language development generally need to address at least the following three questions: what children bring to the language learning task, what mechanisms drive language acquisition, and what types of input support the language-learning system (Pence and Justice 2008). Psychologists have taken drastically different approaches to answering these questions, among which the rationalist, empiricist and pragmatist paradigms have been the most influential (Russell 2004). The rationalist approach, inspired by Chomskyan linguistics, takes the view that the language faculty does not depend on external sources for its content, but is internal to each individual. For rationalists, children are born with innate formal knowledge of a universal grammar, and they bring this domain-specific knowledge to the task of acquiring the I-language (i.e. the internal and individual language) of their native tongue. Language input is used to discover the parameters their native language uses to satisfy the universal grammar. The empiricist approach, upheld by connectionists, believes that the content of the language faculty is not innate, but is derived from perceptual experience. For empiricists, children employ domain-general mechanisms of associative learning to acquire the rules and representations of their native language through experience with sufficient speech input. The pragmatist or socio-cognitivist approach advocates that children recruit their sociocognitive capacity to actively construct their language faculty. Within this paradigm, language is viewed as a socio-cultural action, and the language development process is viewed as involving children constructing a series of models or working theories of their mother tongue from the evidence that is available to them.
Theories of second language development generally seek to explain a different set of questions, including the nature of second language knowledge, the nature of interlanguage, the contributions of knowledge of the first language, the contributions of the linguistic environment, and the role of instruction (Ortega 2007). Nine contemporary theories of second language development are presented in VanPatten and Williams (2007), some of which are rooted in theories of child language development. These theories take different stances with respect to the various aspects of second language development. For example, concerning the nature of second language knowledge, the Chomskyan universal grammar theory, which is committed to nativism, argues that second language learners cannot obtain knowledge of ungrammaticality and ambiguity from linguistic input, but possess pre-existing knowledge of the grammar that constrains their learning task (White 2007). Contrastively, the skill acquisition theory, which is committed to conscious processing, claims that development happens from initial representation of knowledge through proceduralisation of knowledge to eventual automatisation of knowledge (DeKeyser 2007). Still different is the Vygotskian socio-cultural theory, which views human cognition as a social faculty and posits that second language development ‘takes place through participation in cultural, linguistic, and historically formed settings’ (Lantolf and Thorne 2007: 201) and through interaction within social and material environments.
In addition to the theoretical question of how language development takes place, another important and more practical question that is of interest to teachers, researchers, parents and/or clinicians is what stage of language development a particular child or second language learner is in, or, in other words, how much a child or second language learner knows about the language system and its use at a particular point. Measurement of language development is especially important for children suffering any delay or disorder in their language development. There are a number of different ways to answer this question, including naturalistic observation; production, comprehension, and judgment tasks; formal testing; and language sample analysis, among others. In this section, we focus on how language development can be measured through analysing spoken or written language samples produced by a child or second language learner.
A number of measures of language development have been proposed and explored in the child language development literature. Some measures are based on verbal output, e.g. mean length of utterance (MLU) (Brown 1973) and number of different words (NDWs), while others are based on structural analysis, e.g. Developmental Sentence Scoring (DSS) (Lee 1974), and Index of Production Syntax (IPSyn) (Scarborough 1990). Both DSS and IPSyn were developed to evaluate children’s grammatical development, although they work in different ways. The DSS metric assigns a score to each sentence. It considers eight different types of grammatical forms, including indefinite pronouns, personal pronouns, main verbs, secondary or embedded verbs, conjunctions, negatives and two types of questions. Variants of the same type of grammatical form are scored differently based on the order in which children develop the ability to use them. The score of a sentence is the sum of the points for each type plus one point if the sentence is fully grammatical. The average DSS of a speaker can be computed using a representative language sample. The IPSyn metric does not apply to individual sentences, but examines the number of times fifty-six target grammatical structures are used in a sample produced by a speaker. These include various types of noun phrases, verb phrases, questions and some specific sentence structures. Each occurrence of any of the target grammatical structures in the language sample receives one point. However, a maximum of two occurrences of each structure are counted, and the maximum score a language sample can receive is 112.
In the second language development literature, a number of developmental index studies have attempted to identify objective measures of fluency, accuracy and complexity of production that can be used to index the learner’s level of development or overall proficiency in the target language. This is generally achieved by assessing the development of second language learners at known proficiency levels in the target language using various measures. Developmental measures identified in such a way allow teachers and researchers to evaluate and describe the learner’s developmental level in a more precise way. In addition, they can also be used to examine the effect of a particular pedagogical treatment on language use. Wolfe-Quintero et al. (1998) provided a comprehensive review of the measures explored in thirty-nine second and foreign language writing studies and recommended several measures that were consistently linear and significantly related to programme or school levels as the best measures of development or error. These include three measures of fluency, i.e. mean length of T-unit, where a Tunit is a main clause plus any subordinate clauses (Hunt 1965), mean length of clause, and mean length of error-free T-unit; two measures of accuracy, i.e. error-free T-units per T-unit, and errors per T-unit; two measures of grammatical complexity, i.e. clauses per T-unit, and dependent clauses per clause; and two measures of lexical complexity, i.e. total number of word types divided by the square root of two times of total number of word tokens, and total number of sophisticated word types divided by total number of word types.
In this section, we discuss several ways in which a corpus of child language development data may be used to find out more about first language development. Some of these will be illustrated using the following corpora and corpus analysis software: the Child Language Data Exchange System (CHILDES) database, the Computerized Language Analysis (CLAN) program (MacWhinney 2000), and Computerized Profiling (Long et al. 2008). We briefly introduce each of these first.
The CHILDES database contains transcripts and media data collected from conversations between young children of different ages and their parents, playmates and caretakers. These data are contributed by researchers from many different countries, following the same data collection and transcription standards. Each file in the database contains a transcript of a conversation and includes a header that encodes information about the target child or children (e.g. age, native language, whether the child is normal in terms of language development, etc.), other participants, the location and situation of the conversation, the activities that are going on during the conversation, and the researchers and coders collecting and transcribing the data. The conversation is transcribed in a oneutterance-per-line format, with the producer of each utterance clearly marked in a prefix. Each utterance is followed by another line that consists of a morphological analysis of the utterance. Any physical actions accompanying the utterance are also provided in a separate line. The CLAN program is a collection of computational tools designed to automatically analyse data transcribed in the CHILDES format. Some of the automatic analyses that the program can run on one or more files in the CHILDES database include word frequency, type/token ratio, a measure of vocabulary diversity called D (Durán et al. 2004), mean length of turn, mean length of utterance, DSS score, among others.
Computerized Profiling is a set of programs designed to analyse both written language samples and phonetically transcribed spoken language samples. Linguistic analysis at a range of different levels can be performed, including simple corpus statistics, semantics, grammar, phonology, pragmatics and narratives. For example, at the grammar level, the following four procedures can be run: IPSyn, DSS, Black English Sentence Scoring (BESS) (Nelson 1998), which is an adaptation of DSS for use with speakers of African American Vernacular English, and the Language Assessment, Remediation, and Screening Procedure (LARSP) (Crystal et al. 1989), a system for profiling the syntactic and discourse development of children that is related to both age and stage.
First of all, a corpus can be used to describe the characteristics of language produced by children in different age groups or different stages of development. Children may exhibit a considerable amount of variability in terms of language development. However, it is useful to understand the average capability as well as the range of capabilities exhibited by children within the same age group. Researchers generally agree that there are certain milestones in child language development, or approximate ages at which specific language capabilities usually emerge or mature. For example, at approximately twelve months of age, words start to emerge; at approximately twenty-four months of age, children possess more than fifty vocabulary items and begin to spontaneously join these items into self-created two-word phrases; and at approximately thirty months of age, children produce utterances with at least two words, and many with three or even five words (Lenneberg 1967). A large corpus consisting of language samples produced by children of different age groups can be used to complement or confirm naturalistic observation for establishing or revisiting such milestones. The CHILDES database constitutes a good example of such a corpus. Given a set of data that consists of transcripts of conversations involving target children in a particular age group, e.g. eighteen months, it is fairly straightforward to use CLAN to find out the average as well as the range of vocabulary size, mean length of turn, mean length of utterance, lexical diversity, DSS score, etc., exhibited by all the children in the group.
Second, a corpus can be used to investigate the sequence or order in which children acquire different aspects of the grammar of their native language as well as to track the development of individual children over time. This type of investigation necessitates a corpus of longitudinal data, i.e. data collected from the same child or group of children over an extended period of time, e.g. one to five years. One example of this type of research is Ramer (1976), who conducted a longitudinal study to investigate the developmental sequence of syntactic acquisition in seven children. Specifically, she aimed to find out whether there is ‘a universal sequence of emergence of grammatical relations leading up to the production of S + V + O constructions’ (p. 144). She analysed her corpus data using a hypothesised simplicity–complexity dimension based on the number of grammatical relations produced and their expansions. She reported that the sequence of acquisition specified in the hypothesised dimension was observed in the data from all seven children.
Third, a corpus can be used to assess the validity and adequacy of the various metrics proposed for measuring child language development. This is an important enterprise as such measures are often used for evaluating the level of language development of children with developmental delays or disorders. One of the ways to approach this problem is closely related to the descriptive and longitudinal research discussed above. Since these measures were proposed to measure language development, many of them were based on observation of child language acquisition. Given a particular measure, it is sensible to evaluate whether it reflects the development sequence or significantly differentiates the developmental levels of children in different age groups. A second way to approach this problem is to examine whether a proposed measure significantly differentiates between the developmental levels of children with and without developmental disorders within the same age group. A good example of this line of research is Hewitt et al. (2005). They compared scores of kindergarten children with a mean age of six years with and without specific language impairment (SLI) on three commonly used measures, i.e. MLU in morphemes, IPSyn, and NDWs. They found that children with SLI showed significantly lower mean scores for all of the three measures, except for some subtests of the IPSyn. In relation to this line of research, a corpus can also be used to provide normative information for valid and adequate measures. To improve the feasibility of applying these measures in practical situations and to enable researchers and clinicians to make sense of the analytical results using these measures, it is necessary to have normative information for different age groups for benchmarking purposes. The CHILDES database could again be used for providing such normative information.
Finally, a corpus can also be used to gain in-depth understanding of language development disorders. Through comprehensive contrastive analyses, it is possible to qualitatively and quantitatively describe the developmental differences between children with and without language disorders, e.g. in terms of vocabulary size and range of syntactic structures. In addition, longitudinal data can also be used to investigate the effect of a particular therapeutic intervention. Early interventions play a critical role in optimising the developmental trajectory of children with language disorders during the best window of opportunity (Pence and Justice 2008). By analysing language samples produced before and after a particular intervention, it is possible to evaluate whether targeted changes have systematically occurred in a statistically significant way.
In this section, we discuss a number of ways that a corpus of learner language can be used to find out more about second language development. The International Corpus of Learner English (ICLE) (Granger et al. 2009) constitutes an excellent example of such a corpus. The ICLE corpus consists of sixteen subcorpora, each of which contains 200,000 words of academic essays, mostly argumentative, by intermediate to advanced learners of English, mostly university students, representing a different mother tongue background. The texts across the subcorpora are similar in terms of mode (written), genre (academic essay), field (general) and length (500 to 1,000 words). The following learner variables are recorded: age, learning context, proficiency level, gender, mother tongue, region, knowledge of other foreign languages and L2 exposure. As is discussed below, many of these variables may affect the learner’s L2 development.
Various corpus processing tools can be used to analyse learner corpora in the different ways to be discussed below. The Graphic Online Language Diagnostic (GOLD) system that is being developed at the Center for Advanced Language Proficiency Education and Research (CALPER) at the Pennsylvania State University is especially designed for tracking learner development as it is happening. The system allows the users to compile, upload and update their own learner corpora, to share corpora with each other, and to analyse any subset of data defined by the value or values of one or more variables (e.g. the learner’s gender, native language, programme level, standardised test score, L2 exposure, etc.) within one corpus or across multiple corpora. Users who are experienced with XML may compile a corpus by creating their own XML file following the required format, and those who are not may use the guided XML creation interface in the system. Users may choose to make their own corpora accessible to themselves only, but they may also give selected or all other users the right to view or modify their corpora. The system is able to perform detailed frequency analysis, lexical analysis and concordance and collocation analysis. Provided that detailed metadata information, i.e. information about the students and the language samples, is encoded in the corpora, the system allows one to easily compare different subsets of data from one or more corpora in various different ways. For example, one may compare data from the same learner or group of learners over time to track their language development, or one may compare data from different groups of learners exposed to different instruction to examine the effect of instruction on the learner’s development. This functionality makes GOLD an especially useful interface for analysing learner corpora in second language development research.
The first way a corpus can be used to reveal second language development is as a database for describing the characteristics of the interlanguage of learners at known proficiency levels. To this end, it is necessary to have a learner corpus that encodes information about the learner’s proficiency level. Proficiency level can be conceptualised in a number of different ways, e.g. classroom grades, holistic ratings, programme levels, school levels and standardised test scores, among others (Wolfe-Quintero et al. 1998). One may choose to focus on a particular aspect of the interlanguage, for example the degree to which informal, colloquial patterns or styles are used in formal, written language. One may also attempt to provide a comprehensive description of the lexico-grammatical system of the interlanguage.
This type of descriptive study can benefit both from error analysis and from contrastive analysis of learner data and native speaker data. To conduct an error analysis, it is necessary to first design an error annotation scheme, which should be consistently followed in identifying and annotating errors in learner text. A good example of an error annotation scheme can be found in Granger (2003), which assigns each error first to one of the following nine major domains: form, morphology, grammar, lexis, syntax, register, style, punctuation and typo, and then to a specific category within the domain. For example, the syntax domain consists of the following four categories: word order, word missing, word redundant and cohesion. Each domain and category is labelled by a unique tag, e.g. < X > for the syntax domain and < ORD > for the word order category. An error-annotated learner corpus enables one to easily identify the common errors that learners at a given proficiency level tend to make.
A contrastive study of learner data and native speaker data helps us to look at the characteristics of the interlanguage from a different perspective, in particular, how it converges to or deviates from native speaker usage. For example, one may assess whether learners tend to overuse or underuse certain words, phrases, collocations, grammatical constructions, speech acts, etc., relative to native speakers (Granger 1998). It is important, however, to ensure that the learner data and the native speaker data are of comparable nature in terms of mode, genre and field, etc. One excellent resource for this type of contrastive study is the Michigan Corpus of Academic Spoken English (MICASE). This corpus is a collection of transcripts of academic speech events recorded at the University of Michigan. The online interface allows one to search these transcripts and specify desired speaker attributes and transcript attributes. Speaker attributes include academic position or role of the speaker (e.g. junior undergraduate, senior undergraduate, etc.), native speaker status (e.g. non-native speaker, native speaker of American English, native speaker of British English, etc.) and first language. Transcript attributes include speech event type (e.g. advising session, dissertation defence, etc.), academic division (e.g. humanities and arts, social sciences and education, etc.), academic discipline (e.g. American culture, business administration, etc.), participant level (e.g. junior undergraduates, senior graduate students, etc.), and interactivity rating (e.g. highly interactive, highly monologic, etc.). The structure of the corpus and the functions of the online search interface make it possible for one to conduct a contrastive analysis of comparable non-native and native speaker data.
Second, a corpus can be employed in developmental sequence studies to examine the order in which morphosyntactic structures of the target language are acquired. This generally necessitates the analysis of learner errors and performance using longitudinal data. For example, studies of the order of morpheme acquisition or the stages involved in the development of certain grammatical structures, e.g. relative clause or negation, may examine the frequency and accuracy at which learners use different morphemes or different realisations of the target grammatical structure at different time points during the developmental process.
Third, a corpus may be used in developmental index studies to identify objective measures of accuracy, fluency and complexity that can be used to index levels of second language development or the learner’s overall language proficiency. As summarised in Wolfe-Quintero et al. (1998), a number of cross-sectional studies have investigated the differences in syntactic complexity of second language writing between different proficiency levels. However, there is substantial variability among these studies in terms of choice and definition of measures, writing task used, sample size, corpus length, timing condition, etc. This variability makes it challenging to compare the results reported in different studies. To eliminate such inconsistency and variability, it is desirable to evaluate the full set of measures that have been in use in developmental index studies using one large corpus of learner data. Lu (2009) constitutes an effort in this direction. A computational tool was designed to automatically compute the syntactic complexity of collegelevel ESL writing samples using fourteen different measures, including, e.g. mean length of clause, number of complex T-units per T-unit, number of complex nominals per T-unit, etc. This tool was then used to analyse large-scale ESL data from the Written English Corpus of Chinese Learners (WECCL) (Wen et al. 2005). The corpus is a collection of over 3,000 essays written by English majors in nine different colleges in China. Each essay in the corpus is annotated with a header that includes the following information: mode (written or spoken), genre (argumentation, narration or exposition), school level (first, second, third or fourth year in college), year of admission (2000, 2001, 2002 or 2003), timing condition (timed with a forty-minute limit or untimed), institution (a two- to four-letter code), and length (number of words in the essay). Students in the same school level within the same institution wrote on the same topics, but topics varied from institution to institution. Given the information that is available in the corpus, proficiency level is conceptualised using school level. Through the analysis, this study provided useful insights on how different syntactic complexity measures perform as indices of college-level ESL writers’ language development, how they relate to each other, and how their performances are affected by external factors.
Fourth, a corpus can also be used to examine the contributions of knowledge of the first language as well as the effect of L1 transfer. One the one hand, knowledge of the first language may prove helpful in learning certain aspects of the L2, and learners with different L1 background may show strengths in learning different aspects of the L2. On the other hand, the intrusion of L1 may result in difficulty in acquiring certain lexicogrammatical aspects of the L2 and prevalence of certain forms or grammatical patterns that deviate from the target language in the interlanguage. Consequently, the interlanguages of learners at the same proficiency level but with different L1 background may demonstrate some significantly different characteristics. A contrastive study of such interlanguages may provide evidence of L1 influence, either positive or negative, on learner output. The ICLE corpus constitutes an excellent source of data for this type of research, as students with diverse L1 background are represented. A contrastive study of a learner’s L1 and interlanguage will provide further evidence on the L1 influence.
Finally, a corpus may be used to examine the role of instruction or the effect of a particular pedagogical intervention on language development. For example, by examining corpus data of different groups of learners at the same school level or programme level that are exposed to different types of instruction method, material or linguistic environment, we may better understand whether differences in instruction result in differences in L2 development. In addition, by comparing the learner’s production prior to and after a period of targeted pedagogical intervention, we may assess whether the intervention is effective in helping the learner acquire particular aspects of the L2.
As a field, corpus-based language development research will benefit tremendously from the following future developments. First, language samples produced by children and second language learners often contain many errors and as such present a challenge to natural language processing (NLP) technology, especially when it comes to measures that involve syntactic, semantic and discourse analysis. Therefore, continued enhancement of existing NLP technology and development of robust new NLP technology will facilitate more accurate and reliable automatic analysis of language samples using more diversified measures. A second avenue for future development in the field lies in the systematic collection and sharing of large-scale child and second language development data that encodes richer information about the children or learners producing the data. For child language development research, large-scale longitudinal data and data of children with language disorders are particularly valuable. For second language development research, systematical annotation of the learner’s proficiency level using as many conceptualisations as possible will prove especially useful to second language development researchers. These include school levels, programme levels, standardised test scores, holistic ratings, classroom grades, etc. Large-scale data with richer information will make it easier to draw more reliable conclusions for many of the types of research discussed above. Finally, analysis of second language development data will benefit from the development of consistent and standardised error annotation standards as well as improved automatic error detection techniques. Second language development researchers have often devised their own annotation schemes for error analysis, which makes comparison and sharing of research results problematic. The field in general will benefit from a more consistent annotation scheme. There has also been an increasing stream of research in automatic error detection and correction (Heift and Schulze 2003). The maturity of such techniques will facilitate automatic error analysis of large-scale second language development data and enable researchers to gain more reliable insights into second language use.
The contents of this publication were developed under a grant from the US Department of Education (CFDA 84.229, P229A020010) to the Center for Advanced Language Proficiency Education and Research at the Pennsylvania State University. However, the contents do not necessarily represent the policy of the Department of Education, and one should not assume endorsement by the Federal Government.
Granger, S., Dagneaux, E. and Meunier, F. (2009) International Corpus of Learner English. Version 2. Louvain-la-Neuve: Presses Universitaires de Louvain. (This describes the design and structure of the International Corpus of Learner English and discusses how it may be used and analysed in corpusbased second language research.)
MacWhinney, B. (2000) The CHILDES Project: Tools for Analyzing Talk, third edition. Mahwah, NJ: Lawrence Erlbaum Associates. (This provides hands-on instruction on how to transcribe naturalistic child language development data following the CHILDES format and automatically analyse such data using CLAN. Readers are introduced to a set of computational tools designed to improve the readability of transcripts, to automate the data analysis process, and to facilitate the sharing of transcribed data.)
Pence, L. K. and Justice, L. M. (2008) Language Development from Theory to Practice. Upper Saddle River, NJ: Pearson. (This provides an extremely accessible introduction to the theory and practice of child language development. The material presented in the book is also highly relevant to clinical, educational and research settings.)
VanPatten, B. and Williams, J. (eds) (2007) Theories in Second Language Acquisition: An Introduction. Mahwah, NJ: Lawrence Erlbaum Associates. (This is a collection of papers that present a comprehensive introduction to early and contemporary theories in second language acquisition. It provides an excellent overview of each of these compelling theories.)
Brown, R. (1973) A First Language. Cambridge, MA: Harvard University Press.
Crystal, D., Fletcher, P. and Garman, M. (1989) Grammatical Analysis of Language Disability, second edition. London: Cole & Whurr.
DeKeyser, R. (2007) ‘Skill Acquisition Theory’, in B. VanPatten and J. Williams (eds) Theories in Second Language Acquisition: An Introduction. Mahwah, NJ: Lawrence Erlbaum Associates, pp. 97–114.
Durán, P., Malvern, D., Richards, B. and Chipere, N. (2004) ‘Developmental Trends in Lexical Diversity’, Applied Linguistics 25(2): 220–42.
Granger, S. (ed.) (1998) Learner English on Computer. Austin, TX: Addison Wesley Longman.
——(2003) ‘Error-tagged Learner Corpora and CALL: A Promising Synergy’, CALICO Journal 20(3): 465–80.
Granger, S., Dagneaux, E., Meunier, F. and Paquot, M. (2009) International Corpus of Learner English. Version 2. Louvain-la-Neuve: Presses Universitaires de Louvain.
Heift, T. and Schulze, M. (2003) ‘Error Diagnosis and Error Correction’, CALICO Journal 20(3): 433–46.
Hewitt, L. E., Scheffner, H. C., Yont, K. M. and Tomblin, J. B. (2005) ‘Language Sampling for Kindergarten Children with and without SLI: Mean Length of Utterance, IPSYN, and NDW’, Journal of Communication Disorders 38(3): 197–213.
Hunt, K. W. (1965) Grammatical Structures Written at Three Grade Levels (Research Report No. 3). Champaign, IL: National Council of Teachers of English.
Lantolf, J. P. and Thorne, S. L. (2007) ‘Sociocultural Theory and Second Language Learning’,inB. VanPatten and J. Williams (eds) Theories in Second Language Acquisition: An Introduction. Mahwah, NJ: Lawrence Erlbaum Associates, pp. 201–24.
Lee, L. (1974) Developmental Sentence Analysis. Evanston, IL: Northwestern University Press.
Lenneberg, E. H. (1967) Biological Foundations of Language. New York: John Wiley.
Long, S. H., Fey, M. E. and Channell, R. W. (2008) Computerized Profiling (Version 9.7.0). Cleveland, OH: Case Western Reserve University.
Lu, X. (2009) ‘A Corpus-based Evaluation of Syntactic Complexity Measures as Indices of Collegelevel ESL Writers’ Language Proficiency’, paper presented at the American Association for Applied Linguistics 2009 Conference, Denver, March.
MacWhinney, B. (2000) The CHILDES Project: Tools for Analyzing Talk, third edition. Mahwah, NJ: Lawrence Erlbaum Associates.
Nelson, N. W. (1998) Childhood Language Disorders in Context: Infancy Through Adolescence, second edition. Boston, MA: Allyn & Bacon.
Ortega, L. (2007) ‘Second Language Learning Explained? SLA across Nine Contemporary Theories’, in B. VanPatten and J. Williams (eds) Theories in Second Language Acquisition: An Introduction. Mahwah, NJ: Lawrence Erlbaum Associates, pp. 225–50.
Pence, L. K. and Justice, L. M. (2008) Language Development from Theory to Practice. Upper Saddle River, NJ: Pearson.
Ramer, A. L. H. (1976) ‘The Development of Syntactic Complexity’, Journal of Psycholinguistic Research 6(2): 145–61.
Russell, J. (2004) What is Language Development? Rationalist, Empiricist, and Pragmatist Approaches to the Acquisition of Syntax. Oxford: Oxford University Press.
Scarborough, H. S. (1990) ‘Index of Productive Syntax’, Applied Psycholinguistics 11(1): 1–22.
VanPatten, B. and Williams, J. (eds) (2007) Theories in Second Language Acquisition: An Introduction. Mahwah, NJ: Lawrence Erlbaum Associates.
Wen, Q., Wang, L. and Liang, M. (2005) Spoken and Written English Corpus of Chinese Learners. Beijing: Foreign Language Teaching and Research Press.
White, L. (2007) ‘Linguistic Theory, Universal Grammar, and Second Language Acquisition’, in B. VanPatten and J. Williams (eds) Theories in Second Language Acquisition: An Introduction. Mahwah, NJ: Lawrence Erlbaum Associates, pp. 37–56.
Wolfe-Quintero, K., Inagaki, S. and Kim, H.-Y. (1998) Second Language Development in Writing: Measures of Fluency, Accuracy, and Complexity (Technical Report No. 17). Honolulu, HI: University of Hawai’i, Second Language Teaching and Curriculum Center.