Illustrations

Figures

 

4.1 

An example of transcribed speech, taken from the NMMC 45

4.2 

A column-based transcript 46

4.3 

Line-aligned transcription in a musical score type format 47

11.1 

Captcha (www.captcha.net) 140

11.2 

Information Paradise? 141

11.3 

Plain text with curly quotes 143

11.4 

Plain text with different quote characters 143

11.5 

Saving as UTF-8 in Word 2007 143

11.6 

Plain UTF-8 text with some multi-byte characters 143

11.7 

UTF-8-inspired oddities in WS3 word list 145

11.8 

Three-word-cluster word list 148

11.9 

KW clusters in Hamlet 149

11.10 

KW plot of clusters in Hamlet 150

13.1 

KWIC concordance 168

13.2 

WORD in Becket’s concordance 168

13.3 

Sentence concordance – Alice in Wonderland 170

13.4 

KWIC concordance for rabbit 171

13.5 

KWIC concordance + provenance data 172

13.6 

Sentence concordance 172

13.7 

Concordance sample for cat/cats in Alice in Wonderland 173

13.8 

Phrase search 174

13.9 

Wild-card phrase search 174

13.10 

Wild-card multiword phrase search 175

13.11 

Look* (by the node word) 175

13.12 

Left sort 176

13.13 

Right sort 176

13.14 

Concordance lines of into from BNC Baby Corpus 177

13.15 

‘the * of’ 178

13.16 

‘the * * of’ 178

13.17 

Citation concordance 179

16.1 

Sample concordance lines for ‘expenditure/reduce’ 219

16.2 

Sample concordance lines for the collocational framework theof the 219

16.3 

Sample concordance lines for the meaning shift unit ‘play/role’ 220

16.4 

Sample concordance lines for the organisational framework I thinkbecause 220

17.1 

Conditions associated with omission of that in that-clauses 235

19.1 

Summary of Koester’s (2006) generic framework 263

20.1 

Dispersion plot for sudden* in the Cringe Text Corpus 278

20.2 

Dispersion plot for embarrassed in the Cringe Text Corpus 280

22.1 

Two lines from a concordance search of laughs 311

28.1 

Concordances of average from a corpus of scientific and engineering research articles 397

30.1 

Extract from Touchstone Student Book 3 (McCarthy et al. 2006a: 112) 421

30.2 

Extract from Touchstone Student Book 1 (McCarthy et al. 2005a: 103) 421

30.3 

Extract from Touchstone Student Book 1 (McCarthy et al. 2005a: 39) 422

30.4 

Extract from Touchstone Student Book 4 (McCarthy et al. 2006b: 45) 422

31.1 

Word Sketch for ‘compromise’ 436

34.1 

Dimensions of teacher language 479

36.1 

Parallel concordance (English–French) for lax attitude(s) (ParaConc510

37.1 

Concordance of PEOPLE: MALE (male speech) 526

37.2 

Concordance of SPEECH ACT (male speech) 527

37.3 

Concordance of SPEECH ACT (female speech) 527

37.4 

Concordance of RELATIONSHIP: INTIMACY AND SEX (female speech) 528

37.5 

Concordance of IN POWER (male speech) 528

38.1 

Concordance lines for like from CIDN corpus 538

39.1 

Screenshot of ELAN (the Eudico Linguistic Annotator550

39.2 

The BNC web-user interface 552

39.3 

Interface of the NoTa speech corpus 553

40.1 

Concordance lines for ‘east*’ from The Sun 26,000-word corpus 568

Tables

 

2.1 

A qualitative comparison of a text versus a corpus 19

5.1 

Writing about business 59

5.2 

Writing to do business 59

6.1 

VOICE Transcription Conventions 2.1 from VOICE website 73

6.2 

Total number of occurrences of modals of obligation in each macro-genre 76

7.1 

The situational characteristics of family discourse 90

9.1 

Comparable and parallel multilingual corpora 119

10.1 

Frequency list (rank order) 124

10.2 

Frequency list (alphabetical) 124

10.3 

Comparison of rank frequency 126

10.4 

Positive key words in sociology and history texts 127

10.5 

Negative keywords in sociology and history essays 128

12.1 

Collocates of ‘distinguish between’ divided according to meaning 164

13.1 

Basic index 170

13.2 

One-word context concordance 171

13.3 

Windows wild-cards 174

13.4 

Verb forms 180

13.5 

Hypothesis 1 181

13.6 

Extended patterns 181

13.7 

Hypotheses 1 + 2 181

15.1 

Distribution of lemmas in BoE 198

17.1 

Lexico-grammatical associations of verbs and tenses 230

19.1 

Tribble’s analytical framework 261

20.1 

Linkers with a frequency of 4 in the Cringe Text Corpus 277

20.2 

Key content words in the Cringe Text Corpus 279

20.3 

Four-word clusters with a frequency of 5 in the Cringe Text Corpus 280

20.4 

Frequency of some key phrases in the Cringe Text Corpus 281

24.1 

She is Mike’s sister 342

27.1 

Areas of corpus investigation and sample questions 377

27.2 

A summary of dos and don’ts when reading and interpreting concordance lines 380

28.1 

List of the fifty most frequent keywords in a corpus of sample in-house EAP materials 393

28.2 

List of the most frequent keywords in the science and engineering corpus 394

28.3 

List of fifty words selected from the science and engineering corpus for further examination 395

30.1 

Expressions from frequency lists in the Cambridge International Corpus, North American Conversation 419

31.1 

Word frequency data 429

31.2 

Left-sorted BNC concordance lines 432

31.3 

Right-sorted BNC concordance lines 432

31.4 

Concordance lines for ‘production’ 434

31.5 

Concordance lines for ‘have’ 435

31.6 

Distribution of ‘sidewalk’ in the Cambridge International Corpus acrossfour corpora 437

32.1 

Citation worksheet 446

32.2 

Anticipated criticism and writer’s defence 447

32.3 

Concordance task for it seems … in published articles and student dissertations 448

32.4 

Concordance lines for ‘delimiting the case under consideration’ 450

37.1 

Key domain results for SoI and SoE 518

37.2 

Words assigned to HAPPY, VIOLENT/ANGRY and FEAR/SHOCK domains 518

37.3 

SPEECH ACTS in SoI vs SoE 519

37.4 

Key word results 519

37.5 

Concordances of thee 520

37.6 

Mark-up used in the Blockbuster Corpus 523

37.7 

Positive key words for male speech 524

37.8 

Positive key words for female speech 524

37.9 

Positive key domains for male speech 525

37.10 

Positive key domains for female speech 526

37.11 

5-grams (male speech) 528

37.12 

4-grams (female speech) 529

38.1 

The top twenty most frequently used words in the Limerick Corpus of Irish English 537

38.2 

The twelve most frequent words in LCIE and CIDN 538

38.3 

Sentences containing be + after + V-ing 540

38.4 

Sentences containing subordinating and 540

39.1 

Types of spoken discourse in Spoken Dutch Corpus 550

39.2 

Search result ([xx0] do [pni]) extracted from the Brigham Young University’s web-based interface to the BNC 554

40.1 

The highest CCSK values for the The Sun corpus 570

40.2 

The highest CCSK dispersion plot values for The Sun corpus 571

41.1 

Comparative analysis of ‘then’ 582

41.2 

Summary of comparison of three sets of documents 583

41.3 

Twelve search items in Unabomber case web corpus investigation 585

41.4 

Coulthard’s (2004) findings for the string ‘I picked something up like an ornament’ 586

41.5 

Coulthard’s (2004) findings for the string ‘I asked her if I could carry her bags’ 586

43.1 

Keyword categorisation of health themes in AHEC 609

43.2 

Frequencies of sexually transmitted infections and conditions in the AHEC Corpus 611

44.1 

Concordance of teacher in the SACODEYL corpus 627