Kieran O’Halloran
Using tools of linguistic analysis, the study of how texts, particularly media texts, frame the events or issues they describe is one part of what is known as Critical Discourse Analysis (CDA). This branch of linguistics sheds light on how such framings can constrain appreciation of what is being reported. Before corpus linguistics became mainstream, CDA examined such framings in single texts at a particular point in time, or over a very short period. One of the advantages of the abundance of media texts in electronic form on the world-wide web is the ease with which corpora can be assembled for revealing the following: how media texts might be repeatedly framing issues or events which are reported over a significant period of time. This is a real advance for Critical Discourse Analysis. Conveniently, using corpus investigation, critical discourse analysts can now gain insight into the kinds of cultural and ideological meanings being circulated regularly, as well as being potentially reproduced by readers. Increasingly, critical discourse analysts employ corpora in their investigations of media discourse.
This chapter outlines Critical Discourse Analysis (Section 2), and discusses the corpusbased approach to CDA, including further advantages with this approach (Section 3). The substantive section of the chapter contains a case study which demonstrates the value of corpus-based Critical Discourse Analysis in revealing how media texts frame events and issues over a significant period of time (Section 4). Finally, this chapter discusses some ways in which the results of media corpus investigation can be disseminated using relatively recent web-based technologies.
CDA investigates how language use reproduces the perspectives, values and ways of talking of the powerful, which may not be in the interests of the less powerful. It thus focuses on the relationship between language, power and ideology. Ideologies are representations of aspects of the world which contribute to establishing and maintaining social relations of domination, inequality and exploitation, which CDA views as problematic and in need of addressing. Employing tools of linguistic analysis, critical discourse analysts seek to unpick these problem relations in order to illuminate how language use contributes to the domination and misrepresentation of certain social groups:
Analysis, description and theory formation play a role especially in as far as they allow better understanding and critique of social inequality, based on gender, ethnicity, class, origin, religion, language, sexual orientation and other criteria that define differences between people. Their ultimate goal is not only scientific, but also social and political, namely change. In that case, social discourse analysis takes the form of a critical discourse analysis.
(van Dijk 1997: 22–3)
CDA is thus a form of social critique. It encourages reflection on social and cultural processes and their relationship with language use, since it is assumed in CDA that this relationship is ‘dialectical’ or bi-directional. In other words, reproduction of unequal language use (e.g. ‘man and wife’ as opposed to ‘husband and wife’) can help to reproduce unequal cultural processes and vice versa. CDA is a committed form of discourse analysis since analysts are involved in contesting the phenomena they study. Analysts often make their social and political position explicit (usually left-liberal) in revealing and challenging dominance. With such a focus, CDA is drawn to texts where the marginal, and thus relatively powerless, feature, e.g. ethnic immigrants. Through the use of linguistic analysis, CDA ‘diagnoses’ a text’s problems particularly with regard to (mis) representation of the marginal. While one does not need to be a critical discourse analyst to be critical of language and power abuse, a critical discourse analysis would differ from a ‘lay’ critique by having ‘systematic approaches to inherent meanings’, relying on ‘scientific procedures’ and necessarily requiring the ‘self-reflection of the researchers themselves’ (Fairclough and Wodak 1997: 279). Among the principal architects of CDA are Paul Chilton, Norman Fairclough, Teun van Dijk and Ruth Wodak (see, for example, van Dijk 1991; Wodak et al. 1999; Fairclough 2001; Chilton 2004).
CDA has not escaped criticism, which has largely been on methodological grounds. Widdowson (2004) accuses CDA of being over-subjective in its analysis of texts because its analysis is directed by political commitment. The charge of subjectivity can be awkward to refute when CD analysts are not part of the target audience of the texts they analyse. This is because the CD analyst may describe aspects of the text which he or she objects to, and go on to produce an interpretation of, when the target audience may not actually generate this interpretation. When there is no kind of empirically based investigation which can shine light on audience-response or the facets of a text that the audience is likely to notice, CDA has been open to the charges of: (i) arbitrariness of analysis; (ii) circularity from analysis to interpretation and back to analysis, since there is nothing to determine which facets of a text to focus on other than what chimes with the political commitment of the analyst.
With a few examples in the 1990s, but with momentum in the first decade of the twenty-first century, CDA is drawing on corpus linguistic modes of investigation. This has helped to improve methodological rigour and, in turn, mitigate attack from critics. For example, use of large reference corpora in CDA, for purposes of comparison with a text(s) under investigation, reveals salient linguistic features in that text. In this way is arbitrariness, and thus analyst subjectivity, reduced since it is the software which reveals salience and not the analyst. Another advantage of corpus-based CDA is that analysts can go beyond single texts and conveniently explore quantitative patterns of ideological meaning in a large number of texts. This can be done either synchronically, i.e. in a set of texts produced on the same day, or diachronically, i.e. looking at linguistic patterns over a period of time. The following are examples of work in corpus-based CDA where media discourse examination features: Stubbs (1996, 2001); Charteris-Black (2004); Adolphs (2006); Baker (2006); O’Keeffe (2006); Mautner (2007); O’Halloran (2007); Baker et al. (2008); Koller and Davidson (2008); Hidalgo Tenorio (2009). Studying quantitative patterns can be interesting in itself. But, since CDA is concerned with examining ideological meaning, quantitative analysis needs to be combined with qualitative analysis (Fairclough 2003: 6). The case study in Section 4 will show how quantitative data mined from a corpus can usefully ground qualitative analysis and in so doing help to enhance methodological rigour.
One important inspiration for corpus-based CDA has been the work of the corpus linguist Michael Stubbs (notably Stubbs 1996, 2001), and in particular the way he builds on the work of Raymond Williams’‘cultural keyword’ analysis from a corpus-based perspective. Corpus-based CDA has also drawn on another notion of keyword such as that found in Scott (2008). The following section outlines these different notions of ‘keyword’, notions which are drawn upon in the case study in Section 4.
For Raymond Williams, the Marxist thinker, ‘key’ in ‘keyword’ indicates that a particular concept is salient across a culture. So, for example, ‘democracy’ and ‘revolution’ are keywords for Williams. Williams (1983) is a socio-historical, diachronic dictionary of keywords where their semantic development over centuries is traced and interrelationships explored. For this work, Williams used the complete Oxford English Dictionary (OED), which runs to several volumes. To help with discriminating different types of keyword, this chapter refers to Williams’ notion of keyword as a cultural keyword (as, indeed, does Stubbs 1996, 2001).
Chapters in Stubbs (1996, 2001) also examine cultural keywords. The difference from Williams is that Stubbs’ investigation of cultural keywords is done in the main synchronically and is informed by corpus-based methods. ‘Standard’ is one of the cultural keywords which Williams (1983) investigates using the OED. Stubbs (2001) uses a 200million-word corpus of contemporary English, which consists mainly of newspaper and magazine media texts, in order to highlight the most common collocates of the word ‘standard’: ‘living’, ‘high’, etc. Collocates are words which commonly accompany other words over short word spans: that is, they form a collocation such as ‘living standards’. The method is more rigorous than when Williams focuses on contemporary usage since it provides objective quantitative support for the extent to which cultural keywords are being used, and the lexical company they keep. It thus provides a measure of what meanings are being culturally reproduced. We might refer to the cultural keywords Stubbs looks at as corpus-based cultural keywords.
Stubbs (1996, 2001) uses corpus-based methods to examine cultural keywords related to the education policy of the UK Conservative Party during the 1980s and 1990s (e.g. correctness, grammar, standards). Here is Stubbs (2001: 158):
Keywords often inter-collocate, and ideas gain stability when they fit into a frame. Many everyday ideas about language fit very firmly into a frame which contains terms such as:
• standard, standards, accurate, correct, grammar, proper, precise
For linguists, the same terms mean something quite different because they fit into an entirely different lexical field, which contains terms such as:
• dialect, language planning, high prestige language, social variation
These fields are systems of meaning, which use particular vocabulary, take particular things for granted, appeal to different states of knowledge (for example, lay and professional), and therefore allow only particular argumentative moves.
In highlighting how cultural keywords inter-collocate in frames, Stubbs provides a very useful insight.
A large corpus is not only used to provide quantitative support for cultural keywords. It can be used to find a different type of keyword, one which Scott (2008) was designed, in part, to investigate. This type of keyword is defined as words which are statistically more salient in a text or set of texts than in a large reference corpus. ‘Keyness’ here is established through statistical measures such as log likelihood value (see Dunning 1993; see also chapters by Evison, Scott and Tribble, this volume). Relatively high log likelihoods indicate keywords. In order to distinguish this type of keyword from the ones discussed so far, this chapter uses the expression corpus-comparative statistical keyword.
Corpus-comparative statistical keywords can be both lexical and grammatical words. Imagine a comparison of a corpus of several thousand mobile phone text messages in English with a large reference corpus of English consisting of millions of words from a variety of different genres. It is likely that the grammatical word ‘da’ (a truncation of the definite article) would have relatively high keyness since it is much more likely to feature in text messages than in most other genres. It would then be a corpus-comparative statistical keyword. One important function of the corpus investigation software WordSmith Tools (Scott 2008) is revealing these keywords in a text or a corpus in comparison with a much larger reference corpus.
Corpus-comparative statistical keywords may or may not coincide with (corpus-based) cultural keywords. This is because the former may be: (i) any lexical word, and thus part of an extremely large set or (ii) any grammatical word. In contrast (corpus-based) cultural keywords are all lexical words from a much smaller set. Although corpus-based cultural keywords have the advantage that collocation patterns are based on quantitative evidence, all the same this is quantitative evidence around cultural keywords which have been pre-established by a human interpreter. This may be uncontroversial as in the case of ‘democracy’ or ‘immigration’. Alternatively, choice of cultural keyword may overly reflect the cultural and political sympathies of the human interpreter. For example, Williams (1983) includes ‘peasant’ as a cultural keyword. In contrast, an advantage of corpus-comparative statistical keywords is that they are established objectively by software. They are not then intuited by human researchers. Their generation reduces arbitrariness in their identification. However, it must be stressed that though corpuscomparative statistical keywords are generated objectively by software, this objectivity is always relative to a particular reference corpus.
Having given an account of different cultural keywords/keywords, this chapter will now supply a case study. This will include a corpus-based critical discourse analysis of a corpus of media texts where these different notions of keywords are important in the analysis and interpretation of ideological meanings that a regular target readership would have been exposed to, and may potentially be reproducing.
This case study focuses on a set of texts over a six-week period in the British popular tabloid newspaper The Sun on the topic of the European Union (EU) expansion on 1 May 2004. On this date, ten new countries joined the EU, eight of which are located in the east of Europe: Cyprus, Czech Republic, Estonia, Hungary, Latvia, Lithuania, Malta, Poland, Slovakia, Slovenia. It was six weeks prior to 1 May that The Sun began producing news texts regularly on the subject of the EU’s imminent expansion. In cursory readings of some of the texts, there seemed to be some kind of quasi-campaign being run, a series of predictions being made about imminent immigration from Eastern European countries. On such a reading, there appeared to be a purpose to persuade readers of a set of possibilities (e.g. detrimental effects on social services from huge immigration). But how and where to begin a proper systematic analysis of the texts to establish the kinds of ideological meanings that regular target readers would have been exposed to, and might potentially be reproducing? One could go through the texts and perform a qualitative analysis, but this would run the risk of arbitrariness and thus circularity of interpretation to analysis to interpretation. As will be shown, the advantage of a quantitative corpus-based analysis using software such as WordSmith Tools is that we can first establish quantitative patterns of lexis and grammar within which we can ground qualitative interpretation of semantic patterns. This, as will be demonstrated, reduces circularity and arbitrariness of interpretation.
The corpus I investigate consists of all texts in the six weeks prior to 1 May which contain the cultural keywords: ‘(im)migration’, ‘(im)migrant(s)’, ‘EU’ and ‘European’. The corpus consists of seventy-six texts, a total of 26,350 words, and is in chronological order from 20 March to 30 April 2004.
The corpus investigation focuses on how Eastern European migrants are realised linguistically. For the first type of quantitative pattern to investigate in the corpus, let me first explore collocation for ‘east*’ where the asterisk is a wildcard to capture ‘east’ and ‘eastern’ (‘Europe(an(s))’). There are thirty-seven concordance lines for ‘east’ (Figure 40.1):
Figure 40.1 Concordance lines for ‘east*’ from The Sun 26,000-word corpus
The concordancer function of WordSmith Tools is useful because it can reveal patterns, and thus a sense of what regular readers would be exposed to (see Tribble, this volume). Collocationally a dominant pattern, with seventeen instances, is numbers of Eastern European migrants who are likely to arrive after 1 May 2004: e.g. ‘coachloads’, ‘many’, ‘millions’. So we know that regular readers are exposed to the idea of numbers. It is tempting to read more significance than this into the corpus, e.g. the use of numbers signals an alarmist and at worst racist overtones; in other words, The Sun is communicating that the UK will be ‘swamped’ with foreign nationals. However, numbers on their own do not necessarily provide significance. For example, migration is likely to be commonly talked about numerically by government bodies in a neutral sense, and numbers referred to in the concordance lines could be reports of government figures. As such, one should be careful not to over-interpret concordance line data. Having said this, although it is difficult to see from concordance lines the significance of the dominant pattern of numbers and migrants, we can take this as evidence of a frame in Stubbs’ terms, given the regularity of collocation here. That is, from the evidence above, regular readers of The Sun over the six-week period would understand Eastern European migration in association with large numbers.
What kind of migrants are projected to come? One collocation pattern relates to poverty: ‘high-unemployment’, ‘impoverished’, ‘poor’ (in italic in Figure 40.1). But at three instances this can hardly be said to be a major pattern. Second, it is hard to know without more co-text whether these are negative judgements of migrants or perhaps expressions of empathy. Meanings around migrants which are more obviously negative are as follows: ‘underqualified doctors and nurses from Eastern Europe’; ‘vice-girls’; ‘new rules are dodged or challenged by Eastern European migrants’; ‘bogus applications from eastern European’; ‘The men and women – all from eastern Europe – were arrested’; ‘false passports’; ‘suspected visa scam’; ‘criminal scam’ (in bold above).
I have indicated that one cannot always show the significance of what is regularly repeated from concordance line data. To fully achieve this, and especially in relation to ideological reproduction, we may need to go beyond looking at collocation in concordance lines and take account of how clauses and indeed sentences relate to one another. In any case, since ideological reproduction through language ultimately requires an inferential contribution, clause/sentence structure may well have a significant effect on the types of inference readers and listeners generate around cultural keywords. The critical discourse analyst Fairclough (1992: 84) gives the following useful example (which includes the word, ‘job’, identified as a cultural keyword in Stubbs 1996):
what establishes the coherent link between the two sentences ‘She’s giving up her job next Wednesday. She’s pregnant’ is the assumption that women cease to work when they have children. In so far as interpreters take up these positions and automatically make these connections, they are being subjected by and to the text, and this is an important part of the ideological ‘work’ of texts and discourse in ‘interpellating’ subjects.
Ideological reproduction would not take place through ‘job’ or indeed through ‘pregnant’, but through the coherence inference linking these words, i.e. coherence is set up through text structure. Where corpus-based collocation analysis is useful is in showing what meanings readers are regularly exposed to, and thus the frames they may possess. Where it is often limited is in providing evidence for the kinds of inferences readers are likely to make, as a result of this exposure, in accordance with text structure (which may stretch over sentences and thus not be captured in a single concordance line). In other words, it is limited in showing the significance of frames in actual discourse.
In the rest of this case study, I will show the usefulness of corpus investigation in identifying regular text structural patterns – that is, going beyond collocation – which in turn can indicate the kind of ideologically laden inferences that regular target readers could make. To start with, in the next section, I use WordSmith Tools 5.0 to generate corpus-comparative statistical keywords (henceforth CCSKs). I then examine the extent to which CCSKs repeatedly:
1 co-occur across stretches of text larger than collocational spans in concordance lines;
2 provide text structure.
Then, in the following section, I explore whether CCSKs are potentially providing text structure for a set of semantic patterns which, in turn, can indicate the kinds of ideologically laden inferences readers might make in subsequent reading of related texts in relation to migration.
Using WordSmith Tools, the first procedure is finding the CCSKs of The Sun corpus. This is done by comparing The Sun corpus with a reference corpus. The reference corpus used is BNC-baby, an approximately four-million-word sample of the British National Corpus, which consists of around one million words each of academic prose, conversation, fiction and newspaper text. This reference corpus is chosen since its genres are mainstream ones and, as a whole, can be taken as a ‘snapshot’ of mainstream use of English. To ascertain CCSKs, firstly wordlists for both corpora need to be generated; a
wordlist provides word frequencies in a corpus (see chapters by Evison and Scott, this volume). Then, these wordlists are compared by the software to establish keyness values. Table 40.1 shows the ten highest values for ‘keyness’. As can be seen, the cultural keyword ‘immigration’ has high keyness, as does ‘EU’. What this means is that ‘immigration’ and ‘EU’ occur with greater statistical salience in The Sun corpus than in the reference corpus. Since the corpus texts are selected in part from these search words, it is not surprising that they should feature as CCSKs.
CCSKs provide a rough snapshot of salient topics in a corpus. CCSKs with relatively high values indicate the UK politicians being recurrently referred to over this period: Blair (Prime Minister), Blunkett (Home Secretary) and Hughes (Home Office Minister for Citizenship and Immigration) and also other stories connected with immigration, immigration officialdom, etc. but not necessarily connected with Eastern Europeans. (The software treats ‘s’, as in ‘Britain’s’ or ‘he’s’, and ‘t’, as in ‘don’t’, as whole words.)
For a word to occur as a CCSK, it does not necessarily need to occur very frequently. For example, ‘Trajce’ (a town in Poland) has keyness of 30.04; it only occurs three times in The Sun corpus but does not occur at all in the reference corpus. Because I am interested in seeing whether repeated semantic patterns in The Sun corpus can be identified around CCSKs, keyness is not the only criterion I need to take into account. Another is frequency. However, it must be borne in mind that out of the seventy-six texts, some CCSKs may occur very frequently in just a handful of texts. Because my focus is on identifying regularly recurrent semantic patterns over the whole of the corpus, CCSKs which feature in a concentrated burst are of less relevance to me. I need, then, to find CCSKs which are not only frequent but well dispersed across the six weeks’ worth of texts. The dispersion plot facility of WordSmith Tools is useful for my purposes. Table 40.2 shows the ten highest dispersion plot values for CCSKs. The dispersion value is a number between 0 and 1, with those CCSKs with a value close to 1 having the greatest dispersion. Organisation is in terms of descending dispersion values. (Plot dispersion uses the first of the three formulae supplied in Oakes 1998: 190–1.)
Interestingly, the grammatical word ‘But’ (i.e. with a capital ‘B’) has the third highest dispersion value (0.915); it has keyness (28.93). This is interesting for my focus since, as a
grammatical word, it signals a contrastive relation across sentences and thus creates text structure. Moreover, ‘but’ (i.e., with a lower-case ‘b’) does not have keyness since it has a negative value (-30.53). There are eighty-five instances of ‘But’ and forty-nine instances of ‘but’ in The Sun corpus (0.31 per cent and 0.18 per cent of the total number of words respectively). However, in BNC-baby there are 6,594 instances of ‘But’ and 14,619 instances of ‘but’ (0.16 per cent and 0.36 per cent of the total number of words respectively). The ratio of quantities of ‘But’/‘but’ in The Sun corpus is almost the opposite to that of BNC-baby. So, on this comparison, that ‘But’ has keyness is even more significant. To sum up: in The Sun corpus ‘But’ has significant keyness, frequency and dispersion.
The results for ‘But’ are interesting especially since, in starting a sentence, ‘But’ has prominence. This is even more the case if it begins a sentence which initiates a paragraph. And indeed, it is common in The Sun for paragraphs to be one sentence in length. So, if ‘But’ is helping to provide text structure for semantic patterns on a regular basis, in turn the semantic patterns will have prominence. ‘But’, as a grammatical word, will of course not provide lexical content. So, to ascertain if ‘But’ is providing structure for semantic patterns (i.e. patterns of meaning which in stretching over clauses go beyond collocation), I need to discern whether there are associations of ‘But’ with lexical CCSKs. If I can find regular semantic patterns around such associations, then I will have reduced the prospect that I am identifying these arbitrarily. Finding such associations can be done by employing another function of WordSmith Tools – keyword links.
The ‘keyword links’ function of WordSmith Tools shows the number of CCSKs which co-occur with a particular CCSK in a designated word span. The highest word span possible is twenty-five words to both the left and the right of a search term. I choose this span so as to capture the maximum possible co-occurrences with ‘But’. There are nine CCSK links for ‘But’ for this span (one which crosses sentence and paragraph boundaries):
Britain, he, home, immigration, Mr, UK, vote, will, yesterday.
The next stage is to search through the entire Sun corpus and highlight all instances of ‘But’, as well as the nine CCSK links to ‘But’, and then inspect whether regular semantic patterns can be identified around any of these ‘But CCSK links’ in relation to Eastern European immigration. (The CCSK link span was restricted to -25+25, but this highlighting may go beyond this span.)
In exploring the text fragments with the keyword links, I identified five different semantic pattern types associated with ‘But’ and the CCSK links. For the semantic patterns identified, the CCSKs which are most repeatedly linked to ‘But’ are ‘will’, ‘UK’, ‘Britain’, ‘immigration’ and ‘home’ (in ‘Home Office’, the UK government department which deals with immigration). Below, I indicate the five semantic pattern types in the corpus via examples with the CCSKs in italics. In brackets next to each pattern is the number of times each pattern occurred in the corpus.
Semantic Pattern 1 (x 18): for (Eastern European) immigration, there is UK government or official agency incompetence
With this semantic pattern, there is often a reporting of government figures for immigration from new EU countries, the actions of immigration officials, etc. Then ‘But’ begins a sentence or paragraph which contrasts negatively with the UK government perspective and which sets up negative evaluation, either explicit or implicit, that government, officials, etc., are incompetent, e.g.:
26 April The Home Office estimates that no more than 13,000 workers will come each year.
But others put the figure as high as 54,000 due to high unemployment in the East.
Semantic Pattern 2 (x 8): for (Eastern European) immigration, (fear, worry, challenge from) large numbers predicted to arrive in the UK
In this semantic pattern, ‘But’ is used to provide negative contrast in relation to the UK facing huge (Eastern European) immigration which is a challenge to the government, or a fear/worry for various reasons, e.g.:
27 April Stay Strong
THE SUN welcomes Tony Blair’s assurance that he will keep a tight grip on immigration from Eastern Europe.
He is right to recognise that people are worried about what will happen after May 1.
But the PM faces a huge challenge. Thousands are heading for Britain in search of a better life.
Semantic Pattern 3 (x 4): for (Eastern European) immigration, there will be strain on social services
With this semantic pattern, there is a projection of numbers of East European ‘migrants’ arriving in the UK. Then, the semantic pattern via ‘But’ indicates, through negative contrast, that this will lead to overstretched UK social services, e.g.:
26 April THOUSANDS of plane, train and bus seats had been snapped up last night by poor East Europeans seeking a better life in Britain.
They are free to come here when ten new countries join the EU this Saturday, May 1.
But it is feared that in some areas overstretched UK services like schools and hospitals will be unable to cope with the influx.
Semantic Pattern (x 4): for (Eastern European) immigration, there will be some illegal (EU) status
Semantic Pattern 4 relates to stories about illegal immigrants in the UK, or predictions that there will be immigrants arriving in the UK from Eastern Europe, who will have illegal EU citizenship (posing as Polish below), e.g.:
2 April Lieutenant Miroslaw Szacillo, 46, of the Polish Border Guards, assured The Sun: ‘Only the most serious criminals with big money can afford to buy false documents in Poland. We make stringent checks on our borders and we are getting more equipment to detect fake documents.’
But Poles who do get in and find work in Britain will qualify for a range of benefits including free healthcare, child tax credit, child benefit, working tax credit, housing benefit and council tax benefit.
Notice also above that ‘But’ is used, in negative contrast, in predicting that people claiming to be Polish will be able to claim benefits.
Semantic Pattern 5 (x 2): for (Eastern European) immigration, there will be some criminality
In Semantic Pattern 5, ‘But’ is used to signal, through negative contrast, the prospect of serious criminality characterising some immigration from Eastern Europe, e.g.:
31 March Many new arrivals will be good news for Britain.
But some will be up to no good. Gun-happy crime syndicates have already set up vicious vice and drug rackets.
Others want to do our nation harm. The Wall Street Journal says Islamic fanatics are using immigration as a ‘Trojan horse to expand jihad, or holy war’.
‘But’ is never used in the corpus to indicate a positive evaluation of migrants as contrasting with a previously stated negative one (e.g. ‘some migrants from Eastern Europe may be criminals. But the majority will be good for the economy’). ‘But’ in the thirty-six semantic patterns signals, either explicitly or implicitly, negative contrast. Thus, the corpus investigation shows there is evidence that ‘But’ has potentially been ‘primed’ for regular Sun readers to indicate negative evaluative contrast, specifically in relation to predictions around future migration to the UK. However, the priming of ‘But’ relates also to the grammatical word, ‘will’, and lexical words such as ‘immigration’. The priming is actually, then, lexicogrammatical. (On ‘priming’ see Hoey 2005). So, regular readers of The Sun during the six-week period could well expect ‘But’ to preface a negative prediction (most likely around ‘will’) about Eastern European migration to the UK. Conversely, because of these repeated lexicogrammatical associations, regular readers have been positioned into making negative contrast with information on Eastern European immigration in related subsequent texts. In other words, they have been positioned into making negative contrastive inferences from repeated exposure to the five semantic patterns.
The most common semantic pattern is Semantic Pattern 1 at eighteen instances; the next is Semantic Pattern 2 at eight instances. Individually, Semantic Patterns 3, 4, and 5 are fewer in number than Semantic Patterns 1 and 2. However, they are different in kind to Semantic Patterns 1 and 2 since they are predicted negative consequences of projected Eastern European immigration; taken together, Semantic Patterns 3, 4 and 5 amount to ten instances.
Interestingly, while the patterns are semantically distinct in the texts, they are often in close proximity to each other, linked to each other around the ‘But CCSK links’. Consider for example:
26 April DON’T blame the people of Eastern Europe for heading for Britain. (SEMANTIC PATTERN 2).
The Government has put out the welcome mat. (SEMANTIC PATTERN 1)
And to hard-up foreigners this looks the land of milk and honey.
But most will end up in unskilled low-paid jobs in the South East.
How will they afford to live? And how can schools and hospitals which are already at breaking point find room for them? (SEMANTIC PATTERN 3)
(my italics and bold)
Semantic Pattern 1, seemingly so different to Semantic Patterns 2–5, links in the data to other semantic patterns as can be seen in the example above. Because of regular interrelating of semantic patterns around ‘But CCSK links’, potential has been created for the reader to deduce one or more of the five semantic patterns in subsequent reading of other semantic patterns over the six weeks prior to 1 May. In this way, ideological reproduction would take place. In other words, a discourse is created around new Eastern European migrants where they are not treated with equity.
It is also worth noting that the preponderance of Semantic Pattern 1 (eighteen instances) across the corpus might seem to protect The Sun from accusations of bias against immigrants, i.e. it seems to focus much more on government incompetence than on the predicted consequences of migration. However, the interrelatedness of Semantic Pattern 1, along the ‘But CCSK links’, with other semantic patterns, and the potential for deductions from Semantic Pattern 1 means that in practice it would be disingenuous for The Sun to claim that it is always only criticising the government. Such interrelatedness, especially with Semantic Pattern 3, means that seeming expressions of empathy such as ‘high unemployment’, ‘impoverished’, ‘poor’ (see the examples in Figure 40.1) are, in fact, likely to imply migrants will be a drain on social services. Finally, it is also notable that a concordance line for ‘Eastern Europe’ in the examples in Figure 40.1 cannot show the cohesive pattern of, for example, ‘Eastern Europe’ to ‘hard-up foreigners’ to ‘most’ to ‘they’ to ‘them’ in the 26 April fragment above (see my bold). The concordance lines, in turn, are limited in showing the interrelatedness of semantic patterns, and thus the kinds of dynamic meanings regular target readers could be generating.
In the ways I have indicated, ideological reproduction can potentially take place in reading. I have thus shown the value of a corpus-based CDA because it grounds qualitative patterns in quantitative patterns and in so doing can help to reveal patterns of ideological meanings. Furthermore, I have shown the value of grounding interpretation of semantic patterns in corpus-generated data since arbitrariness of identification, and circularity of analysis to interpretation to analysis, has been considerably reduced.
This chapter began by extolling the advantage of having media texts in electronic form on the world-wide web. With the ever developing world-wide web and the bloom in technological tools which facilitate production and distribution of content, ‘media’ now means a multitude of things. Not only can it include electronic web versions of printbased media, and internet television and radio, but also ‘citizen media’. There are many forms of citizen-produced media (hence the term prosumer, producer–consumer) including blogs, vlogs, podcasts, digital storytelling, participatory video. Some forms of citizenproduced media are ‘parasitical’ on the official media, such as reader electronic comments on a piece of on-line journalism, or contributions to newspaper web discussion forums. The world-wide web has thus facilitated media prosumption – citizens can both consume the media and produce it (Tapscott and Williams 2008).
Analysing a media corpus for patterns of ideological meaning can become a form of prosumption if the critical discourse analyst feeds back the results of his or her analysis in a vehicle such as a blog, or even engages with actual readers in a discussion forum in the on-line version of the newspaper investigated. With such contributions, one needs to tread carefully, given copyright issues and the legal restrictions on the purposes for which one can compile web-based corpora (e.g. commercial gain is usually prohibited). Analysts would also need to act quickly in compiling their corpora since the topic under investigation will not necessarily remain a hot one. But, done sensitively and carefully, there is the prospect of a new form of critical discourse engagement: one where target readers can become more aware of the regularity of what they consume and potentially the kinds of ideologically laden inferences they may be making in reading of media texts.
Adolphs, S. (2006) Introducing Electronic Text Analysis. London: Routledge. (An accessible discussion of the underlying principles and concepts relevant to electronic text analysis; provides an overview of different types of spoken and written corpora.)
Baker, P. (2006) Using Corpora in Discourse Analysis. London: Continuum. (A lucid examination and evaluation of a variety of corpus-based concepts and methods including: collocation, keyness, concordances, dispersion plots, as well as building and annotating corpora.)
Bednarek, M. (2006) Evaluation in Media Discourse: Analysis of a Newspaper Corpus. London: Continuum. (Presents the first book-length corpus-based account of evaluation, using a corpus of 100 newspaper articles comprising a 70,000 word comparable corpus, drawn from both tabloid and broadsheet media.)
O’Keeffe, A. (2006) Investigating Media Discourse. London: Routledge. (Usefully shows how combining corpus linguistic quantitative method with the qualitative methods of discourse analysis and conversation analysis can lead to multiple insights.)
Stubbs, M. (1996) Text and Corpus Analysis: Computer-assisted Studies of Language and Culture. Oxford: Blackwell. (A pioneering work in corpus-assisted text analysis; particularly useful for corpusassisted CDA in showing how software can be used to reveal culturally significant patterns of language use.)
Adolphs, S. (2006) Introducing Electronic Text Analysis. London: Routledge.
Baker, P. (2006) Using Corpora in Discourse Analysis. London: Continuum.
Baker, P., Gabrielatos, C., KhosraviNik, M., Krzyzanowski, M., McEnery, T. and Wodak, R. (2008) ‘A Useful Methodological Synergy? Combining Critical Discourse Analysis and Corpus Linguistics to Examine Discourses of Refugees and Asylum Seekers in the UK Press’, Discourse and Society 19(3): 273–306.
Charteris-Black, J. (2004) Corpus Approaches to Critical Metaphor Analysis. Basingstoke: Palgrave Macmillan.
Chilton, P. (2004) Analysing Political Discourse. London: Routledge.
Dunning, T. (1993) ‘Accurate Methods for the Statistics of Surprise and Coincidence’, Computational Linguistics. 19: 1 61–74.
Fairclough, N. (1992) Discourse and Social Change. Cambridge: Polity.
——(2001) Language and Power, second edition. London: Longman.
——(2003) Analysing Discourse: Textual Analysis for Social Research. London: Routledge.
Fairclough, N. and Wodak, R. (1997) ‘Critical Discourse Analysis’, in T. van Dijk (ed.) Discourse as Social Interaction. London: Sage, pp. 258–84.
Hidalgo Tenorio, E. (2009) ‘The Metaphorical Construction of Ireland’, in K. Ahrens (ed.) Politics, Gender and Conceptual Metaphors. Houndmills and New York: Palgrave Macmillan, pp. 112–36.
Hoey, M. (2005) Lexical Priming. London: Routledge.
Koller, V. and Davidson, P. (2008) ‘Social Exclusion as Conceptual and Grammatical Metaphor: A Cross-genre Study of British Policy-making’, Discourse and Society 19(3): 307–31.
Mautner, G. (2007) ‘Mining Large Corpora for Social Information: The Case of “Elderly”’, Language in Society 36(1): 51–72.
Oakes, M. (1998) Statistics for Corpus Linguistics. Edinburgh: Edinburgh University Press.
O’Halloran, K. A (2007) ‘Critical Discourse Analysis and the Corpus-informed Interpretation of Metaphor at the Register Level’, Applied Linguistics 28(1): 1–24.
O’Keeffe, A. (2006) Investigating Media Discourse. London: Routledge.
Scott, M. (2008) WordSmith Tools Version 5.0. Oxford: Oxford University Press.
Stubbs, M. (1996) Text and Corpus Analysis: Computer-assisted Studies of Language and Culture. Oxford: Blackwell.
——(2001) Words and Phrases: Corpus Studies of Lexical Semantics. Oxford: Blackwell.
Tapscott, D. and Williams, A. (2008) Wikinomics: How Mass Collaboration Changes Everything. London: Atlantic Books.
Van Dijk, T. (1991) Racism and the Press. London: Routledge.
——(1997) ‘The Story of Discourse’, in T. van Dijk, Discourse as Structure and Process. London: Sage.
Widdowson, H. G. (2004) Text, Context, Pretext: Critical Issues in Discourse Analysis. Oxford: Blackwell.
Williams, R. (1983) Keywords, second edition. London: Flamingo.
Wodak, R., de Cillia, R., Reisigl, M. and Liebhart, K. (1999) The Discursive Construction of National Identity. Edinburgh: Edinburgh University Press.