6 From Self-Identity to Genotype
The Past, Present, and Future of Ethnic Categories in Postgenomic Science
In 2004, we were funded by the Wellcome Trust—the world’s largest medical charity—through its Biomedical Ethics Programme to conduct a three-year project to investigate how genetic scientists working in biomedical research understood and operationalized race and ethnicity as categories in their research.1 There had been limited research on these questions in the U.K. context, so this was to be a significant undertaking. We focused on research studies examining genetic and environmental factors in the development of common, complex diseases and those exploring the genetic basis of differential drug response. These studies were chosen because both have engaged with well-documented disparities in health among racial and ethnic groups, and because there have been high expectations that the application of advances in genetic technology in both these areas will have a broad impact on future health care practice and policy (Bell 1998; Khoury, Burke, and Thomson 2000).
One of our interests was in the classificatory practices of scientists with respect to race and ethnicity. In British, American, and many other societies today, we have seen the institutionalization of the principle of racial and ethnic self-identification as an established (but not uncontested) practice in various domains of public life, including social statistics (Morning 2008). There have also been debates among geneticists and epidemiologists about the scientific merits of using self-identification for the purposes of studies into disease etiology (Bamshad et al. 2004; Burchard et al. 2003; “Census, Race, and Science” 2000; Royal and Dunston 2004; Risch et al. 2002). This debate takes place against the backdrop of the emergence of programs for genetic analysis, which aim to assign individuals to populations on the basis of their genotypes (e.g., Wilson et al. 2001).
At the policy level, legislation has been passed in the United States to mandate the greater inclusion of women and individuals from racial and ethnic minorities in clinical and biomedical studies (Epstein 2007; Friedman et al. 2000). While there is no similar regulatory requirement for U.K.-based studies to include specific numbers of subjects from ethnic, age, or other minority groups, there are informal expectations that research should be more inclusive of participants from minority ethnic groups (Mehta 2006; Tutton 2008), and the 2000 Race Relations (Amendment) Act obliges researchers to ensure that they address the issue of diversity. Epstein’s (2007, 91) analysis of policy developments in the United States suggests that there has been a “categorical alignment” whereby “the categories of identity politics, the biological categories of biomedical research, and the social classifications of state bureaucrats [became] one and the same system of categorization.” In this way, the practices of self-identification have become a central part of the conduct and governance of biomedical and pharmaceutical science, becoming aligned with knowledge about biological and genetic variation and related characteristics.
Given these debates, we were interested in considering how the practices of self-identification are incorporated into the design and operationalization of biomedical research. How do scientists consider the uses and limitations of self-identification for scientific fields, such genetic epidemiology and pharmacogenetics, that are aiming to build on the Human Genome Project and to produce new or improved diagnostics and therapies for human diseases? How does this compare with new, emerging techniques for classification based on genotyping? This chapter draws on interviews that we conducted with a number of scientists working in the United Kingdom in the fields of genetic epidemiology and pharmacogenetics about their use of self-identification, to record the race and/or ethnicity of research subjects in their studies and to determine what this might reveal about both current practices and expectations of future practices. To preface this discussion of our research, we begin by considering the conceptual and practical basis of self-identification and its rise to dominance over observer-assigned approaches to racial and ethnic classification in the domain of social statistics. This will highlight a number of issues that we will pick up on in the discussion of our interviews.
The Rise of Self-Identification as a Classification Practice
It is possible to broadly delineate the social practices involved in classifying race and/or ethnicity as requiring one person to make judgments about another person (or people) or requiring a person to make judgments about himself or herself. Thus race and/or ethnicity can be ascribed by the person collecting the information (an observer-assigned classification), or it can be self-identified. At present, it is the latter approach that dominates in social science and public policy, to the extent that Peter Aspinall (2001, 839) notes that “observer-assigned ethnicity is no longer regarded as an acceptable method of assignment.”2 His phrase “no longer regarded as… acceptable” indicates that the practice in question has been subjected to pressures from norms and values that have changed over time. However, both observer-assigned and self-identification practices share many things in common. In both practices, the person doing the classifying uses either his or her own words (a free text response) and/or chooses from lists of predetermined responses to assign race and/or ethnicity or to respond to questions about specific criteria for classifying race and/or ethnicity.3 The key facet of practices of self-identification is thus that the responsibility for classification rests with the person for whom the classification is being made. We might ask, therefore, why is this currently deemed the most acceptable practice?
The acceptability of self-identification reflects a mix of political, conceptual, and practical concerns that can (partially at least) be understood as a response to the scientific and political debates about the status of race as a category and the history of observer-assigned racial classifications.4 For example, in the realm of state bureaucracy within the United States and United Kingdom, it has become widespread practice to ask people to self-identify from a prespecified classification scheme (e.g., in censuses and in a range of settings over which the state has oversight).5 Importantly, the U.S. Office of Management and Budget (OMB) has, over time, moved to a position that explicitly states that the racial and ethnic categories that it prespecifies are “social-political” (rather than valid anthropological or scientific) constructs and, as such, seems to reject the idea that the state or scientific experts should classify individuals into different racial or ethnic groups. This position marks a departure from the assumption that races are natural types or can be derived on the basis of ostensibly objective, biological characteristics such as skull morphology or blood quanta (Kertzer and Arel 2001). Furthermore, once race and/or ethnicity are conceptualized as sociopolitical constructs, then self-identification becomes a practical approach for classifying people’s experience of social identity that has face validity.6 If race and ethnicity are viewed as social identities and not objective types, the person best placed to decide in what category a person belongs is clearly that person himself or herself. These developments in thinking can also be seen in the United Kingdom, where observer assignment continued to be used during the 1970s and 1980s in some administrative settings (Booth 1985). The U.K. Office for Populations Censuses and Surveys (the predecessor of the present U.K. Office of National Statistics, ONS), was, however, anxious that the introduction of the ethnic question in the national censuses should not lend credence to commonly held (but nonetheless discredited) ideas about racial types (Sillitoe and White 1992). Like the OMB, the ONS now favors self-identification using a range of what it calls “ethnic groups.” This practice also reflects the dominance of ethnicity over race in the United Kingdom, at least in formal public discourse (where ethnicity is conceptualized as being primarily about self-association with sociocultural groupings).
Nevertheless, in some circumstances, observer assignment continues to be practiced by state agencies. For example, U.K. police categorize those suspected of, and arrested for, recordable crimes using a series of “ethnic appearance categories” (Hansard 2008).7 This aberration from the self-identification norm aptly illustrates how the classification of race and ethnicity in any specific context depends not only on how they are conceptualized and operationalized, but also on the circumstances in which classification takes place. Questions about the appropriateness, reliability, validity, and implications of different classification practices and circumstances have been widely explored. In the context of health and biomedicine, questions have included, for example, issues such as the proper alignment of classificatory criteria with different conceptualizations, research questions, and research contexts, and the degree of correspondence between people’s self-identification and observer assignment (e.g., Aspinall 2001; Bradby 2003; Nazroo 1998; Senior and Bhopal 1994; Smaje 1996). Without rehearsing these arguments in full here, a common lesson that has emerged from these debates is that the practices and decisions surrounding classification should aim to ensure that measurements correspond to the required uses of the measures (i.e., they should be fit for purpose).
If we accept this premise about fitness for purpose, it is necessary to recognize that the prevailing practice of self-identifying race and ethnicity suffers from a number of limitations. For example, data that are reliant on practices of self-identification might, in some instances, be problematic because self-assessment tends to be highly contingent, unstable, and out of the control of researchers (Senior and Bhopal 1994). And while self-identified race and ethnicity has face validity (as a measure of a person’s experience of his or her social identity), it does not necessarily have content validity (i.e., it may not fully represent all of what a person perceives his or her race and/or ethnicity to be)8 or external validity (i.e., it may not be generalizable to someone else in the same position). Moreover, in health research, self-identified ethnic identity might be of limited use if the research questions are focused on issues related to ancestry, descent, or ethnic origins (Bradby 2003). Self-identification is also a potentially poor marker of what other people think your race and/or ethnicity might be—something that can be important in questions about how people are perceived and treated by society, and how specific authorities (such as scientists and health care professionals) might enact conceptions about race and/or ethnicity in their routine practice. And of course, there are some circumstances in which people are unable or unwilling to self-identify (or ask other people to self-identify), which make a self-identified classification impossible, or very difficult, to attain.
At the same time, it is necessary to critically challenge an idealized notion that self-identification represents the triumph of self-determination over the authority of science and the state. It has, for example, been argued that the shift to self- identification is part of a tendency to count identities—to measure how people think about themselves in relation to a range of specified and socially meaningful categories (see Skerry 2007 Kertzer and Arel 2001). From this perspective, practices of self-identification are framed in terms of empowerment, freedom of choice, and respect for individual dignity (Skerry 2000). However, Ann Morning’s (2008, 248) cross-national analysis of how different national censuses frame their questions about race, ethnicity, and nationality found that only twelve of the eighty-seven countries that include such questions “treat it as a subjective facet of identity by asking respondents what they ‘think,’ ‘consider,’ or otherwise believe themselves to be.” In many other countries, Morning noted, race and ethnicity are treated almost as objective or essential features of an individual, which the individual, in turn, should know, recognize, and be able to assign to himself or herself for the purposes of a census return. Of course, the state is the final arbiter of both the range of categories (and their associated nomenclature) from which people are able to choose and the taxonomic relationship between them. As Yanow (2003, 94) argues, “within this seeming possibility of self-identification… the range of options is created and circumscribed by the state through its agencies.” Moreover, it is clear that the labels, nomenclatures, and classification schemes from which people are asked to choose have tended to evolve from previous naturalized biogeographic group categories. Thus the U.S. and U.K. censuses both have categories rooted in skin color and geographical origins—two of the biogeographical characteristics used in traditional racial classifications.9 This process of constructing predetermined categories from which people are then able to choose (or into which free-form self-identifications are subsequently reduced) is inevitably tainted by past conditioning of what race and ethnicity mean and how these terms have been used in discriminatory policies and practices. Similar conditioning will also affect the choices individuals make when using these categories (and related classification systems) to self-identify their race and/or ethnicity.
Nevertheless, it is widely recognized that state classification schemes for race and ethnicity are not straightforward political or scientific impositions. The emergence and acceptance of the categories has been, and continues to be, a matter of social and political debate and contest (see Cornell and Hartmann 2007, 176–178; Fenton 2003, 30–42, for examples from U.S. and U.K. contexts). Furthermore, as both Cornell and Hartmann (2007) and Fenton (2003) record, the categories that emerge for use by the state can also feed back into civic society. Even if the group categories sanctioned by state bureaucracies have little meaning to start with, their saliency is constituted by virtue of their being used in everyday social and sociopolitical practice. Therefore state-legitimized categorizations such as those used in censuses are sites at which individual and collective meanings are mutually constitutive, and individuals come to reconcile their own internal senses of self with the administrative categories available to them on census forms and the like (cf. Hacking 1981). These categories can also, however, be used to challenge hegemonic constructions of social reality (Urla 1993).
In summary, the ascendancy of racial and ethnic self-identification as a classification practice in state bureaucracy is framed by the history of the race concept and accompanying practices of observer assignment. This conceptual shift in classification practice accompanied the reorientation of “race” as a sociopolitical construct and, within the United Kingdom, its partial replacement by the notion of ethnicity (which is, in large part, an overtly sociocultural concept). Over time, observer assignment has become perceived as not only conceptually inappropriate, but also as not fit for purpose in relation to monitoring the social groups of which people perceived themselves to be a part. However, even self-identification of race and ethnicity does not easily overcome the long history of difficulties associated with the race concept and associated practices of observer assignment. It does not necessarily distinguish ethnicity from race, nor does it free either of these terms from their naturalistic, biogeographical conceptualizations or from the associated practices of oppression and discrimination that have tainted the authority of both science and the state. Practices of classification involving the self-identification of race and ethnicity are not free floating, but rather, operate within certain limits and relate to categories whose constitution and relevance are determined by a range of social actors with differential power to define, challenge, and reshape the boundaries of social groupings. The ascendancy and social acceptability of self-identification does not strictly rule out alternative approaches to classification, but it does cast them in a negative light as being potentially improper or reprehensible.
This analysis of the history and practice of self-identification has raised a number of issues that we develop now in relation to our research on the classificatory practices of U.K. scientists working in the fields of genetic epidemiology and pharmacogenetics.
Race, Ethnicity, and Genomics in the United Kingdom
Our first set of interviews were conducted with principal research staff of ten biobanks and cohort studies investigating the environmental and genetic factors involved in health and illness in a range of populations, including children and adult volunteers. These biobanks recruited volunteers at single or multiple sites, and four of them sought to exclude minority ethnic groups so that only “white” British volunteers were studied. The justification given for this exclusivity stemmed from the view that there was significant genetic variation between ethnic groups that might complicate the interpretation of any gene-disease associations identified among ethnically diverse study populations as a result of gene-gene interactions at ethnic-specific polymorphic loci. Therefore, to investigate the association between genetic polymorphisms and disease, these respondents felt that the greater the (perceived) genetic homogeneity of the cohort, the stronger the internal validity of any such association would be. Most of the remaining respondents worked either on studies that had adopted a race/ethnicity-blind approach to recruitment (which tended to result in the inclusion of only small numbers of nonwhite British participants) or on studies that actively sought to recruit disproportionate numbers of minority groups, ostensibly to specifically study the consequences of genetic variation in such groups on gene-disease associations.
The second set of interviewees comprised a group of academic researchers working on nine pharmacogenetic studies located in university research centers based in the United Kingdom. These researchers were predominantly investigating variation in drug-metabolizing enzymes (DMEs) across a range of pharmaceuticals developed to treat a variety of disease conditions. As before, many of these nine studies focused exclusively on “white” British participants (for very similar reasons to those described earlier for the four biobanks focusing on white British participants), although the remainder (n = 4) adopted what constituted a race/ethnicity-blind approach.
A key theme explored in the interviews concerned the classificatory practices followed by the scientists involved in the biobanks and pharmacogenetic projects examined. In particular, we investigated the extent to which contemporary practices of self-identification were incorporated into the design and operationalization of these studies. Fourteen of the nineteen studies explicitly used self-identification, and of that fourteen, twelve did so using the “standardized” ethnic group classification scheme produced by the ONS for the purposes of the national censuses.10 For four of the other studies, we were unable to determine the exact procedure used for classifying race and/or ethnicity,11 whereas for the remaining study, the scientific team involved had rejected self-identification altogether—an exception we discuss in more detail subsequently.
In what follows, we discuss the accounts provided by the scientists we interviewed concerning their studies’ use of self-identification to record race and/or ethnicity. We focus on three interrelated strands in these accounts: the scientists’ experiences of, and justifications for, adopting or rejecting self-identification in their research designs and practices; their views concerning the strengths and limitations of using self-identification in their scientific work; and their insights into future developments in which self-identification might be superseded by alternative approaches.
Self-Identification as a Practical Tool
Some of the scientists we interviewed considered self-identification in predominantly pragmatic terms. For example, the lead researcher on a regionally based biobank in England noted,
As far as I can understand in the literature in the area… people having gone round the houses looking at alternative ways to subclassify different racial groups or different ethnicities, they’ve come back to the understanding that, or the agreement that, this straightforward self-reported classification into one of these major groups is as good a tool as any in the context of genetic association studies.
This respondent justifies using self-reporting “into one of these major groups” on pragmatic grounds, this approach being “as good as any” within a context of long-standing, and seemingly intractable, debates about classifying racial groups or ethnicities. Furthermore, this view is grounded in practical issues and describes self-identification as straightforward and thereby uncomplicated. However, part of this argument concerning practicability relates to the fact that self-identification, as practiced here, is a structured process—one that is ordered by a predetermined set of categories that have some broader recognition and relevance elsewhere. Another scientist, this time working on a similar national study, broadly shared this view:
Asking people to assign themselves in ethnicity is not perfect but we can’t—we don’t have any better way of doing it at the moment.
Again, we can see that self-identification is justified on pragmatic grounds, and although this interviewee acknowledges that it is “not perfect,” it is upheld as the best available approach compared with currently available alternatives.
Self-Identification as Standardization
When it came to putting self-identification into practice, most of the scientists we interviewed whose studies used self-identification to record race and/or ethnicity did so using the standardized ethnic group classificatory scheme developed by the ONS. These scientists justified their use of the ONS ethnic group categories on the basis that these were an accepted standard used by many other agencies and many other scientific studies. Using the ONS scheme, they argued, not only meant that they were adopting accepted classificatory practices, but also that they were generating data that would improve the comparability, generalizability, and applicability of their findings. For example, a pharmacogeneticist, reflecting on his team’s choice of the census classification, said,
I feel that standardisation for certain things like ethnicity, within the terms of genetic testing in this way is probably very useful. Now that’s not to say that, you know, people shouldn’t feel comfortable to classify themselves however they want to—but to have a sense of when you’re trying to develop a test that is useful at a sort of population level or over, you know, significant groupings of people, I think some form of standardisation probably is very useful to make sure that you are capturing the appropriate information and therefore offering testing that is useful and appropriate across the board.
While this scientist acknowledged the key benefits of standardized self-identification in terms of the comparability of research findings “across the board,” the interviewee clearly perceived a tension between permitting people to freely self-identify (i.e., in any way they might choose) and the need to generate standardized data that could be generalized beyond the confines of a given study’s sample population. As such, the ONS classificatory scheme was felt to offer a set of standardized categories that permitted the production of generalizable knowledge.
This rationale was further emphasized by a lead researcher on a large longitudinal study, which had previously classified its population cohort using researcher-assigned categories when it first began in the 1980s. The lead researcher discussed how the study had subsequently adopted the self-identification practices developed by the ONS to conform to perceived standards—what the researcher described as “the U.K. population reference method.” However, the researcher also felt that this change in classificatory practice had scientific merit because the researcher considered that individual participants were a more reliable source of knowledge about their own ethnicity than researchers (who might otherwise identify and categorize people on the basis of potentially misleading criteria such as place of birth or appearance). Nevertheless, the use of classifications derived from the arena of public policy in biomedical science raised questions for this researcher about the validity of racial and ethnic categories from both a sociopolitical and scientific perspective:
How does one, how does one define the validity of an ethnicity classification? Well I mean there are formal definitions of validity, one is theoretical coherence, another one would be the word that you used which was reliability or reproducibility. But then I guess if you went outside the realms of the technical definition of validity then you know issues of eugenics or political classifications raise their head. And that’s the thing… about the Pacific Islander classification in the States.… There was a political economic reason for creating a category of Pacific Islanders because it meant that resource would be available to that group… if it became officially accepted as a distinct category, distinct ethnic or race category. So I think validity is a very thorny subject in this context.
In short, the researcher recognized that racial and ethnic categories are the product of social, political, historical, and scientific discourses, and that the concept of what might constitute a valid classification (and, by implication, a valid classificatory scheme) would vary according to the context in which the classification was used. In particular, the researcher notes that political negotiations might generate new categories (such as Pacific Islander) in the context of social statistics, which has potential implications for the way in which scientists design and conduct studies to address related policy objectives or capitalize on the availability of new categories.
Elsewhere, there was substantive ambivalence concerning the nature of the racial and ethnic categories used in biomedical research. This was evident in an interview with another researcher, who talked about what the researcher perceived to be a gap between policy and practice regarding the use of self-identification in the researcher’s study. This researcher worked at a biobank that aimed to study the development and incidence of disease in two ethnic populations it called “Asians” and “whites.” This study had adopted the ONS classificatory scheme as part of its formal research protocol, but when commenting on what had actually happened in practice, this researcher described how the research nurses employed to recruit healthy volunteers into the study approached the task of classification in a very different way:
I know what they should do, which is they should give them a list of the [ONS] census definitions. I don’t think they do; I think they ask them where they’re from, I think they ask them their religion, I think they ask them where their grandparents are from and they classify them according to that. And actually it’s more useful, in some ways I think what the nurses are doing; they’re assigning in a way that we all understand and whilst I’m entirely sympathetic, I mean you know I could be in a bad mood and say I’m black, you know, West Indian and you’d have to accept that and that would be a nonsense, that’s more, that’s going to more, make my study much more problematic. So I don’t think they’re doing it along traditional or PC [politically correct] lines for the classification of ethnicity.
This account indicated that while a standardized approach to self-identified race and/or ethnicity might be part of the formal procedure for this and other projects, in practice, those charged with conducting recruitment might adopt a range of different approaches such as asking volunteers about their religion or grandparental ancestry. Although this is essentially a departure from the stated protocol of the study, the scientist seemed content that the nurses were adopting a commonsense approach using criteria that “we all understand” and, as such, were able to address some of the potential pitfalls of self-identification (such as deliberate misidentification by grumpy volunteers). As such, they felt that these kinds of questions might actually elicit more useful information for the study. Nonetheless, they accepted that self-identification was the “politically correct” approach to the classification of ethnicity and, as such, implied that this was the approach that needed to be formally adopted when conducting ethnic classification.
Concerns about the scientific validity of self-identified ethnicity were underlined by one of the pharmacogeneticists we interviewed, who questioned whether the way people self-identified always matched what scientists thought they would find at the genetic level. In this context, ethnicity was interpreted as a proxy for ancestry and related to a widespread belief among the geneticists we interviewed that ethnic ancestry, even at the level of parental and grandparental geographical origins, captured a significant component of clinically relevant genetic variation:
So I suppose self-reporting of your ethnicity is presumably slightly inaccurate, maybe…, you know, at grand paternal level… but self-reported ethnicity is slightly inaccurate probably in large conurbations.
This brief quote encompasses two distinct points concerning the perceived limitations of self-identified ethnicity for genetic research. The first of these relates to the recognition that self-identified ethnicity might not accurately reflect the biogeographical origins or ancestry of individuals, where their knowledge of family history, beyond perhaps their parents or grandparents, is limited or incomplete. The second point concerns the impact of admixture within “large conurbations” on knowledge about, or clarity of, ancestry. This interviewee went on to illustrate these concerns by describing a previous study that had sought to exclude “non-Caucasians” in the belief that genetic variation between ethnic groups might complicate the interpretation of any gene-disease associations identified among ethnically diverse study populations as a result of gene-gene interactions at ethnic-specific polymorphic loci. Despite this study’s focus on the recruitment of “Caucasians,” the use of self-identification to select volunteers into the study failed to achieve the study’s aims:
Interestingly enough we did a project where we wanted to just to have Caucasian, only South Londoners, just for ease of the genetic analysis, and found through stratification analysis that they weren’t ethnically homogenous. So presumably, you know, the population of South Londoners has some mixed ethnicity, even though in the tick box way they are white Caucasians, but they obviously have some other influences coming into their ethnicity.
In this account, the scientist juxtaposed what he called the “tick box” approach to the self-identification of ethnicity (i.e., a system similar to that developed by the ONS classificatory scheme) with evidence from subsequent genetic analyses, which found higher than expected levels of genetic variation. In this instance, the researcher interpreted these findings as evidence of mixed ethnicity and thereby an example of the inherent limitations of self-identification for generating ethnically (and thereby genetically) homogeneous study populations. Similar genetic analyses had been used by a number of the other interviewees in their studies. These analyses helped to ascertain “whether their [the study participants’] report on ethnicity is similar to their genetic ethnicity” and indicate that geneticists view the “genetic identity” of groups and individuals as something that can be somewhat different to self-identified ethnicity. Ostensibly, the notion of genetic ethnicity conflicts with definitions of ethnicity that emphasize sociocultural practice, although it should be noted that shared ancestry and genealogy are prominent in sociological discussions of the concept (Fenton 2003). However, it seems clear that genetic ethnicity is not invoked by geneticists as a conceptual challenge to ethnicity as a sociocultural and political concept per se, but rather as a semantic sleight of hand, which reflects a shared belief among geneticists that some genetic polymorphisms are more commonly found in populations identified as ethnic groups in sociopolitical discourse.
Along similar lines to the approach adopted by the interviewee who described the study of Caucasian South Londoners, many of the pharmacogenetics studies sought to focus exclusively on participants described, variously, as white British, European, or Caucasian. These studies prioritized measures of ancestry over self-identification on the basis that ancestry provided a better marker of genetic heritage. To this end, researchers in these studies described how they would ask potential participants about their family histories when deciding whether to include or exclude them during the recruitment process. However, in practice, the use of these sorts of questions would only come into play if the physical appearance of volunteers led the researchers to conclude that, in terms of what one described as their “basic genetics,” they were not of white British, European, or Caucasian descent.
One particular study went further by rejecting the use of ethnicity as a consideration when recruiting participants. This project was engaged in building a case-control panel of genetic samples to assist in the identification of genetic variants associated with common complex diseases. The lead researcher involved explained that they had decided to exclude ethnicity as a marker for genetic ancestry because they felt that people generally have little knowledge of their own ethnic origins:
If you… take people in this country, they have no idea what their ethnic origin is, I mean the people in Orkney think they’re all Vikings.… There’s no useful information that really is other than geographic and it’s the geographic information that actually to a large extent does define the population. So, roughly speaking, you know… the Celtic precursors of people, who now would be considered the Celtic fringe, were the population of Britain before the agricultural revolution came, which was about five or six thousand years ago, and gradually invaders and others pushed them to the fringe and you get other communities, you’ve got invading communities around the coast. And those are largely geographic phenomena, so I think that’s the only objective way to go about defining what the population of this country is.
Rather than relying on self-identification, which the researcher felt had little value as a marker of the genetic makeup of Britain’s different populations, the study sought information on people’s family history and, in particular, selected volunteers with four grandparents who had originated from a specific locality. Armed with this information, the study categorized samples on the basis of geographical location, using this as an alternative proxy for genetic ancestry.
However, this study was exceptional, and most of the scientists we interviewed described how their studies had practiced self-identification using the ONS classificatory scheme (or very similar classifications) for two principal reasons: it offered a standardized approach to the practice of ethnic classification that was socially and politically acceptable, and it could facilitate the comparability and generalizability of findings across different studies. For some, relying on self-identification using sociopolitical categories developed for implementing and monitoring social policy was potentially problematic within the context of science because the categories available could change in response to political imperatives, and the categories chosen were subject to the whims of potential participants. There was also a tension between the potential benefits of allowing people to freely self-identify and the need to generate reproducible scientific variables using predetermined categories (whether standardized or not). This was most apparent when concerns were expressed about whether self-identified ethnicity was an accurate measure of underlying ancestry-related genetic variation.
For most genetic analyses, the primary interest in ethnicity is as a proxy for ancestry and its presumed association with geographical or sociocultural patterns of genetic variation. Indeed, despite the fact that all the studies examined in our analyses shared an explicit interest in genetic and environmental contributions to disease and/or pharmacological efficacy, none of the researchers we interviewed emphasized the potential insights into environmental contributions that might be provided by data on self-identified race or ethnicity (and its related sociocultural and socioeconomic correlates). Instead, the studies we examined sought to ask volunteers questions about ethnicity, family background, place of birth, and so on, as an indication of their ancestry, or adopted post hoc genetic analyses to check whether these criteria had correctly identified populations believed to be genetically distinct and homogeneous. Indeed, given their concerns about the genetic validity of self-identified ethnicity (and related markers of ancestry), it is surprising that only one of the studies we examined had abandoned the use of self-identified ethnicity, although more had attempted to improve the utility of self-identification using specific questions about familial geographical origins and related criteria.
From Self-Assigned to Genotype Assigned?
Some of the scientists we talked to—especially in the pharmacogenetics field—anticipated that emergent technologies would transform practices of racial and ethnic classification (Tutton et al. 2008). Some enthusiastically imagined a future in which self-identification would ultimately be supplanted by the widespread use of genotyping samples. As one key scientist with a national biobank suggested,
Ultimately the reason for being concerned about ethnicity from the science perspective is because we think that the gene sets are going to be different and therefore if you can actually get the information at the genetic level, which we theoretically can ultimately, then that’s obviously going to be more accurate and more useful for what we actually want to do. Now that wouldn’t be true for example if you’re doing an anthropological study then self-assignment is more important that genetic ethnicity but from the point of view of a genetic study then the genetic assignment would be better. But it is dependent on being able to do lots and lots of genotyping, which currently is still too expensive but will subsequently not be, and also that there will be available marker sets that we know define the different ethnicities, so there are things for the future.
This account explicitly articulates the principal rationale of geneticists for studying different racial and ethnic groups: the notion that these groups are genetically different. From this perspective, self-identified ethnicity offers a useful proxy for certain patterns of genetic variation and will, in the future, be superseded by more accurate categories derived from genomic data. However, if this respondent’s vision of the future is to be realized, it is contingent on substantial investment to produce the genetic data required to determine and define the genetics of different racial and ethnic groups (including new groupings defined from genetic data, rather than sociopolitical categories). It is also dependent on the costs of genotyping falling so that the techniques involved become more widely available to researchers. This optimistic scenario was underscored by other interviewees. One, in particular, imagined a future in which not only would ethnic self-identification be superseded by genetic data, but the utility of ethnicity itself would be rendered irrelevant within biomedical research and practice by the advent of individualized medicine based on molecular characteristics:
I think ethnicity will eventually pale into insignificance because you’ll know what the allelic variation there is across the whole population and you’ll just test for a particular allele. It doesn’t matter what the ethnic background is, which will tell you what the genotype or sensitivity or whatever of that person is to that drug.
In this scenario, ethnicity, as a biomedical variable, will be superseded by genetic information about allelic variations that will instead place people into somewhat different categories such as fast or slow metabolizers of certain classes of drugs (see Wilson et al. 2001). This interviewee did acknowledge that to achieve this future vision, scientists and science funders would need to invest heavily in what would be a complex and expensive process.
However, one of the scientists we interviewed did strike a more ambivalent tone:
I think when certain people go through, as people increasingly are doing, you know, their genealogies and finding out about their family trees, finding that they have origins that are different from their expectations is really quite profound for some people. And I think that genetic tests in that way could also have a similar sort of influence and might throw up things that might not necessarily be helpful. That needs to be counterbalanced, I think, by whether… getting that information would be truly of benefit as opposed to the self-reporting, whether you know that molecular level of classification of people’s ethnicity, if there was some real benefit to that compared to self-reporting then it might be worth considering. But I can’t really envisage that at the moment. I can see that technically it could happen but I can’t, you know on the spot, think of an immediate application where I think that that would be particularly of benefit, you know, in a clinical setting or a day-to-day setting.
He conceded the technical possibility, but was less certain as to what benefits would arise from genotyping compared with the current use of self-identification. He also raised what might be seen as an ethical question regarding how the disclosure of findings of genomic analyses might conflict with people’s own self-identifications. However, his was the lone dissenting voice in this (somewhat modest) sample of scientists interviewed.
In summary, given the doubts about the match between self-identified ethnicity and the underlying genetic variation that might be associated with ethnic origins, some of the scientists we interviewed saw the prospects of large-scale genotyping as offering the opportunity for ethnic group categories to become refashioned as genomic categories or superseded altogether in a new regime of individualized medicine.
Conclusions
In conclusion to this chapter, we reflect on the following question: what is the importance of self-identified race and ethnicity in social and political life, and to what extent do the perspectives and practices evident from our research threaten their continued use in the context of biomedicine? Earlier we argued that self-identification has emerged as a vital governmental practice in many contemporary societies in relation to how states and other agencies collect information about racial and ethnic groups. As Hacking (1981) has argued, social statistics largely defined biopolitics from the nineteenth century onward: as a technology of power, social statistics was the primary means by which the state intervened in the social body of the population. Alongside statistics related to morbidity, mortality, living conditions, and economic productivity, contemporary biopolitics also encompasses questions about who people think they are. As such, counting identities has become a notable feature of contemporary social statistics.
Crucially, self-identification is seen to mark a key departure from previous practices of racializing people and groups—in this instance, practices conducted by politically and economically privileged elites who claimed special scientific knowledge to classify others primarily on the basis of physical and behavioral traits and/or biogeographical origins. Self-identification, by contrast, conveys with it the idea that race and ethnicity are not objective, biologically derived categories that can, or should, be conferred on others, but instead, are a matter of self-perception and self-assertion. However, as we have argued, self-identification does not necessarily succeed in distinguishing race from ethnicity, nor does it free either of these concepts from naturalistic or essentialistic approaches to the framing differences. In any case, self-identification is now an established and even a taken-for-granted aspect of what Epstein (2007) calls “biopolitical citizenship.” In this, through a burgeoning array of bureaucratic interactions with employers, health care providers, welfare agencies, educators, and so on, individuals are expected to self-identify (albeit using preselected categories, in most instances). When becoming a subject of, or participant in, biomedical research, people also find themselves asked to self-identify by the scientific investigators involved both for scientific reasons and to conform to policy requirements established to ensure that greater numbers of individuals from racial and ethnic minorities are included in biomedical research. Such polices have not, however, gone uncontested. Population geneticists, among others, have argued about the extent to which social categories of race and ethnicity might act as good proxies for identifying significant patterns of genetic variation among populations.
The perspectives and practices uncovered by our research in the United Kingdom indicate that, for some researchers, asking study participants to self-identify their ethnicity is a pragmatic solution for a practical task. They have sought recourse to what might be thought of as two standards: the principle of self-identification and the way in which (in most of the studies) this was standardized in practice using the ONS census ethnic group classification. As we noted earlier, using this ONS classification, researchers are able to realize the benefits of standardization, including comparability, generalizability, applicability, and social acceptance (see Smart et al. 2008). This approach would seem to reinforce the value of standardized self-identification practices because it both meets the social expectation (i.e., what is considered acceptable) that individuals should be the ones to assign themselves to racial or ethnic categories and serves the interests of the scientists concerned by supporting their aim to produce generalizable results from their studies. However, the principal interest of most scientists in these fields of study lies in capturing information on genetic ancestry. The worry, therefore, is that they have effectively adopted self-identified ethnicity as a proxy for genetic ancestry—an approach that has a number of scientific (classificatory and genetic) and social consequences (Ellison and Jones 2002, Foster and Sharp 2002; and Juengst 1998).
As we have documented here, some scientists expressed a concern that there would be a gap between people’s perceptions and their self-reporting of ethnicity based on a set of categories that reflect political and social imperatives, as opposed to scientific ones (and the underlying biological reality of people’s genetic ancestry). Given this, researchers adopted various strategies to reveal this reality such as using alternative (or additional) questions, inventing study designs that were deliberately exclusive, creating alternative conceptions of population groups, and/or conducting post hoc genetic tests on participants selected for inclusion on the basis of self-identified ethnicity. Some imagine a future in which genotyping techniques would supersede reliance on practices of self-identification and would, in turn, either redefine or supersede prevailing categories of (predominantly self-identified) race and/or ethnicity as used in biomedical research and practice. Self-identification, therefore, is a practice of what anthropologist Mike Fortun (2007) calls the “meantime”—the “making do” with present contingencies and compromises while the future takes shape—a future in which it is expected (by some) that genotyping will eventually supersede the use of self-identification in the design of biomedical studies and in the analysis of their results (thereby leading to new sets of standards). This future vision would seem to challenge the current status of self-identification and herald a shift from self-assigned to a form of observer-assigned categorization shaped and mediated by genetic technologies and the data these generate.
But there are two quite different future visions articulated in the interviews: one being that existing group categories might be defined and understood genetically, the other that existing group categories will become irrelevant and thus replaced by new types of categories based on genotypic information. The question of whether, or which of, these imagined futures will materialize cannot be answered here, but it is evident from our interviews that such expectations are shaping current practices associated with the use of self-identification. Indeed, the former tends to lend support to the continued use of the self-identified census categories, while the latter suggests that these categories are only of value to biomedical research and practice in the interim—that is, until they are replaced by other, yet to be determined categories with provenance in science, rather than politics.
In either case, it is evident that for most of the scientists we interviewed, genotypic information has greater authority than self-identity in research that focuses on the genetic determinants of health, presumably because the former is seen to be based on scientific fact, rather than on subjective and contextually contingent social and political factors. Many of the scientists we interviewed felt that either the redefinition of existing categories using genotypic data or the emergence of new categories using these data would introduce greater certainty to scientific practices focused on the genetic characteristics of individuals and groups. At present, the potential impacts of introducing new genotypic categories and their possible resonance and uptake within society remain uncertain.
In the meantime, however, as our research demonstrates, the process of racial and ethnic categorization is co-constructed from practices and resources drawn from social statistics and from the statistical and genomic techniques that have been developed to control for the uncertainties associated with self-identification. This strikes a balance between realizing the benefits of standardization and conforming to social and political classificatory norms, while preserving scientific interest in genetic ancestry, as opposed to ethnic group identity. This is then bracketed by expectations surrounding the potential for large-scale genotyping to play a more significant role in future scientific practices and to have a transformative effect on classificatory practices and, indeed, on scientific understandings of the genetics of disease.
In conclusion, analysts like Epstein (2007) and Nelson (2008) encourage us to think in terms of categorical alignments between the social and the biological ordering of categories (and how these are made in the arenas of new biomedical and genomic knowledge by various actors). The alignment of governmental and biomedical categories is, as Epstein has demonstrated, a central part of the current configuration of biopolitical citizenship in the United States and elsewhere, with significant traction in government policy as well as support among many important political actors. However, perhaps the futures envisaged by some of the scientists we interviewed suggest that dis-alignment might also be possible. The potential for such a disalignment exists where new socially meaningful genotypic categories are created or where genotypic information dynamically intersects with existing socially meaningful categories in ways that reshape or reform them. Furthermore, disalignment may also exist in terms of the practices that underlie the categories—the potential shift from self-identification to genotype that we have reported here. As we have documented, the practices of self-identification carry a significant normative load, and self-identified race and ethnicity using the census categories will continue to have relevance for health research. Nevertheless, there is an evolving relationship between different methods of categorization in genetics research, which is indicative of how tensions between the science and politics of racial and ethnic classification continue to be negotiated.
Notes
1. This project was called “Race/Ethnicity and Genetics in Science and Health” and was funded by the Wellcome Trust Biomedical Ethics Programme, 2004–2007. The project team comprised Paul Martin (principal investigator, University of Nottingham), Richard Ashcroft (Queen Mary’s London), George Ellison (London Metropolitan University), Andrew Smart (Bath Spa University), and Richard Tutton (Lancaster University).
2. Aspinall’s (2001) sole focus on ethnicity reflects the U.K. context, where, in many aspects of official public discourse, the language of ethnicity is often favored over that of race. However, as ethnicity is usually conceptualized as primarily being about self-association with sociocultural groupings, observer assignment of this kind of self-identity would raise methodological criticisms.
3. The labels, nomenclature, and classification schemes from which people are asked to choose are variable (although they are also subject to standardization, particularly within nation-states), and there are a wide variety of criteria that can be used for classifying individuals into racial and/or ethnic groups on the basis of “what you look like; what you do; and where you come from.” A nonexhaustive list might include an individual’s physical traits (such as skin color, hair texture, or facial features); his or her nationality; his or her birthplace and that of the individual’s parents and/or grandparents; the individual’s cultural or religious affiliations, beliefs, and/or practices (including lifestyle and diet); his or her name or what language the individual speaks; the individual’s experiences of racism, stigmatization, discrimination, or exclusion; and his or her experiences of migration.
4. Pfeffer (1998) claims that political mobilizations played an instrumental role in this shift to self-identification.
5. Ostensibly, the purpose of collecting these data is to monitor the number and sizes of different racial and/or ethnic groups in a national population and support legislation designed to outlaw discrimination and variation in the need for, access to, and uptake of public services.
6. I.e., “where an indicator ‘makes sense’ as a measure of a construct” (Neuman 2006, 192): self-identification of race and/or ethnicity measures self-identification to a sociopolitical construct and/or sociocultural grouping.
7. A House of Commons written response to a question about the National DNA Database included the information that “ethnic appearance is based on the judgement of the police officer taking the sample as to which of six broad ethnic appearance categories the person is considered to belong. ‘Unknown’ means that no ethnic appearance information was recorded by the officer taking the sample” (Hansard 2008). It is also notable that the six “categories” (black; Middle Eastern; Asian; white southern European; white northern European; and Chinese, Japanese, or Southeast Asian) do not directly match the ones developed for the 2001 censuses. This observer assignment process is markedly different to the one described by Delsol and Shiner (2006) in their account of the police protocols that are intended to govern stop-and-search practices.
8. What people self-identify on a census form may not be what they really think, or indeed what they might think at different moments in time. Nor is this necessarily as complex or as free formed as what they might, under other circumstances, report.
9. Indeed, in popular and political discourse within the United Kingdom, the term ethnicity sometimes appears to function as surrogate versions of race (Mason 2000, 14, citing Saggar), perhaps because it is seen as “politically correct.”
10. This is standardized in the sense that it not only specifies the classification system, associated categories, and nomenclature used, but also the practices by which classification is conducted (i.e., by self-identification to a fixed number of predetermined categories).
11. In some cases, the interviewee did not know the processes involved in classifying study participants in his or her research study.
References
Aspinall, P. 2001. Operationalising the collection of ethnicity data in studies of the sociology of health and illness. Sociology of Health and Illness 23:830–862.
Ballard, R. 1996. Negotiating race and ethnicity: Exploring the implications of the 1991 census. Patterns of Prejudice 30:3–33.
Bamshad, M., S. Wooding, B. A. Salisbury, and J. C. Stephens. 2004. Deconstructing the relationship between genetics and race. Nature Reviews Genetics 5:598–609.
Bell, J. 1998. The new genetics in clinical practice. British Medical Journal 316:618–620.
Bhopal, R., J. Rankin, and T. Bennett. 2000. Editorial role in promoting valid use of concepts and terminology in race and ethnicity research. Science Editor 23:75–80.
Bonham, V. L., E. Warshauer-Baker, and F. S. Collins. 2005. Race and ethnicity in the genome era: The complexity of the constructs. American Psychologist 60:9–15.
Booth, H. 1985. Which “ethnic question”? The development of questions identifying ethnic origin in official statistics. Sociological Review 33:254–275.
Bradby, H. 1996. Genetics and racism. In The Troubled Helix: Social and Psychological Implications of the New Human Genetics. ed. T. Marteau and M. Richards. Cambridge: Cambridge University Press, 295–316.
Bulmer, M. 1980. On the feasibility of identifying “race” and “ethnicity” in censuses and surveys. New Community 8:3–16.
Burchard, E., E. Ziv, N. Coyle, S. L. Gomez, H. Tang, A. Karter, J. Mountain, E. Perez-Stable, D. Sheppard, and N. Risch. 2003. The importance of race and ethnic background in biomedical research and clinical practice. New England Journal of Medicine 348:1170–1175.
Census, race, and science. 2000. Editorial. Nature Genetics 24:97–98.
Cornell, S., and D. Hartmann. 2007. Ethnicity and race. Making identities in a changing world. 2nd ed. London: Pine Forge.
Delsol, R., and M. Shiner. 2006. Regulating stop and search: A challenge for police and community relations in England and Wales. Critical Criminology 14:241–263.
Ellison, G. T. H., and I. R. Jones. 2002. Social identities and the “new genetics”: Scientific and social consequences. Critical Public Health 12:265–282.
Epstein, S. 2007. Inclusion: The politics of difference in medical research. Chicago: Chicago University Press.
Fenton, S. 2003. Ethnicity. Cambridge: Polity.
Fortun, M. 2007. Race in the meantime: The “care of the data” for complex conditions. Paper presented at the Business of Race, Massachusetts Institute of Technology, Cambridge.
Foster, M. W., and R. R. Sharp. 2002. Race, ethnicity, and genomics: Social classifications as proxies of biological heterogeneity. Genome Research 12:844–850.
Friedman, D. J., B. B. Cohen, A. R. Averbach, and J. M. Norton. 2000. Race/ethnicity and OMB Directive 15: Implications for state public health practice. American Journal of Public Health 90:1714–1719.
Hacking, I. 1981. How should we do the history of statistics? Ideology and Consciousness 8:15–26.
Hansard, H. C. 2008. Cols 798–802W. http://www.publications.parliament.uk/pa/cm200708/cmhansrd/cm081110/text/81110w0007.htm.
Juengst, E. T. 1998. Group identity and human diversity: Keeping biology straight from culture. American Journal of Human Genetics 63:673–677.
Kaufman, J. S., and R. S. Cooper. 2001. Commentary: Considerations for use of racial/ethnic classification in etiologic research. American Journal of Epidemiology 154:291–298.
Kertzer, D. I., and D. Arel. 2001. Censuses, identity formation, and the struggle for political power. In Politics of race, ethnicity and language in national censuses, ed. D. Kertzer. Cambridge: Cambridge University Press, 1–42.
Khoury, Muin J., Wylie Burke, and Elizabeth J. Thomson, eds. 2000. Genetics and public health in the 21st century. Oxford: Oxford University Press.
Mason, D. 2000. Race and ethnicity in modern Britain. 2nd ed. Oxford: Oxford University Press.
Mehta, P. 2006. Promoting equality and diversity in UK biomedical and clinical research. Nature Reviews Genetics 7:668.
Morning, A. 2008. Ethnic classification in global perspective: A cross-national survey of the 2000 census round. Population Research and Policy Review 27:239–272.
Nazroo, J. Y. 1998. Genetic, cultural or socio-economic vulnerability? explaining ethnic inequalities in health. Sociology of Health & Illness 20:710–730.
Nelson, A. 2008. Bio Science: Genetic Genealogy Testing and the Pursuit of African Ancestry. Social Studies of Science 38:809–833.
Neuman, W. L. 2006. Social research methods: Qualitative and quantitative approaches. 6th ed. London: Pearson.
Pfeffer, N. 1998. Theories of race, ethnicity, and culture. British Medical Journal 317:1381–1384.
Risch, N., E. Burchard, E. Ziv, and H. Tang. 2002. Categorization of humans in biomedical research: Genes, race, and disease. Genome Biology 3:1–12.
Royal, C. D. M., and G. M. Dunston. 2004. Changing the paradigm from “race” to human genome variation. Nature Genetics 36(11 Suppl.):5–7.
Senior, P. A., and R. Bhopal. 1994. Ethnicity as a variable in epidemiological research. British Medical Journal 309:327–330.
Sillitoe, K., and P. H. White. 1992. Ethnic group and the British census: The search for a question. Journal of the Royal Statistical Society, Series A 155:141–163.
Skerry, P. 2007. Counting on the census: Race, group identity, and the evasion of politics. Washington, DC: Brookings Institution Press.
Smaje, C. 1996. The ethnic patterning of health: New directions for theory and research. Sociology of Health & Illness 18:139–171.
Smart, A., R. Tutton, P. Martin, G. T. H. Ellison, and R. Ashcroft. 2008. The standardization of race and ethnicity in biomedical science: Editorials and UK biobanks. Social Studies of Science 38:407–423.
Tutton, R. 2008. Biobanks and the biopolitics of inclusion and representation. In Monitoring bodies: The new politics of biobanks, ed. H. Gottweis and A. Petersen. London: Routledge, 159–176.
Tutton, R., A. Smart, P. Martin, R. Ashcroft, and G. T. H. Ellison. 2008. Genotyping the future: Scientists expectations of race/ethnicity and genetics after BiDil®. Journal of Law, Medicine, and Ethics 36:464–470.
Urla, J. 1993. Cultural politics in an age of statistics: Numbers, nations, and the making of Basque identity. American Ethnologist 20:818–843.
Wilson, J. F., M. E. Weale, A. C. Smith, F. Gratrix, B. Fletcher, M. G. Thomas, N. Bradman, and D. B. Goldstein. 2001. Population genetic structure of variable drug response. Nature Genetics 29:265–269.
Yanow, D. 2003. Constructing “race” and “ethnicity” in America: Category-making in public policy and administration. Armonk, NY: M. E. Sharpe.