8  Race and Ancestry

Operationalizing Populations in Human Genetic Variation Studies

Joan H. Fujimura, Ramya Rajagopalan, Pilar N. Ossorio, and Kjell A. Doksum

Across the social sciences, there has been broad acceptance that race and race categories are sociohistorical constructs that are relational, processual, and dynamic, changing over time and locale. Relatedly, some scholars have noted that there were attempts to move away from race categories as organizing principles in the biological sciences, particularly around the United Nations Educational, Scientific, and Cultural Organization statements in the middle of the twentieth century (Haraway 1989; Reardon 2005; Brattain 2007). In population genetics, some scholars argued that between-race differences are fewer than within-race differences (Lewontin 1974); others championed the mid-century idea of a single “family of man” as they simultaneously promoted efforts to construct and catalog genetic differences among humans (the Human Genome Diversity Project or HGDP; Cavalli-Sforza et al. 1991). Even so, biologists never abandoned the notion of races as biologically important categories (Mueller-Wille 2005; Gissis 2008), and the HGDP met with critiques of sanitizing and depoliticizing race concepts by using populationist terminology and frames (Reardon 2005).

At the end of the twentieth century, race appeared to have (re)gained prominence in the realms of medicine through, for example, transplantation research (Haraway 1997), although many physicians may never have stopped using race categories in their practice of medicine (Satel 2002). In the last fifteen years, new technologies have facilitated the investigation of genetic differences between individuals and groups, which have in turn fueled new debates about the role of race in the biomedical sciences (reviewed in Fujimura, Duster, and Rajagopalan 2008; Koenig, Lee, and Richardson 2008). Proponents of using race as an organizing concept for genetic studies argue that biological variation correlates well enough with race and ethnicity to be useful, and such variation can help account for differences in disease susceptibility, occurrence, etiology, and treatment response (e.g., Risch et al. 2002; Burchard et al. 2003). While some social scientists have discussed situations where there might be some propinquity between biological and race categories (e.g., Duster 2003; Rebbeck and Sankar 2005), they also point out that recent findings in genome variation science reveal that race is often a poor proxy for genetic ancestry, and they warn that the search for such genetic differences not only stigmatizes certain groups and reifies race as biological, but also diverts resources away from the investigation of much more significant factors in disease such as socioeconomic status and its related stratified access to quality healthcare, exercise, diet, and environments (e.g., Krieger and Fee 1994; Ossorio and Duster 2005; Kahn 2005; Montoya 2007; Fausto-Sterling 2008).

In this chapter, we focus on within-group differences among geneticists and the different concepts they devise to deal with “population differences.” We examine the technologies that some geneticists have used to construct different populations for the purposes of finding genetic markers for disease. We especially focus on recent efforts by medical geneticists to conceptualize and measure population differences without using race or ethnic categories. In this process, medical and population geneticists emphasize that they are examining population differences due to different ancestries, rather than assessing differences among racial groups. We examine the relationships between the notions of ancestry and race by examining the theories and practices of medical geneticists and population geneticists who have joined forces to search for genetic markers for common complex diseases. A key point here is that these common complex diseases are thought to unite humans precisely because these diseases are not specific to any one group or groups.

Although the connections between race and population (and the use of either term in biomedical research) have been contested across the social sciences and the biological sciences, the notion of ancestry has received much less attention. Some have argued that assaying ancestry through genetics might be a more appropriate variable to use in biomedical research than race (Shields et al. 2005), but the production of ancestry concepts and its operationalization in scientific practice have been largely untreated.1 Our point here is that just as race is a socially constructed set of categories, so ancestry is a constructed concept. In this chapter, we examine how ancestry is conceived and constructed as well as its connections to population or race concepts in biomedical research.

Our questions include the following: How do concepts of race and genetic ancestry operate in contemporary biomedical studies of human genetic variation? How do they interact to produce notions of population? To discuss these related notions, we will describe our ethnographic research2 on two different kinds of biomedical genomic studies in which researchers seek disease-associated markers in human populations. The first are admixture mapping studies, and the second are genome-wide association studies (GWAS). In both approaches, the concept of population is key. We will examine how particular population constructs become entangled with research on disease, and we will analyze how scientists produce simultaneously different kinds of populations and population differences.

Populations and the Search for Genetic Contributions to Common Complex Diseases

Until this decade, medical geneticists focused largely on diseases with “simple” genotype-phenotype associations such as sickle-cell anemia, phenylketonuria (PKU), and so on. Many of these diseases are monogenic, meaning that almost all cases are caused by defects in a single gene, and many are quite penetrant, meaning that a person who has a defective copy of the gene will almost certainly show symptoms of the disease, though when and to what degree will vary. However, as a result of the development of genomics technologies, medical geneticists have increasingly turned to the challenges posed by common complex diseases such as diabetes, heart disease, and cancers. The etiology of these and other complex diseases is thought to involve combined contributions from environmental factors as well as from genetic factors distributed across the genome. Many factors and pathways, both outside and within bodies, are involved in the production of symptoms of these diseases. Some patients affected by a complex disease may have only a small contributing genetic component, while in other cases genetics may be more significant.

It is estimated that there are between twenty thousand and twenty-five thousand genes in a human genome, although many thousands of these genes remain a mystery in terms of their function. When searching for genetic associations to a complex disease, which may involve the action of tens or even hundreds of genes in several biological pathways, geneticists cannot use the same techniques that have worked in the past for monogenic diseases such as examining one gene at a time via candidate gene approaches. Instead, they conduct studies that examine many hundreds or thousands of regions of the genome simultaneously.

Since humans are estimated to be about 99.5 percent similar at the level of their DNA (Levy et al. 2007), it is believed that genetic contributions to disease are to be found in the 0.5 percent of DNA that varies across individuals.3 In the wake of human genome sequencing, the International Haplotype Map (HapMap) project has attempted to catalog this genetic variation across several groups (four in the first stage of HapMap, with more groups added in HapMap2 and HapMap3). The HapMap project has focused on genetic markers known as single nucleotide polymorphisms, or SNPs. A SNP is a single base of DNA whose nucleotide identity (A, T, G, or C) varies among people. To hone in on disease-associated regions of the genome, medical geneticists who do genome-wide studies initially examine SNP markers for their association with disease. They rely heavily on statistical assessments of what counts as a “significant” association between a SNP and the disease being studied. Finding significant SNPs requires that geneticists genotype and examine hundreds or thousands of SNP markers in hundreds or thousands of people.

One of the big challenges in such studies is minimizing both the chance of finding an incorrect association (a false-positive association) and the chance of not finding a correct association (a false-negative association). Practitioners of admixture mapping and GWAS approaches have developed different techniques for dealing with these problems, and these techniques are at the heart of how they construct their populations for analysis. The concept of ancestry plays a central role in the design and deployment of these techniques.

Admixture Mapping Studies

Admixture mapping is a genome-wide approach that uses concepts of race and ethnicity to construct populations and analyze associations between genetics and disease. Racial and ethnic as well as geographical and historical descriptors and narratives inform how admixture mapping scientists conceive of ancestry. We examine the generation of the tools to do admixture mapping and the use of these tools to scan for disease associations by a research team that used admixture mapping to identify genetic variants for prostate cancer in African American men.

The theory underlying the admixture mapping approach interweaves ideas from population genetics, demographic data, and racial and cultural histories. Admixture mapping geneticists focus on the study of complex diseases in groups that they believe are recently admixed. Admixed means that their recent ancestors came from geographically distinct parts of the world, that these ancestral groups had distinguishable patterns of genetic variation, and that evidence of these distinctive patterns is still observable in the study population. The notion of a recently admixed population is connected to three main constraints that researchers use to construct study populations. First, the disease under study is chosen to be one for which epidemiological data suggest an elevated incidence in the admixed population, relative to the general population. (We note here that many epidemiological figures are calculated based on race and ethnicity.) Second, the admixed population should be recently admixed, which typically means fewer than twenty generations since admixture, for the statistical genetics to “work.” Finally, the admixture should have involved at least two ancestral populations that were geographically separate from each other and had little interbreeding until the admixture occurred.

Admixture mapping researchers genotype DNA from individuals who have the disease under study and who self-identify as belonging to those groups researchers believe to be admixed. Unlike case-control studies of disease, there are typically no control groups in admixture mapping studies (which would include individuals who do not have the disease). Admixture scans aim to identify disease susceptibility loci by zeroing in on large chunks of DNA that have undergone little recombination over the small number of generations since the presumed admixture events. Researchers believe that such chunks are likely to be more significantly associated with one or the other of the two ancestries than the rest of the individuals’ genomes. They analyze these chunks for markers that are statistically associated with the disease in the cases they are studying.

The acceptable definitions of study populations in admixture mapping are highly dependent on practices and assumptions in statistical genetics. Researchers view ancestral DNA chunks as belonging to one of the assumed ancestries that contributed to the admixture, and they have developed group definitions and statistical tools that they believe allow them to trace the inheritance of these chunks. For example, population geneticists have over time developed the restriction that the admixture event(s) should have occurred within the last twenty generations. They believe that this allows them to detect ancestral contributions even with the “scrambling” of DNA that occurs through recombination in every generation. As one respondent explained, twenty generations was an arbitrary cutoff established to facilitate the genetic analysis. Thus population geneticists view an admixed group to be a mix of the genetic material of the ancestral populations, while retaining discernible and separable traces of each of the ancestries—a mix that is, in some sense, a quantifiable sum of its imputed parts.

The restrictions that admixture mapping practitioners have built into their work, then, have direct implications for how populations are conceived and constructed in this research. Admixture mapping geneticists frequently seek out African Americans and Hispanic Americans as desirable populations to be studied in the U.S. context because they conceive of these groups, by way of history, evolutionary biology, and anthropology, to be the products of admixing.4 For example, they assume that people who self-identify today as African Americans possess both African and European ancestry due to historical encounters with Europeans that brought people of African ancestry to the Americas. They assume that the genomes of people who self-identify as Latinos or Hispanic Americans are admixtures of DNA from native and indigenous groups living in what is now the Americas, and DNA from European colonizers. Thus practitioners of admixture mapping in the United States borrow heavily from stories of human isolation, migration, and mixing to determine groups they regard as suitable for this analysis.

The scientist respondents in our study described what they are doing as “tracking continental origin of lines of descent,” under the belief that the ancestral populations were, according to one respondent, “very different … at a population level, genetically.” Our respondents were able to conceive of admixture and admixed populations because they felt comfortable distinguishing ancestral populations from each other, along continental lines. As one respondent noted, “After all, the available admixed populations are fairly limited…. I mean, there are other possibilities, but by and large, right now, we’re restricted to Hispanics or African Americans.” This limitation may explain why admixture researchers are often interpreted as doing race-based science (Fullwiley 2008).

In admixture mapping, population geneticists draw boundaries between ancestral groups along continental lines. They differentiate two continental lines of descent by genotyping a predetermined set of a few thousand SNP markers that can, according to researchers, reliably distinguish between the two ancestral populations due to differences in variant frequencies in each ancestral group. These geneticists refer to such markers as ancestry-informative markers (AIMs), and refer to a set of AIMs designed for a particular group as an admixture map. The methods of making of admixture maps used by these scientists illuminates the contingent ways in which ancestral and admixed populations are conceived and operationalized. For example, the geneticists have to estimate marker frequencies in ancestral populations because no individuals from these ancestral groups from whom samples could be drawn exist today. Geneticists estimate these frequencies by assessing frequencies among purported contemporary representatives of these groups. For example, they often use European American samples such as the Centre d’Étude du Polymorphisme Humain samples of white Americans from Utah, or samples of white Americans from urban centers like Baltimore or Chicago, to stand in for the ancestral European population. They use marker frequencies measured in contemporary West African or sub-Saharan samples to stand in for frequency estimates for the African ancestors of African Americans.

In admixture studies, geneticists deploy particular genealogical stories in the ancestral origins labels they attach to chunks of DNA. They often conceptualize these ancestral origins in terms of the major continents, which they and others read as continental race categories (Fullwiley 2008). Admixture studies are therefore often viewed as using racial and ethnic categories, both in the collection of samples and in the analysis of data. Indeed, a few of our geneticist respondents who use admixture mapping approaches straightforwardly said that they do not find race to be a troubling concept in their research.

These geneticists view individuals as carrying in their genomes segments of DNA from their various ancestors. For individuals in groups they view as racially admixed, they believe that these ancestral contributions can be separated from each other. They base this belief on theories of genetic linkage and theories of human migration patterns, which themselves are based on historical records as well as physical anthropological, archeological, and linguistic research, which in turn are based on other theoretical and historical ideas and materials. We acknowledge the depth and breadth of their scientific work in the production of measurements of admixture; however, we also want to clarify that all these forms of evidence are both products of and productive of sociocultural frames and understandings. Furthermore, the inextricability of concepts of population, ancestry, race, and genetic difference that circulate in this science does the work of reinforcing and legitimating the collective, continued use in future studies of the very connections between genetic markers and sociocultural histories that we are problematizing here.

Genome-wide Association Studies: Genomics without Race?

In contrast to admixture mapping, the GWAS researchers we studied attempted to specify populations based on concepts of genetic ancestry. They argue that genetic ancestry produces a finer resolution of populations than would sociocultural categories of race. GWAS has been made possible by several recent advances in genomic technologies (particularly faster and cheaper genotyping technologies, the International HapMap, population genetics techniques, and new analytic tools). These infrastructures are critical tools in GWAS. GWAS have been widely touted by hopeful analysts in the genetics field and in the press; for example, the December 2007 issue of the journal Science named human genetic variation studies, especially GWAS, its “breakthrough of the year” (Pennisi 2007).

GWAS are large-scale, high-throughput studies, much more so than admixture mapping. Like admixture mapping researchers, GWAS researchers search for markers in genomes that may be causally linked to common complex diseases, such as heart disease, type II diabetes, and cancer, under the assumption that these diseases are likely to involve small, combinatorial influences from many genes distributed throughout the genome. They focus on the genetic elements of causation, although they acknowledge that many other social, environmental and behavioral factors and processes are involved in the etiology of common complex diseases.5

In contrast to admixture mapping, GWAS are typically case-control studies. They involve statistical analyses of differences in the frequencies of SNP markers between cases (those diagnosed with the disease) and controls (those who do not have the disease). Researchers genotype and evaluate hundreds of thousands of genetic markers per individual for association to disease, in thousands of individuals, all in a single study. With large numbers of cases, controls, and markers, GWAS researchers statistically examine potential relationships of certain markers to health outcomes.

Population Substructure: Genetic Similarity Scores Adjust for Population Differences

To find statistically significant SNPs that indicate disease risk, GWAS methods include efforts to avoid spurious associations due to systematic patterns of difference among the sampled genomes. To reduce the chance of false-positive associations, researchers need to account for genetic differences between cases and controls that may have nothing to do with the disease. They call this “adjusting for population substructure” or “adjusting for population stratification.”

We describe our ethnographic research in a medical genetics lab where researchers were doing GWAS analysis of several common complex diseases. The researchers in this lab were very interested in avoiding the use of race groups as organizing categories for the design of their research. They were medical geneticists who stated that their projects were not about race—whether defined by ideas about phenotypic characteristics like skin color or by how people label themselves (called self-report). They believed that race does not refer to a genetic set of categories, and therefore it is the wrong concept to use in genomics research.

In making the adjustments for population stratification, the primary tool these medical geneticists used is a statistical software package, Eigensoft, designed by population geneticists. Software programs in Eigensoft have been designed to generate scores for the SNP variation in each individual DNA sample relative to the other samples. These SNP variation scores are generated through a modified version of a classical statistical approach called principal components analysis (PCA). The scores are subsequently used as input for a program called Eigenstrat (included in Eigensoft), which has been designed to carry out the statistical regression analysis to adjust for population stratification and compute potential associations to disease.

The researchers we interviewed were enthusiastic about the Eigenstrat program as their tool of choice because they regarded it as a method that allowed them to measure genetic similarities directly instead of using race as a proxy in their subsequent disease SNP searches.

Genetic Similarity and Ancestry

The researchers regarded the previously described process of correcting for population stratification as key to a proper GWAS analysis, often without a great deal of concern about any underlying population concept or definition.6 When asked explicitly about what this process meant for definitions of population, some medical geneticists insisted that they use a technical definition of populations, and some of them emphasized that their software affords them a “genetic standpoint” from which to infer populations and population history. In making these assertions, they invoked the concept of ancestry. Some of the medical geneticists were comfortable using terms like genetic history and ancestry, in part because they have adopted the language and ideas used by the population geneticists who designed the statistical technologies they use to correct for population substructure. Many of the researchers called the process of adjusting for substructure using Eigenstrat “adjusting for ancestry.” They viewed adjusting for ancestry as a required practice for reducing the chance that the results they obtain through genetic association studies are spurious because of differences between cases and controls due to ancestry and not to disease risk.

The practice of adjusting for ancestry depends critically on which markers in the genome get genotyped. In any GWAS study, a pre-determined set of SNPs are genotyped using SNP microarrays. Scientists who design the SNP genotyping microarrays for GWAS select which SNPs to include, relying heavily on data in the HapMap to select a set of SNPs with reasonable coverage of the genetic variation estimated in the HapMap sample groups. They take into account many population genetic considerations, such as the number and genomic distribution of selected SNPs, which tells them how useful the chips will be for assessing variation in different HapMap groups. They also include technical considerations, such as which SNPs will perform the best in the genotyping experiments. The building of the microarray, the selection of markers, and the genotyping itself are central to GWAS.

As one medical geneticist said, “Ancestry [is] shared allele frequencies due to shared ancestors.” But this is an imprecise definition; like the notion of population, ancestry is a fluid term and has different interpretations and meanings in different contexts and usages. We focus on how the scientists we studied assessed or constructed ancestry. While the American Society for Human Genetics (ASHG), a leading human genetics association in the United States, notes that ancestry determination depends on “how underlying patterns of human genetic diversity are distributed among populations” (American Society for Human Genetics 2008, 4), we argue that these underlying patterns are known only through the data and data-producing technologies and practices of the geneticists.

For example, the precision of ancestry determinations depends on how far back in time or how many generations are considered by the assessment. Family histories can provide somewhat reliable information on very recent ancestors within a few generations. Anthropologists assess the earliest hominids from which modern humans evolved through archeological evidence. But intervening levels of ancestry, particularly during the early millennia of modern humans and their global migrations, are more difficult to define and interpret, and geneticists trying to assess these levels of ancestry resort to methods of inference with significant levels of uncertainty. It is precisely these levels of ancestry that are of interest to biomedical geneticists (American Society for Human Genetics 2008), particularly those studying common genetic variation. GWAS scientists believe that common variation may illuminate genetic factors involved in complex diseases. Such common variation is expected to occur in all groups, but at different frequencies, depending on the histories and migrations of ancestral peoples, who passed on their particular SNP variants to descendants in different parts of the world. Thus intervening levels of ancestry, if it were possible to assess them with greater certainty, would help illuminate the differential patterns of SNPs in different groups in which individuals share ancestors.

Translation of SNP Similarity to Relatedness

Some of the researchers took the notion of ancestry a step further. For example, one of the population geneticists interpreted individuals with similar SNP variation scores as having “shared ancestry,” a belief based on established ideas about human genetic evolutionary relationships in his field of expertise. The population geneticists we interviewed worked under the theory that modern humans originated relatively recently in Africa and then spread, through different waves of migration, to other continents.7 Although there are varying ideas about the degree of difference among populations that developed on separate continents, the general belief expressed by our respondents is that the differences are recent (in terms of human history) and useful for distinguishing portions of the genome from different continents.

This translation, or slippage, from SNP similarity to genetic history, to ancestry, to shared ancestry indicates a view that is shared by some researchers we interviewed, but not by all. That is, the software program can do the work of adjusting for population stratification “to genetically match cases and controls” without inferring shared ancestry for individuals with similar patterns of SNP variation and without assigning a particular ancestry label to such individuals. Nevertheless, some made the leap of reading shared ancestry and its implied relatedness onto individuals with similar SNP patterns.

Furthermore, some geneticists among the GWAS researchers attributed geographic meaning to differences in SNP frequencies, and they interpreted these geographic differences to mean different ancestries. This practice is a key point at which populations and population differences are produced.

In contrast, other geneticists we interviewed said that their GWAS research did not use race or ethnic categories in the genetic analysis because the Eigenstrat program automatically adjusted for population substructure or ancestry using PCA scores in the regression analysis. They said that there was no need to use descriptors of race, ancestry, or geographic origins when using Eigenstrat in biomedical applications of GWAS. Even when the samples are collected using self-reported racial or ethnic categories, as they are in some cohort collections, the analysts do not use that information for genetic analysis. And although they believe that adjusting for ancestry is necessary in these studies, they are looking at common variation and therefore do not believe that the alleles they find will be ancestry-specific; rather, they expect that SNP variants associated with disease will be present in all population or ancestry groups. Even the most cited “race-specific” genetic alleles, such as the Duffy null allele, which is much more common in people of African descent, are uncommon, but not absent, in other groups. Researchers are aware of this; still, their reporting of alleles that are prominent in certain groups and rare in others tends to elide the distinction between absent and rare.

Conclusion: Race versus Ancestry—Two Notions That Operationalize Human Populations

In this chapter, we have examined genetic epidemiological research conducted by medical geneticists; some use race explicitly, and others attempt to avoid the use of race categories in their research design and implementation. In admixture mapping studies searching for disease-related genetic markers, geneticists deploy particular genealogical stories in the ancestral origins labels they attach to chunks of DNA. These ancestral origins are often conceptualized in terms of the major continents. In contrast, some GWAS studies attempt to avoid race and ancestral origin labels and their attendant meanings8 but, nevertheless, sometimes end up being interpreted as using notions of shared ancestry.

We show that in contrast to recent studies of the use of race in genetic, medical, and pharmaceutical research, there are researchers who view race as an incorrect categorization scheme for operationalizing their search for disease-associated genetic markers. These researchers have devised or adopted new tools, partly in response to critiques about the use of race in genomics. Their new methods thus allow them to distinguish themselves and their work from the generation of admixture mapping genetics, which explicitly uses race groups. These GWAS researchers have developed their alternative notions of population and their alternative technologies to produce what they call a “genetic standpoint” from which to assess and adjust for ancestry differences.

Still, there remains a complex interplay among ancestry and race concepts. Although many GWAS researchers attempt to sidestep the discourse of ancestral origins of the bits of DNA they study, they nevertheless sometimes invoke notions of shared ancestry, which often are interpreted as common geographic origin and, by others, as race categories. Because the medical geneticists deploy methods devised by population geneticists, they also sometimes inherit the notions of populations used by those population geneticists; that is, the designers of GWAS technologies that address population stratification interpret SNP variation groups as human groups that have related genetic histories or shared ancestry. While some geneticists specifically police their language and disavow notions of race or shared ancestry in their work, others do not.

Furthermore, policing their own language does not prevent audiences from reading race into the work of these practitioners. Although ancestry, as used in GWAS, is not equivalent to race, shared ancestry could be interpreted as race, especially when ancestry is traced back to the major continents of Africa, Asia, Europe, and the Americas. It is these continental geographies that lend GWAS analysis to racial interpretations. For example, New York Times science writer Nicholas Wade (2007) read race into the results of one of the first GWAS studies, conducted by the Wellcome Trust in 2007 on several thousand British patients. Thus, despite the fact that some of the researchers we studied who use the notion of ancestry believe that race categories are sociohistorical concepts and that race is an incorrect concept for use in genetics, the notion of shared ancestry is often read as race by the media, the public, or other researchers.

The relationship between race and ancestry, then, is intricate and difficult to disentangle, which also helps to explain the difficulties of separating the reading practices of consumers from the production practices of scientists. Although one could regard ancestry as a concept produced using population genetics tools and race as a sociocultural set of understandings, the two are not so clearly separated in scientific or popular cultural deployments; that is, science and culture are not separate discourses.

Concepts of care and ancestry inform how populations are conceived and constructed in contemporary genomics methodologies, but in different ways. Admixture research uses race and ethnic categories to designate admixed and so-called ancestral populations. Both race and ethnicity are complex social concepts with blurry edges. Nevertheless, researchers use them to posit easily definable, isolated ancestral populations. For example, admixture researchers use race and ethnic categories to determine which individuals and groups are sampled and included in the study as well as to guide the analysis via the construction of ancestral frequencies and the inferred histories of groups. In contrast to this use of race and ethnicity, the notion of ancestry enters GWAS studies at the analysis stage, and while some GWAS geneticists infer shared ancestry and attach ancestry labels to individuals with similar SNP variation scores, others do not. Thus, with caution and care, GWAS technologies can provide alternative means to conduct research on genetic markers associated with disease, without using race or ethnic categories. The construction of populations in these two very different approaches illustrates that all human genomics research is not the same. It points to a diversity of methodologies and choices available to biomedical researchers, suggesting that genomics research into human genetic variation is varied and contingent. Indeed, there are even variations and differences of opinion among researchers at a single laboratory site, as we have described.

Notes

1. One exception here is Fullwiley’s (2008) study of the use of a particular set of ancestry- informative markers in a laboratory using admixture mapping to study asthma. Others (Nelson 2008; Bolnick 2007) have examined the potential of consumer genetic ancestry testing to reinscribe race as biological.

2. Fieldwork described in this chapter (including interviews with researchers, observation in labs and other group meetings, and observation of some work practices) was conducted at three U.S. research sites between January 2007 and April 2009. The research sites consisted of large teams of researchers working on biomedical genetics projects. The research spanned many disciplines, including medical genetics, population genetics, epidemiology, statistics, bioinformatics, and medicine. Some of the projects also involved collaborations across laboratories, institutions, and even nations.

3. Researchers had earlier estimated that any two individuals were 99.9% similar in their DNA sequence, suggesting very little variability between human genomes. This most recent estimate suggests much more variability than previously thought because it includes measurements of structural variation between genomes, including copy number variations.

4. Given that these groups overlap with traditionally disadvantaged groups in the United States, some practitioners of admixture mapping argue that such groups have received insufficient attention by the medical research establishment and view their work as potentially mitigating this gap (Risch et al. 2002; Burchard et al. 2003).

5. We point out that some geneticists have begun to caution that GWAS will not be able to unlock all the genetic contributions to common complex disease. Some are circumspect about the potential significance of any variants uncovered in the future by GWAS, arguing that the most highly significant variants should already have been identified for diseases that have been studied (Goldstein 2009), while others remind readers of the limited levels of risk that GWAS can explain (Kraft and Hunter 2009). A related critique is that too much infrastructural capital is being spent on finding genetic contributions to disease through GWAS, rather than examining and remediating the potentially far more significant contributions of diet, smoking, lack of exercise, stress, inadequate health care, racism, toxic waste or chemical exposure, living conditions, and various combinations of these that epidemiological studies can reveal (e.g., Duster 2003; Krieger and Fee 1994; Montoya 2007). In light of these cautions, the rise of direct-to-consumer genetic testing based on GWAS SNP findings raises concerns. Despite the uncertainties around SNP associations, whose causal relationships to disease remain unestablished, commercial genetic risk testing prematurely applies GWAS findings to individual diagnosis. “Home brew” tests produced by companies like 23andMe, Navigenics, and deCODEme fly under the radar of the Food and Drug Administration. Their availability, alongside media readings of race based on GWAS findings, makes it likely that certain groups or individuals will feel more vulnerable or at risk than others and may be overrepresented in the consumer base of these companies. What are the ramifications of using genetic findings to make broad claims about an individual’s risk for certain diseases, when there are thus far only associations with almost no causal links to genetic markers, let alone to disease genes, and scant knowledge about the other factors outside of the genome, both inside the body and outside, involved in producing disease phenotypes?

6. Most medical geneticists we interviewed used stratification correction tools without thinking about their definitions of population concepts. However, some have thought and continue to think seriously about this issue and have adopted, at least in part, the social science view that races are socially constructed and have to do with much more than ancestry or phenotype.

7. This is the theory often mentioned by our respondents. The evolution of humans is a highly contentious topic, and we are not espousing any particular view here. We instead want to indicate that medical geneticists use some of the ideas and theories from population genetics, but they do not generate those ideas and theories.

8. Why do some researchers choose admixture mapping over GWAS? The rationale for this choice is that admixture mapping is much cheaper in terms of the technological tools required. Through admixture mapping techniques, researchers can achieve reasonable statistical power with fewer DNA samples and by genotyping about three thousand markers in each sample, rather than hundreds of thousands. Thus some researchers believe that certain diseases may be studied more inexpensively using an admixture mapping approach. However, as one respondent noted, GWAS can be used to generate the same findings as admixture mapping without being restricted to admixed populations, and the admixture mapping approach is thus falling out of favor at some labs, even as it remains in use at others.

References

American Society of Human Genetics. 2008. The American Society of Human Genetics ancestry testing statement. http://www.ashg.org/pdf/ASHGAncestryTestingStatement_FINAL.pdf.

Bolnick, D. A. 2007. Individual ancestry inference and the reification of race as a biological phenomenon. In Revisiting race in a genomic age, ed. B. A. Koenig, S. S.-J. Lee, and S. S. Richardson. Piscataway, NJ: Rutgers University Press, 70–88.

Brattain, M. 2007. Race, racism and antiracism: UNESCO and the politics of presenting science to the postwar public. American Historical Review 112:1386–1413.

Burchard, E. G., E. Ziv, N. Coyle, S. L. Gomez, H. Tang, A. J. Karter, J. L. Mountain, E. J. Perez-Stable, D. Sheppard, and N. Risch. 2003. The importance of race and ethnic background in biomedical research and clinical practice. New England Journal of Medicine 348:1170–1175.

Cavalli-Sforza, L. L., A. C. Wilson, C. R. Cantor, R. M. Cook-Deegan, and M. C. King. 1991. Call for a worldwide survey of human genetic diversity: A vanishing opportunity for the Human Genome Project. Genomics 11:490–491.

Duster, T. 2003. Buried alive: The concept of race in science. In Genetic nature/culture: Anthropology and science beyond the two culture divide, ed. A. Goodman, D. Heath, and M. S. Lindee. Berkeley: University of California Press, 258–277.

Fausto-Sterling, A. 2008. The bare bones of race. Social Studies of Science 38:657–694.

Fujimura, J. H., T. Duster, and R. Rajagopalan. 2008. Race, genetics, and disease: Questions of evidence, matters of consequence. Social Studies of Science 38:643–656.

Fullwiley, D. 2008. The biologistical construction of race: “Admixture” technology and the new genetic medicine. Social Studies of Science 38:695–735.

Gissis, S. B. 2008. When is “race” a race? 1946–2003. Studies in History and Philosophy of Biological and Biomedical Sciences 39:437–450.

Goldstein, D. B. 2009. Common genetic variation and human traits. New England Journal of Medicine 360:1696–1698.

Haraway, D. 1989. Primate visions: Gender, race, and nature in the world of modern science. New York: Routledge.

Haraway, D. 1997. Modest_Witness@Second-Millennium.FemaleMan(c)Meets_OncoMouse(tm). New York: Routledge.

Kahn, J. 2005. From disparity to difference: How race-specific medicines may undermine policies to address inequalities in health care. Southern California Interdisciplinary Law Journal 15:105–130.

Koenig, B. A., S. Soo-Jin Lee, and S. S. Richardson, eds. 2008. Revisiting race in a genomic age. Piscataway, NJ: Rutgers University Press.

Kraft, P., and D. J. Hunter. 2009. Genetic risk prediction—are we there yet? New England Journal of Medicine 360:1701–1703.

Krieger, N., and E. Fee. 1994. Man-made medicine and women’s health: The biopolitics of sex/gender and race/ethnicity. International Journal of Health Services 24:265–283.

Levy, S., G. Sutton, P. C. Ng, L. Feuk, A. L. Halpern, B. P. Walenz, N. Axelrod, et al. 2007. The diploid genome sequence of an individual human. PLoS Biology 5:2113–2144.

Lewontin, R. C. 1974. The genetic basis of evolutionary change. New York: Columbia University Press.

Montoya, M. 2007. Bioethnic conscription: Genes, race, and Mexicana/o ethnicity in diabetes research. Cultural Anthropology 22:94–128.

Mueller-Wille, S. 2005. Race and ethnicity: Human diversity and the UNESCO statement on race (1950–51). In Sixty years of UNESCO’s history: Proceedings of the International Symposium in Paris. Paris: UNESCO, 211–220.

Nelson, A. 2008. Bio science: Genetic genealogy testing and the pursuit of African ancestry. Social Studies of Science 38:759–783.

Ossorio, P., and T. Duster. 2005. Race and genetics: Controversies in biomedical, behavioral and forensic sciences. American Psychologist 60:115–128.

Pennisi, E. 2007. Breakthrough of the year: Human genetic variation. Science, 318:1842–1843.

Reardon, J. 2005. Race to the finish: Identity and governance in an age of genomics. Princeton, NJ: Princeton University Press.

Rebbeck, T. R., and P. Sankar. 2005. Ethnicity, ancestry, and race in molecular epidemiologic research. Cancer Epidemiology, Biomarkers and Prevention 14:2467–2471.

Risch, N., E. Burchard, E. Ziv, and H. Tang. 2002. Categorization of humans in biomedical research: Genes, race and disease. Genome Biology. http://genomebiology.com/2002/3/7/comment/2007.

Satel, S. L. 2002. I am a racially profiling doctor. New York Times Magazine, May 5.

Shields, A., M. Fortun, E. M. Hammonds, P. A. King, C. Lerman, R. Rapp, and P. F. Sullivan. 2005. The use of race variables in genetic studies of complex traits and the goal of reducing health disparities. American Psychologist 60:77–103.

Wade, N. 2007. Researchers detect variations in DNA that underlie seven common diseases. New York Times, June 7.