CHAPTER SIX
A CONCEPTUAL FRAMEWORK FOR MEASURING SEGREGATION AND ITS ASSOCIATION WITH POPULATION OUTCOMES

Sean F. Reardon

One of the central sets of questions in both epidemiological and sociological research concerns the patterns of association between where individuals live and their health and social outcomes. Interest in such “neighborhood effects” research has grown dramatically in recent decades, in part due to theoretical and methodological advances that have helped illuminate the associations among neighborhood characteristics and individual outcomes. An important subset of such research is particularly concerned with whether aggregate differences in health and social outcomes among population subgroups (especially groups defined by race, ethnicity, or socioeconomic characteristics) are attributable to—or at least associated with—patterns of racial, ethnic, or socioeconomic residential segregation. In short, this research investigates whether racial/ethnic and socioeconomic differences in health and social outcomes are due to the fact that different subgroups live in different social, physical, and institutional environments. More bluntly, it asks “does segregation contribute to racial/ethnic inequalities?”

To ask whether segregation is associated with subgroup differences in health or social outcomes requires, first, a clear definition of what we mean by “segregation”; second, a strategy for measuring segregation; and third, a methodology for inferring descriptive and causal associations between measured segregation and patterns of subgroup differences. The chapter is organized in three sections, which discuss these three requirements in turn. First, I review definitions of “segregation,” pointing out the different ways that the term is used and suggesting a general framework for conceptualizing segregation. The second section briefly reviews the methodological literature on the measurement of segregation, describing several of the most useful segregation measures. The third section discusses analytic strategies for estimating the association between segregation and individual and subgroup outcome patterns.

What Is Segregation?

What does it mean to say a neighborhood or city is segregated? Consider the following three uses of the term “segregated”:

  1. Chicago is a segregated city.
  2. Most black children live in segregated neighborhoods.
  3. Some housing projects were deliberately segregated.

In the first sentence, “segregated” is an adjective describing the uneven distribution of (racial) groups across the city. In this case, segregation is seen as a characteristic of a region, and describes the extent to which population subgroups are (un)evenly distributed throughout that region. By this usage, a city with 90% black residents would not be segregated, so long as those black residents were evenly distributed throughout the city. Conversely, a city with equal proportions of white, black, and Latino residents, but where each of the three groups occupied distinct areas of the city, would be highly segregated.

In the second sentence, “segregated” is an adjective describing individual neighborhoods. In this case, segregation is a characteristic of a local neighborhood, and is used essentially as shorthand to describe the (racial) composition of that particular neighborhood—so a neighborhood is segregated if it has a high proportion of minority residents. By this usage, a 90% black neighborhood might be described as highly segregated, even if all other neighborhoods in the city were also 90% black. This leads to a possible contradiction between the first and second uses of the term—a city where every neighborhood was 90% black would be completely unsegregated by the first definition, but each neighborhood would be highly segregated by the second definition. The contradiction occurs because the second usage describes local racial composition, while the first describes regional unevenness.

In the third sentence, “segregated” is a verb describing deliberate action on the part of some legislative, judicial, or administrative body to ensure that members of different (racial) groups live in different neighborhoods. In this case, the term “segregated” is used to indicate not only racial differences in housing patterns, but also the presence of some active policy mechanism that produced these differences. In the case of legislatively explicit segregation policies, as with school segregation laws in the South prior to 1954, the use of “segregate” as an active verb is relatively transparent, since both the agent (state and local governments) and the mechanisms (Jim Crow segregation laws) are apparent. In the case of current housing patterns, however, the causes of racial unevenness in residential location are less clear. Thus, the use of “segregated” in this sense depends on some inference about the causes of residential patterns.

While each of these three usages is meaningful within some contexts, I will generally use the term “segregation” in this chapter in only the first sense. A region is segregated, by this definition, to the extent to which individuals of different groups live in different neighborhoods within the region. The term segregation does not apply to individual neighborhoods, but only to larger regions. In describing individual neighborhoods, then, it will be more useful to describe the (racial or socioeconomic) composition of the neighborhood; the term is more precise and better disentangles local composition from regional demographics. Moreover, to say that a region is segregated in this sense is merely to describe the existing patterns of racial or socioeconomic residential patterns; no assumption about the cause of these patterns should be inferred.

Why Does Segregation Matter?

The conceptual distinctions among the uses of the term segregation arise, in part, because of different implicit theories about why and how segregation patterns might matter for health and social outcomes. To see this, consider why it might matter where people live.

In part, it matters where people live because residential location influences individuals' proximity to important resources and shapes the possibilities for intergroup contact. From a social inequality perspective, segregation matters because it may be—and generally is—related to the differential proximity of groups to important resources. Such resources may be both institutional (e.g., schools, health clinics and hospitals, child care facilities, and labor markets and employment opportunities) and social (e.g., access to social networks and other forms of social capital). In addition, segregation matters because it may be related to the differential proximity of groups to a variety of potential hazards, including environmental hazards (e.g., poor air or water quality, substandard housing, exposure to lead, etc.), and social hazards such as exposure to crime and violence (Acevedo-Garcia 2001; Acevedo-Garcia, et al. 2003; Downey 2003; Lopez 2002; Massey and Denton 1993; see, for example, Wilson 1987).

From the social inequality perspective, it might not matter if different population groups were residentially separated from one another, so long as both groups had equal proximity to all social resources and hazards. In a society where all social goods (including institutional, social, and environmental resources) were evenly distributed throughout residential space, where one lived would be, in principle, unrelated to one's access to these social goods. In this case, segregation might not matter for health or social outcomes.

From a social interaction perspective, in contrast, segregation matters because it affects the potential for intergroup contact among members of different social groups. If intergroup contact leads to better social relations among groups, and if groups have, on average, different levels of social resources (wealth and access to social networks and social and cultural capital), then proximity to other groups would provide greater potential for the distribution of social resources through intergroup contact. In this perspective, segregation might matter even if institutional and environmental goods were evenly distributed throughout social space, because some social groups might have greater access to forms of capital that provide advantages for health and social outcomes.

Conceptual and Methodological Issues in the Measurement of Segregation

In order to measure the segregation of a region, several methodological and conceptual issues must be addressed. First, because segregation indicates the extent to which individuals of different groups live in different neighborhoods, we must clarify the meaning of “neighborhood”; to do this, we must make some determination about the proximity of residential locations to one another within a region. Second, we must decide on a conceptual definition of segregation—Massey and Denton (1988) describe five different dimensions of segregation, which they term evenness, exposure, clustering, concentration, and centralization; strategies for measuring segregation will depend on which of these aspects of segregation we are particularly interested in. Third, we must define the population dimension along which we wish to measure segregation—measuring segregation among two or more distinct, unordered groups requires a different set of measurement tools than does measuring segregation along some ordered or continuous dimension, such as educational attainment or family income. Moreover, even if we are interested in measuring segregation among some set of distinct, categorical groups (e.g., race/ethnic groups), measuring segregation among more than two population subgroups requires different measures than if we are measuring segregation between only two groups.

Spatial and Aspatial Measures of Segregation

Segregation can be thought of as the extent to which individuals of different groups occupy or experience different social environments. A measure of segregation, then, implicitly requires that we define the social environment of each individual. Most traditional measures of segregation implicitly define an individual's social environment as equivalent to some organizational or areal unit (e.g., a school or census tract), without regard for the patterning of these units in social space. Such measures are described as aspatial, because they do not account for the arrangement of these units in social space. All individuals in a given census tract, for example, are defined as occupying the same social environment, whose composition is independent of the makeup of nearby tracts.

Aspatial segregation measures have often been criticized in the residential segregation context for their failure to account for the spatial patterning of census tracts (Grannis 2002; Lee et al. 2008; Massey and Denton 1988; Morrill 1991; Reardon and O'Sullivan 2004; Wong 1993, 2002). In particular, aspatial measures are criticized for their sensitivity to the “checkerboard problem” (Morrill 1991; White 1983) and the “modifiable areal unit problem” (Openshaw and Taylor 1979; Wong 1997). Each of these can be seen as critiques of the definition of the social environment implicit in the traditional segregation measures.

The “checkerboard problem” stems from the fact that aspatial segregation measures ignore the spatial proximity of neighborhoods and focus instead only on the racial composition of neighborhoods. To visualize the problem, imagine a checkerboard where each square represents an exclusively black or exclusively white neighborhood (Figure 6.1). If all the black squares were moved to one side of the board and all white squares to the other, we would expect a measure of segregation to register this change as an increase in segregation, since not only would each neighborhood be racially homogeneous but most neighborhoods would now be surrounded by similarly homogeneous neighborhoods. Aspatial measures of segregation, however, do not distinguish between the first and second patterns, since in each case the racial compositions of individual neighborhoods are the same (White 1983).

Illustration of the Checkerboard problem.

FIGURE 6.1 THE CHECKERBOARD PROBLEM

The “modifiable areal unit problem” (MAUP) arises in residential segregation measurement because residential population data are typically collected, aggregated, and reported for spatial units (such as census tracts) that have no necessary correspondence with meaningful social/spatial divisions. This data collection scheme implicitly assumes that individuals living near one another (perhaps even across the street from one another) but in separate spatial units are more distant from one another than are two individuals living relatively far from one another but within the same spatial unit. As a result—unless spatial subarea boundaries correspond to meaningful social boundaries—all measures of spatial and aspatial segregation that rely on population counts aggregated within subareas are sensitive to the definitions of the boundaries of these spatial subareas. Figure 6.2 illustrates two aspects of the MAUP: aggregation effects, which result in differences in measured segregation if different-sized subareas are used to compute it; and zoning effects, which result in differences in measured segregation if the subarea boundaries are shifted, even if the number and size of the subareas remain fixed (Openshaw and Taylor 1979; Wong 1997, 1999).

Illustration depicting the modifiable areal unit problem.

FIGURE 6.2 THE MODIFIABLE AREAL UNIT PROBLEM

In some cases—as when measuring school segregation, for example—it is possible to define organizational units (e.g., schools) that meaningfully delimit social interactions and among which spatial proximity is irrelevant (i.e., schools meaningfully bound students' “school environments”); in such cases, aspatial measures of segregation are perfectly appropriate. In other cases, such as when measuring residential segregation, for example, the checkerboard problem and MAUP pose conceptual difficulties to the measurement of segregation. Reardon and O'Sullivan (2004), however, argue that the “checkerboard problem” and the “modifiable areal unit problem” are both artifacts of a reliance on subarea (e.g., tract) boundaries in the computation of segregation measurement. In principle, segregation measures that use information on the exact locations of individuals and their proximities to one another in residential space would eliminate the “checkerboard problem” and MAUP issues entirely from the measurement of residential segregation. I discuss measures that explicitly account for the spatial patterning of residential locations—so-called spatial measures of segregation—later in this chapter.

The Dimensions of Segregation

The reliance on census tract or other administrative boundaries for the computation of segregation measures has led to some conceptual confusion in the segregation measurement literature. In an oft-cited article, Massey and Denton (1988) describe five conceptually distinct “dimensions” of residential segregation, which they term evenness, exposure, clustering, centralization, and concentration. In their formulation, evenness and exposure are aspatial dimensions (because they ignore the spatial patterning of census tracts and so are subject to the checkerboard problem), while clustering, concentration, and centralization are explicitly spatial dimensions of segregation and require information on the locations and areas of census tracts to compute.

Reardon and O'Sullivan (2004) argue that the distinction between aspatial “evenness” and spatial “clustering,” however, is an artifact of the reliance on spatial subareas (e.g., census tracts) at some chosen geographical scale of aggregation. Evenness, in Massey and Denton's formulation, refers to the degree to which members of different groups are over- and underrepresented in different subareas relative to their overall proportions in the population. Clustering refers to the proximity of subareas with similar group proportions to one another. However, evenness at one level of aggregation (say census tracts) is clearly strongly related to clustering at a lower level of aggregation (say block groups), since tracts where a minority group is overrepresented will tend to be “clusters” of block groups where the minority population is overrepresented. Unless subarea boundaries correspond to meaningful social boundaries, the distinction between “evenness” and “clustering” is arbitrary.

As a result of this insight, Reardon and O'Sullivan (2004) suggest an alternative to the Massey and Denton (1988) dimensions of residential segregation, arguing instead for two primary conceptual dimensions to spatial residential segregation—spatial exposure (or spatial isolation) and spatial evenness (or spatial clustering). Spatial exposure refers to the extent that members of one group encounter members of another group (or their own group, in the case of spatial isolation) in their local spatial environments. Spatial evenness, or clustering, refers to the extent to which groups are similarly distributed in residential space. Spatial exposure, like aspatial exposure, is a measure of the typical environment experienced by individuals; it depends in part on the overall racial composition of the population in the region under investigation. Spatial evenness, in contrast, is independent of the population composition. In this framework, Massey and Denton's evenness and clustering dimensions are collapsed into a single dimension. Their exposure dimension remains intact, but is now conceptualized as explicitly spatial. Their centralization and concentration dimensions can be seen as specific subcategories of spatial unevenness.

Measuring Segregation Among Different Population Dimensions

Because sociologists initially developed and used segregation indices primarily to study a particular set of social concerns—black/white school and residential segregation during the civil rights era from the 1950s through the 1970s and occupational sex segregation during the 1970s—most segregation indices are designed to measure segregation between two discrete population groups. However, the world is not dichotomous, of course. Social classifications and markers such as race, ethnicity, religion, political affiliation, and occupation encompass multiple distinct categories. Moreover, as US society becomes increasingly racially diverse, two-group measures of racial/ethnic segregation are increasingly inadequate for describing complex patterns of racial segregation and integration.

In addition, given the theoretical importance of income segregation and inequality in sociology, epidemiology, geography, economics, and public policy, measures of income segregation, are particularly important. Unless household income is dichotomized, however, traditional categorical measures of segregation are not useful for describing segregation along an income dimension. Thus, in defining and measuring segregation, it is important to choose indices that appropriately measure segregation along the population dimension of interest.

Although most traditional measures of segregation measure segregation among two discrete groups, some measures have been developed to measure segregation among multiple groups (Reardon and Firebaugh 2002), among ordered categories (Reardon 2009), and along a continuous dimension, such as income (Jargowsky 1996; Jargowsky and Kim 2005; Reardon and Bischoff 2011; Watson 2009). I describe such measures later in this chapter.

Measures of Residential Segregation

A large number of indices of residential segregation have been proposed, evaluated, and used in social science research on segregation and its consequences. To the casual reader, the literature on segregation measurement offers a sometimes bewildering array of proposed indices, and an equally extensive literature criticizing these indices (James and Taeuber 1985; Massey and Denton 1988; Reardon and Firebaugh 2002; Reardon and O'Sullivan 2004; for reviews and evaluations of many such indices, see Zoloth 1976). In the following section, I briefly describe the most useful and most commonly used (not always the same ones) segregation indices, with particular attention to the circumstances under which each might be used.

As noted above, in choosing a segregation index, several considerations are important. First, the choice of index will depend on the dimension of segregation to be measured (e.g., exposure, evenness). Second, the choice of index will depend on the definition of the population dimension among which segregation is to be measured. This dimension may be defined by a binary variable (e.g., white/non-white; male/female), a multigroup categorical variable (e.g., white/black/Hispanic/Asian/other), an ordered categorical variable (e.g., those without a high school diploma, those with an HS diploma but without a college degree, those with a bachelor's degree but no advance degree, and those with an advanced degree), or a continuous variable (e.g., income). Third, the choice of a measure will depend on a definition of subareas among which population groups are distributed (e.g., census tracts, schools) and the extent to which it is important to account for the spatial or social proximity of these subareas to one another. In the following section, I describe segregation measures used to measure exposure and evenness. Among those that measure evenness, I describe two-group, multigroup, and continuous-variable measures of segregation. Finally, I describe measures that can take into account the spatial or social patterning of population distributions.

Notation

Throughout this chapter, I use the following notation: consider a spatial region c06-math-0001 populated by c06-math-0002 mutually exclusive population subgroups (e.g., racial groups), indexed by c06-math-0003. Let c06-math-0004 be index points within the region c06-math-0005 and let c06-math-0006 be index subareas of the region c06-math-0007 (e.g., census tracts). Let c06-math-0008 denote population density and c06-math-0009 denote population proportion. Thus we have

  1. c06-math-0010 = population count of subarea c06-math-0011
  2. c06-math-0012 = population count of group c06-math-0013 in subarea c06-math-0014 (note that c06-math-0015)
  3. c06-math-0016 = population density at point c06-math-0017
  4. c06-math-0018 = population density of group c06-math-0019 at point c06-math-0020
  5. c06-math-0021 = total population in c06-math-0022 (note that c06-math-0023 and c06-math-0024)
  6. c06-math-0025 = proportion in group c06-math-0026 of total population (e.g., proportion black)
  7. c06-math-0027 = proportion in group c06-math-0028 in subarea c06-math-0029 (defined as c06-math-0030)

In addition, I use a superpositioned tilde (c06-math-0031) to indicate that a parameter describes the spatial environment of a given point, rather than the point itself (e.g., c06-math-0032 denotes the proportion in group c06-math-0033 in the local environment of point c06-math-0034).

Measures of “Exposure”

Exposure-based indices of segregation measure the average exposure of members of one group (group c06-math-0035) to another group (group c06-math-0036), where “exposure” is understood to be the proportion of group c06-math-0037 in the local environment of a member of group c06-math-0038. A region is highly segregated—from the exposure perspective—if members of group c06-math-0039, on average, inhabit local environments containing few members of group c06-math-0040. In the aspatial case, where each individual's local environment is defined by the subarea (e.g., census tract) that she/he inhabits, the exposure index (Bell 1954; Lieberson and Carter 1982a, 1982b) for the exposure of group c06-math-0041 to group c06-math-0042 (denoted c06-math-0043) is formally defined as

(1) equation

In concrete terms, c06-math-0045 is simply the average proportion of group c06-math-0046 in the subareas of members of group c06-math-0047. Note that the P* index is not symmetric—the exposure of group m to group c06-math-0048 is not in general equal to the exposure of group n to group m.

More generally, exposure-based measures might be thought of as measuring the average exposure of a population (or subpopulation) to some environmental characteristic. If c06-math-0049 measures some characteristic of a local environment c06-math-0050 (air quality, percentage of low-income residents, toxic waste facilities, etc.), then the average exposure of members of group c06-math-0051 to c06-math-0052 will be given by

(2) equation

Equation (1) is simply a special case of (2), where the proportion of group n is the environmental characteristic of interest. In concrete terms, c06-math-0054 measures the average value of characteristic c06-math-0055 in the subareas where members of group m are located.

Reardon and O'Sullivan (2004) suggest a spatial version of (2), the spatial exposure index, which indicates the average exposure of members of group m to some aspect X of their local environment (for details, see Reardon and O'Sullivan 2004):

(3) equation

Because an exposure index measures some characteristic of the average environment of a group, it is dependent on the overall prevalence of that characteristic in the region of interest. As the black proportion of the population grows, for example, then the exposure of whites to blacks may increase even if the spatial distribution of the black and white populations across a region remains the same.

Measures of “Evenness”

In contrast to exposure measures, “evenness” measures of segregation measure the extent to which population groups are evenly distributed (relative to one another or to some environmental characteristic) across a region. A region is highly segregated—from the evenness perspective—if members of group c06-math-0057 are distributed very differently throughout a region than are members of group c06-math-0058. In this case, members of group c06-math-0059 will inhabit local environments where group c06-math-0060 is disproportionately underrepresented relative to its share of the regional population. Evenness measures, unlike exposure measures, are not sensitive to the overall proportions of groups in the population, but rather measure the extent to which groups are differentially distributed throughout a region, regardless of their overall share of the population. Thus, a region can, for example, exhibit high exposure of group c06-math-0061 to c06-math-0062 while also being characterized by perfect evenness—if group c06-math-0063 makes up a large share of the population and groups c06-math-0064 and c06-math-0065 are identically distributed throughout the region.

There are a number of segregation measures designed to measure “evenness.” Most commonly used is the dissimilarity index (denoted c06-math-0066), though c06-math-0067 has been criticized for possessing a number of mathematical properties that are inconsistent with intuitive notions of segregation (James and Taeuber 1985; Reardon and Firebaugh 2002; Winship 1978). In particular, c06-math-0068 does not appropriately register changes in the population distribution that should, in principle, change segregation levels—if, for example, a black family moves from a neighborhood that is disproportionately black to a less-black neighborhood, c06-math-0069 does not necessarily indicate that the latter configuration of households is less segregated than the former.

Other useful measures of evenness are the information theory index (c06-math-0070), the Gini index (c06-math-0071), and the variance ratio index (c06-math-0072).1 More detail on the definitions, interpretations, and properties of these indices can be found in Zoloth (1976), James and Taeuber (1985), White (1986), Massey and Denton (1988), and Reardon and Firebaugh (2002).

The Dissimilarity Index

Formally, the dissimilarity index (Taeuber and Taeuber 1965) can be written as

(4) equation

The dissimilarity index can be interpreted as the percentage of all individuals who would have to transfer among units in order to equalize the group proportions across units, divided by the percentage who would have to transfer if the system started in a state of complete segregation.

The Gini Index

The Gini segregation index (James and Taeuber 1985) is

(5) equation

where c06-math-0075, c06-math-0076, and c06-math-0077 are the proportions of group c06-math-0078 in the population and in subareas c06-math-0079 and c06-math-0080, respectively (the index is symmetric with respect to the two groups, so it does not matter whether we use group c06-math-0081 or c06-math-0082 in the calculation). The Gini index can be interpreted as the sum of the weighted average absolute difference in group proportions between all possible pairs of subareas, divided by the maximum possible value of this sum (obtained if the system were in a state of complete segregation). Note that the Gini segregation index is related to, but distinct from, the more familiar Gini index of inequality, which is a common measure of income inequality (see, for example, Schwartz and Winship 1980). Like the dissimilarity index, the Gini index of segregation exhibits several undesirable properties (James and Taeuber 1985; Reardon and Firebaugh 2002), though perhaps the primary reason it has been less commonly used is that it is computationally more demanding to calculate than D and other indices.

The Variance Ratio Index

The variance ratio index (James and Taeuber 1985) is defined as

(6) equation

where c06-math-0084 and c06-math-0085 are the group m proportions in the total population and in subarea c06-math-0086, respectively (again, the index is symmetric with respect to groups c06-math-0087 and c06-math-0088, so it does not matter which group is used in the calculation). The variance ratio index can be interpreted as the proportion of the variance in group membership that is accounted for by between-subarea differences in group proportions.2

The Information Theory Index

The information theory index (also called the Theil index, after its originator) measures the variation in diversity across subareas, where the diversity of a population is defined as the entropy (c06-math-0090) of the population:

(7) equation

where there are c06-math-0092 groups in the population.3 The entropy takes on a value of 0 if and only if the population is made up of a single group, and has its maximum if each of the c06-math-0094 groups are equally represented in the population. The information theory index (Theil 1972; Theil and Finezza 1971) is then defined as

(8) equation

where c06-math-0096 is the entropy in subarea r. Note that c06-math-0097—unlike c06-math-0098, c06-math-0099, and c06-math-0100—is implicitly defined as a measure of segregation among multiple population groups, since the entropy is defined for any c06-math-101.

These are not the only measures of “evenness” that have been proposed and used in research on segregation, but they are the most commonly used. In each case, the index has a minimum value of 0—obtained if and only if each subarea has the same group composition—and a maximum value of 1—obtained if and only if each subarea is comprised of a single group.

Measures of Multigroup Segregation

The dichotomous indices of segregation that measure evenness (c06-math-102, c06-math-103, and c06-math-104) each have multigroup analogs (Reardon and Firebaugh 2002), while c06-math-105 is implicitly defined as a multigroup measure (Theil 1972; Theil and Finezza 1971). Each of these multigroup measures of evenness describe the extent to which c06-math-106 population groups are similarly distributed among subareas. Reardon and Firebaugh (2002) provide an extensive review of these multigroup indices and their mathematical properties, concluding that the information theory index (c06-math-107) is the most flexible and conceptually appropriate multigroup measure of evenness.

The choice of whether to use a two-group or multigroup index of segregation depends on the specific question of interest. In a region where the population is composed of three groups (white non-Hispanic, black non-Hispanic, and Hispanic, for example), we may be interested in the segregation between two specific groups (e.g., how segregated are white from black residents?); or we may be interested in the segregation among all three groups (e.g., how segregated are white, black, and Hispanic residents from one another?). In the first case, any of the two-group indices would be appropriate; in the second case, a multigroup index is required.

A special class of multigroup segregation measures are those that measure segregation among individuals described by a set of ordered categories. For example, one might wish to compute segregation among groups classified by the highest educational degree earned or by some ordinal measure of their occupational status. In such cases, the multigroup measures are not appropriate, because they are blind to the ordered nature of the categories. Reardon (2009) describes a set of measures of ordinal segregation that can be used in such cases.

Measures of Income Segregation

Segregation indices to measure income segregation, for example (or, more generally, segregation along any continuous variable), have only recently been developed. One commonly used measure of income segregation is the neighborhood sorting index (NSI), which measures the proportion of income variation that lies between subareas (Jargowsky 1996, 1997). If c06-math-108 is a measure of income for person c06-math-109, then the NSI is defined as

(9) equation

where c06-math-111 is the mean income in subarea c06-math-112 and c06-math-113 is the mean income in the region. The NSI can be interpreted as the ratio of the standard deviation of subarea mean incomes (weighted by subarea population) to the standard deviation of income in the regional population.

A drawback of the NSI, however, is that it is sensitive to changes in the distribution of income. If income inequality grew (if high-income households' income doubled, for example, while low-income households' incomes remained flat) and each household remained in the same neighborhood, the NSI would register an increase in segregation. An alternate approach is to rank all households by their income level and then to measure the extent to which households are segregated by income rank. This is the approach taken by several new measures of income segregation, in particular the centile gap index (CGI) (Watson 2009) and the rank-order information theory index (c06-math-114) (Reardon 2011; Reardon and Bischoff 2011). Both of these measures focus on the sorting of households by income rank rather than focusing on the sorting of income itself. As a result, the CGI and c06-math-115 are both insensitive to changes in the shape of the income distribution. The choice between a measure like the NSI and one like the CGI or c06-math-116 therefore depends on which is conceptually closer to the social phenomenon of interest.

Measures of Spatial Segregation

All of the segregation indices described above are aspatial—meaning that they treat each census tract as an isolated neighborhood and do not account for the spatial patterning of tracts. While such indices have been commonly used in many studies of residential segregation, they suffer from the checkerboard problem and MAUP issues, as described above. A class of spatial racial/ethnic segregation indices developed in recent decades, however, are designed to better account for spatial patterns of residential locations (Frank 2003; Grannis 2002; Lee et al. 2008; see, for example, Morgan 1983a, 1983b; Morrill 1991; O'Sullivan and Wong 2004; Reardon et al. 2008; Reardon and O'Sullivan 2004; White 1983; Wong 1993, 1998, 1999; Wu and Sui 2001). While there are many such spatial indices, Reardon and O'Sullivan (2004) show that most fail to meet a set of criteria that ensure they adequately address MAUP issues and match theoretically meaningful conceptions of segregation.

Reardon and O'Sullivan (2004) propose a conceptually straightforward and general approach to measuring two-group and multigroup segregation in a way that accounts for spatial patterns. They suggest that a segregation index should measure the extent to which the local environments of individuals differ in their racial or socioeconomic composition (or, more generally, in any population or environmental trait), where each individual inhabits a “local environment” whose population is made up of the spatially weighted average of the populations (or other characteristics) at each point in the region of interest. Typically, the population at nearby locations will contribute more to the local environment of an individual than will more distant locations (a “distance-decay” effect). Given a particular spatial weighting function and data on the residential location of households, it is straightforward to compute the spatially weighted racial (or socioeconomic) composition of the local environment of each location (or person) in the study region. Given this, spatial exposure is measured by computing the average composition of the local environments of members of each group. Spatial evenness is measured by examining how similar, on average, the racial (or socioeconomic) compositions of all individuals' local environments are to the overall composition of the study region. If each person's local environment is relatively similar in composition to the overall population, there is little spatial unevenness; conversely, if there is considerable deviation from the overall composition, there is high spatial segregation (unevenness).

For example, to compute a spatial version of the information theory index, Reardon and O'Sullivan first define the spatially weighted entropy at each point c06-math-117 as

(10) equation

This is the entropy of the local environment of point c06-math-119. It is analogous to the entropy of an individual tract, c06-math-120, used in the computation of the aspatial segregation index c06-math-121 (if we define the local environment of c06-math-122 to be tract c06-math-123, then c06-math-124), except that c06-math-125 may incorporate (proximity weighted) information on the racial composition at all points in c06-math-126, not just the racial composition of the tract where p is located. The spatial information theory segregation index, c06-math-127, is then defined as

(11) equation

where c06-math-129 is the total population and c06-math-130 is the overall regional entropy as in Equation (8). The spatial information theory index c06-math-131 is the spatial analog to the usual information theory index c06-math-132, a measure of how much less diverse individuals' local environments are, on average, than is the total population of region c06-math-133.

One advantage of the spatial segregation measures is that they enable researchers to measure segregation at different geographic scales. Reardon and colleagues compute c06-math-134 using a range of radii to define individuals' local environments, and then draw a “spatial segregation profile” that describes the levels of segregation computed over a range of geographic scales, providing a more nuanced view of segregation patterns than could be obtained from aspatial measures or from spatial measures using a single scale (Lee et al. 2008; Reardon and Bischoff 2011; Reardon et al. 2008, 2009).

Choosing Appropriate Segregation Indices

In any given study, the choice of a segregation index depends on (1) the dimension (exposure or evenness) of segregation of interest; (2) the population dimension of interest (which may be indicated by a binary variable, such as gender, a multigroup categorical variable, such as race, an ordinal variable, such as educational attainment, or a continuous variable, such as income); and (3) the extent to which it is important to account for the spatial proximity of locations. Table 6.1 summarizes the indices described above with regard to these three aspects. In addition, Table 6.1 indicates whether each index is decomposable in two different ways (Reardon and Firebaugh 2002; Reardon and O'Sullivan 2004; Reardon, Yun, and Eitle 2000). Here, organizational decomposability indicates that an index can be decomposed into components of segregation attributable to between- and within-subregion segregation (e.g., into segregation between a city and suburbs and within cities and suburbs separately); grouping decomposability indicates that a multigroup index can be decomposed into components of segregation attributable to segregation between and within different clusters of population subgroups (e.g., into segregation between white and minority families, and segregation among different minority subgroups).

TABLE 6.1 PROPERTIES OF SEGREGATION MEASURES

P* D G V H NSI
Dimension
Exposure
Evenness
Variable types
Two-group
Multigroup
Ordinal
Continuous
Spatial
Aspatial
Spatial
Decomposability
Organizational
Grouping

If a measure of exposure is required, then a version of the P* exposure index must be used; both spatial and aspatial versions are available. There are more indices that measure evenness for two-group and multigroup population dimensions, each of which (except the Gini index) have both spatial and aspatial versions. Reardon and Firebaugh (2002) and Reardon and O'Sullivan (2004), however, note that the information theory index (c06-math-135) and the variance ratio index (c06-math-136) have more attractive mathematical properties than the others, in part because they are decomposable and in part because they register changes in segregation more appropriately in response to household mobility. Finally, if segregation is to be measured along some ordinal or continuous population dimension—such as income—then both c06-math-137 and c06-math-138 can be adapted to measure both aspatial and spatial segregation. Finally, the NSI is available in both spatial and aspatial versions.

Computing Segregation Indices

The formulas for computing aspatial segregation indices generally require summing (or double-summing, in the case of the Gini index) over all subareas in a region. Most software packages do not have built-in routines for computing these indices, though the formulas are relatively easily programmed and require only data on race/ethnic (or other group) counts for each tract or subarea.4 The spatial measures of segregation, however, are more complicated to compute, and typically require geographic information systems (GIS) software and data on the spatial patterning of census tracts to compute.5

The Association of Segregation with Population Outcomes

One of the primary reasons for measuring segregation is to assess the association between segregation patterns and group differences in some health or social outcome. In education, for example, we are interested in knowing whether racial segregation among schools is associated with racial achievement gaps. Likewise, in public health, we are interested in whether racial or socioeconomic segregation is associated with racial or socioeconomic differences in disease rates or mortality.

In this final section, I give a brief introduction to methods of assessing the association between segregation and health and social outcomes. We begin by considering three simple models for the ways that segregation may be associated with some social outcome c06-math-139 (e.g., asthma). First, suppose that residents of neighborhoods with higher proportions of black residents have higher rates of asthma, regardless of their individual race (this might happen, for example, if air quality were negatively correlated with the proportion of black residents in a neighborhood and if air quality affects black and white residents similarly). Formally, this would imply that (where c06-math-140 indicates racial group membership, c06-math-141 indexes individuals, and c06-math-142 indexes neighborhoods)

(12) equation

In this first model, neighborhood racial composition is associated with asthma, resulting in an observed racial difference in asthma rates across a region, even if there is no difference in asthma rates between black and white residents living in the same neighborhoods.

A second model would suggest that the segregation level of a region might be associated with average outcomes of all groups within a region. It might not be far-fetched to imagine that mortality rates are higher for all racial groups in more segregated cities, if segregation leads to increased stress, conflict, and violence among groups. If this were true, we would expect to observe

(13) equation

where c06-math-145 indexes cities. In this model, segregation is associated with mean outcomes for all racial groups equally.

It is more likely that perhaps segregation is associated with unequal outcomes among segregated groups. Returning to the asthma example above, if air quality were correlated with neighborhood racial composition, then we would expect to observe larger racial differences in asthma rates in more segregated regions, since the racial differences in average exposure to poor air quality will be larger in more segregated regions. Formally, if c06-math-146 and c06-math-147 indicate distinct groups in region c06-math-148, we would expect to observe

(14) equation

Such a model implicitly assumes that either group c06-math-150 or c06-math-151 tends to benefit more, on average, from segregation patterns.

We can test association hypotheses of these types using regression techniques. I illustrate first models using individual-level data, but show that we can estimate these using only group-specific aggregate data on an outcome c06-math-152. To test model 1, suppose we collect data on some outcome of interest for a sample of members of groups c06-math-153 and c06-math-154 in some region. We can estimate the average difference in outcomes between the groups by fitting them to the simple regression model:

(15) equation

From this model, we obtain c06-math-156, an estimate of the average difference in c06-math-157 between the two groups. If we let c06-math-158 index neighborhoods (often operationalized as census tracts or blocks, but not necessarily) and define c06-math-159 as the average value of c06-math-160 in neighborhood c06-math-161 (this will be the proportion of the population in neighborhood c06-math-162 who are members of the group indicated by c06-math-163), we can then estimate the following regression model:

(16) equation

Fitting this model yields c06-math-165, an estimate of the within-neighborhood association between c06-math-166 and c06-math-167, and c06-math-168, an estimate of the between-neighborhood association between c06-math-169 and Y after controlling for individuals' group membership (β2 is often termed the neighborhood compositional effect of group membership, though the term “effect” here should be understood as describing as association, not a causal process). Note that β2 is the parameter of interest in model 1, as it describes the association between neighborhood racial composition and Y, holding individual race constant. A useful relationship between equations (15) and (16) is that

(17) equation

where c06-math-171 is the variance ratio index segregation measure between the two groups defined by the dichotomous variable c06-math-172 (see Equation (6)). This result allows us to decompose the average difference in outcomes into a within-neighborhood component (c06-math-173) and a between-neighborhood component (the product of the association between neighborhood composition and the outcome (c06-math-174) and the level of segregation in the region (c06-math-175)).

If the within-neighborhood effect is zero, this means that there is, on average, no difference in outcomes between members of different groups residing in the same neighborhoods, and so the total difference in outcomes between groups is associated with neighborhood segregation. If the between-neighborhood effect is zero, in contrast, then segregation cannot be responsible for the difference between groups (assuming the model specified is correct), regardless of how high segregation levels are. Conversely, if segregation levels are very low, then even a strong association between neighborhood composition and the outcome will not produce a large outcome gap.

Next we consider models 2 and 3. In these models, we are interested in how segregation is associated with the observed outcomes (on some variable c06-math-176) of two groups (denoted by c06-math-177 and c06-math-178). If we have data on a number of regions (indexed by c06-math-179) and we measure segregation using some segregation index c06-math-180, we can write a multilevel linear model (Raudenbush and Bryk 2002) describing the association between c06-math-181, c06-math-182, and c06-math-183:

equation

where

(18) equation

In this model, c06-math-185 indicates the average outcome Y in a region with an average level of segregation S; likewise c06-math-186 indicates the average within-region between-group difference in outcomes Y in a region with an average level of segregation S. The coefficient c06-math-187 indicates the association between segregation levels and the average value of Y; the coefficient c06-math-188 indicates the association between segregation levels and the within-region between group difference in average values of Y.

Note that we do not need individual-level data to estimate this model. If we average Equation (18) over all individuals within each region j, we obtain

(19) equation

We can estimate c06-math-190 and c06-math-191 directly from aggregated data on Y and measured levels of segregation S. The parameter c06-math-192 is the parameter of interest in model 2, as it describes the association of regional segregation with Y for all individuals.

Likewise, if we average Equation (18) over all individuals of groups G = 1 and G = 0 separately within each region j, we obtain

(20) equation

and

(21) equation

Subtracting (21) from (20), we obtain the between-group gap in Y in region j (denoted by δ1j):

(22) equation

Thus, we can estimate c06-math-196 and c06-math-197 directly from the observed within-region average between-group differences in Y and measured levels of segregation S. In model 3, c06-math-198 is the parameter of interest, as it describes the association between segregation and the size of the between-group difference in Y.

If segregation S is measured with the variance ratio index V, then from (22) and (17), we have

(23) equation

This implies

(24) equation

Thus, we can estimate c06-math-201 and c06-math-202 in Equation (17) for each region j from aggregated data alone so long as we have data from multiple regions (and we assume model (18) is correct). With only a measure of the between-group gap in Y for each region and the computed variance ratio index for each region, we can estimate the average within-neighborhood gap for each region, and the association between the neighborhood composition and Y, net of group membership.

Finally, note that we can add to any of these models a vector of variables representing mechanisms through which segregation may be related to the outcomes. In the asthma example, for example, we might add measures of air quality to Equations (16) and (22); if the inclusion of this variable reduced the coefficient c06-math-203 to 0, this would suggest that the association between segregation and asthma was explained by between-neighborhood differences in air quality. It is important to note, however, that the inclusion of a potential mediator and/or confounder variables is not necessarily straightforward in contextual effect models, due to selection mechanisms, endogeneity, and cross-level interactions (for more discussion of the complexities in making causal inferences from contextual effect models, see Morgenstern 1995 and Oakes 2004).

Summary

In this chapter, I have addressed three central issues involved in answering the question of whether segregation is associated with subgroup differences in health or social outcomes. Careful analyses of segregation and health and social outcomes requires, first, a clear conceptualization of what we mean by “segregation.” Theory and prior research are typically useful for determining what conceptual definition of segregation is most appropriate in a given context. Second, segregation research requires a measure or measures of segregation appropriate to the conceptual framework and hypothesized mechanisms. This chapter has briefly reviewed this literature; the interested researcher, however, should consult some of the articles cited here for further detail. Finally, I have described some simple statistical models for inferring descriptive associations between measured segregation and individual and subgroup outcome patterns. The models described in this section are intended as outlines only. In particular, models of the sort described here may be appropriate for estimating patterns of association between segregation and outcomes, but they do not necessarily produce unbiased estimates of causal associations between segregation and observed health and social outcomes. As in all statistical analyses, the old caveat applies: correlation does not imply causation. In fact, designing studies and analytic strategies for inferring the effects of segregation on health and social outcomes is an area of research where much work remains to be done, both methodologically and substantively. This is a rapidly developing field where social epidemiologists might make important contributions.

References

  1. Acevedo-Garcia, D. (2001) Zip code-level risk factors for tuberculosis: neighborhood environment and residential segregation in New Jersey, 1985–1992. American Journal of Public Health, 91, 734–741.
  2. Acevedo-Garcia, D., Lochner, K.A., Osypuk, T.L., and Subramanian, S.V. (2003) Future directions in residential segregation and health research: a multilevel approach. American Journal of Public Health, 93, 215–220.
  3. Bell, W. (1954) A probability model for the measurement of ecological segregation. Social Forces, 43, 357–364.
  4. Clotfelter, C.T. (1999) Public school segregation in metropolitan areas. Land Economics, 75, 487–504.
  5. Coleman, J.S., Hoffer, T., and Kilgore, S. (1982) High School Achievement: Public, Catholic, and Private Schools Compared, Basic Books, New York.
  6. Downey, L. (2003) Spatial measurement, geography, and urban racial inequality. Social Forces, 81, 937–952.
  7. Duncan, O.D. and Duncan, D. (1955) A methodological analysis of segregation indexes. American Sociological Review, 20 (2), 210–217.
  8. Frank, A.I. (2003) Using measures of spatial autocorrelation to describe socio-economic and racial residential patterns in US urban areas, in Socio-economic Applications of Geographic Information Science, Innovations in GIS (eds D. Kidner, G. Higgs, and S. White), Taylor & Francis, London, pp. 147–162.
  9. Grannis, R. (1998) The importance of trivial streets: Residential streets and residential segregation 1. American Journal of Sociology, 103 (6), 1530–1564.
  10. Grannis, R. (2002) Discussion: segregation indices and their functional inputs. Sociological Methodology, 32, 69–84.
  11. James, D.R. and Taeuber, K.E. (1985) Measures of segregation. Sociological Methodology, 14, 1–32.
  12. James, F.J. (1986) A new generalized “exposure-based” segregation index demonstration in Denver and Houston. Sociological Methods & Research, 14 (3), 301–316.
  13. Jargowsky, P.A. (1996) Take the money and run: economic segregation in U.S. metropolitan areas. American Sociological Review, 61, 984–998.
  14. Jargowsky, P.A. (1997) Poverty and Place: Ghettos, Barrios, and the American City, Russell Sage Foundation, New York.
  15. Jargowsky, P.A. and Kim, J. (2005) A measure of spatial segregation: the generalized neighborhood sorting index. National Poverty Center, University of Michigan, Ann Arbor.
  16. Lee, B.A., Reardon, S.F., Firebaugh, G., et al. (2008) Beyond the census tract: patterns and determinants of racial residential segregation at multiple geographic scales. American Sociological Review, 73, 766–791.
  17. Lieberson, S. and Carter, D.K. (1982a) A model for inferring the voluntary and involuntary causes of residential segregation. Demography, 19, 511–526.
  18. Lieberson, S. and Carter, D.K. (1982b) Temporal changes and urban differences in residential segregation: a reconsideration. American Journal of Sociology, 88, 296–310.
  19. Lopez, R. (2002) Segregation and black–white differences in exposure to air toxics in 1990. Environmental Health Perspectives, 110, 289–295.
  20. Massey, D.S. and Denton, N.A. (1988) The dimensions of residential segregation. Social Forces, 67, 281–315.
  21. Massey, S. and Denton, N.A. (1993) American Apartheid: Segregation and the Making of the Underclass, Harvard University Press, Cambridge, MA.
  22. Morgan, B.S. (1983a) An alternate approach to the development of a distance-based measure of racial segregation. American Journal of Sociology, 88, 1237–1249.
  23. Morgan, B.S. (1983b) A distance–decay interaction index to measure residential segregation. Area, 15, 211–216.
  24. Morgenstern, H. (1995) Ecologic studies in epidemiology: concepts, principles, and methods. Annual Review of Public Health, 16, 61–81.
  25. Morrill, R.L. (1991) On the measure of spatial segregation. Geography Research Forum, 11, 25–36.
  26. Oakes, J.M. (2004) The (mis)estimation of neighborhood effects: causal inference for a practicable social epidemiology. Social Science and Medicine, 58, 1929–1952.
  27. Openshaw, S. and Taylor, P. (1979) A million or so correlation coefficients: three experiments on the modifiable area unit problem, in Statistical Applications in the Spatial Sciences (ed. N. Wrigley), Pion, London, pp. 127–144.
  28. O'Sullivan, D. and Wong, D.W.S. (2004 March) A density surface-based approach to measuring spatial segregation. Annual Meeting of the Association of American Geographers, Philadelphia.
  29. Raudenbush, S.W. and Bryk, A.S. (2002) Hierarchical Linear Models: Applications and Data Analysis Methods, Sage Publications, Thousand Oaks, CA.
  30. Reardon, S.F. (2002) SEG: Stata Module for Computing Multiple-Group Diversity and Segregation Indices, StataCorp LP, College Station, TX.
  31. Reardon, S.F. (2009) Measures of ordinal segregation, in Occupational and Residential Segregation, vol. 17 (eds Y. Flückiger, S.F. Reardon, and J. Silber), Emerald, Bingley, pp.129–155.
  32. Reardon, S.F. (2011) Measures of income segregation, in CEPA Working Papers, Stanford Center for Education Policy Analysis, Stanford, CA.
  33. Reardon, S.F. and Bischoff, K. (2011) Income inequality and income segregation. American Journal of Sociology, 116, 1092–1153.
  34. Reardon, S.F., Farrell, C.R., Matthews, S., et al. (2009) Race and space in the 1990s: changes in the geographic scale of racial residential segregation, 1990–2000. Social Science Research, 38, 55–70.
  35. Reardon, S.F. and Firebaugh, G. (2002) Measures of multi-group segregation. Sociological Methodology, 32, 33–67.
  36. Reardon, S.F., Matthews, S.A., O'Sullivan, D., et al. (2008) The geographic scale of metropolitan racial segregation. Demography, 45, 489–514.
  37. Reardon, S.F. and O'Sullivan, D. (2004) Measures of spatial segregation. Sociological Methodology, 34, 121–162.
  38. Reardon, S.F. and Yun, J.T. (2002) Private school racial enrollments and segregation. Civil Rights Project, Harvard University, Cambridge, MA.
  39. Reardon, S.F., Yun, J.T., and McNulty Eitle, T. (2000) The changing structure of school segregation: measurement and evidence of multi-racial metropolitan area school segregation, 1989–1995. Demography, 37, 351–364.
  40. Schwartz, J. and Winship, C. (1980) The welfare approach to measuring inequality. Sociological Methodology, 9, 1–36.
  41. Stata Corporation (2003) Stata, Stata Corporation, College Station, TX.
  42. Stearns, L.B. and Logan, J.R. (1986) Measuring trends in segregation three dimensions, three measures. Urban Affairs Review, 22 (1), 124–150.
  43. Taeuber, K.E. and Taeuber, A.F. (1965) Negroes in Cities: Residential Segregation and Neighborhood Change, Aldine Publishing Co., Chicago, IL.
  44. Theil, H. (1972) Statistical Decomposition Analysis, vol. 14 (ed. H. Theil), North-Holland Publishing Company, Amsterdam.
  45. Theil, H. and Finezza, A.J. (1971) A note on the measurement of racial integration of schools by means of informational concepts. Journal of Mathematical Sociology, 1, 187–194.
  46. Watson, T. (2009) Inequality and the measurement of residential segregation by income. Review of Income and Wealth, 55, 820–844.
  47. White, M.J. (1983) The measurement of spatial segregation. American Journal of Sociology, 88,1008–1018.
  48. White, M.J. (1986) Segregation and diversity measures in population distribution. Population Index, 52, 198–221.
  49. Wilson, W.J. (1987) The Truly Disadvantaged: The Inner City, the Underclass, and Public Policy, University of Chicago Press, Chicago, IL.
  50. Winship, C. (1978) The desirability of using the index of dissimilarity or any adjustment of it for measuring segregation: reply to Falk, Cortese, and Cohen. Social Forces, 57, 717–720.
  51. Wong, D.W.S. (1993) Spatial indices of segregation. Urban Studies, 30, 559–572.
  52. Wong, D.W.S. (1997) Spatial dependency of segregation indices. The Canadian Geographer, 41, 128–136.
  53. Wong, D.W.S. (1998) Measuring multiethnic spatial segregation. Urban Geography, 19, 77–87.
  54. Wong, D.W.S. (1999) Geostatistics as measures of spatial segregation. Urban Geography, 20, 635–647.
  55. Wong, D.W.S. (2002) Spatial measures of segregation and GIS. Urban Geography, 23, 85–92.
  56. Wu, X.B. and Sui, D.Z. (2001) An initial exploration of a lacunarity-based segregation measure. Environment and Planning B: Planning and Design, 28, 433–446.
  57. Zoloth, B.S. (1976) Alternative measures of school segregation. Land Economics, 52, 278–298.