Sean F. Reardon
One of the central sets of questions in both epidemiological and sociological research concerns the patterns of association between where individuals live and their health and social outcomes. Interest in such “neighborhood effects” research has grown dramatically in recent decades, in part due to theoretical and methodological advances that have helped illuminate the associations among neighborhood characteristics and individual outcomes. An important subset of such research is particularly concerned with whether aggregate differences in health and social outcomes among population subgroups (especially groups defined by race, ethnicity, or socioeconomic characteristics) are attributable to—or at least associated with—patterns of racial, ethnic, or socioeconomic residential segregation. In short, this research investigates whether racial/ethnic and socioeconomic differences in health and social outcomes are due to the fact that different subgroups live in different social, physical, and institutional environments. More bluntly, it asks “does segregation contribute to racial/ethnic inequalities?”
To ask whether segregation is associated with subgroup differences in health or social outcomes requires, first, a clear definition of what we mean by “segregation”; second, a strategy for measuring segregation; and third, a methodology for inferring descriptive and causal associations between measured segregation and patterns of subgroup differences. The chapter is organized in three sections, which discuss these three requirements in turn. First, I review definitions of “segregation,” pointing out the different ways that the term is used and suggesting a general framework for conceptualizing segregation. The second section briefly reviews the methodological literature on the measurement of segregation, describing several of the most useful segregation measures. The third section discusses analytic strategies for estimating the association between segregation and individual and subgroup outcome patterns.
What does it mean to say a neighborhood or city is segregated? Consider the following three uses of the term “segregated”:
In the first sentence, “segregated” is an adjective describing the uneven distribution of (racial) groups across the city. In this case, segregation is seen as a characteristic of a region, and describes the extent to which population subgroups are (un)evenly distributed throughout that region. By this usage, a city with 90% black residents would not be segregated, so long as those black residents were evenly distributed throughout the city. Conversely, a city with equal proportions of white, black, and Latino residents, but where each of the three groups occupied distinct areas of the city, would be highly segregated.
In the second sentence, “segregated” is an adjective describing individual neighborhoods. In this case, segregation is a characteristic of a local neighborhood, and is used essentially as shorthand to describe the (racial) composition of that particular neighborhood—so a neighborhood is segregated if it has a high proportion of minority residents. By this usage, a 90% black neighborhood might be described as highly segregated, even if all other neighborhoods in the city were also 90% black. This leads to a possible contradiction between the first and second uses of the term—a city where every neighborhood was 90% black would be completely unsegregated by the first definition, but each neighborhood would be highly segregated by the second definition. The contradiction occurs because the second usage describes local racial composition, while the first describes regional unevenness.
In the third sentence, “segregated” is a verb describing deliberate action on the part of some legislative, judicial, or administrative body to ensure that members of different (racial) groups live in different neighborhoods. In this case, the term “segregated” is used to indicate not only racial differences in housing patterns, but also the presence of some active policy mechanism that produced these differences. In the case of legislatively explicit segregation policies, as with school segregation laws in the South prior to 1954, the use of “segregate” as an active verb is relatively transparent, since both the agent (state and local governments) and the mechanisms (Jim Crow segregation laws) are apparent. In the case of current housing patterns, however, the causes of racial unevenness in residential location are less clear. Thus, the use of “segregated” in this sense depends on some inference about the causes of residential patterns.
While each of these three usages is meaningful within some contexts, I will generally use the term “segregation” in this chapter in only the first sense. A region is segregated, by this definition, to the extent to which individuals of different groups live in different neighborhoods within the region. The term segregation does not apply to individual neighborhoods, but only to larger regions. In describing individual neighborhoods, then, it will be more useful to describe the (racial or socioeconomic) composition of the neighborhood; the term is more precise and better disentangles local composition from regional demographics. Moreover, to say that a region is segregated in this sense is merely to describe the existing patterns of racial or socioeconomic residential patterns; no assumption about the cause of these patterns should be inferred.
The conceptual distinctions among the uses of the term segregation arise, in part, because of different implicit theories about why and how segregation patterns might matter for health and social outcomes. To see this, consider why it might matter where people live.
In part, it matters where people live because residential location influences individuals' proximity to important resources and shapes the possibilities for intergroup contact. From a social inequality perspective, segregation matters because it may be—and generally is—related to the differential proximity of groups to important resources. Such resources may be both institutional (e.g., schools, health clinics and hospitals, child care facilities, and labor markets and employment opportunities) and social (e.g., access to social networks and other forms of social capital). In addition, segregation matters because it may be related to the differential proximity of groups to a variety of potential hazards, including environmental hazards (e.g., poor air or water quality, substandard housing, exposure to lead, etc.), and social hazards such as exposure to crime and violence (Acevedo-Garcia 2001; Acevedo-Garcia, et al. 2003; Downey 2003; Lopez 2002; Massey and Denton 1993; see, for example, Wilson 1987).
From the social inequality perspective, it might not matter if different population groups were residentially separated from one another, so long as both groups had equal proximity to all social resources and hazards. In a society where all social goods (including institutional, social, and environmental resources) were evenly distributed throughout residential space, where one lived would be, in principle, unrelated to one's access to these social goods. In this case, segregation might not matter for health or social outcomes.
From a social interaction perspective, in contrast, segregation matters because it affects the potential for intergroup contact among members of different social groups. If intergroup contact leads to better social relations among groups, and if groups have, on average, different levels of social resources (wealth and access to social networks and social and cultural capital), then proximity to other groups would provide greater potential for the distribution of social resources through intergroup contact. In this perspective, segregation might matter even if institutional and environmental goods were evenly distributed throughout social space, because some social groups might have greater access to forms of capital that provide advantages for health and social outcomes.
In order to measure the segregation of a region, several methodological and conceptual issues must be addressed. First, because segregation indicates the extent to which individuals of different groups live in different neighborhoods, we must clarify the meaning of “neighborhood”; to do this, we must make some determination about the proximity of residential locations to one another within a region. Second, we must decide on a conceptual definition of segregation—Massey and Denton (1988) describe five different dimensions of segregation, which they term evenness, exposure, clustering, concentration, and centralization; strategies for measuring segregation will depend on which of these aspects of segregation we are particularly interested in. Third, we must define the population dimension along which we wish to measure segregation—measuring segregation among two or more distinct, unordered groups requires a different set of measurement tools than does measuring segregation along some ordered or continuous dimension, such as educational attainment or family income. Moreover, even if we are interested in measuring segregation among some set of distinct, categorical groups (e.g., race/ethnic groups), measuring segregation among more than two population subgroups requires different measures than if we are measuring segregation between only two groups.
Segregation can be thought of as the extent to which individuals of different groups occupy or experience different social environments. A measure of segregation, then, implicitly requires that we define the social environment of each individual. Most traditional measures of segregation implicitly define an individual's social environment as equivalent to some organizational or areal unit (e.g., a school or census tract), without regard for the patterning of these units in social space. Such measures are described as aspatial, because they do not account for the arrangement of these units in social space. All individuals in a given census tract, for example, are defined as occupying the same social environment, whose composition is independent of the makeup of nearby tracts.
Aspatial segregation measures have often been criticized in the residential segregation context for their failure to account for the spatial patterning of census tracts (Grannis 2002; Lee et al. 2008; Massey and Denton 1988; Morrill 1991; Reardon and O'Sullivan 2004; Wong 1993, 2002). In particular, aspatial measures are criticized for their sensitivity to the “checkerboard problem” (Morrill 1991; White 1983) and the “modifiable areal unit problem” (Openshaw and Taylor 1979; Wong 1997). Each of these can be seen as critiques of the definition of the social environment implicit in the traditional segregation measures.
The “checkerboard problem” stems from the fact that aspatial segregation measures ignore the spatial proximity of neighborhoods and focus instead only on the racial composition of neighborhoods. To visualize the problem, imagine a checkerboard where each square represents an exclusively black or exclusively white neighborhood (Figure 6.1). If all the black squares were moved to one side of the board and all white squares to the other, we would expect a measure of segregation to register this change as an increase in segregation, since not only would each neighborhood be racially homogeneous but most neighborhoods would now be surrounded by similarly homogeneous neighborhoods. Aspatial measures of segregation, however, do not distinguish between the first and second patterns, since in each case the racial compositions of individual neighborhoods are the same (White 1983).
FIGURE 6.1 THE CHECKERBOARD PROBLEM
The “modifiable areal unit problem” (MAUP) arises in residential segregation measurement because residential population data are typically collected, aggregated, and reported for spatial units (such as census tracts) that have no necessary correspondence with meaningful social/spatial divisions. This data collection scheme implicitly assumes that individuals living near one another (perhaps even across the street from one another) but in separate spatial units are more distant from one another than are two individuals living relatively far from one another but within the same spatial unit. As a result—unless spatial subarea boundaries correspond to meaningful social boundaries—all measures of spatial and aspatial segregation that rely on population counts aggregated within subareas are sensitive to the definitions of the boundaries of these spatial subareas. Figure 6.2 illustrates two aspects of the MAUP: aggregation effects, which result in differences in measured segregation if different-sized subareas are used to compute it; and zoning effects, which result in differences in measured segregation if the subarea boundaries are shifted, even if the number and size of the subareas remain fixed (Openshaw and Taylor 1979; Wong 1997, 1999).
FIGURE 6.2 THE MODIFIABLE AREAL UNIT PROBLEM
In some cases—as when measuring school segregation, for example—it is possible to define organizational units (e.g., schools) that meaningfully delimit social interactions and among which spatial proximity is irrelevant (i.e., schools meaningfully bound students' “school environments”); in such cases, aspatial measures of segregation are perfectly appropriate. In other cases, such as when measuring residential segregation, for example, the checkerboard problem and MAUP pose conceptual difficulties to the measurement of segregation. Reardon and O'Sullivan (2004), however, argue that the “checkerboard problem” and the “modifiable areal unit problem” are both artifacts of a reliance on subarea (e.g., tract) boundaries in the computation of segregation measurement. In principle, segregation measures that use information on the exact locations of individuals and their proximities to one another in residential space would eliminate the “checkerboard problem” and MAUP issues entirely from the measurement of residential segregation. I discuss measures that explicitly account for the spatial patterning of residential locations—so-called spatial measures of segregation—later in this chapter.
The reliance on census tract or other administrative boundaries for the computation of segregation measures has led to some conceptual confusion in the segregation measurement literature. In an oft-cited article, Massey and Denton (1988) describe five conceptually distinct “dimensions” of residential segregation, which they term evenness, exposure, clustering, centralization, and concentration. In their formulation, evenness and exposure are aspatial dimensions (because they ignore the spatial patterning of census tracts and so are subject to the checkerboard problem), while clustering, concentration, and centralization are explicitly spatial dimensions of segregation and require information on the locations and areas of census tracts to compute.
Reardon and O'Sullivan (2004) argue that the distinction between aspatial “evenness” and spatial “clustering,” however, is an artifact of the reliance on spatial subareas (e.g., census tracts) at some chosen geographical scale of aggregation. Evenness, in Massey and Denton's formulation, refers to the degree to which members of different groups are over- and underrepresented in different subareas relative to their overall proportions in the population. Clustering refers to the proximity of subareas with similar group proportions to one another. However, evenness at one level of aggregation (say census tracts) is clearly strongly related to clustering at a lower level of aggregation (say block groups), since tracts where a minority group is overrepresented will tend to be “clusters” of block groups where the minority population is overrepresented. Unless subarea boundaries correspond to meaningful social boundaries, the distinction between “evenness” and “clustering” is arbitrary.
As a result of this insight, Reardon and O'Sullivan (2004) suggest an alternative to the Massey and Denton (1988) dimensions of residential segregation, arguing instead for two primary conceptual dimensions to spatial residential segregation—spatial exposure (or spatial isolation) and spatial evenness (or spatial clustering). Spatial exposure refers to the extent that members of one group encounter members of another group (or their own group, in the case of spatial isolation) in their local spatial environments. Spatial evenness, or clustering, refers to the extent to which groups are similarly distributed in residential space. Spatial exposure, like aspatial exposure, is a measure of the typical environment experienced by individuals; it depends in part on the overall racial composition of the population in the region under investigation. Spatial evenness, in contrast, is independent of the population composition. In this framework, Massey and Denton's evenness and clustering dimensions are collapsed into a single dimension. Their exposure dimension remains intact, but is now conceptualized as explicitly spatial. Their centralization and concentration dimensions can be seen as specific subcategories of spatial unevenness.
Because sociologists initially developed and used segregation indices primarily to study a particular set of social concerns—black/white school and residential segregation during the civil rights era from the 1950s through the 1970s and occupational sex segregation during the 1970s—most segregation indices are designed to measure segregation between two discrete population groups. However, the world is not dichotomous, of course. Social classifications and markers such as race, ethnicity, religion, political affiliation, and occupation encompass multiple distinct categories. Moreover, as US society becomes increasingly racially diverse, two-group measures of racial/ethnic segregation are increasingly inadequate for describing complex patterns of racial segregation and integration.
In addition, given the theoretical importance of income segregation and inequality in sociology, epidemiology, geography, economics, and public policy, measures of income segregation, are particularly important. Unless household income is dichotomized, however, traditional categorical measures of segregation are not useful for describing segregation along an income dimension. Thus, in defining and measuring segregation, it is important to choose indices that appropriately measure segregation along the population dimension of interest.
Although most traditional measures of segregation measure segregation among two discrete groups, some measures have been developed to measure segregation among multiple groups (Reardon and Firebaugh 2002), among ordered categories (Reardon 2009), and along a continuous dimension, such as income (Jargowsky 1996; Jargowsky and Kim 2005; Reardon and Bischoff 2011; Watson 2009). I describe such measures later in this chapter.
A large number of indices of residential segregation have been proposed, evaluated, and used in social science research on segregation and its consequences. To the casual reader, the literature on segregation measurement offers a sometimes bewildering array of proposed indices, and an equally extensive literature criticizing these indices (James and Taeuber 1985; Massey and Denton 1988; Reardon and Firebaugh 2002; Reardon and O'Sullivan 2004; for reviews and evaluations of many such indices, see Zoloth 1976). In the following section, I briefly describe the most useful and most commonly used (not always the same ones) segregation indices, with particular attention to the circumstances under which each might be used.
As noted above, in choosing a segregation index, several considerations are important. First, the choice of index will depend on the dimension of segregation to be measured (e.g., exposure, evenness). Second, the choice of index will depend on the definition of the population dimension among which segregation is to be measured. This dimension may be defined by a binary variable (e.g., white/non-white; male/female), a multigroup categorical variable (e.g., white/black/Hispanic/Asian/other), an ordered categorical variable (e.g., those without a high school diploma, those with an HS diploma but without a college degree, those with a bachelor's degree but no advance degree, and those with an advanced degree), or a continuous variable (e.g., income). Third, the choice of a measure will depend on a definition of subareas among which population groups are distributed (e.g., census tracts, schools) and the extent to which it is important to account for the spatial or social proximity of these subareas to one another. In the following section, I describe segregation measures used to measure exposure and evenness. Among those that measure evenness, I describe two-group, multigroup, and continuous-variable measures of segregation. Finally, I describe measures that can take into account the spatial or social patterning of population distributions.
Throughout this chapter, I use the following notation: consider a spatial region populated by
mutually exclusive population subgroups (e.g., racial groups), indexed by
. Let
be index points within the region
and let
be index subareas of the region
(e.g., census tracts). Let
denote population density and
denote population proportion. Thus we have
In addition, I use a superpositioned tilde () to indicate that a parameter describes the spatial environment of a given point, rather than the point itself (e.g.,
denotes the proportion in group
in the local environment of point
).
Exposure-based indices of segregation measure the average exposure of members of one group (group ) to another group (group
), where “exposure” is understood to be the proportion of group
in the local environment of a member of group
. A region is highly segregated—from the exposure perspective—if members of group
, on average, inhabit local environments containing few members of group
. In the aspatial case, where each individual's local environment is defined by the subarea (e.g., census tract) that she/he inhabits, the exposure index (Bell 1954; Lieberson and Carter 1982a, 1982b) for the exposure of group
to group
(denoted
) is formally defined as
In concrete terms, is simply the average proportion of group
in the subareas of members of group
. Note that the P* index is not symmetric—the exposure of group m to group
is not in general equal to the exposure of group n to group m.
More generally, exposure-based measures might be thought of as measuring the average exposure of a population (or subpopulation) to some environmental characteristic. If measures some characteristic of a local environment
(air quality, percentage of low-income residents, toxic waste facilities, etc.), then the average exposure of members of group
to
will be given by
Equation (1) is simply a special case of (2), where the proportion of group n is the environmental characteristic of interest. In concrete terms, measures the average value of characteristic
in the subareas where members of group m are located.
Reardon and O'Sullivan (2004) suggest a spatial version of (2), the spatial exposure index, which indicates the average exposure of members of group m to some aspect X of their local environment (for details, see Reardon and O'Sullivan 2004):
Because an exposure index measures some characteristic of the average environment of a group, it is dependent on the overall prevalence of that characteristic in the region of interest. As the black proportion of the population grows, for example, then the exposure of whites to blacks may increase even if the spatial distribution of the black and white populations across a region remains the same.
In contrast to exposure measures, “evenness” measures of segregation measure the extent to which population groups are evenly distributed (relative to one another or to some environmental characteristic) across a region. A region is highly segregated—from the evenness perspective—if members of group are distributed very differently throughout a region than are members of group
. In this case, members of group
will inhabit local environments where group
is disproportionately underrepresented relative to its share of the regional population. Evenness measures, unlike exposure measures, are not sensitive to the overall proportions of groups in the population, but rather measure the extent to which groups are differentially distributed throughout a region, regardless of their overall share of the population. Thus, a region can, for example, exhibit high exposure of group
to
while also being characterized by perfect evenness—if group
makes up a large share of the population and groups
and
are identically distributed throughout the region.
There are a number of segregation measures designed to measure “evenness.” Most commonly used is the dissimilarity index (denoted ), though
has been criticized for possessing a number of mathematical properties that are inconsistent with intuitive notions of segregation (James and Taeuber 1985; Reardon and Firebaugh 2002; Winship 1978). In particular,
does not appropriately register changes in the population distribution that should, in principle, change segregation levels—if, for example, a black family moves from a neighborhood that is disproportionately black to a less-black neighborhood,
does not necessarily indicate that the latter configuration of households is less segregated than the former.
Other useful measures of evenness are the information theory index (), the Gini index (
), and the variance ratio index (
).1 More detail on the definitions, interpretations, and properties of these indices can be found in Zoloth (1976), James and Taeuber (1985), White (1986), Massey and Denton (1988), and Reardon and Firebaugh (2002).
Formally, the dissimilarity index (Taeuber and Taeuber 1965) can be written as
The dissimilarity index can be interpreted as the percentage of all individuals who would have to transfer among units in order to equalize the group proportions across units, divided by the percentage who would have to transfer if the system started in a state of complete segregation.
The Gini segregation index (James and Taeuber 1985) is
where ,
, and
are the proportions of group
in the population and in subareas
and
, respectively (the index is symmetric with respect to the two groups, so it does not matter whether we use group
or
in the calculation). The Gini index can be interpreted as the sum of the weighted average absolute difference in group proportions between all possible pairs of subareas, divided by the maximum possible value of this sum (obtained if the system were in a state of complete segregation). Note that the Gini segregation index is related to, but distinct from, the more familiar Gini index of inequality, which is a common measure of income inequality (see, for example, Schwartz and Winship 1980). Like the dissimilarity index, the Gini index of segregation exhibits several undesirable properties (James and Taeuber 1985; Reardon and Firebaugh 2002), though perhaps the primary reason it has been less commonly used is that it is computationally more demanding to calculate than D and other indices.
The variance ratio index (James and Taeuber 1985) is defined as
where and
are the group m proportions in the total population and in subarea
, respectively (again, the index is symmetric with respect to groups
and
, so it does not matter which group is used in the calculation). The variance ratio index can be interpreted as the proportion of the variance in group membership that is accounted for by between-subarea differences in group proportions.2
The information theory index (also called the Theil index, after its originator) measures the variation in diversity across subareas, where the diversity of a population is defined as the entropy () of the population:
where there are groups in the population.3 The entropy takes on a value of 0 if and only if the population is made up of a single group, and has its maximum if each of the
groups are equally represented in the population. The information theory index (Theil 1972; Theil and Finezza 1971) is then defined as
where is the entropy in subarea r. Note that
—unlike
,
, and
—is implicitly defined as a measure of segregation among multiple population groups, since the entropy is defined for any
.
These are not the only measures of “evenness” that have been proposed and used in research on segregation, but they are the most commonly used. In each case, the index has a minimum value of 0—obtained if and only if each subarea has the same group composition—and a maximum value of 1—obtained if and only if each subarea is comprised of a single group.
The dichotomous indices of segregation that measure evenness (,
, and
) each have multigroup analogs (Reardon and Firebaugh 2002), while
is implicitly defined as a multigroup measure (Theil 1972; Theil and Finezza 1971). Each of these multigroup measures of evenness describe the extent to which
population groups are similarly distributed among subareas. Reardon and Firebaugh (2002) provide an extensive review of these multigroup indices and their mathematical properties, concluding that the information theory index (
) is the most flexible and conceptually appropriate multigroup measure of evenness.
The choice of whether to use a two-group or multigroup index of segregation depends on the specific question of interest. In a region where the population is composed of three groups (white non-Hispanic, black non-Hispanic, and Hispanic, for example), we may be interested in the segregation between two specific groups (e.g., how segregated are white from black residents?); or we may be interested in the segregation among all three groups (e.g., how segregated are white, black, and Hispanic residents from one another?). In the first case, any of the two-group indices would be appropriate; in the second case, a multigroup index is required.
A special class of multigroup segregation measures are those that measure segregation among individuals described by a set of ordered categories. For example, one might wish to compute segregation among groups classified by the highest educational degree earned or by some ordinal measure of their occupational status. In such cases, the multigroup measures are not appropriate, because they are blind to the ordered nature of the categories. Reardon (2009) describes a set of measures of ordinal segregation that can be used in such cases.
Segregation indices to measure income segregation, for example (or, more generally, segregation along any continuous variable), have only recently been developed. One commonly used measure of income segregation is the neighborhood sorting index (NSI), which measures the proportion of income variation that lies between subareas (Jargowsky 1996, 1997). If is a measure of income for person
, then the NSI is defined as
where is the mean income in subarea
and
is the mean income in the region. The NSI can be interpreted as the ratio of the standard deviation of subarea mean incomes (weighted by subarea population) to the standard deviation of income in the regional population.
A drawback of the NSI, however, is that it is sensitive to changes in the distribution of income. If income inequality grew (if high-income households' income doubled, for example, while low-income households' incomes remained flat) and each household remained in the same neighborhood, the NSI would register an increase in segregation. An alternate approach is to rank all households by their income level and then to measure the extent to which households are segregated by income rank. This is the approach taken by several new measures of income segregation, in particular the centile gap index (CGI) (Watson 2009) and the rank-order information theory index () (Reardon 2011; Reardon and Bischoff 2011). Both of these measures focus on the sorting of households by income rank rather than focusing on the sorting of income itself. As a result, the CGI and
are both insensitive to changes in the shape of the income distribution. The choice between a measure like the NSI and one like the CGI or
therefore depends on which is conceptually closer to the social phenomenon of interest.
All of the segregation indices described above are aspatial—meaning that they treat each census tract as an isolated neighborhood and do not account for the spatial patterning of tracts. While such indices have been commonly used in many studies of residential segregation, they suffer from the checkerboard problem and MAUP issues, as described above. A class of spatial racial/ethnic segregation indices developed in recent decades, however, are designed to better account for spatial patterns of residential locations (Frank 2003; Grannis 2002; Lee et al. 2008; see, for example, Morgan 1983a, 1983b; Morrill 1991; O'Sullivan and Wong 2004; Reardon et al. 2008; Reardon and O'Sullivan 2004; White 1983; Wong 1993, 1998, 1999; Wu and Sui 2001). While there are many such spatial indices, Reardon and O'Sullivan (2004) show that most fail to meet a set of criteria that ensure they adequately address MAUP issues and match theoretically meaningful conceptions of segregation.
Reardon and O'Sullivan (2004) propose a conceptually straightforward and general approach to measuring two-group and multigroup segregation in a way that accounts for spatial patterns. They suggest that a segregation index should measure the extent to which the local environments of individuals differ in their racial or socioeconomic composition (or, more generally, in any population or environmental trait), where each individual inhabits a “local environment” whose population is made up of the spatially weighted average of the populations (or other characteristics) at each point in the region of interest. Typically, the population at nearby locations will contribute more to the local environment of an individual than will more distant locations (a “distance-decay” effect). Given a particular spatial weighting function and data on the residential location of households, it is straightforward to compute the spatially weighted racial (or socioeconomic) composition of the local environment of each location (or person) in the study region. Given this, spatial exposure is measured by computing the average composition of the local environments of members of each group. Spatial evenness is measured by examining how similar, on average, the racial (or socioeconomic) compositions of all individuals' local environments are to the overall composition of the study region. If each person's local environment is relatively similar in composition to the overall population, there is little spatial unevenness; conversely, if there is considerable deviation from the overall composition, there is high spatial segregation (unevenness).
For example, to compute a spatial version of the information theory index, Reardon and O'Sullivan first define the spatially weighted entropy at each point as
This is the entropy of the local environment of point . It is analogous to the entropy of an individual tract,
, used in the computation of the aspatial segregation index
(if we define the local environment of
to be tract
, then
), except that
may incorporate (proximity weighted) information on the racial composition at all points in
, not just the racial composition of the tract where p is located. The spatial information theory segregation index,
, is then defined as
where is the total population and
is the overall regional entropy as in Equation (8). The spatial information theory index
is the spatial analog to the usual information theory index
, a measure of how much less diverse individuals' local environments are, on average, than is the total population of region
.
One advantage of the spatial segregation measures is that they enable researchers to measure segregation at different geographic scales. Reardon and colleagues compute using a range of radii to define individuals' local environments, and then draw a “spatial segregation profile” that describes the levels of segregation computed over a range of geographic scales, providing a more nuanced view of segregation patterns than could be obtained from aspatial measures or from spatial measures using a single scale (Lee et al. 2008; Reardon and Bischoff 2011; Reardon et al. 2008, 2009).
In any given study, the choice of a segregation index depends on (1) the dimension (exposure or evenness) of segregation of interest; (2) the population dimension of interest (which may be indicated by a binary variable, such as gender, a multigroup categorical variable, such as race, an ordinal variable, such as educational attainment, or a continuous variable, such as income); and (3) the extent to which it is important to account for the spatial proximity of locations. Table 6.1 summarizes the indices described above with regard to these three aspects. In addition, Table 6.1 indicates whether each index is decomposable in two different ways (Reardon and Firebaugh 2002; Reardon and O'Sullivan 2004; Reardon, Yun, and Eitle 2000). Here, organizational decomposability indicates that an index can be decomposed into components of segregation attributable to between- and within-subregion segregation (e.g., into segregation between a city and suburbs and within cities and suburbs separately); grouping decomposability indicates that a multigroup index can be decomposed into components of segregation attributable to segregation between and within different clusters of population subgroups (e.g., into segregation between white and minority families, and segregation among different minority subgroups).
TABLE 6.1 PROPERTIES OF SEGREGATION MEASURES
P* | D | G | V | H | NSI | |
Dimension | ||||||
Exposure | ✓ | |||||
Evenness | ✓ | ✓ | ✓ | ✓ | ✓ | |
Variable types | ||||||
Two-group | ✓ | ✓ | ✓ | ✓ | ✓ | |
Multigroup | ✓ | ✓ | ✓ | ✓ | ||
Ordinal | ✓ | ✓ | ||||
Continuous | ✓ | ✓ | ✓ | |||
Spatial | ||||||
Aspatial | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Spatial | ✓ | ✓ | ✓ | ✓ | ✓ | |
Decomposability | ||||||
Organizational | ✓ | ✓ | ✓ | ✓ | ||
Grouping | ✓ |
If a measure of exposure is required, then a version of the P* exposure index must be used; both spatial and aspatial versions are available. There are more indices that measure evenness for two-group and multigroup population dimensions, each of which (except the Gini index) have both spatial and aspatial versions. Reardon and Firebaugh (2002) and Reardon and O'Sullivan (2004), however, note that the information theory index () and the variance ratio index (
) have more attractive mathematical properties than the others, in part because they are decomposable and in part because they register changes in segregation more appropriately in response to household mobility. Finally, if segregation is to be measured along some ordinal or continuous population dimension—such as income—then both
and
can be adapted to measure both aspatial and spatial segregation. Finally, the NSI is available in both spatial and aspatial versions.
The formulas for computing aspatial segregation indices generally require summing (or double-summing, in the case of the Gini index) over all subareas in a region. Most software packages do not have built-in routines for computing these indices, though the formulas are relatively easily programmed and require only data on race/ethnic (or other group) counts for each tract or subarea.4 The spatial measures of segregation, however, are more complicated to compute, and typically require geographic information systems (GIS) software and data on the spatial patterning of census tracts to compute.5
One of the primary reasons for measuring segregation is to assess the association between segregation patterns and group differences in some health or social outcome. In education, for example, we are interested in knowing whether racial segregation among schools is associated with racial achievement gaps. Likewise, in public health, we are interested in whether racial or socioeconomic segregation is associated with racial or socioeconomic differences in disease rates or mortality.
In this final section, I give a brief introduction to methods of assessing the association between segregation and health and social outcomes. We begin by considering three simple models for the ways that segregation may be associated with some social outcome (e.g., asthma). First, suppose that residents of neighborhoods with higher proportions of black residents have higher rates of asthma, regardless of their individual race (this might happen, for example, if air quality were negatively correlated with the proportion of black residents in a neighborhood and if air quality affects black and white residents similarly). Formally, this would imply that (where
indicates racial group membership,
indexes individuals, and
indexes neighborhoods)
In this first model, neighborhood racial composition is associated with asthma, resulting in an observed racial difference in asthma rates across a region, even if there is no difference in asthma rates between black and white residents living in the same neighborhoods.
A second model would suggest that the segregation level of a region might be associated with average outcomes of all groups within a region. It might not be far-fetched to imagine that mortality rates are higher for all racial groups in more segregated cities, if segregation leads to increased stress, conflict, and violence among groups. If this were true, we would expect to observe
where indexes cities. In this model, segregation is associated with mean outcomes for all racial groups equally.
It is more likely that perhaps segregation is associated with unequal outcomes among segregated groups. Returning to the asthma example above, if air quality were correlated with neighborhood racial composition, then we would expect to observe larger racial differences in asthma rates in more segregated regions, since the racial differences in average exposure to poor air quality will be larger in more segregated regions. Formally, if and
indicate distinct groups in region
, we would expect to observe
Such a model implicitly assumes that either group or
tends to benefit more, on average, from segregation patterns.
We can test association hypotheses of these types using regression techniques. I illustrate first models using individual-level data, but show that we can estimate these using only group-specific aggregate data on an outcome . To test model 1, suppose we collect data on some outcome of interest for a sample of members of groups
and
in some region. We can estimate the average difference in outcomes between the groups by fitting them to the simple regression model:
From this model, we obtain , an estimate of the average difference in
between the two groups. If we let
index neighborhoods (often operationalized as census tracts or blocks, but not necessarily) and define
as the average value of
in neighborhood
(this will be the proportion of the population in neighborhood
who are members of the group indicated by
), we can then estimate the following regression model:
Fitting this model yields , an estimate of the within-neighborhood association between
and
, and
, an estimate of the between-neighborhood association between
and Y after controlling for individuals' group membership (β2 is often termed the neighborhood compositional effect of group membership, though the term “effect” here should be understood as describing as association, not a causal process). Note that β2 is the parameter of interest in model 1, as it describes the association between neighborhood racial composition and Y, holding individual race constant. A useful relationship between equations (15) and (16) is that
where is the variance ratio index segregation measure between the two groups defined by the dichotomous variable
(see Equation (6)). This result allows us to decompose the average difference in outcomes into a within-neighborhood component (
) and a between-neighborhood component (the product of the association between neighborhood composition and the outcome (
) and the level of segregation in the region (
)).
If the within-neighborhood effect is zero, this means that there is, on average, no difference in outcomes between members of different groups residing in the same neighborhoods, and so the total difference in outcomes between groups is associated with neighborhood segregation. If the between-neighborhood effect is zero, in contrast, then segregation cannot be responsible for the difference between groups (assuming the model specified is correct), regardless of how high segregation levels are. Conversely, if segregation levels are very low, then even a strong association between neighborhood composition and the outcome will not produce a large outcome gap.
Next we consider models 2 and 3. In these models, we are interested in how segregation is associated with the observed outcomes (on some variable ) of two groups (denoted by
and
). If we have data on a number of regions (indexed by
) and we measure segregation using some segregation index
, we can write a multilevel linear model (Raudenbush and Bryk 2002) describing the association between
,
, and
:
where
In this model, indicates the average outcome Y in a region with an average level of segregation S; likewise
indicates the average within-region between-group difference in outcomes Y in a region with an average level of segregation S. The coefficient
indicates the association between segregation levels and the average value of Y; the coefficient
indicates the association between segregation levels and the within-region between group difference in average values of Y.
Note that we do not need individual-level data to estimate this model. If we average Equation (18) over all individuals within each region j, we obtain
We can estimate and
directly from aggregated data on Y and measured levels of segregation S. The parameter
is the parameter of interest in model 2, as it describes the association of regional segregation with Y for all individuals.
Likewise, if we average Equation (18) over all individuals of groups G = 1 and G = 0 separately within each region j, we obtain
and
Subtracting (21) from (20), we obtain the between-group gap in Y in region j (denoted by δ1j):
Thus, we can estimate and
directly from the observed within-region average between-group differences in Y and measured levels of segregation S. In model 3,
is the parameter of interest, as it describes the association between segregation and the size of the between-group difference in Y.
If segregation S is measured with the variance ratio index V, then from (22) and (17), we have
This implies
Thus, we can estimate and
in Equation (17) for each region j from aggregated data alone so long as we have data from multiple regions (and we assume model (18) is correct). With only a measure of the between-group gap in Y for each region and the computed variance ratio index for each region, we can estimate the average within-neighborhood gap for each region, and the association between the neighborhood composition and Y, net of group membership.
Finally, note that we can add to any of these models a vector of variables representing mechanisms through which segregation may be related to the outcomes. In the asthma example, for example, we might add measures of air quality to Equations (16) and (22); if the inclusion of this variable reduced the coefficient to 0, this would suggest that the association between segregation and asthma was explained by between-neighborhood differences in air quality. It is important to note, however, that the inclusion of a potential mediator and/or confounder variables is not necessarily straightforward in contextual effect models, due to selection mechanisms, endogeneity, and cross-level interactions (for more discussion of the complexities in making causal inferences from contextual effect models, see Morgenstern 1995 and Oakes 2004).
In this chapter, I have addressed three central issues involved in answering the question of whether segregation is associated with subgroup differences in health or social outcomes. Careful analyses of segregation and health and social outcomes requires, first, a clear conceptualization of what we mean by “segregation.” Theory and prior research are typically useful for determining what conceptual definition of segregation is most appropriate in a given context. Second, segregation research requires a measure or measures of segregation appropriate to the conceptual framework and hypothesized mechanisms. This chapter has briefly reviewed this literature; the interested researcher, however, should consult some of the articles cited here for further detail. Finally, I have described some simple statistical models for inferring descriptive associations between measured segregation and individual and subgroup outcome patterns. The models described in this section are intended as outlines only. In particular, models of the sort described here may be appropriate for estimating patterns of association between segregation and outcomes, but they do not necessarily produce unbiased estimates of causal associations between segregation and observed health and social outcomes. As in all statistical analyses, the old caveat applies: correlation does not imply causation. In fact, designing studies and analytic strategies for inferring the effects of segregation on health and social outcomes is an area of research where much work remains to be done, both methodologically and substantively. This is a rapidly developing field where social epidemiologists might make important contributions.