CHAPTER 8
Whither the Community?
IN 2010, the Obama administration and the Department of Education announced the Promise Neighborhoods grant initiative. The goal was to support the establishment of “great schools and strong systems of family and community support” within disadvantaged communities, enabling youths there “to attain an excellent education and successfully transition to college and a career.” 1 The program’s vision was modeled on Geoffrey Canada’s Harlem Children’s Zone, a hybrid school and community center that had successfully integrated the various institutions responsible for youth services with each other and with the surrounding community. The Harlem Children’s Zone has been lauded as an archetype for transforming disadvantaged communities and creating the “promise” that youths growing up there might not otherwise experience, and applicants to the grant program were encouraged to find ways to similarly break down silos between services and to make their relationships with the community more seamless. 2 Since the program’s inception, the Department of Education has issued over $100 million in awards to projects spanning the demographic and geographic range of the country. These have included such diverse examples as Knox County, Kentucky, an impoverished, almost exclusively white rural community; the predominantly Latino community in the Los Angeles Promise Zone; the largely black West Philadelphia neighborhood; and the Paskenta-Nomlaki Indians in Corning, California.
Promise Neighborhoods was not a research grant program. It was explicitly intended to fund the planning, organization, and implementation of integrated, community-facing youth services. Nevertheless, recipients have ended up doing plenty of empirical research. In fact, it was required of them. One of the main thrusts in the program description was “learning about the overall impact of the Promise Neighborhoods program and about the relationship between particular strategies in Promise Neighborhoods and student outcomes, including through a rigorous evaluation.” 3 The application materials stipulated that a “rigorous evaluation” would have to assess 10 key results, including “children enter kindergarten ready to learn,” “youth graduate from high school,” and “students live in stable communities.” A successful applicant would have to present a clear plan for collecting and analyzing relevant data and would then have to demonstrate success via these metrics to be considered eligible for further funding.
The logic of making funding for social services contingent on an evidence-based approach is sound in theory, but it creates a practical conundrum. Few organizations are equipped with the expertise both to innovate in youth services and to conduct a rigorous, end-to-end evaluation of those services. The most obvious solution to this problem was for a community to partner with a local academic institution that would execute the methodological side of the project; for example, the Knox County and West Philadelphia examples were led in part by Berea College and Drexel University, respectively. This was not a feasible solution for all communities. As we saw in Chapter 7 , not every city government has a university partner at the ready, and the challenge is probably even greater for individual neighborhoods, which may not have the institutional connections to construct such a partnership. The result was a vacuum that a handful of nonprofits and academic consortia sought to fill, including the Promise Neighborhood Institute, an effort of PolicyLink, and the Center for the Study of Social Policy in Washington, D.C. I had the privilege to be part of one such group, the Promise Neighborhood Research Consortium, an interdisciplinary team of researchers from across the country. Central to our mission was the construction of methodological tools and guidelines to support Promise Neighborhood grant recipients in their program evaluations.
The challenge presented by the evaluation requirements of the Promise Neighborhoods program reflects the second digital divide facing the field of urban informatics. There is an increasing expectation on the part of funders that community organizations keep up with societal trends by conducting rigorous, data-driven program evaluations, and yet they are severely limited in their ability to do so. That said, the problem goes beyond the challenge of reporting program outcomes. Much has been made of the value that modern digital data, especially open data, offer the public, but who is prepared to execute on that potential? Community organizations are uniquely positioned to identify local needs, and if they were skilled with modern digital data, they could utilize them toward these ends in a way that others cannot. While groups like the MetroLab Network wrestle with the digital divide between cities, far less formal attention is being paid to the digital divide emerging within cities, between those professional institutions that are equipped to work with data and the community members and organizations that are not. Put another way, there is no city-university-community model for urban informatics.
This chapter takes up the role of infomediaries, or institutions that can translate data into products that hold public value, in closing the digital divide between data-savvy institutions and the rest of society. Though most work to date has cast infomediaries as consumers of raw data who provide it in more interpretable forms, here I concentrate on the role that community organizations can play in using the data to inform and support programs and services. For this to be possible, however, they need support from other institutions that are skilled in data science. This chapter describes a handful of projects that pursue this ideal, including the Boston Area Research Initiative’s program that trains community organizations in the use of its Boston Data Portal. I summarize a survey of training attendees, offering a more direct window into their current utilization of data and attitudes toward “big data.” From these results, I seek deeper insights on what an effective city-university-community model might look like, reaching some conclusions but also uncovering a set of unanswered questions.
Big Data and the Public
Returning to a major theme from Part I of this book, the novel data resources that have catalyzed the field of urban informatics, including administrative records, social media posts, and sensor readings, can be a mixed blessing. On the one hand, they create many opportunities for a deeper understanding of the city. On the other hand, the conceptual and methodological challenges they present can be overwhelming. The result is a “data deluge” that has left us awash in information that we do not yet fully understand how to navigate. Writers on the subject, myself included, often characterize the problem as one facing the public agencies, private corporations, and academics who work with data professionally. Less time is spent considering the public—community members, leaders of local organizations and nonprofits, and their associates. The data describe the streets that community members walk, the neighborhoods where they live, work, and play, and even their actions and interactions, and yet such resources are not typically accessible to them. Even when the data are made available, most members of the general public do not have the skills to properly analyze them. They would indeed be lost at sea in the data deluge.
As with other digital divides, we might begin to examine this disparity between data-savvy professional institutions and the public with the question of material access: Could a member of the public access these data and, if so, at what cost of effort and resources? Let us take the most available of the newer forms of digital data, municipal administrative records. Traditionally, access to such data has required a formal request under the Freedom of Information Act (FOIA) and sometimes a charge for the expense of organizing and delivering the data. Under FOIA, the government only has to release data in response to requests, placing the burden of access on the public’s time, energy, and ability to identify and request the desired data.
In recent years, the reactive approach of FOIA has given way to the proactivity of open data. This has been instigated in part by groups like the Sunlight Foundation, which have successfully advocated for transparency in government records. It has grown even more as governments and corporations have recognized the potential value of sharing their data with others; researchers and “hackers” are likely to leverage the opportunity to conduct new analyses and build applications that benefit the original data holder, creating something analogous to a low-cost, adjunct R&D team. Consequently, dozens of municipal governments have constructed data portals through which they publish nonsensitive government data, including 311 requests, tax assessments, city-curated mapping files, crime reports, and the like. This has been most prominent in big cities like New York, Chicago, Boston, and San Francisco but has also gained traction in smaller cities. In the greater Boston area alone, the cities of Cambridge and Somerville, both with populations of ∼100,000 people, have also constructed open data portals. In parallel, national governments, including those of the United States and Canada, have formed their own open data initiatives and portals, further accelerating the trend.
Proponents of open data have hailed the trend as providing the public with “greater control over their lives and improv[ing] both their material and social conditions.” 4 Though this perspective is inspiring in principle, we arrive at a recurring theme in the story of the digital divide. Open data attends to the literal problem of material access but reveals a disparity in skills. Data portals publish large spreadsheets, often with limited documentation, meaning that only those familiar with the tools of data science are equipped to work with them. This is a vanishingly small proportion of the population, an issue that has been noted even by those advocating for open data. A 2014 survey found that 70 percent of the United Kingdom’s Open Government Data community (i.e., people involved in the production or use of open data) believed that members of the public lacked the necessary skills to effectively use open data. 5 Similar concerns arose in a survey of Canadians regarding that country’s Open Data Initiative. 6 These impressions highlight the fact that solving the problem of material access will not on its own enable the public to make use of municipal data.
The concern of whether members of the public have the skills to work with and leverage data has not been isolated to the open data movement. As data have permeated all aspects of our daily lives, some have argued that “data literacy” is a critical capacity for people in modern society. The term refers to an individual’s ability to understand and interpret data and the tools used to organize, analyze, and visually represent information. Even with uniform access, differences in data literacy will result in disparities in the benefits that one might gain from modern digital data. Unfortunately, the problem of data literacy is not a simple one, in part because it encompasses a wide-ranging set of competencies, including “the ability to: formulate and answer questions using data as part of evidence-based thinking; use appropriate data, tools, and representations to support this thinking; interpret information from data; develop and evaluate data-based inferences and explanations; and use data to solve real problems and communicate their solutions.” 7 This definition is so broad that it borders on the nebulous, but that in itself is instructive. Data literacy does not comprise a small set of discrete skills but is instead an overarching capacity to reason about and with data. This has led educators to assert that the most effective approach for achieving universal data literacy begins with its incorporation into primary and secondary school mathematics curricula. Some have even experimented with interdisciplinary collaborations that place these forms of reasoning within applied contexts, such as social studies. 8 This is all to say that the problem of data literacy is not a simple one, creating a persistent limitation to the narrowing of this particular digital divide.
Data Infomediaries: Lowering the Needs of Data Literacy
When concerns about the digital divide in internet usage transitioned from disparities in material access to disparities in skills a decade ago, the worry was whether older, less affluent, and less educated individuals with broadband access would be able to effectively utilize their newfound online resources. Similarly, low levels of data literacy across society mean that open data alone is not sufficient to guarantee that the broader public will gain value from these resources. A major difference between these two stories, however, is the level of effort necessary to even out disparities in skills. Whereas a few hours of classes can meaningfully advance an individual’s ability to access internet resources, the utilization of modern digital data requires extensive training. Many of the data sets made available through open data portals are foreign even to most academics, meaning the goal of educating large swaths of the population to make effective use of a data portal’s contents is simply not feasible. Instead, there is a need to lower that threshold for engagement so that a greater proportion of the population is able to utilize modern data.
A potential solution to the high level of skills required for the use of modern digital data is the infomediary, or an entity that translates raw data into value for a particular audience. This value might arise from a new product or service, or even just a more interpretable form of the underlying information. Infomediaries have become increasingly common in recent years. For example, projects such as PolicyMap, 9 Data USA, 10 and the Racial Dot Map 11 have downloaded and reorganized census data into maps, infographics, reports, and other products that are more immediately useful. Many city data portals have also incorporated interactive maps, including Detroit’s tax parcel map, 12 CrimeMapping.com’s multicity map of crime events, 13 Boston’s map of open 311 requests, 14 or, possibly the most impressive in its comprehensiveness, Chicago’s Open Grid, which allows visitors to map any public city data set with geographical references. 15 These projects have been a good first step, making information more accessible to those with limited data skills, but they have their weaknesses. First, they tend to isolate a single topic or data source, making them unidimensional. More importantly, second, they rarely transform the data in ways that facilitate interpretations that go beyond a summary of records. As noted repeatedly in this book and elsewhere, a list of records or events without context can be misleading. The third weakness is that they are often one-way; what the infomediary offers may be based largely or entirely on its own internal priorities. As a result, it is not always clear how often or to how many people such portals are useful.
Some have taken a more interactive, community-based approach to the role of infomediary. One such technique builds on Kevin Lynch’s seminal efforts to have urbanites draw the map of the city as they perceive it. 16 This has grown into the broader subfield of participatory GIS, with an expansive set of methods for capturing community perceptions of local geography and dynamics graphically. 17 Some have also worked directly with the community to identify the implications of data for them and to represent them in their terms. For example, Rahul Bhargava and his team at the MIT Media Lab have led workshops and built tools for “data therapy,” techniques that enable those who are not data scientists to engage meaningfully with data. 18 One particularly interesting product of this work has been a series of “data murals,” artistic representations of data and their meaning to a community. In contrast to the portal-based infomediaries, these sorts of projects are better aligned with localized needs and interests. The downside of their specificity, however, is the inability to create a generalizable product.
Each of the two approaches to infomediaries described here—one general and institution driven, the other localized in its motivation and impact—holds value, but there is a middle ground that is largely empty. How does an institution lower the barriers of access to the content of modern digital data in a general way and empower local communities to take advantage of them for their own purposes? Such a model has been hard to come by. One example that comes close is a project from Georgia Tech, which worked with the Westside Community Alliance in Atlanta, Georgia, to build a map that describes public safety in the neighborhood. 19 They leveraged not only official records but also interviews with the public on the perceived distribution of crime. Though the project was limited to a single neighborhood, the sophisticated technical infrastructure and use of official records offers a potentially scalable model. In contrast, the Boston Area Research Initiative’s (BARI) Boston Data Portal (BDP) moves in the other direction, attempting to bring the contents of a generalized portal closer to community members. It publishes its data through the BDP, which has two components: the Data Library, which houses modified, research-ready versions of data from various sources, including administrative records provided by the city of Boston, and BostonMap, where users can explore data visually in conjunction with other tools, including Google StreetView.
The organization and tools of the BDP offer public access to modern digital data in a way that accommodates multiple levels of data literacy. The curated record-level data sets are research-ready resources for data scientists. The ecometric data sets, which distill more complex record-level data into neighborhood-level measures (see Chapter 2 ), are more immediately useful to those who conduct traditional urban research, and the interactive maps are an important resource for those who are not skilled in data analysis and visualization but would still benefit from knowing about the events, conditions, and dynamics of their community. Our hope was that this structure would not only facilitate collaboration on research and policy but also support community utilization of the information contained in modern digital data. With support from the Herman and Frieda L. Miller Foundation, we have sought to realize this promise through a series of community-based training and conversations about the utility of the BDP and its contents. Much of the work to organize and construct this project was conducted at first by Chelsea Farrell, a PhD student in the School of Criminology and Criminal Justice at Northeastern University, and has since been led by Samantha Levy, BARI’s program coordinator. As the subsequent sections describe, we have experienced some success but, more than anything, have learned a lot about the challenges that communities and community organizations face in translating data into public value.
Training Community Organizations in the Boston Data Portal
BARI formally began offering community-based training in the BDP in January 2016. It had previously held annual sessions on the BDP for audiences comprising a mix of faculty and students, public officials, and community leaders, but the new program featured a curriculum that catered to the needs and skill levels of community organizations. The sessions now also allotted time for an open discussion of the public value these tools and content might hold and how we might make them even more useful. The original plan was to have the training follow a two-stage model. First, we would host a training for representatives from a variety of community organizations, whom we would recruit in collaboration with partners who maintained such connections, such as Microsoft’s Office for Technology and Civic Engagement and NU Crossing, a unit at Northeastern University that provides programming for the local community. Our vision was that those community organizations that found the BDP particularly useful would help us to establish a direct link to the public by cohosting similar training for their constituents with us.
Our plan turned out to be somewhat naive. Although the community organizations themselves were often enthusiastic about using the BDP in their work, they did not see their constituencies as having the same motivation. Indeed, data could be useful for those trying to better understand, serve, and advocate for a community but would be of less interest to community members themselves. In retrospect, this seemed perfectly obvious. The BDP translates available data into something interpretable, but even then they are still data, and data are traditionally the province of “nerds.” Nerdiness has shed much of its stigma and has even become trendy in recent years, but, simply put, not everyone is motivated to analyze data. Put in terms of the digital divide literature, open data creates universal material access, and the BDP lowers the skill level required to utilize it, but the necessary attitudes among the public are often lacking.
The realization that everyday Bostonians have limited interest in utilizing the BDP was not so much a setback to the community-based training but a signal that we would need to reconfigure it. It suggested that our focus should be to empower community organizations to act as a different kind of infomediary: rather than creating new data products, they were uniquely positioned to translate the contents of the BDP into public value, provided they were given the resources and skills necessary to do so. My colleague Michael Johnson from URBAN.Boston has studied the ways that data might support and advance the work of community organizations and has identified three areas of activity. 20 First, an organization might more effectively pursue funding if it has more detailed information about the need it is trying to satisfy or the problem it is trying to solve. Second, accountability is critical to any community organization and can be greatly facilitated by data. Data might be leveraged to assess internal performance, as many public agencies and private corporations already do, or to rigorously evaluate the external impact of programs. As the story of Promise Neighborhoods at the beginning of this chapter illustrated, funders are increasingly requiring such sophistication of community organizations. Third, data can strengthen advocacy efforts by providing clear evidence of need when approaching public officials. Organizations can also use data to advocate locally, communicating to community members how they might understand or grapple with their challenges. Of course, the specific data needs of each community organization will depend on its nature and mission. For example, a community development corporation might focus more on new investment in a neighborhood, whereas a service provider will be concerned with resident outcomes. That said, with the right data sources, any organization would likely be able to benefit in each of these three areas.
Recognizing the potential value of data for community organizations, Johnson conducted in-depth interviews with a set of representatives from such organizations in greater Boston to discover how they do or do not utilize data in their work. 21 His overarching finding was that the organizations were aware of what they needed in terms of information but lacked the capacity to realize those goals. It was apparent that limited material resources within an institution translated into a lack of skills. The average community organization has a staff of only five people, and this modest staff must successfully raise funds and implement, evaluate, and advocate for programming before there is even a need for data. 22 With this in mind, it is unclear where such activity would fit in the budget. Indeed, the average community organization spends only 2 percent of its budget on IT infrastructure, and only 36 percent of organizations include it in their budget at all. 23 As a result, Johnson found that internal expertise was so limited that, even if the requisite resources were available, community organizations did not know what steps they would need to follow to effectively leverage data.
Community organizations represent a crucial but largely ignored player in the civic data ecosystem of a city. They are uniquely positioned to translate information into public value but are severely limited in their ability to do so. Thus, the contribution that organizations like BARI can make is to provide resources and training that lower the skill level necessary to leverage data. This has become increasingly important as data have increased in size and complexity and therefore require considerable processing before they are informative. It is with this mission that we run our community-based training sessions. In order to construct a curriculum that matches this goal, we have conducted a survey of participating community organizations. Beyond informing our pedagogical goals, the responses also provide an additional window into the attitudes that community organizations have toward data, the areas in which they might build their capacity to utilize data, and how they view the societal shift toward “big data.”
Survey of Community Organization Representatives
In advance of each community-based training in the BDP in 2016, we invited attendees to complete an online survey regarding their organization’s current attitudes and usage of data, including their definition of “big data.” The intent was twofold: to enable us to better target the curriculum to the needs and interests of our audience and to learn in a more general sense how such organizations interact with data. The response rate was moderate, with 10 of 21 participating organizations responding to the survey. To be clear, this is by no means a random sample. The organizations attending the training were already sufficiently inclined toward the use of data to choose to attend our training, and those who completed the survey were even further motivated or capable at some level. It might be safe to assume, then, that those represented in the sample are on average more data savvy than the population of community organizations as a whole. Nonetheless, their responses have the potential to reveal certain patterns and dynamics that contribute to or hinder the effective usage of data by such groups.
The majority of respondents to the survey were place-based in that their mission was to serve the residents of a specific geographic area (e.g., a local community development corporation, or CDC). There were also a handful of groups that advocated and intervened on behalf of certain vulnerable populations, such as welfare recipients, or focused on particular societal challenges, such as access to nutrition. One outlying response was from a school of public health at a local university, which has hundreds of employees and a clear understanding of data. For our purposes here, this response does not qualify as a community organization but does offer insights on how community organizations partnering with this university might operate. The nine other community organizations varied broadly in their size, from an all-volunteer staff with no formal employees to 88 employees. The median was four full-time employees.
What follows is a descriptive analysis of a series of Yes-No questions and quotations from the corresponding open-ended responses. In it, I maintain some of the grammatical and syntactical errors made in context. One that will be observed most often is the treatment of “data” as singular, which goes against the convention within this book to treat “data” as plural.
Survey Responses
Of the nine community organizations, seven reported consciously collecting data on their own services, predominantly on the characteristics of service recipients. Though this generally involved information that would be entered during enrollment, making it a community organization’s version of administrative data, three organizations also indicated that they conducted surveys of program users. Data sets that required more effort and sophistication, however, were less commonly used. Five of the nine organizations reported using census data to augment their own data collection; two reported using some form of data visualization technology, such as ESRI’s online GIS tools or Tableau; and only one reported using any of the local portals that publish data, including the city of Boston’s Open Data Portal or CityScore platform, the Metropolitan Area Planning Council’s MetroBoston DataCommon, the Massachusetts Budget and Policy Center’s Budget Browser, or BARI’s Boston Data Portal.
Though these organizations used data from a limited number of sources, they tended to do so purposefully. Six indicated that they used data for funding purposes, and five used data for evaluating effectiveness. Far fewer (three of nine) used data for advocacy. Most stated that they used data primarily to justify the value of programs to foundations and government agencies. In fact, it appeared that many of the respondents conflated the use of data for “funding” and “evaluation” into a single, largely formulaic response to the requirements of funders. One organization did stand out as being more proactive, however, saying that it used data to determine priority areas for which it would then pursue funding. The same organization also generated internal quarterly reports on program activities that were the basis for discussions on how to continue to improve services. This outlier organization, which was rather well staffed, utilized data in a manner similar to that of the municipalities that had seen greater success with Commonwealth Connect in Chapter 7 .
In general, the organizations had very little idea of what “big data” was. A few explicitly said they were “unsure” when asked to define it, and others wrote things that had little bearing on the subject. Some of the responses did, however, capture one or more aspects of either the sources, content, or implications of novel digital data. One organization, which had a computer scientist on its staff, had a sophisticated definition of “structured, semi-structured, [or] unstructured data that has the potential to be mined for information.” Another described big data as “millions of data points, but I’m not sure what it is. I hear Google and Facebook are collecting big data about their billions of users and can predict what we are interested in and who our ‘friends’ are or will be.” This equation of big data to internet technology was echoed in other responses. One organization did seem to grasp the value of the composite nature of measurement I described in Chapter 1 by noting the value of aggregating record data to achieve fuller insights.
Although the organizations did not fully understand big data, they still believed it to have potential for their own work. Some were optimistic (“I’m sure I could think of applications when I know what it is”), while others were a bit more cautious (“If everyone else is using it, we at least need to know what it is”). All but one of the organizations stated that big data could be more useful than they currently are, the one being an organization that does not currently utilize data in any form. The general consensus was that the value offered by these new informational resources simply was not within reach. One of the more advanced organizations astutely noted that using big data would “[require] a different skill set” than more traditional data resources. The representative from the local school of public health echoed this sentiment, expressing “suspicion that MOST … will NOT have the in-house resources or ability (and often not even the understanding to frame the questions).” Another organization went further, identifying the additional hurdle that even when the resources are available “gaining full buy-in from staff continues to be a challenge.” This issue may have been obscured by the fact that most organizations do not yet have such conversations in much depth. A proposed solution was that “foundations should get more educated in this field and be open to fund data capacity at the operational level.”
Summary: The Limited Capacity for Using Data in Community Organizations
Unsurprisingly, the survey found that community organizations do make use of data, but in a manner limited in both content and range of application. Nonetheless, there were some noteworthy lessons. First, respondents to the survey by and large were not clear on the definition of “big data.” This stands to reason in retrospect, but it means that the hurdles to utilization are greater than anticipated. Whereas Johnson noted that community organizations lacked a basic understanding of the skills necessary to make use of data, it turns out that they do not even know what the resources themselves are. 24 Notably, not a single respondent to the survey mentioned administrative data, nor did any indicate using the city of Boston’s Open Data portal. Those that referred to social media and internet data did so only in the broadest strokes and with no sense of how they might be informative. Second, there seems to be an initial hint that the same institutional hurdles we saw at the municipal level in Chapter 7 may be lurking in community organizations. Many organizations currently see data and analysis as an unwanted or indifferent requirement of pursuing funding. At the moment, most are still scrambling to gain the capacity to comply with these requirements, but those few that are already there are starting to see resistance to the investments and reorientation necessary for the proactive use of data.
Research Centers and Community Organizations: Complementary Infomediaries
This chapter began with the question of how to place modern digital data in the hands of the broader public. Per some of the early arguments for Open Data, this would empower everyday people to leverage these new resources in ways that address local needs and interests. Though logical enough, it appears that this direct approach, in which the general public accesses data and takes action on its own accord, may not be realistic. The initial problem is that the vast majority of people are not data scientists of any sort and therefore are unequipped to grapple with the complexity of modern digital data resources. This would call for raising data literacy across society, a daunting but not entirely impossible task. However, there is a second, possibly more trenchant issue: very few members of the general public actually want to play with data. This situation may change as data become increasingly visible in society and efforts to introduce data literacy during primary and secondary education take hold, but right now few people are motivated to use public data resources.
Instead, the promise that open data holds for public value rests with infomediaries. Though the term is often used as a general umbrella for institutions that translate data into products that hold public value, I propose a specific solution that requires two types of infomediaries operating in sequence: research-oriented institutions that translate raw data into more accessible forms, and organizations that can identify uses of these resources that reveal and attend to the needs and interests of local communities. This model is distinct from earlier theorizing on infomediaries. First, it outlines a complementarity between two types of infomediaries and argues that coordination between them is necessary to fully unlock the potential of modern digital data. Second, it moves beyond an almost singular emphasis on infomediaries that repackage and publish information for public use. Being that this on its own is not sufficient for empowering the general public, it is also necessary to acknowledge as infomediaries those organizations that work with communities to identify and implement ways to pursue public value through the use of data.
Data to Public Value: A Two-Layer Pipeline
Converting modern digital data directly into public value requires two steps: specifying the needs and interests of the public and identifying, accessing, and analyzing the data that will attend to those needs and interests. The institutions best positioned to do the first are nonprofit community-based organizations, but, as others have shown before and the survey here only served to reinforce, they are unequipped to complete the second part. The average community organization typically has a very small staff, almost none of whom are dedicated to data analysis. When it does utilize data, it tends to be in a formulaic way intended to attract or satisfy funders. Furthermore, our survey suggests that such organizations have very little understanding of the rapidly growing set of data resources that are available and that they still rely almost entirely on a mixture of internal data and census indicators. Few respondents were able to define “big data” and its value, and none referenced the city of Boston’s administrative data as being useful. There is also the concern that, without the complete set of skills necessary, they could end up generating conclusions that are not entirely robust and instead conveniently resonate with their goals or impressions.
Even if the limitations in skills were surmounted, there may be additional roadblocks on the horizon for the use of data within community organizations. Just as we saw with municipalities in Chapter 7 , the transition to greater use of technology within an institution is not an automatic process. Bureaucracies will vary in their receptivity, with some welcoming the opportunity and others being more intransigent, creating barriers to implementation. This can be reinforced by a negative perception of data-based work as a burden imposed by funders. Most community organizations are motivated by the potential to serve their constituency and to effect social change. They may then see the increased demands of data analysis and reporting as siphoning off time and resources from the organization’s “true” mission. Others may simply be resistant to a move toward unfamiliar processes or standards for planning and evaluating programming. In any case, the challenge to incorporating data more strongly into the daily work of community organizations is not a unidimensional problem.
The context here is analogous to that of the Promise Neighborhoods story that began this chapter. Community organizations are aware of a societal push toward data and of the dangers of being left on the wrong side of the emerging institutional digital divide, but the most efficient solution is not to bring data science skills in-house. Instead, they will need to rely on research centers and consortia that can make such data more accessible to them and their level of data literacy. The overall process of empowering communities through data depends heavily on each of these two types of infomediaries: without the research center, the community organization would be ill prepared to leverage modern digital data, and without the community organization, the research center would lack the knowledge necessary to identify and address local concerns.
As is often the case, this two-stage model for infomediaries is far from the clean and tidy solution it might appear to be. It is particularly complicated by the fact that research centers have to provide two different forms of expertise that often sit in different corners of academia. The first is to release data in forms that are interpretable and that can be incorporated into the community organization’s work, which is the purview of data scientists, who typically have very little experience talking to community groups. The second is to explore with the organization how such resources might be of value, to educate them in the tools that might be brought to bear on those goals, and to jointly execute the resultant research project. This latter approach is essentially the stated mission of community-based participatory research (CBPR), which is more often based in public health and social scientific disciplines. More pertinent to the point here, CBPR, because of its focus on the interests and needs of localized communities, predominantly works with smaller, more targeted data sets; in many cases, the work is qualitative rather than quantitative.
There are many research centers skilled in data science or in community engagement, but very few are skilled in both. Nevertheless, this combination is necessary if academia is going to partner effectively with community organizations in the localized use of modern digital data. As I have argued repeatedly, ecometrics can be seen as new-age indicators that distill the complex content of modern digital data into accessible information that can be as important as census variables. That said, we saw through the survey that even these rather simple data forms—which amount to a series of interpretable variables for a few dozen recognizable “neighborhoods”—are more than most community organizations can handle on their own. They need thought partners who can help them think through the possible value of these data and offer analytic support. To put a finer point on it, they deserve this. Throughout this book, I have discussed partnerships between researchers and policymakers where the former offer data science skills so that the two can learn novel things about the city together. There is no reason why community organizations should not expect the same level of partnership.
I can foresee two ways that projects might bridge the divide between data science and community engagement on the academic side in order to empower community organizations to use modern digital data effectively. The first and simplest is the combination of public data and training. This is what BARI has done with the Boston Data Portal. This past year, we have expanded the program to have a graduate student as an on-call data consultant available to community organizations. Another success in this vein comes from Michael Gurstein, who recounts a case study in which the people of Zanesville, Ohio, used data skills learned in training by the UCLA Center for Health Policy Research to demonstrate the public health risks of a proposed truck stop. 25 The evidence they presented was sufficiently compelling to halt the construction. In this approach, academia aims to provide the community organizations with as many skills as possible, thereby empowering them to pursue projects on their own.
A second model is more comprehensive and might be thought of as “big data–driven CBPR.” Alex Taylor and his colleagues, for example, conducted a year-long, in-depth data project on a single road in Cambridge, United Kingdom, with an approach they called “data-in-place.” 26 Through conversations with the community, they identified traffic as a major issue of interest, proceeded to harvest and access sensor data describing traffic volume and air quality, collected new data on the attitudes of residents regarding local traffic, and discussed the results with the community. Returning to Boston, Sandeep Jani, a graduate student under Michael Johnson at the University of Massachusetts Boston (whose work on community-based organizations was summarized earlier in this chapter), collaborated with groups of local businesses that had received “Main Streets grants” to revitalize the commercial areas of residential Boston neighborhoods. 27 Recipients of these grants must report program outcomes, including indicators of economic vitality, a requirement they largely resent. In an effort to reset this negative relationship, Jani has worked with them to identify the kinds of things they would actually want to know and how they might access them through data. What is notable about Johnson and Jani’s work is that they offered BARI’s ecometrics as a new resource but also found it necessary to consider customized indicators that did not yet exist. These sorts of projects are still few and far between but highlight the value that community organizations can derive from modern digital data when they are invited to the table as equals.
For either of the two models I have described to be realized more consistently, especially the second, there will need to be institutional changes within academia. There are limited incentives for data scientists to participate in highly localized studies, and most proponents of CBPR have limited training in cutting-edge quantitative techniques. These divisions will have to be ameliorated. I will close by noting, however, that a new institution is being piloted that will likely become important in this space. As luck would have it, it also hails from Boston, supported by the Boston Civic Media Consortium, a group based at Emerson College that I have mentioned previously. They are establishing a community-based institutional review board (IRB). Just as the IRB at a university evaluates the merits and potential risks of a given research study, this body would examine the public value that a research project might provide to the community being studied. It also is intended to act as an arbitrator between the community and the researcher, enabling them to construct a project that is viewed as mutually beneficial. Whereas the previous examples describe the process and goals associated with individual projects, the community IRB is distinctive in that it offers a new institution for negotiating these agreements.
Conclusion: A More Inclusive Civic Data Ecosystem
The earliest proposed solution to the digital divide was for everyone to have access to modern digital resources. Each individual and institution would then make use of them according to their needs and interests. As we have seen, this vision is not only implausible but also only serves to reveal disparities in the skills people have and the choices they make when they do have access. In Chapters 7 and 8 , we have seen how the two digital divides of urban informatics clearly embody these lessons. Even when a program like Commonwealth Connect makes a technological policy innovation universally accessible, internal obstacles posed by local bureaucracies can lead to failures in implementation. When it comes to the public and the community organizations that represent it, the level and penetration of data literacy are sufficiently low to suggest that universal utilization is an unreasonable expectation at this time. In each case, the role of institutions is critical: in the former, the realities of institutional change within government agencies are front and center; in the latter, we must consider how multiple institutions with distinct areas of expertise can combine to create public value.
The emphasis on institutional roles brings us back to one of the themes of urban informatics from Chapter 1 : the civic data ecosystem, or the network of data sharing and collaboration. If we evaluate a given region’s civic data ecosystem on its successes in furthering the understanding of the city and in improving the programs and services that manage that city, then community organizations, which provide many of those programs and services, need to be as active and influential as city agencies. This goal depends on a well-crafted taxonomy of institutions, each with their own role in generating public value from data. In such a context, community organizations would not need to hire data scientists but would more simply need to be able to capitalize on the resources and potential partners within this institutional landscape. My proposal of a two-layer infomediary pipeline that combines the skills of research centers and community organizations is far from comprehensive, but it is a start. The exact form of these institutions will undoubtedly evolve in the coming years. As effective models emerge, they will in turn create a more robust civic data ecosystem that might eventually realize the vision of a city-university-community model of urban informatics.