Beth Simone Noveck
INTRODUCTION: THE COLLABORATIVE POLITICAL ECONOMY OF OPEN DATA
For fifty years, the Freedom of Information Act (FOIA)1 has been the legal bedrock of the public’s right to know about the workings of the U.S. government. At the same time, FOIA’s delays in responses and redactions frustrate information seekers while the volume of requests, in particular commercially and politically motivated requests, bedevil government agencies.2 With more than 700,000 FOIA requests filed each year and a lack of funding to process them, the federal government faces the costs of a mounting backlog.3 Arguably “flawed beyond repair,”4 as David Pozen writes, FOIA may foster litigation without better government to show as a result.
In recent years, however, an entirely different approach to government transparency in line with the era of big data has emerged: open government data. Open government data—generally shortened to open data—has the potential to complement and overcome some of FOIA’s worst flaws.
Open data has several definitions but is generally understood to be publicly available government information that can be universally and readily accessed, used, and redistributed free of charge in digital and machine-readable form.5 Open data policy is a direct response to what technology makes possible. Data that are digitized and machine readable can be ingested and processed by analytics or visualization software, enabling in turn the application of new computer-aided statistical methods, often referred to as data science. When we can search, sort, compare, aggregate, visualize, and track a vast storehouse of public (and private) data sets, we can generate insights that help us understand more about ourselves, our communities, and our environment.
Open data represents a major governing innovation in the twenty-first century. When data are legally and technically accessible, those with know-how, whether they own the data or not, can create sophisticated and useful tools, models, and analyses across data sets to enable empirically based problem solving and advance social justice. For example, the city of Chicago is using its own data to offer the public an information-rich tool called Open Grid for exploring services and activities in one’s neighborhood, while inviting private developers to collaborate on improving that tool.
Open data is not limited to statistics but also includes the text of the Federal Register, the daily newspaper of the U.S. government, which was released as open data in bulk form in 2010 and quickly redesigned by three independent software developers working in a café. The National Archives adopted their work, and now the Register is no longer a hard-to-read PDF (and the butt of legalese jokes) but a searchable and graphical online magazine.6
These examples show how, in enabling the co-creation by public institutions and private participants of solutions to problems, the open data policy framework could profoundly shift the relationship between citizen and state around questions of transparency from adversarial to collaborative. Because in an open data regime government must proactively publish its information with the intent that people use it, the normative essence of open data is participation rather than litigation. By catalyzing civic engagement—both scrutiny of data by the public and collaboration with the public in building new analytical tools and websites—open data is grounded in a very different conception of transparency than traditional freedom of information laws. Rather than focusing on prying secrets out of a distrusted government, open data emphasizes empirical decision making and practical problem solving, embodying a primarily utilitarian rather than a deontological theory of transparency. For example, when the start-up Panjiva.com uses open government data to help businesses find overseas suppliers and enable global trade, this reflects a new normative view of the goals of transparency—open data in service of innovation and entrepreneurship rather than accountability per se.7
Open data and FOIA bear many similarities, but open data places innovation at the center of addressing public challenges and engaging with citizens. Whereas FOIA ideally promotes reasoned and deliberative discourse about what government did, open data anticipates what public institutions and citizens could do together to create value of different kinds, especially to advance evidence-based policy making. Because open data emphasizes collaboration as the form of participation,8 it may point the way to preserving democratic values in administrative governance.
However, open data is not a panacea for all social challenges, and not all government information is or should be shared as open data. Furthermore, and most critically, open data relies on the willingness of the data owner to publish the data. Open data policies generally do not have the “teeth” to compel disclosure when the information holders are reluctant to do so. In such situations, freedom of information laws can help citizens seek information that has not yet been made publicly available in the appropriate form.
As we shall explore, each approach offers crucial benefits that can bolster the legitimacy of public institutions. To realize robust transparency, open data’s collaborative tactics will need to be blended with FOIA’s adversarial right of enforcement, at least for the foreseeable future. The goal of this chapter is to chart a path toward a twenty-first-century transparency regime that takes advantage of the strengths of both FOIA and the open data model.
To that end, I begin by expanding upon the ways open data differs from FOIA. Second, I track the evolution of the open data movement and examine the hallmarks of open data policy and legislation. Third, I look at the challenges and weaknesses of each regime for advancing its respective aims. The conclusion offers additional recommendations for how to blend the best of both approaches to promote evidence-based and more effective governing.
HOW FOIA AND OPEN DATA DIFFER: TIMING, INFORMATION TYPE, AND AUDIENCE
Compared to FOIA, open data differs in three ways: open data changes the timing for disclosure, focuses on different types of information, and addresses a broader audience.
First, open data shifts the default time of disclosure. FOIA institutionalizes ex post disclosure pursuant to a specific demand by an individual requester. Open data thrives on ex ante, proactive publication of whole classes of information publicly and online, often in a centralized repository such as data.gov. Under the Obama administration’s 2009 Open Government Directive, federal agencies are required to identify “high-value information” not yet available and establish a time line for publication of these data sets.
As this directive indicates, such high-value information might encompass any information that “can be used to increase agency accountability and responsiveness; improve public knowledge of the agency and its operations; further the core mission of the agency; [or] create economic opportunity.”9 Hence, for example, under the Texas open data law, which imitates the federal regime, the state’s attorney general has proactively made key criminal justice data sets available for download, such as the lists of parole absconders, attorney general opinions, child support evaders, and custodial death reports, which may then be analyzed for purposes of empirical criminal justice reform.10
Second, open data emphasizes different classes of information than FOIA does. Like FOIA, open data laws and policies include data created by the government about the workings of government—what Cass Sunstein refers to in chapter 9 of this volume as “input transparency”—but they go further than FOIA in providing greater coverage of data collected by the government about the economy, environment, and society.11 Of course, FOIA requests, too, could and often do involve the latter sort of information. But FOI laws (and their many exceptions) focus on laying bare information about the way government works.
Transparency in the open data context goes well beyond the deliberations and decisions of government, the schedules of parliamentarians and ministers, the spending of treasuries, and the like. “High-value data” includes data that public institutions collect in their role as regulators (for example, workplace safety and injury records, airplane flight on-time logs, and doctors’ prescriptions), as well as information gathered in their capacities as scientific research organizations (such as weather data and information about the human genome). Although a term like “high-value” data suffers from inherent ambiguity (high value to whom?), the Open Government Directive’s definition has ushered in a movement toward more proactive disclosure of more kinds of information across all levels of government.
Third, open data assumes a broader audience than FOIA. FOIA was written, to a significant extent, with journalists in mind. Yet corporations—knowing what to look for, knowing where to look, and having the resources to navigate the complex process of filing requests—quickly became primary users of the act.12 In a further departure from its journalistic origins, many current FOIA requesters attempt to hobble the machinery of the administrative state through adversarial requests of various kinds.13
But open data anticipates, and thus far has attracted, a diverse and less consistently corporate audience. Unlike responses to FOIA requests, open data is directed to a wider public and is published for all—not just an individual requester—to reuse. Beneficiaries of this public character include computer programmers and data scientists with the skills to draw insight from the data; academic users seeking information as the basis for original research, especially empirical social science about policy making; and commercial users looking to create new products and services. This public direction and benefit of open data can be seen in practice. For example, the New York City Mayor’s Office is using municipal open data to stimulate entrepreneurship. Through its Business Atlas, the city provides small enterprises with the business intelligence they need to know where to open their restaurant or shop.14
Precisely because realizing the value from open data depends on collaboration with those willing to add value to it, the open data ecosystem is populated by actors with different incentives from those of corporate FOIA users. In some cases, for-profit companies are working side by side with nonprofits to use the data as a core asset to create data-driven products and services. One example of this is BrightScope, which worked with previously “locked up” Department of Labor Form 5500 retirement plan data to offer better decision-making tools to investors. As the founders describe it:
While BrightScope started with DOL data, as we have grown we have gathered data and information from a variety of public sources, including the Securities and Exchange Commission (SEC), the Census Bureau, and the Financial Industry Regulatory Authority (FINRA). Through the process of identifying high-value datasets and integrating them into our databases, we have encountered all different types of public disclosure.15
In another departure from FOIA, one of the biggest users of open data has been government itself, including officials wishing to make use of their own data to improve how they deliver services and make policy. For example, the Centers for Medicare and Medicaid Services (CMS) uses its own billing and payment data to improve service delivery and reduce costs.16 In addition, Chicago’s city government used its data on restaurant inspections to create an algorithm to predict food-safety violations. This project increased the effectiveness of its inspections by 25 percent.17 By giving government actors access to more and better data, and especially by giving state and local government access to the same data that federal officials have, the open data movement allows comparisons across jurisdictions and unlocks new, more innovative regulatory approaches. When the federal government ceases to have a monopoly on the data, it calls into question who is in the best and most informed position to regulate and opens opportunities for decentralized regulation.
A BRIEF HISTORY OF THE U.S. OPEN GOVERNMENT DATA MOVEMENT
On his first day in office in 2009, and fulfilling a campaign promise made in 2007, President Obama signed a Memorandum on Transparency and Open Government, declaring that “information maintained by the Federal Government is a national asset” and calling for the use of “new technologies to put information about [agency] operations and decisions online and [to make it] readily available to the public.”18
When data.gov launched in May 2009, it made forty-seven data sets searchable, turning the principles of the memorandum into initial practice by creating a tangible and central place for agencies to list and the public to find government data.19 Later the same year, as already noted, the Office of Management and Budget (OMB) directed federal agencies to release more than just data about the workings of government but also “high-value” information.20 This instruction effectively broadened FOIA’s understanding of and goals for government transparency, responding to what the technologies of big data and the technologies of collaboration make possible today.
The Obama White House open data policy was part of a broader set of open government mandates. These mandates called for agencies to inventory the information they have collected, and to move—although with no definitive deadlines for completion—toward the proactive publication of certain classes of information in their entirety, such as air and water quality measures, safety records, and visitor logs.21
In 2013, the federal government recommitted to its open data policy by issuing an Executive Order on “Making Open and Machine Readable the New Default for Government Information” to advance and accelerate open data implementation in federal agencies. The order reiterated the utilitarian and instrumentalist underpinnings of the earlier policies by stating explicitly that “openness in government strengthens our democracy, promotes the delivery of efficient and effective services, and contributes to economic growth.” The order cites as examples the government’s release of both weather data and geo-locational data, which enabled weather apps and GPS devices, respectively. Entrepreneurship and innovation—rather than accountability—are emphasized: “As one vital benefit of open government, making information resources easy to find, accessible, and usable can fuel entrepreneurship, innovation, and scientific discovery that improves Americans’ lives and contributes significantly to job creation.”22
Further laws have followed, broadening the scope of data covered under open data statutes and policies. The Digital Accountability and Transparency Act, signed into law in 2014, calls for publishing all federal government spending data as open data in standardized formats by 2017.23 In late 2016, the Senate unanimously passed the Open, Public, Electronic, and Necessary Government Data Act, or the OPEN Government Data Act, which calls for inventorying and publishing all government information as open data.24 The Congressional Budget Office scored the cost of the legislation as “negligible,”25 and in March 2017 supporters reintroduced the bill, which passed the House in November 2017 as part of the Foundations for Evidence-Based Policymaking Act.26
The reintroduction of this law speaks to the persistent popularity of open data as a new tool of policy making. In addition to a supply-side push, increasing demand for data to support efficient, evidence-based practices in government has spurred the popularity of open data. The authors of Moneyball for Government describe this trend as follows:
Building evidence about the practices, policies and programs that will achieve the most effective and efficient results so that policymakers can make better decisions; investing limited taxpayer dollars in practices, policies and programs that use data, evidence and evaluation to demonstrate they work; and directing funds away from practices, policies, and programs that consistently fail to achieve measurable outcomes.27
In the United States, this agenda appeals to both the right and center-left politically;28 presumably, the former sees open data as a pathway to smaller, more efficient, and less wasteful government and the latter uses open data as a tool to pursue more evidence-based social programs. The bipartisan interest in data-driven approaches to governing has been fueling demand for more access to administrative information (both personally identifiable information about individuals and anonymized, population-level open data), including the data that agencies collect about companies, workplaces, the environment, and the world beyond government.29
In parallel with the adoption of open data policy in the United States, seventy countries have signed onto the Open Government Partnership Declaration since 2011. The declaration, which copies the U.S. framework, calls for governments to commit to “pro-actively provide high-value information, including raw data, in a timely manner, in formats that the public can easily locate, understand and use, and in formats that facilitate reuse.”30 Fifteen countries have adopted the International Open Data Charter, which goes further by calling for making government data open in digital formats by default and for investing in the creation of a culture of openness.31 In parallel, over 436 partners from national governments and from nongovernmental, international, and private sector organizations have agreed to a joint Statement of Purpose on using open data to solve long-standing problems and to benefit farmers and the health of consumers.32
OPEN DATA AND FOIA: COMPETITORS OR PARTNERS?
The explosion of open data, coupled with the development of technologies to disseminate and understand it, is cause for optimism. That said, open data cannot be the sole tool for catalyzing government information-sharing. Although substantial, the current inventory of open data sets still represents only a small fraction of important government data. For instance, researchers at the World Wide Web Foundation recently found that across the globe, less than 10 percent of the data in key government data sets is fully open.33
Of course, open data’s limitations extend beyond the slow pace of implementation (at least relative to the amount of data the government possesses). Gaps exist in both the open data regime and the FOIA regime. In the next sections, I turn to a discussion of some of open data’s most pressing shortcomings before proposing practical steps to blend both regimes.
POLITICAL COMMITMENT
First, open data success depends on political commitment to transparency and collaboration. Governments of all stripes refuse to disclose data even when they should. There is a looming risk that governments will only post what is expedient and uncontroversial and seek recognition for their proactive disclosure—a practice increasingly referred to as “open-washing.”34
Especially as presidential administrations change, there is a risk that, for example, an administration that has publicly declared itself to be hostile to the census, the long-form American Community Survey, and climate change will fail to collect and publish important data on these subjects.35 These practices will be subject to the vagaries of politics. President Donald Trump has already revised, among other things, the Obama administration policy of disclosing who visits the White House. In the run-up to his inauguration, many groups raced to back up open data, such as environmental data, lest it be taken down.36
A RIGHT OF ACTION
It is important to dispel any techno-utopian strain in an open data narrative suggesting that, given enough data, all problems are solvable.37 Without a legal right of action or other robust enforcement mechanisms, there will never be enough data in the right formats. Although newly proposed federal open data legislation would compel the inventorying and publication in machine-readable format of all government data (the legislation is silent on any exceptions), the legislation speaks to how data should be disclosed without adequate assurances that the data will in fact be disclosed.38
FOIA’s legal right of action to sue (or threaten to sue) for information disclosure when records are withheld remains essential for ensuring access to data of public import until other mechanisms are put in place to mandate open data disclosures. For example, in 2013, transparency activist Carl Malamud had to use FOIA to request nine nonprofit tax returns from the Internal Revenue Service (IRS) because the agency would not make the returns available in digital form. Although disclosure of nonprofit returns is required by law and the filers submitted those returns electronically, the IRS wanted to send Malamud image files of the returns. The IRS typically took electronically filed returns, printed them out, scanned them back in, and sold DVDs with the image files.39 Malamud sued for the digital originals and won.40 The significance of the decision is that the IRS now makes all electronically filed nonprofit tax returns digitally downloadable as open data.41 Despite considerable pressure, the IRS chose not to invest in publishing the nonprofit tax returns as downloadable data until Malamud filed suit and a judge compelled the agency. When collaboration fails, litigation can sometimes create the impetus needed to overcome resistance and support reformers inside and outside of government.
PRINCIPLES FOR OPEN DATA DESIGN
Open data, at least in theory, requires inventorying all data and creates an opportunity for reasoned debate between the public and the agency about what part of that corpus to publish, with what frequency, and in what formats. But because open data is often the creature of executive action, not legislation, open data is not always a systematic process with a clear definition of high-value information.
Indeed, there often seems to be no rhyme or reason behind what is defined as “high-value” information and therefore what gets published.42 The lack of a clear or sensible publication process can frustrate and hinder the uptake of open data. Agencies post a lot of data on data.gov that no one especially wants, knows exists, or uses.43
As civil servants learn about the benefits of open data, a sense of priorities may emerge. Eventually, disclosure prioritization might be mandated through legislation. However, this hope is not certain to become reality.
What sorts of discourses and politics will be produced by a transparency regime that supports proactive disclosure but has no lodestar? If open data is oriented toward publication of easily understood and quantifiable information that technical people can turn into consumer tools (such as transit data that becomes a “when is my bus coming” app), will other kinds of substantive information be neglected in favor of easy disclosures that lead to headline-grabbing consumer tools such as the College Scorecard? What kind of information ecology results if complex information that forms the basis of government decision making, such as budget models, is neglected in favor of data that feeds consumer apps?
Drawing on the lessons of FOIA, which emphasizes disclosure of data by government about government, open data policy, too, should evolve to articulate normative guidance to agencies about what should be published online and when and how to make use of it. Although there is a certain appealing optimism to the early organic and ad hoc evolution of open data practices, there is a need to evolve beyond the unsystematic apps-over-substance nature of open data policy to focus on disclosures and their uses that lead to positive social change, advance democratic values, and lead to measurable progress, not in terms of the number of data sets released but in terms of the downstream impact on people’s lives.
Because open data enables the publication of ever-larger data sets that can then be analyzed using algorithms, it lends itself to projects that benefit from comparisons at scale—such as macro-analyses of the efficiency, effectiveness, or disparate impact of how policies and services are delivered—rather than to insights derived from a smoking gun hidden in a single “FOIA’d” document. For example, Argentina, Lithuania, and Slovakia have launched judicial open data projects that publish average caseload and court budgets to improve equity and reduce corruption in their court systems.44 In another example of open data being used to mitigate inequitable distribution, Transparency International and the Web Foundation have established an ongoing effort to help civil society and governments use open data to identify and fight corruption, especially in procurement.45 Opening the entire corpus of data about food-borne illnesses, to take one final example, provides a supply of information to match the demand for better algorithms that helps Chicago allocate its restaurant inspection and enforcement resources more efficiently.46
Moving toward a more principled approach to open data also demands focusing on outcomes rather than on inputs. In the first generation, we celebrated the act of publishing data sets—transparency for its own sake—regardless of who (if anyone) used them and to what end. To strengthen the normative underpinnings of open data, however, it is important to start with a clear definition of the problem to be solved, be it corruption or human rights abuses or agricultural productivity, and use open data as the means to the end rather than as an end unto itself. Therefore, efforts to publicize the calendars of cabinet secretaries or the salaries of government officials may be weak candidates for open data efforts because, absent reason to suspect serious malfeasance, such disclosures will not drive changes in how government operates and may indeed sap the political will for furthering open data. Open data priorities should retain their broader orientation toward high-value problem-solving.
In addition to publishing clean, comprehensive, and timely data, an open data regime should invest in and prioritize coalition-building among those interested in using the data to tackle a well-defined problem. Open data encourages efficient outcomes in part because the proactive disclosure is often, although not always, accompanied by a plan for how to use the data and how to cultivate citizen engagement among those interested in applying the data for social good. These areas of alignment in which government and civil society or industry are prepared to collaborate to address a challenge are, obviously, excellent, although not exclusive, opportunities to realize value from open data.
HARMONIZING OPEN DATA AND FOIA: FIVE RECOMMENDATIONS
FOIA and open data both emphasize disclosure to the public of information created or collected by the government, but the normative underpinnings and the mechanics differ dramatically. As we have seen, open data is rooted in a theory about government legitimacy stemming from outcome-oriented effectiveness, in contrast to FOIA’s focus on honoring the public’s “right to know” what the government does. Open data substitutes a utilitarian rationale (evidence-based decision making) for transparency in place of a justification based on moral obligation.
To avoid the pitfalls of FOIA’s mechanics, which can hobble some of the functioning of government, while taking advantage of the legal right of redress that FOIA affords to get at government secrets and, at the same time, to take advantage of open data’s collaborative and participatory dynamics, the two approaches must self-consciously be blended. There are already developments toward harmonization, but more could and should be done.
First, the federal government should establish a single website for information requests, whether pursuant to FOIA or open data policy, and for posting information in response to those requests. All agencies should be required to participate in and use the new portal. The process should take advantage of the existing technology and infrastructure of data.gov, housed by the General Services Administration, and the transition process managed by the Office of Government Information Services (OGIS)—the so-called FOIA Ombudsperson—housed at the National Archives and Records Administration.
The FOIA Improvement Act passed in 2016 calls for setting up a single electronic portal for FOIA requests, and although it is silent as to where the responses to those requests will be posted, it would make sense to use data.gov (with a pointer from FOIA.gov) as a one-stop shop for requesting and searching for information.47 Data.gov is already set up to act as a clearinghouse to point to data housed across the federal government, and it has an underlying information architecture that makes it a good, easy-to-remember place from which to make data searchable. There is also a well-established process for making agency information searchable via data.gov, although it is not yet a comprehensive or robust search engine for government data. A single, unified database would make it possible for more people to find and use more information quickly. Managing that transition from agency FOIA websites to using data.gov, however, requires knowledge of the FOIA process and its personnel. Hence, OGIS should play a key role in stewarding the changeover, working closely with FOIA officers and the Department of Justice’s Office of Information Policy, which oversees FOIA policy within the executive branch.
Second, all information requested pursuant to FOIA or open data policy should be published in machine-readable formats.48 Although the FOIA Improvement Act calls for releasing information electronically, it is otherwise silent as to the format. The Office of Information Policy, in collaboration with OGIS and OMB, should issue guidance defining the electronic format required under the law as the same format as open data—namely, machine-readable formats—and should post pointers back to the data published online on data.gov. By shifting to a release-to-one, release-to-all strategy and posting information online (perhaps with a short delay for information obtained through FOIA to maintain media incentives), requesters can search for desired information prior to filing a new request. This should cut down on FOIA requests and processing times for noncontentious information and enable greater innovation by the public, such as data analysis and visualization, using the published information. Also, releasing to all, instead of just to one, may help to cut down on the politically or economically motivated nuisance requests.
Third, bringing FOIA into the era of big data and achieving the goal of a unified database for requests and publications will be accelerated, as a practical matter, by increased dialogue between the FOIA Officers’ Council, the Chief Information Officers (CIO) Council, and chief data officers.49 The FOIA officials, under the auspices of the Department of Justice, manage the FOIA process together with the agency general counsels, whereas the CIO Council, often in collaboration with the chief technology officer or chief innovation officer or chief data scientist in an agency, has responsibility for posting open data. These communities need to collaborate and agree on data publication standards and more efficient workflow, as well as on strategies for ensuring that data publication helps to achieve the agency’s core mission.
More broadly, there is a need for a more comprehensive perspective on how the government creates and uses information that cuts across the disciplinary boundaries of law, technology, and policy as well as across agency silos. CIOs and FOIA officials currently convene separately as government-wide, interagency communities. They need to talk and work more closely with one another. Chief information officers, chief data officers, chief innovation officers, and others in charge of open data should be meeting to ask and answer how the FOIA process could embrace the collaborative nature of open data practices.
By the same token, open data managers need to understand the audiences for FOIA and their demands in order to develop more responsive approaches. The need for more internal government dialogue is mirrored by the need for more conversation between the external transparency and accountability interest groups (for example, Program on Government Oversight, Cause for Action, National Freedom of Information Coalition), which have traditionally stewarded and watched over the implementation of FOIA and other “sunshine” laws, and the open data groups (for example, Omidyar Network, Center for Open Data Enterprise, Data Coalition, Open Data Institute), which often have a more technological bent and tend to focus on engendering collaboration rather than litigation. Developing a unified legal and policy framework for information collection and publication will benefit lawyers, technologists, data scientists, and policy makers as well as the public.
Fourth, federal agencies should use their own data to improve the FOIA process by using performance analytics. Whereas certain inefficiencies, such as the possibility of partially or wholly overlapping requests, are baked into the design of the FOIA process, accidental inefficiencies might be improved through scrutiny of the data. For example, comparing processing times across agencies reveals significant disparities. The FOIA Officers’ Council, the Department of Justice, and the White House Counsel’s Office should endeavor to implement performance improvement strategies using the reported data to reduce these variations and help agencies develop more efficient approaches.
In this world of big data, enormous quantities of information help to generate value and make it possible to search and sort and compare within and across organizations and over time. Therefore, fifth, governments should work toward creating and storing data digitally in a searchable cloud. The notion of either ferreting out a single document or posting data sets will then become increasingly outdated.
With government data in the cloud, it will be possible to run searches broadly about and across government.50 One could, for example, anonymously search all nonprivileged memoranda to see the topics decision makers discuss over time. Furthermore, when it becomes possible to search all contracts or grant-making data across agencies and across levels of government, it should be possible to obtain a much more accurate picture of what government does and measure its impact, applying algorithms to identify effective practices or spot patterns of fraud, waste, and corruption.
Government information policy has a long way to go to realize this vision. If one wanted to conduct research today on government military spending by monitoring contract solicitations from the Department of Defense, for example, one could do so by reading solicitations and contracts published openly online. But downloading and working with that data is difficult. Analyzing these solicitations and contracts to spot trends and patterns in government spending could require going repeatedly to FBO.gov (a website where federal agencies post procurement opportunities), running a computer program to draw down a massive data file, and then comparing it to yesterday’s or last year’s entries.51 To do an even more detailed analysis—such as how eventual federal outlays map onto the timing of original contract solicitations—could involve an additional layer of merging records across unconnected databases, such as joining unstructured FBO.gov data with the Federal Procurement Data System.
In such a case, we would say that the data is technically “open.” However, as a practical matter, answering the question of “who the government is contracting with, when, and to what degree” is very time-consuming and requires considerable computing time, technological expertise, and data storage available only to a select few at universities or companies with the time, talent, and curiosity to do the analysis. Even those inside government, potentially with the greatest need for such analysis, may not have the necessary resources to run such painstaking manual queries. Our public institutions too often lack the infrastructure in legal, technical, and human capacity to support evidence-based policy making.
Even when available, then, supposedly open data is often not open enough to be usable. Open data, as it stands today, is only an interim step on the pathway from a paper-based FOIA world to a future in which comprehensive public information is produced and stored digitally in real-time in formats that enable agile and empirical social science. Helping open data realize its potential for collaboration requires a better approach: namely, storing clean and searchable government data in a publicly accessible cloud.
Until we get there, we need to keep both FOIA and open data in our arsenal of transparency tools, relying principally on FOIA to protect against secrecy and corruption and on open data to promote the co-creation by the government and the governed of solutions to public problems. Open data has the side effect of also strengthening active forms of citizenship and engagement.
CONCLUSION: FROM OPEN DATA TO COLLABORATIVE DEMOCRACY
The explosion of newly available data coupled with mounting evidence that data catalyzes productive, problem-solving partnerships between government and civil society suggests that open data, as a tool of governing, will continue to grow. If the trend continues, open data will lead to new empirically informed ways of holding government and others accountable, spurring consumer choice and expanding the range of approaches to tackling societal challenges. In principle, open data promotes broad-scale transparency; simplifies the disclosure process; requires publication in reusable and computable formats; focuses on disclosure of information collected by government as regulator and researcher, not exclusively on data created by government about its own workings; and, above all, gets both more “eyeballs” and machines looking at the data to spot problems, identify patterns, devise solutions, and act.
As promising as open data may be, FOIA indispensably complements it by providing a legal right of action to compel disclosure, by suggesting the kinds of data to prioritize releasing, and by disclosing who is using what data and how. Open data policies still depend on the political will to publish information. They will be strengthened when, like FOIA, they can compel disclosure, especially in reusable formats.
But the most significant impact of open data in the long run may stem less from the immediate problem-solving benefits than from the way open data fosters more active citizenship and more responsive democratic institutions. It transforms transparency policy from a means to monitor government after the fact to a mechanism for getting the public to participate actively in improving societal outcomes. By eschewing the adversarial in favor of a collaborative approach to transparency, open data reflects a radically different transparency narrative and, ultimately, a different theory of democracy whereby citizen-participants collaborate in designing and building solutions to important problems together with public institutions. This collaborative model enables governments to draw directly on the collective expertise of the population in developing creative regulatory approaches and affords the public new opportunities to participate in our democracy.
NOTES
The author is grateful to David Pozen and Michael Schudson of Columbia University for organizing an excellent conference on FOIA’s fiftieth anniversary at which many of the ideas in this chapter were presented.
1. 5 U.S.C. § 552 (2012).
2. David E. Pozen, “Freedom of Information Beyond the Freedom of Information Act,” University of Pennsylvania Law Review 165 (2017): 1097–1158, at 1111–31.
4. Pozen, “Freedom of Information Beyond the Freedom of Information Act,” at 1136.
7. For more on the collaborative theory of participatory democracy, see Beth Simone Noveck, Wiki Government: How Technology Can Make Government Better, Democracy Stronger, and Citizens More Powerful (Washington, D.C.: Brookings Institution, 2010), chap. 2.
11. In the United States alone, forty-eight cities, ten states, and the federal government had enacted open data legislation or policies by 2017. See “A Bird’s Eye View of Open Data Policies,” Sunlight Foundation, 2017, https://sunlightfoundation.com/policy/opendatamap.
12. Pozen, “Freedom of Information Beyond the Freedom of Information Act,” 1103.
13. Pozen, “Freedom of Information Beyond the Freedom of Information Act,” 1104.
17. Beth Simone Noveck, “Five Hacks for Digital Democracy,” Nature 544 (2017): 287–89, at 287, 288, doi:10.1038/544287a.
19. “Data.gov is primarily a federal open government data site…. Data.gov does not host data directly, but rather aggregates metadata about open data resources in one centralized location.” Data.gov, “About,” accessed May 26, 2017, www.data.gov/about.
20. See “Memorandum from Peter R. Orszag,” 7–8 for definition of high-value data sets.
21. See “Memorandum from Peter R. Orszag,” 3, 7–8 for a listing of several presidential open government initiatives.
22. Exec. Order No. 13,642, 3 C.F.R. § 13,642 (2013). For further reading on this topic, see Andrew Young, Christina Rogawski, and Stefaan Verhulst, “United States GPS System: Creating a Global Public Utility,” GovLab & Omidyar Network (2016), http://odimpact.org/static/files/case-studies-gps.pdf.
23. Digital Accountability and Transparency (DATA) Act of 2014, Pub. L. No. 113-101, 128 Stat. 1146.
24. Open Government Data Act, S. 2852, 114th Cong. (2016).
26. H.R. 4174, tit. II, 115th Cong. (2017).
28. The OPEN Government Data Act was, in both the House and the Senate, jointly reintroduced by a Democrat and a Republican. See Congressional Record 163 (March 29, 2017), H2557; Congressional Record 163 (March 29, 2017), S2099.
29. The Evidence-Based Policymaking Commission Act of 2016, Pub. L. No. 114-140, 130 Stat. 317, created a commission to study the use of open government data to conduct program evaluation and was introduced by Republican Paul Ryan in the House and Democrat Patty Murray in the Senate.
35. FiveThirtyEight, “Politics Podcast: Data Under Trump,” January 2, 2017. See also Edward Wong, “Trump Has Called Climate Change a Chinese Hoax. Beijing Says It Is Anything But,” New York Times, November 18, 2016.
36. See Brady Dennis, “Scientists Are Frantically Copying U.S. Climate Data, Fearing It Might Vanish Under Trump,” Washington Post, December 13, 2016.
37. See Evgeny Morozov, “Open and Closed,” New York Times Sunday Review, March 16, 2013.
38. See, for example, Open Government Data Act, S. 2852, 114th Cong. (2016).
40. Public.Resource.org v. U.S. Internal Revenue Serv., 78 F. Supp. 3d 1262 (N.D. Cal. 2015), appeal dismissed June 24, 2015, holding that the IRS must produce digital, not paper-based, copies of electronically filed nonprofit tax returns.
42. The Open Government Directive does provide a broad definition of high-value data sets. See “Memorandum from Peter R. Orszag,” 7–8. However, this has not necessarily provided sufficient direction. See discussion and criticisms of the looseness of high-value data definitions in Section I.B.
43. Even when data are, in theory, posted to data.gov, there is often a lack of investment in making the data truly accessible and usable. Users frequently find just a link to a website describing data but no actual way to download that data.
47. FOIA Improvement Act of 2016, Pub. L. No. 114-185, 130 Stat. 538.
49. The Chief FOIA Officers Council was created by the FOIA Improvement Act of 2016. See 5 U.S.C. § 552(k)(1). The CIO Council was created by Executive Order 13,011 on Federal Information Technology and later codified by the E-Government Act of 2002. The new role of chief data officer has been created at various federal government agencies but has not yet been incorporated into an interagency body.
50. Of course, moving government information to the cloud will require similar attention to that paid today to publishing open data to ensure that personally identifiable information and classified information are not inadvertently disclosed.
51. For an example utilizing such data, see Michael Z. Gill, “The Economic Benefits of Conflict? Estimating Defense Firm Responses to Major Events in U.S. Foreign Policy” (working paper), 2017, http://www.michaelzgill.com/research.