4 Ideas and Technology

Data sharing policy cannot be understood by understanding the actors and their motivations alone. The specific type of data under consideration, and its security, normative, and economic attributes have important effects on the incentives and actions of these actors. The technology for collecting, producing, analyzing, and distributing data also plays a role in the possibilities and preferences for data sharing, and technological change can lead to important changes in security, normative, and economic data attributes.

Data Context: Security and Privacy

Even among advocates of open data, it is generally agreed that access to data should be restricted when there are legitimate security or privacy concerns.1 The challenge, of course, is determining what constitutes a legitimate concern. Many individuals recognize both instrumental arguments for security and privacy—ensuring data isn't used to cause harm—as well as an intrinsic right to privacy, and many nations have laws in place to protect sensitive data. In some cases these laws limit data access to government agency officials for use only in the specific cases for which the data was collected.2 However, as mentioned above, lack of clarity in the interpretation of these laws has the potential to lead to the release of sensitive information or, conversely, to the placement of restrictions on data that could be made available.3

Privacy

Although the government collects a great deal of personal information, this data has generally not been the focus of efforts to increase the availability of government data. This has led some in the open data community to argue that privacy is a “nonissue.”4 Indeed, much of the data held by the government, such as transit schedules, expenditure information, maps, and environmental data, does not contain, and never did contain, personal data, so the release of these datasets generally does not pose a privacy risk. The statement cannot be absolute, because nonpersonal information can sometimes be used to learn about an individual. For example, data about average house prices by zip code could be matched with an individual's address to estimate the value of their home.5

Still, most privacy issues relate to data that is, or once was, directly associated with an individual. Raw personal data, data that includes personally identifiable information such as names or social security numbers, is rarely considered appropriate for release. Exceptions are sometimes made in the case of public officials, where transparency may be viewed as more important than privacy. More often, data must be anonymized—all personally identifiable information has to be removed—before there can be any consideration of its release.6 Access to anonymized personal information has the potential to provide significant benefits: data on travel into and around the United States can be used by the hospitality industry, standardized test scores could allow parents to better understand school performance, and sharing healthcare data could improve medical research.7

However, release of personal data, even after it has been anonymized, is controversial due to the risk of deanonymization, in which individuals can be reidentified, intentionally or unintentionally, through cross-linking of databases.8 Borgesius et al. (2015) argue that irreversible anonymization may be impossible, so even anonymized datasets should not be released openly. One option some nations have used to address this is data licenses that expressly forbid any attempts to reidentify individuals in the dataset. Borgesius et al. suggest “restricted disclosure” or “managed access” as other potential compromises for data access. For example, anonymized medical data may be released to an academic research team for use in an approved research proposal, perhaps even requiring that researchers access the data using secure government systems. Another option would be to create systems that allow third parties to query the dataset, but which only return statistical data, rather than individual information. The appropriateness of access and/or reuse restrictions, or of the release of the data of any kind, must be determined on a case-by-case basis. Borgesius et al. suggest considering the goal that is being pursued by releasing the data and whether there is another way to achieve that goal. If there is not another option for achieving the goal, the risks of releasing data should be assessed, and decision-makers should determine how probable these cases are and how great the harm would be if they occurred.

Security

Just as confidential data is restricted, data that has the potential to pose a national security risk is generally not considered eligible for sharing outside the government. This includes classified information, as well as data that is unclassified, but sensitive. At one extreme, some might argue for restriction of any data that could be used for nefarious purposes. However, almost no data would be released under this rubric. For example, one could argue that access to public transit schedules should be restricted, because terrorists could use them to plan and coordinate attacks. While true, clearly this would be an overreaction. Transit schedules provide a great deal of benefit to the general public, and terrorists would have other ways of gathering this information if the timetables were no longer published.9 Florini (2004) argues that officials must consider not only the costs of openness, but also the costs of secrecy.10

Focusing specifically on geospatial information systems, Salkin argues that a lack of access to environmental and public health data will harm public safety, rather than help it. Like Borgesius et al., he argues that release of these datasets should be considered on a case-by-case basis, pointing to guidelines developed in a report by the RAND Corporation as well as the US Federal Geographic Data Committee (FGDC) to direct evaluation. These guidelines focus on usefulness, uniqueness, and costs and benefits. They suggest that data holders examine (1) whether the data are useful for selecting potential targets or planning an attack on a target; (2) whether the data is unique: not easily observable and not available from alternative sources; and (3) whether the security costs of sharing the data outweigh the benefits to the public of doing so.11

With respect to geospatial information, the RAND report found that almost none of the publicly accessible datasets they examined met the first two criteria, as alternative information sources exist from nongovernment entities.12 The FGDC guidelines closely match those proposed by RAND, adding that even if the data does need to be safeguarded, additional steps should be taken to develop policies that limit access in intelligent ways. In particular, the FGDC suggests examining options for making changes to the data—deletions, aggregations, or other adjustments—that would increase the safety of release. Limited restrictions—on users eligible for access or on redistribution—should also be considered.13

In some cases, access to data may significantly enhance national security or public safety. Timely information about severe weather developments, for example, can help to save lives and property. Good data about public infrastructure could be critical for first responders or individuals in preparing for, or responding to, a terrorist attack. Sharing information about the environment may allow scientists to improve the understanding of global challenges, allowing governments and individuals to better mitigate or adapt to changes. Sharing crime data may help individuals to make more informed decisions about where to live, work, or travel. Analysis of public health could lead to insights that save lives. As always, the benefits of data sharing must be considered with respect to the risks of doing so.

Data Context: Normative

Normative arguments involve identifying moral or ethical responsibilities related to data sharing. Rather than looking at which policies can be implemented, a normative framework asks which policies should be implemented. This framework helps to explicitly identify the values that underlie the goals these policies should aim to achieve, including those implicit in the economic and security sections in this chapter. Debates surround both the appropriateness of data sharing processes as well as the importance of the potential outcomes of data access and use. Those interested in normative issues are concerned with equity and the distributional effects of data sharing policies. The applications that are enabled or deterred via data sharing have the potential to impact individual's lives in important ways, and some argue that particular types of data uses create an imperative for greater data sharing or greater data protection.

National Security, Public Safety, and Privacy

It is a normative judgment that the government ought to consider national security, public safety, and privacy in the development of data sharing policies. There is widespread agreement that protection of national security and public safety is one of the key responsibilities of government and that citizens have a right to privacy. As discussed earlier, pursuing these goals may mean choosing to restrict access to data that could threaten national security or public safety if it were to fall into the wrong hands, or ensuring that data is only released if it has been reliably anonymized to protect privacy. Sharing information about environmental or other challenges may enhance national security and public safety, advancing scientific understanding of major global challenges or facilitating the development new safety applications.

Distributional Effects and Equity: Repository or Public Trust

In a classic economic view, the only concern is the maximization of net social benefit, regardless of who actually captures those benefits. In reality, many are concerned with equity and the distributional effects of data sharing policies. Do the benefits accrue to the producer of the data or the consumer? Do the government and general public shoulder the costs of data collection and sharing, or does the private sector share this burden? Are existing companies and noncommercial data users favored over those who have not yet discovered or used the data?

Distributional effects of data sharing policies most commonly manifest in debates about whether it is more appropriate for the government to operate as a repository for data, maintaining data and making it freely available, or as a public trust, restricting access to data and charging for its use. Those who argue for government as a data repository emphasize that citizens have already paid for the data once through taxation and shouldn't be asked to pay again to access the data themselves.14 Advocates argue that when fees are imposed, it privileges companies that have the ability to pay. Researchers and others that could develop applications with broad social benefit will have more difficulty accessing the data. When a cost is imposed, access is especially challenging for the poorest individuals, including those in developing countries.15 However, if the argument is that citizens should have free access because they pay taxes, then this argument doesn't actually apply to data sharing outside the boundaries of the district, state, or nation that collected it.16 The argument would, however, apply to commercial users, as corporations do pay taxes.

The traditional argument in favor of government acting as a public trust is based on the idea that valuable public resources should be protected by the government, generally by restricting access, to ensure they're available for future generations. However, this logic does not apply well to data—any amount of data use today will not decrease the amount of data available in the future. If anything, it will have the opposite effect, with data use generating additional knowledge and applications that benefit future generations. In the case of data sharing, the concept of a public trust is sometimes interpreted to mean that the government should protect this valuable public asset by not allowing individuals or companies to derive private benefit, in the form of revenues, at the expense of the general taxpayer.17 Commercial users—corporations and entrepreneurs—are most likely to have the time, expertise, and incentive to analyze large datasets. These users reap disproportionate personal benefits from data, while data collection, maintenance, and distribution are financed by taxpayers. By privileging these groups, open data policies would increase inequality and reinforce the digital divide.18 Charging commercial users a fee would help to offset the costs of the data collection and sharing system to the public. Onsrud argues that the public trust model is more appropriate when the primary uses of the data are commercial, while the repository model is better when the data is useful for scientific research, transparency, or other applications that would primarily benefit the general public.19

Proponents of both models are concerned with the advantages of users with time and expertise, particularly commercial users, over the general public. Supporters of a public trust model and fee-based system argue that this returns at least some of the private gains to the public. Supporters of a repository and free data provision argue that fees would only exacerbate existing commercial advantages. Martin suggests that education and technology could be used to offset these inequalities. Agencies can advertise the availability and value of their datasets to the general public, or adopt visualization software that lowers barriers to use posed by a lack of statistical or analytical skills.20

In general, it is important to understand that any data sharing policy will have winners and losers. A decision to implement a free and open data sharing policy will benefit the value-added commercial sector, which will incorporate the free data into its products and business plans. Data sales will benefit companies that have developed applications that are sufficiently profitable to operate given data costs. Public-private partnerships, including government data purchases, benefit data collection companies that receive contract guarantees and revenues. A policy that provides exclusive use of data to scientists who helped develop a data collection system (i.e., principal investigators), benefits scientists with the resources, reputation, and expertise to apply for and win large grants. Even if these types of distributional issues were not always considered in the initial development of a policy, they often come to the fore when a change in policy is being contemplated. In these cases, existing users have a strong incentive to protect the status quo.21 Hence, this issue will generally enter agency decision-making via inputs from interest groups, rather than as a normative consideration on its own. Just as there is no economic motivation to privilege one group or another, neither existing nor potential users generally have a stronger normative claim to disproportional benefit from a data sharing policy.

Increasing Government Transparency, Accountability, and the Right to Know

One of the most common arguments in favor of greater access to public information, and open data policies in particular, is the important role data access can play in increasing government transparency. Increased transparency is sometimes seen as an economic benefit, due to its instrumental role in increasing economic growth—discussed in more detail in the economic section below. Normative arguments instead focus on the intrinsic value of transparency—citizens’ right to know what their government is doing or to have access to the information on which government officials are basing decisions.22 Calls for transparency sometimes also reference its instrumental role in increasing government accountability, which is seen as an intrinsic good. Some effects of transparency, such as the enabling of “armchair auditors” to identify government fraud or abuse, have both economic and normative implications.23

Yu and Robinson argue that the term “open government” was originally used to refer to the release of politically sensitive government information. Its origins can be traced back to the freedom of information movement in the 1950s. Now, the term “open government data” is used to describe not only data directly relevant to government transparency and accountability, but more broadly to any government data shared freely over the Internet.24 However, some of the broadening of this definition may be appropriate. Even data that are released with the primary goal of improving government services, rather than providing public accountability, provide insight into what the government is doing. The release of environmental data, which may seem politically innocuous, can provide citizens with an insight into the information on which policy-makers are basing important decisions affecting public health and safety. A number of scholars argue that insight into government decision-making is necessary in a democracy, helping to affirm the legitimacy of the government and maintain the trust of its citizens.25

Some argue that openness can undermine trust, as citizens uncover fraud or other abuse.26 However, this is likely a short-term effect, as in the long run the increased ability to detect fraud would decrease its actual occurrence.27 When information is shared with the public, the government is more accountable for its decisions.28 Others counter that while this is true in theory, empirical evidence for an increase in transparency related to open data is limited. There is a concern among some public officials that trust can be harmed, because data is disproportionately used in negative stories and in “gotcha” journalism.29 Others point again to the advantage of wealthier individuals and corporations, in terms of time and expertise, in accessing and using government data. These individuals and organizations can use the data to track government actions and decision-making and more effectively lobby for their own preferred policy solution, once again furthering inequality.30 Like Martin, above, Halonen argues that even if open data does not empower the average citizen now, this may change as education increases and technology lowers the threshold for using public data.31

Efficiency as a Normative Goal

Before moving on to the next section, it's important to acknowledge that the pursuit of economic efficiency–maximizing net social benefit, implicit in the economic section below, is a normative goal. As noted in the next section, the decision to calculate these costs and benefits on a global, rather than a national or agency level, has normative implications, as well. Economic efficiency is only one of a number of goals that individuals argue the government should be attempting to achieve, and the relative importance of achieving economic efficiency, compared to security or transparency, for example, is debatable. Furthermore, due to the challenges of directly measuring utility, net social benefit is almost always measured in dollars, giving preference to a narrow focus on maximization of benefits that are easily monetized, rather than on effects that help drive economic growth, but whose exact effect is difficult to measure.

Data Context: Economic

Despite these limitations, economic arguments are at the heart of many debates about data sharing policy development and changes. From an economic perspective, the goal is to determine which data sharing policy will maximize net social benefits: total benefit to society minus total costs. Yet actually carrying out this calculation, and even determining which components should rightfully be included, can be quite complex. Thus, this section goes into significant detail on this important topic, discussing the economic attributes of data, the theoretical effects of data sharing policies, and practical limitations faced in implementation. Although the economic efficiency of a policy depends significantly on the specifics of the type of data and its potential uses, this section also develops a framework to determine the relative economic efficiency of open data sharing given the economic context of the data.

Economic Attributes of Data

Much of the debate regarding the appropriate choice of data sharing policy is related to whether data is most efficiently treated as a public good and shared openly or as a commodity to be sold. To understand this debate, it is important to first understand the economic attributes of data itself. Within economics, goods are typically classified based on two dimensions: excludability and rivalry. The first questions whether users can be excluded from accessing or benefiting from a good, and the second examines whether one person's use of a good reduces the amount of that good available for others.32

A hamburger is a classic example of a pure private good, a good that is both excludable and rivalrous: it is easy to limit someone's access to the hamburger, and when one person eats it, it is no longer available for anyone else. The traditional example of a pure public good is national defense: it is not possible to exclude individuals from the benefits of national defense, and the fact that one person is protected by national defense does not limit the amount of protection available for another citizen.

Data and information as a whole don't fit neatly into these categories. For example, data can sometimes be considered rival. Privileged access to data can give the holder an advantage that is lost once others gain access to that information. This is one reason that firms sometimes choose to keep key technological information secret rather than to pursue a patent. It also explains why scientists who help design a new instrument sometimes receive a period of exclusive use of the data, allowing them the first chance to publish new findings. After corporate secrets are released or initial scientific findings are published, the data is technically still available for use, but its strategic value has decreased. For the most part, however, data can be thought of as nonrival: the fact that one person uses the data does not reduce the amount available for others. Once the data has been collected or produced, it can be used over and over by many users with no additional production costs. In economic terms, the marginal cost of providing the data is zero.

The extent to which data and information are excludable is also debatable. In some cases, once information is made publicly available, it is difficult or impossible to exclude individuals from accessing and using it. This is particularly true for simple facts or single data points, which, once made known, cannot be made private again. However, for more extensive datasets or collections of information, exclusion is possible through the use of legal restrictions in the form of licenses governing access, use, and redistribution of the data. Individuals can also be excluded from accessing or using information if that information is kept in secure systems and not made available outside the originating organization.

Although there are some exceptions, for the most part data can be thought of as nonrival and excludable. These two attributes together suggest that information is best thought of as an impure public good. The fact that data is nonrival—the marginal cost of the data is zero—means that net social benefits are maximized when every user who places a positive value on the data can access and use it. The more a dataset is used, the more knowledge is created or applications are developed, and the more benefit is generated overall. Maximizing use will maximize total social benefits.33

One way to ensure that everyone who places a positive value on the data can access it is to make the data freely available. Since a commercial entity could not make a profit by giving its products away for free, this would require the government to collect and distribute the data. Because data is excludable, it is also possible to sell data. If a company were to sell data to every user at exactly the amount they were willing to pay, a concept called perfect price discrimination, this would also result in a situation in which everyone placing a positive value on the data would be able to access it. Therefore, in theory, it would be equally efficient to treat data as a public good and provide it for free or treat it as a commodity and sell it using perfect price discrimination. The only difference would be whether the benefits accrue primarily to the data users—the case in free government distribution—or to the data producer—in the case of commercial data sales. Questions of efficiency come down to the extent to which each of these two methods can be implemented in practice. A review of these practical issues below shows that both policies have drawbacks in practical implementation, but the relatively larger challenges in approximating perfect price discrimination mean that free and open data policies will almost always result in greater social benefit than data sales.

Free Data Provision in Practice

While in theory the marginal cost of providing data to an additional person is zero, in reality the cost of maintaining and sharing data can be significant.34 Because of this, some agencies that claim to have a free data policy do charge a fee for access. While the data itself is free, and users are not responsible for the costs associated with data collection, the user pays the marginal cost of data provision or the cost of fulfilling a user request.

In determining the marginal cost of data provision, some agencies factor in salaries of the personnel needed to process, copy, and distribute data, or the costs of the data storage and processing systems. Initial processing of raw data into datasets that are error free and properly documented with metadata can require hours of expert work. Organizations also require advanced technology for data storage, which requires maintenance and updates over time. Enabling easy access to the data requires development of a user-friendly data portal. If agencies pass on the costs of these efforts to users, the fee for data access can become quite high.

However, in most cases, the inclusion of personnel, infrastructure, and maintenance costs in marginal cost fees is inappropriate. Typically, initial processing, development of metadata, and data storage are required even if government officials only expect to share the data within the agency or maintain the data for their own future use. In these cases, making the data available to additional external users does not require significant investments in personnel, infrastructure, or maintenance.

More easily justifiable are marginal cost fees that include the cost of storage media (paper, DVDs, CDs, or hard drives, for example) and shipping. In 1962, Kenneth Arrow, an expert on the economics of information, noted that the cost of transmitting data is frequently very low.35 In some cases, governments have found that imposing marginal cost pricing results in a net cost to the government due to the need to develop and maintain a system for collecting and processing fees.36 Further, the increasing capabilities and decreasing costs of information technology, particularly the growth of the Internet, have resulted in marginal costs that truly do approach zero. Data can be provided online without any media or shipping costs. Further, if agencies do not restrict redistribution of the data, it can be retransmitted by others at no cost to the agency: truly zero marginal cost, at least from the perspective of the agency.

Perfect Price Discrimination in Practice

Perfect price discrimination is not practical in reality, because it is nearly impossible to determine each user's true willingness to pay. Attempts to negotiate prices on a case-by-case basis would be time consuming and entail transactions costs that on their own would exceed the resources of some users. Instead, perfect price discrimination is typically approximated by prices that differ depending on some attribute of the user or the data. As noted in the discussion of definitions in chapter 2, data policies sometimes differ based on whether the data will be used for commercial, public, research, or educational uses. Access conditions may differ depending on the timeliness, accuracy, or other attribute of the data. The extent to which these policies approximate the benefits of perfect price discrimination depends on how well the tiers align with various groups’ actual willingness to pay. It is inefficient to the extent that users who place a positive value on the data, cannot, or choose not to, use it because of the existence of fees or restrictions. Additional practical issues suggest that the number of users that fall into this group can be quite large.

Appropriability and externalities 

One group of users negatively impacted by fees and restrictions are those who do not derive personal financial return from their use of the data. For example, a scientist who uses data to improve understanding of climate change, a government analyst who uses data to improve the efficiency of Medicare, or a nonprofit organization that develops a tool to more easily monitor forest fires are all unlikely to generate any direct revenue from their activities. These users may be unable or unwilling to raise the funds needed to purchase the data, and the social benefits that would have been generated by their activities are lost. Moreover, the benefits of these activities extend well beyond the individuals involved in the particular project, resulting in positive externalities that benefit society in general.

A tiered policy that provides data for free for noncommercial uses may help to alleviate this problem, but data use will still be limited by restrictions on access and reuse needed to protect the ability to sell the data to others, and these restrictions can adversely affect data use.37 This may be due to increased transaction costs—the need to read and sign license agreements, for example. Restrictions can also limit the ability to share data that underlie research, making collaboration and replication of studies difficult. In some cases, restrictions on redistribution prevent open sharing of the research or application produced using the data. It's worth reiterating the observation by Overpeck that about half of international environmental modeling groups were restricted from sharing digital climate model data beyond the research community because of intellectual property rights imposed by governments.38 By contrast, empirical evidence suggests that the returns to science of open data policies are considerable; the accelerated pace of discovery in genomics is often credited to action taken by that community to support the rapid and open sharing of data.39

Evaluation challenges 

Another complication is that for some, the value of the data is uncertain, and it is often difficult to evaluate information goods without actually working with them. An entrepreneur developing a new product may not know whether a particular dataset will prove useful in his or her application, or whether the finished product will be successful at all. Such a potential user may not be willing or able to purchase a dataset given this uncertainty. Data sellers may provide limited access to users on a trial basis to help mitigate this issue.

In practice, there are some indications that this negative effect on the value-added sector can be large. A 2000 report by Pira International entitled “Commercial Exploitation of Europe's Public Sector Information” noted that the market for public sector information in Europe, which often charged fees for access to government data, was significantly smaller than the corresponding market in the United States, where data was often provided on an open basis. The report found that cost recovery policies had resulted in a net financial loss for European governments, arguing that revenues generated from licensing fees were smaller than the taxation revenues that would have been generated from the value-added market had the data been given away for free.40

Data Sales, Cost Savings, and Distributional Effects

Given the relatively greater challenges of approximating perfect price discrimination compared to free data provision, and the existence of nonappropriable data uses, externalities, and evaluation challenges, a free and open data policy is much more likely to maximize data use and benefits than one that imposes fees and restrictions.41 Because policies incorporating data sales and restrictions have lower total social benefits compared to free and open data policies, they will only be the more economically efficient option in cases in which they decrease the total social costs associated with data collection and provision. These cost savings can come about in two ways.

First, data sales enable the private sector to get involved in data provision. It is generally accepted that the private sector is more efficient than the government and has incentives to continually lower costs through innovation. If the private sector is able to develop and operate the data collection and distribution systems at a lower cost than the government, then this decrease in system cost would represent a decrease in total social costs, as well. For example, if a satellite system would have cost the government $100 million to build and operate, but a commercial company can do the same job for $90 million, this represents a savings of $10 million for society as a whole.

Second, to the extent that data collection is funded by resources raised through sales to the private sector rather than funds raised through general taxation, it is possible to avoid the deadweight loss associated with taxes, which microeconomic estimates suggest could be up to 30 percent.42 These savings can occur when government agencies purchase data from commercial providers, decreasing the total amount of government funds needed to access data. They can also be secured if government data producers sell their data to private actors and use the funding to offset the cost of activities that would otherwise have been provided through regular budgeting procedures.

When considering this second type of cost savings, it is common for governments and others to conflate true societal cost savings with distributional effects. It is important to remember that only a portion (up to 30 percent) of the decrease in government spending represents true cost savings from a societal perspective. For example, if the government spends $100 million to build its own satellite and raises $10 million in revenue from data sales to commercial entities, this looks like $10 million in savings from the government agency's perspective. However, from a societal perspective, that $10 million cost has simply been transferred to another sector within society—commercial data users. True societal savings in this case would only be $3 million (or less), representing the up to 30 percent savings resulting from efficiencies associated with using funds that were not generated through general taxation.43

This also means that if the government agency in this example were to sell its data to other government agencies, or researchers funded by government grants, there would be no societal cost savings at all: the costs would only have been shifted around within the government and would still have been paid for with funds raised through general taxation. In fact, the need to negotiate the fees and exchanges among agencies would likely have added transaction costs.

If our definition of society includes the global community, this finding holds for data sales to foreign governments, as well: if a government agency sells data to a foreign government agency, costs are simply transferred from one national government to another. They are still dependent on general taxation and subject to the associated deadweight loss. There is no change in total social costs and no true cost savings. This means that data sales to a foreign government are equivalent—from an economic efficiency perspective—to other types of cost sharing arrangements: for example, cofunding the development of a new data collection system.

That is not to say that distributional effects are of no importance. A government agency may have a strong incentive to decrease its own costs, even if those costs are simply shifted to another agency in the same government. Some argue this provides a fairer distribution of costs and a more accurate picture of the distribution of data users. Similarly, a government may see significant benefits in shifting some of the costs of a system to a foreign government. In practice, particularly in the short run, agency budgets are usually relatively fixed. To the extent that cost sharing arrangements allow agencies to engage in the collection of data they would not otherwise be able to procure, both agencies and data users may favor these arrangements, regardless of the lack of net economic impact. However, it is important that decisions about these distributional effects be recognized as normative and practical, not economic, decisions.

Finding a Balance between Open Data and Data Sales

In determining the most economically efficient data sharing policy, a government agency should determine how alternative policy options would affect both the total costs of data collection and provision and the total benefits derived from the data.

To calculate the potential for cost savings, governments need to estimate the cost of developing a government-owned data collection system as well as the estimated revenues from data sales to commercial entities and/or the cost of purchasing equivalent data from the commercial sector. These can be difficult questions to answer, particularly if the government is developing a new type of data collection system for which costs are not well understood, or if a commercial market for data sales does not already exist in the area of interest. However, as discussed below, the potential for cost savings will typically vary depending on the size of the commercial market for data, thus making it possible to make informed data policy decisions based simply on general estimates about the market.

When calculating the total social benefits of a data policy, the value of data sold at market prices can be estimated by total revenues. However, quantifying data that is sold below market prices or given away for free is more difficult. Even if the data is used to produce a commercial value-added product or service, it is often difficult to determine the proportion of resulting revenue that can be attributed to a particular dataset. Applications developed by government agencies or nonprofits are even more difficult to value. Advancing scientific knowledge, improving the quality of public policy, and improving transparency are all common uses of government data that are associated with economic growth, but whose exact benefits are exceedingly difficult to quantify.44

Further, estimating the relative effects of various levels of fees or types of restrictions is particularly difficult. Agencies must ask: When a price is put on the data, are many potential commercial users left out of the market, or very few? How much of a disincentive is data cost to new entrepreneurs uncertain about the potential of their product? To what extent are scientists and others able to raise funds to support research that requires the purchase of data? If the data is made freely available for noncommercial uses, to what extent do restrictions on access and redistribution limit its benefits? Luckily, as in the case of cost savings, it is possible to relate the relative social benefits of an open data policy to a single key attribute: the extent to which the data has broad, as opposed to narrow, noncommercial uses.

Table 4.1 Scenarios for Relative Economic Efficiency of Open Data vs. Data Sales

Narrow Noncommercial Uses
Less benefit to open data
Broad Noncommercial Uses
Greater benefit to open data
Nonviable Commercial Market (government funding required)
Less savings from data sales
I
Open data policy or
Tiered data policy
II
Open data policy
Viable Commercial Market (no government funding required)
Greater savings from data sales
III
Data sales
IV
Open data policy or
Tiered data policy

There is no simple formula to determine the most economically efficient data policy, but it is possible to say where free data provision and data sales each have a relative advantage, and which types of policy designs or public-private interactions may prove most economically efficient in various situations.

The first key attribute is the viability of a commercial market for the data. The size of this market determines the extent to which a commercial data collection entity would be likely to be successful, and the extent to which data sales can generate revenue that would decrease the reliance on general taxation, thus decreasing the total social costs of the program. Importantly, the definition of commercially viable here means that a private entity engaging in data collection and distribution is sustainable without requiring funding from the government in the form of investment or data purchases. Since raw or minimally processed data is an intermediate good, this would mean that there were a sufficient number of value-added companies willing and able to purchase the data for use in value-added products or services to allow the data collection company to recoup the costs of data collection and distribution. It's possible, even likely, that a data collection company would also sell some value-added products or services on its own. If it could generate a profit on the sale of these products and services alone, or in combination with raw data sales, this would also be considered a viable commercial market. Commercial viability is presented as a dichotomous variable, but it can also be thought of as a spectrum. To the extent that there is a large commercial market for a particular type of data, even if it is not fully commercially viable, it may present opportunities for economic efficiency similar to those of fully commercial viable situations.

Much of the benefit of free and open data policies comes from the ability for the data to be used by a broad range of users, particularly those producing products and services that provide broad social benefits rather than private financial benefit. (Open data policies also benefit the commercial value-added sector, but data sales represent less of a barrier for these actors.) The second of the two key attributes thus looks at the breadth of noncommercial uses of the data. Is the data useful primarily for fulfilling one narrow government need or addressing one specific scientific question? Or is the data likely to be broadly useful for a number of government agencies, nonprofits, and researchers? This can be a difficult question to answer, and some open data advocates would argue that all data has the potential for broad uses, even if these aren't obvious when data collection and sharing policies are being planned. However, some distinction between these situations is typically possible, at least in the short term, and the concept is useful as a general guide for decision-making.

The quadrant into which a particular type of data falls has implications for the relative economic efficiency of free and open data policies versus data sales. The quadrant also has important implications for the types of public-private partnerships or interactions that are likely to be most successful and generate the greatest economic benefit overall. Below, each of these four situations is examined and discussed.

Nonviable Commercial Market, Broad Noncommercial Uses: Open Data Policy

In cases in which the commercial market for data is not viable, but there are many noncommercial uses for the data—for research or to conduct government activities, for example—then it will be most efficient for the government to collect the data and provide it for free. This is because the small commercial market means that data sales do not offer significant opportunities for cost savings, and the broad noncommercial uses mean that free data provision will be particularly important for ensuring data use and maximizing data benefit.

This doesn't mean that there are no opportunities for interaction with the private sector. Many governments already contract with commercial firms to build data collection systems that will be owned and operated by the government. This allows governments to share the fixed costs of facilities and labor needed to build these assets with other agencies and entities, domestically and internationally. It also results in commercial competition that can drive innovation and advancements in the technology sector as a whole. Other agencies outsource data collection to the private sector, purchasing the data itself, rather than the data collection system. If the data is purchased with an open license, the government would benefit from commercial efficiencies, but still have the same ability to share the data as it would if the data collection system itself was government owned, and it would thus retain all of the benefits of open data sharing.

Nonviable Commercial Market, Narrow Noncommercial Uses: Open Data Policy or Tiered Data Policy

Some data collection programs are designed to meet a narrow government need, and do not have broad commercial or public applications. In these cases, the advantages of adopting an open data policy are relatively low, and data sales are not likely to result in significant savings. In this type of situation, almost any data policy can be justified. The government could collect the data and make it freely available, hoping to generate benefits from at least some additional uses of the data. There is always the possibility that new uses will be identified that weren't foreseen early in planning. Since the effect of restrictions on data use is likely to be low, the government may choose to sell some of the data itself, offsetting just a small portion of its costs. If the commercial market is large enough (but not fully viable), the government may choose to act as an anchor tenant for a commercial firm, again decreasing costs by taking advantage of commercial efficiencies and reducing the amount of government funding required.

Viable Commercial Market, Broad Noncommercial Uses: Open Data Policy or Tiered Data Policy

It's more difficult to determine the optimal data policy in the case of a viable commercial market and broad noncommercial uses. On the one hand, broad noncommercial uses suggest that an open data policy is particularly important for ensuring data use and maximizing benefits. On the other hand, the existence of a viable commercial market means that there is the potential for significant savings through a decreased reliance on funding generated through general taxation and efficiency advantages offered by the commercial sector.

To attempt to balance these two goals, an agency could pursue a tiered policy that maximized data availability for noncommercial uses, while allowing for some data sales by the commercial sector. For example, the user could purchase data from the commercial entity under the condition that the data could be shared freely for all noncommercial purposes. If the data used persistent identifiers and put the onus for compliance on data users, it would not need to restrict data access or redistribution. Value-added companies that ignored the licensing restrictions and used the data in a commercial product would risk having their company fined or shut down, if this was discovered.

Tiered policies based on some particular attribute of the data, rather than the type of user, could also allow for a balance between data sales and open data. For example, the government could purchase all data that is older than two weeks, or one year, or five years, depending on the relative commercial and noncommercial value of these types of data. The government may be able to purchase data that has been degraded or aggregated to serve noncommercial uses, while the highest-precision or most detailed data is sold to private actors.

Viable Commercial Market, Narrow Noncommercial Uses: Commercial Data Policy

When there is a viable commercial market, normal market incentives should result in the creation of private entities to fulfill this demand. If the data is also useful for fulfilling a narrow government need, and does not seem to have broad noncommercial uses, there is not a strong incentive for the government to make the data widely available under an open data policy. Instead, the government can purchase this “commercial off-the-shelf” data and expect to have significant savings compared to a situation in which it built the full data collection system itself.

It's worth reiterating here that there is a good deal of uncertainty involved in estimating whether the noncommercial uses of the data are broad or narrow, and there are benefits even for the commercial value-added sector of free and open data policies. For these reasons, to the extent that the government can negotiate agreements that increase the ability to share data (using the types of arrangements in the viable commercial market, broad noncommercial uses scenario, for example) without significantly increasing cost, this would be a worthwhile option to pursue.

Data Context: Technical

Technological developments have had, and continue to have, the ability to fundamentally change the landscape in which data sharing decisions are made. Technology affects the types and volume of data that can be collected; it affects how much data can be stored and how easily it can be accessed and distributed. Technology and technical standards affect the extent to which data can be processed, analyzed, linked, and otherwise manipulated to enable the many applications that have been discussed previously. These possibilities in turn affect the security or vulnerability of the data, and they affect the economic costs and benefits associated with sharing. Technological change can create new challenges and new opportunities, and new problems and new solutions.

Large-Scale Technology Developments

It's impossible to explain changes in data sharing policies over time without understanding changes in the underlying technologies that enable data collection and sharing. Many of these technological changes apply broadly across sectors, particularly the transformations in electronics and information technology. Others may be driven by the development of specialized equipment, new techniques, or new algorithms for collecting, analyzing, or using data in a particular field.

Before the advent and spread of computers and the Internet, the volume of data collected was often smaller, and the physical space it occupied was larger. The methods for copying and distributing data could be time consuming and expensive, limiting the incentives to share data on the part of the producer. It was more difficult to discover the existence of new datasets, particularly outside your own field. This decreased the number of requests for data on the part of consumers.

With the advent of computers and their accompanying storage devices, both the physical space required and the cost of distributing data steeply declined, sometimes by orders of magnitude. Evans and Wurster noted that in the late 1990s, printing, binding, and shipping a set of encyclopedias cost about $200, while producing an encyclopedia on CD-ROM was about $1.50.45 As access to the Internet spread, it became easier to discover what datasets existed and where they could be found. While the creation of a well-designed, user-friendly portal is a nontrivial activity, once it is created, the marginal cost of providing data to an additional user is nearly zero. Data users can search, select, and download data on their own.46

Emerging technology developments have the potential to create both challenges and opportunities. For example, in some cases the volume of data being collected exceeds the ability to transmit or process it with existing technologies. Big data, for example, poses new opportunities for analysis, but also challenges to widespread access.47 The emergence of cloud computing is one potential solution to this challenge. In the future, users may not actually copy or move the data to their own systems at all, rather accessing it and manipulating it in the cloud. Other technological advances, such as the development of data visualization tools, will also help to improve accessibility of data, continuing the trend of reaching new users without expertise in the given field, or even without any advanced data analysis skills at all.48

Technical Opportunities and Challenges That Aren't Really Technical

A number of scholars list technical issues as one of the key barriers or enablers of data sharing.49 While it is true that developing a robust, user-friendly data portal or data sales interface is a technical challenge, it is one that can be met using existing technology and knowledge.50 Therefore, with regard to policy-making, I argue that the development of such a system is more correctly thought of as an economic or political challenge. The cost of developing a portal should be taken into account in an agency's calculations of net social benefit. The availability of funding to support these activities is an important input from national-level policy-makers. If the agency is interested in developing a data portal and the funds are provided to implement it, then technology will not pose a significant barrier.

Similarly, ensuring high data quality, including relevant metadata, and documenting and sharing algorithms are important tasks that undoubtedly affect the success of data sharing efforts and require technical expertise.51 However, if the funds are available to support the hiring of appropriately trained personnel to complete these tasks, a government agency would not find the technical challenges of this activity to be a major barrier.

Many authors point to the importance of international technical standards, and significant data supports the idea that the existence of standards can greatly enhance effective data sharing.52 However, agreeing on technical standards is often more of an international relations challenge than a technical one, and in this model would be considered in the context of nongovernmental and intergovernmental organizational activities.

Finally, Zuiderwijk and others point to the importance of building feedback mechanisms into data sharing infrastructures to better understand how the data is used and improve user experiences.53 This feedback is important, but, as implied by Zuiderwijk, the primary reasons are related to improved interactions with user groups and a better ability to explain the value of data sharing to national-level policy-makers and others.54 These issues are captured in the external actor sections of this model, focusing on the concept of this interaction as the key factor, rather than the technical means by which this interaction is achieved.

Feedback Loops and Other Dynamics

It is important to note that this model is dynamic, not static. The right choice of data sharing policy at one time may no longer be the best option later on. Almost every element of the model is subject to change over time. Bureaucratic mission or preferences may change, although this process is generally slow. National-level policy-makers, their directives, and budgetary support can change much more rapidly. Nongovernmental and intergovernmental groups may form or disband, increase or decrease their activity level, or even change the nature of their activities.

Changes in the security, economic, normative, or technical attributes of data and data systems can also occur, and in many cases these changes are interrelated. In particular, new technology developments may change not only the technological possibilities, but also the nature of the security or privacy issues faced or the economics of data sharing. New types of security threats may increase concern about existing practices. Commercial innovations can change the economic effects of data sharing. The rise of norms in society, calling for government transparency, for example, or concerned with public-private interactions, may raise the profile of normative issues in policy development.

Sometimes these changes come not from external developments, but from improved understanding of the existing situation. Sometimes the answers to the questions raised in this chapter, and posed explicitly in the table below, are partially or completely unknown. As this information becomes available, through study, or through trial and error, data sharing policies can be reevaluated and updated as necessary. For example, a study examining the economic value of downstream applications developed with free government data can provide better insight into whether the data sharing policy is resulting in positive net social value. Experimenting with a data sales model after implementing an open data policy, or vice versa, may provide useful data with regard to the types of users most interested in data use or the willingness of the data users to pay. Actively seeking to answer these questions can improve our understanding of data policy development, or improve development of the policies themselves.

Data Sharing Policy Development Model Framework

Each of the elements of the model presented above presents key questions that have implications for the success of a chosen data sharing policy. The following table presents these key questions along with their implications. This table is designed to be used as a tool to examine and understand an existing policy or as a checklist to guide the successful development of a new data sharing policy.

Table 4.2 Key Questions and Implications in Data Sharing Policy Development

Government Agency
What are the agency's goals? Does the agency view data collection, use, and sharing as central to its activities or as a secondary issue/ byproduct? Achieving the mission or goals of the agency is often a high priority for officials developing a new policy.
What is the dominant professional culture within the agency? (For example, does the agency have a culture of secrecy or of openness?) Agency culture impacts the type of data sharing policy the agency views as appropriate or natural in their circumstances. Agency culture is often affected by the dominant professional culture within the agency and agency history.
National-Level Actors
Are there national-level initiatives or laws supporting particular data sharing policies? Laws or initiatives that encourage or require the release of data generally result in some agency action on the issue, but can often be subverted in implementation (e.g., slow or partial compliance, poor technical infrastructure and/or data quality, little or no user support, etc.).
Are there national-level initiatives or laws forbidding the release of some types of data? Laws forbidding the release of data are generally successful in meeting their objectives. However, if the interpretation is not clear, these laws may dampen data sharing more than intended, as agency officials worried about inadvertently breaking the law tend to err on the side of restricting access.
Is there budgetary support to cover the costs of data collection, maintenance, and distribution? Is there budgetary support to offset losses from data sales, if applicable? Sufficient budgetary support (or lack of support) is a key enabler for agencies wishing to implement their data sharing policies. More open data sharing generally requires more budgetary support.
Is there an understanding among national policy-makers of the potential revenues and/or nonquantifiable benefits of data use? The ability to communicate both the commercial and noncommercial benefits of data use is key to gaining and maintaining political support.
Nongovernmental Actors
Are there existing companies, researchers, or others who benefit from maintaining the current policy? Existing data users will generally have an incentive to lobby to maintain the existing policy.
Are there companies, researchers, or others who would benefit from a change in the data sharing policy? Are these potential users aware of the data's availability and potential benefits? These users have an incentive to lobby for a change in data sharing policy. However, these users may not be aware that the data exists or that is relevant to them.
Are there nongovernmental organizations (e.g., professional associations, nonprofit organizations) active in this area? What activities (if any) have these NGOs taken to enable data sharing and/or raise awareness of data sharing issues and activities? Nongovernmental actors can be instrumental in developing technical standards and building and reinforcing professional norms related to sharing. They can also increase the visibility of existing challenges or opportunities.
Intergovernmental Actors
Are there intergovernmental organizations active in this area? What activities (if any) have these IGOs taken to enable data sharing and/or raise awareness of data sharing issues and activities? Intergovernmental organizations may present a forum for developing technical standards. They can play an important role in building and reinforcing international norms related to sharing. They can also increase the visibility of existing challenges or opportunities.
Security and Privacy Attributes
Are there legitimate privacy, public safety, or national security concerns with respect to sharing the data? Releasing data has the potential to improve or harm privacy, public safety, and national security depending on the circumstances. Careful analysis of the risks and opportunities posed by various data sharing policies is necessary.
Economic Attributes
Are there broad noncommercial uses for the data? To what extent would data use be inhibited under the data policy being considered? Free and open provision of data is particularly important for data that has broad noncommercial uses. Policies that restrict access or redistribution are likely to decrease total societal benefits.
Is there a viable commercial market for the data? To what extent can total costs of the program be decreased through data sales? If there is a viable commercial market for data, or the size of the commercial market is relatively large, there may be opportunities for significant cost savings through data sales to the private sector. If these cost savings are larger than the decrease in total benefits (above), data sales and tiered policies may be efficient.
Normative Attributes
What is the relative value the agency places on public safety, national security, privacy, efficiency, equity, and transparency concerns? Each of these can be a legitimate goal in developing a data sharing policy, and agencies will often have to make trade-offs among them. Their relative importance is often related to agency norms.
Is the data primarily useful for commercial applications? If the data is primarily useful for commercial purposes, data sales may decrease the extent to which the general public pays for a good that results in private gains for commercial users (though some argue that commercial users also pay taxes and thus have a right to benefit from the data without paying for access).
Does the data have the potential to significantly improve government transparency and accountability? Do citizens have a “right to know” this information? Free and open policies are more appropriate when the data provides significant transparency benefits. This may be especially true for politically sensitive data about the government, but also applies to data upon which the government relies to make policy decisions.
To what extent can technology be used to lower the barriers to data access and use for average citizens? Some concerns with regard to equity of access can be alleviated by implementing technologies, such as visualization software, that make the data accessible to those without significant time or advanced statistical skills.
Technical Context
What technological limitations (in data collection, processing, distribution, use, etc.) may impact data sharing in this area? How might these change in the future? Technology can affect the limits of what is possible in data sharing as well as the calculations of security, economic, and normative risks and benefits.
Does the organization have the technology and expertise to implement their data sharing policy (e.g., build a robust data sharing infrastructure) while following technical best practices? This should be seen primarily as a political challenge, requiring adequate political support and funding to support these activities (discussed above under national-level actors). Lack of attention to good technical implementation may also reflect agency incentives and priorities related to data sharing (i.e., lack of emphasis on technical implementation is one way agencies may subvert national directives).
Are there adequately defined international technical standards, including relevant metadata, in this field? This should be seen as an issue related to nongovernmental and intergovernmental actors. Are there collective action challenges that have limited the development of these standards?
Does the agency have a robust infrastructure to allow user feedback with regard to the data? This should be seen as an issue of agency interaction with nongovernmental, intergovernmental, and national-level actors. A robust technical system is important, but the key challenges are in the engagement with these organizations on a political level.

Notes

1.  Peter Arzberger et al., “Promoting Access to Public Research Data for Scientific, Economic, and Social Development,” Data Science Journal 3 (2004).Janssen, Charalabidis, and Zuiderwijk, “Benefits, Adoption Barriers and Myths of Open Data and Open Government.”

2.  Chris Martin, “Barriers to the Open Government Data Agenda: Taking a Multi-Level Perspective.”

3.  Antti Halonen, “Being Open About Data: Analysis of the UK Open Data Policies and Applicability of Open Data.”

4.  Marco Fioretti, “Open Data: Emerging Trends, Issues and Best Practices,” in Open Data, Open Society (Laboratory of Economics and Management of Scuola Superiore Sant’Anna, Pisa, 2011).

5.  Francesco Molinari and Jesse Marsh, “Does Privacy Have to Do with Open Data?” (paper presented at the Conference for E-Democracy and Open Government, 2013).Frederik Zuiderveen Borgesius, Jonathan Gray, and Mireille van Eechoud, “Open Data, Privacy, and Fair Information Principles: Towards a Balancing Framework,” Berkeley Technology Law Journal 30 (2015).

6.  “Open Data, Privacy, and Fair Information Principles: Towards a Balancing Framework.”

7.  Chris Clifton et al., “Privacy-Preserving Data Integration and Sharing” (paper presented at the Proceedings of the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2004).

8.  Barry and Bannister, “Barriers to Open Data Release: A View from the Top.”

9.  Harlan Onsrud, “Access to Geographic Information: Openness versus Security,” eds. S Cutter, D Richardson, and T Wilbanks, Geographic Dimensions of Terrorism (Routledge, 2003).

10.  Ann Florini, “Behind Closed Doors,” Harvard International Review 26, no. 1 (2004).

11.  Patricia Salkin, “GIS in an Age of Homeland Security: Accessing Public Information to Ensure a Sustainable Environment,” William & Mary Environmental Law and Policy Review 30 (2005).John C Baker et al., Mapping the Risks: Assessing the Homeland Security Implications of Publicly Available Geospatial Information (Rand Corporation, 2004).Federal Geographic Data Committee, “Guidelines for Providing Appropriate Access to Geospatial Data in Response to Security Concerns,” National Spatial Data Infrastructure (NSDI) (2005).

12.  Baker et al., Mapping the Risks: Assessing the Homeland Security Implications of Publicly Available Geospatial Information.

13.  Committee, “Guidelines for Providing Appropriate Access to Geospatial Data in Response to Security Concerns.”

14.  Geospatial Information & Technology Association, “Free or Fee: The Governmental Data Ownership Debate,” GITA White Paper ( 2005).Joseph E Stiglitz, “On Liberty, the Right to Know, and Public Discourse: The Role,” eds. Matthew J Gibney, Globalizing Rights: The Oxford Amnesty Lectures 1999 (2003).Halonen, “Being Open About Data: Analysis of the UK Open Data Policies and Applicability of Open Data.”Uhlir and Schröder, “Open Data for Global Science.”

15.  Ibid.

16.  Geospatial Information & Technology Association, Association, “Free or Fee: The Governmental Data Ownership Debate,” GITA White Paper (2005).

17.  Harlan J Onsrud, “Tragedy of the Information Commons,” ed. D R F Taylor, Policy Issues in Modern Cartography (Elsevier, 1998).

18.  Janssen, Charalabidis, and Zuiderwijk, “Benefits, Adoption Barriers and Myths of Open Data and Open Government.”

19.  Onsrud, “In Support of Cost Recovery for Publicly Held Geographic Information.”

20.  Martin et al., “Risk Analysis to Overcome Barriers to Open Data.”

21.  Ibid.

22.  Stiglitz, “On Liberty, the Right to Know, and Public Discourse: The Role.”Halonen, “Being Open About Data: Analysis of the UK Open Data Policies and Applicability of Open Data.”

23.  Ibid.

24.  Yu and Robinson, “The New Ambiguity Of ‘Open Government’.”

25.  Martin et al., “Risk Analysis to Overcome Barriers to Open Data.”Florini, “Behind Closed Doors.”Halonen, “Being Open About Data: Analysis of the UK Open Data Policies and Applicability of Open Data.”Janssen, Charalabidis, and Zuiderwijk, “Benefits, Adoption Barriers and Myths of Open Data and Open Government.”

26.  Janssen, Charalabidis, and Zuiderwijk, ibid.

27.  Martin et al., “Risk Analysis to Overcome Barriers to Open Data.”

28.  Dawes, “Interagency Information Sharing: Expected Benefits, Manageable Risks.”Agrawal, Kettinger, and Zhang, “The Openness Challenge: Why Some Cities Take It on and Others Don't.”

29.  Barry and Bannister, “Barriers to Open Data Release: A View from the Top.”Halonen, “Being Open About Data: Analysis of the UK Open Data Policies and Applicability of Open Data.”

30.  Barry and Bannister, “Barriers to Open Data Release: A View from the Top.”Janssen, Charalabidis, and Zuiderwijk, “Benefits, Adoption Barriers and Myths of Open Data and Open Government.”

31.  Halonen, “Being Open About Data: Analysis of the UK Open Data Policies and Applicability of Open Data.”

32.  Hess and Ostrom, “Introduction: An Overview of the Knowledge Commons,” eds. Charlotte Hess and Elinor Ostrom, Elinor, Understanding Knowledge as a Commons. From Theory to Practice (Cambridge, MA: The MIT Press).”

33.  Kenneth Arrow, “Economic Welfare and the Allocation of Resources for Invention,” in The Rate and Direction of Inventive Activity: Economic and Social Factors (Princeton University Press, 1962).

34.  Arzberger et al., “Promoting Access to Public Research Data for Scientific, Economic, and Social Development.”

35.  Arrow, “Economic Welfare and the Allocation of Resources for Invention.”

36.  Onsrud, “In Support of Cost Recovery for Publicly Held Geographic Information.”

37.  Peter Weiss and Y Pluijmers, Borders in Cyberspace: Conflicting Public Sector Information Policies and Their Economic Impacts (Edward Elger Publishing, 2004).Tulloch and Harvey, “When Data Sharing Becomes Institutionalized: Best Practices in Local Government Geographic Information Relationships.”

38.  Jonathan T Overpeck et al., “Climate Data Challenges in the 21st Century,” Science(Washington) 331, no. 6018 (2011).

39.  Kaye et al., “Data Sharing in Genomics—Re-Shaping Scientific Practice.”Henry Rodriguez et al., “Recommendations from the 2008 International Summit on Proteomics Data Release and Sharing Policy: The Amsterdam Principles,” Journal of Proteome Research 8, no. 7 (2009).Heidi L Williams, “Intellectual Property Rights and Innovation: Evidence from the Human Genome,” The Journal of Political Economy 121, no. 1 (2010).

40.  Pira International and European Commission. Information Society DG., Commercial Exploitation of Europe's Public Sector Information: Executive Summary (Office for Official Publications of the European Communities, 2000).

41.  Janssen, Charalabidis, and Zuiderwijk, “Benefits, Adoption Barriers and Myths of Open Data and Open Government.”Uhlir and Schröder, “Open Data for Global Science.”Weiss and Pluijmers, Borders in Cyberspace: Conflicting Public Sector Information Policies and Their Economic Impacts.

42.  Martin Feldstein, “Tax Avoidance and the Deadweight Loss of the Income Tax,” Review of Economics and Statistics 81, no. 4 (1999).Onsrud, “In Support of Cost Recovery for Publicly Held Geographic Information.”

43.  The situation is analogous for government purchases of data. If the government would have spent $100 million to build its own satellite, but can instead purchase equivalent data from a commercial provider for $90 million, this will be a $10 million savings to the government, but just a $3 million (or less) savings for society as a whole.

44.  Nathan Rosenberg, “Science, Invention and Economic Growth,” The Economic Journal 84, no. 333 (1974).Robert G King and Sergio Rebelo, “Public Policy and Economic Growth: Developing Neoclassical Implications” (National Bureau of Economic Research, 1990).Joseph E Stiglitz, “Capital Market Liberalization, Economic Growth, and Instability,” World Development 28, no. 6 (2000).Halonen, “Being Open About Data: Analysis of the UK Open Data Policies and Applicability of Open Data.”Gregor Eibl and Brigitte Lutz, “Money for Nothing—Data for Free” (paper presented at the Conference for E-Democracy and Open Governement, 2013).Martin et al., “Risk Analysis to Overcome Barriers to Open Data.”Fienberg, Martin, and Straf, Sharing Research Data.

45.  Philip B Evans and Thomas S Wurster, “The New Economics of Information,” Harvard Business Review 5 (1997).

46.  Zuiderwijk et al., “Socio-Technical Impediments of Open Data.”Agrawal, Kettinger, and Zhang, “The Openness Challenge: Why Some Cities Take It on and Others Don't.”

47.  John Carlo Bertot et al., “Big Data, Open Government and E-Government: Issues, Policies and Recommendations,” Information Polity 19, nos. 1–2 (2014).

48.  Janssen, Charalabidis, and Zuiderwijk, “Benefits, Adoption Barriers and Myths of Open Data and Open Government.”Uhlir and Schröder, “Open Data for Global Science.”

49.  Dawes, “Interagency Information Sharing: Expected Benefits, Manageable Risks.”Zuiderwijk et al., “Socio-Technical Impediments of Open Data.”Arzberger et al., “Promoting Access to Public Research Data for Scientific, Economic, and Social Development.”

50.  “Promoting Access to Public Research Data for Scientific, Economic, and Social Development.”

51.  Nahon and Peled, “Data Ships: An Empirical Examination of Open (Closed) Government Data.”Arzberger et al., “Promoting Access to Public Research Data for Scientific, Economic, and Social Development.”Agrawal, Kettinger, and Zhang, “The Openness Challenge: Why Some Cities Take It on and Others Don't.”Janssen, Charalabidis, and Zuiderwijk, “Benefits, Adoption Barriers and Myths of Open Data and Open Government.”Martin et al., “Risk Analysis to Overcome Barriers to Open Data.”

52.  Arzberger et al., “Promoting Access to Public Research Data for Scientific, Economic, and Social Development.”Barry and Bannister, “Barriers to Open Data Release: A View from the Top.”

53.  Zuiderwijk et al., “Socio-Technical Impediments of Open Data.”

54.  Ibid.Janssen, Charalabidis, and Zuiderwijk, “Benefits, Adoption Barriers and Myths of Open Data and Open Government.”