2 Defining Data Sharing

To begin, it is important to define the concepts and terms that will be used in this book and that will bound the scope of the model. Two of the most common terms used are data and information. Formally, data can be defined as the “raw bits,” the symbols that represent the properties of objects and events, and information is defined as data that is processed and organized to put it in context or make it more useful, creating value-added products, such as visualizations, specialized applications, or reports.1 However, this distinction is problematic, because it differs significantly from common usage in policy development.

Most of what we think of as “data” has at least minimal processing—translating zeros and ones collected by a sensor into standard units, sanitizing a dataset to remove typos or outliers, or adding metadata to describe what has been collected. Minimally processed data remains an intermediate good: a good used as an input to the production of final goods, rather than a final product itself. It is this type of data to which many government data sharing policies refer. Because it does involve some processing, this type of data is sometimes referred to as information. Following common practice among those developing government data policies, this book will use the two terms interchangeably, both referring to minimally processed data.

This book deals specifically with data collected and/or produced by the government, sometimes referred to as publicly collected data or public sector information (PSI). This can include scientific data, such as that collected by space or environmental agencies, as well as data collected to carry out government functions, such as meteorological data collected to develop weather reports and warnings. Publicly collected data can also be about the government itself, providing politically important information about spending, future plans, or other government activities. It can include also benign information about government services or structures, such as public transit schedules, postal codes, or office hours.2 Recognizing and addressing these distinctions is important, as discussed below.

This book will not deal with data sharing policies governing data collected by nongovernment entities supported by public funding, such as individual researchers supported by government grants. While there are many similarities in the development of data sharing policies for these grant-giving institutions, there are also unique considerations regarding the incentives and barriers for individual researchers that will not be addressed in detail here.3 The book will also not address data sharing policy development within other entities, such as nongovernmental organizations, intergovernmental organizations, or companies, except as the policies of these organizations affect the development of government data sharing policies. Again, while there are many similarities, and those studying data sharing policy development may find this book informative in understanding these processes, these organizations face additional incentives and constraints in data sharing policy development that will not be explicitly discussed here.

With a working definition of data and information in hand, the next important term to define is data sharing policy. Once again, there are a series of terms used (often interchangeably) to refer to closely related ideas. Data sharing can refer to interpersonal, intraorganizational, and interorganizational data exchanges. In this usage, the term generally refers to a negotiated agreement between two or more entities for a mutual exchange of data or the sharing of data by one organization in exchange for products, services, or funds from another organization.4 The term data exchange is generally used synonymously with this definition of data sharing. These types of negotiated arrangements are not the subject of this book. Instead, this book focuses on data sharing policies that are created by an agency on its own, without any official partner entities or formal external negotiation processes, sometimes referred to as data access policies. While the agency may consult with other organizations—domestic, international, government and nongovernment—it does not require that these organizations provide approval of the document.

A data sharing (or data access) policy defines what data will be shared, with whom (and/or for what purpose), at what price, and under what conditions the user can redistribute the data or applications developed using the data. These policies can vary significantly in terms of the overall level of openness and the particular types of restrictions. At one extreme, a data sharing policy covering top-secret information may limit access to only those with special credentials and an officially sanctioned “need to know.” At the other extreme, an open data policy would support completely unrestricted access: anyone is free to use, reuse, and redistribute the data at no cost and for any purpose.5 In this book, the term “data sharing policy” will generally refer to policies in which at least some access by external users is intended.

Table 2.1 Elements of a Data Sharing Policy

What data is being shared?
Restrictions on Access
With whom/for what purpose?
At what price?
Using what process?
Restrictions on Redistribution
Under what conditions can the data (or derivative products that use the data) be redistributed?

Data sharing policies specify the type of data to be shared and any key attributes of the data. The data may be near real time, minimally processed data, or the data may be transformed in some way: processed, aggregated, anonymized, truncated, or time delayed, for example. Policies may differentiate between types of users and/or uses: domestic, foreign, education, research, journalism, operational, policy, commercial, or others. Data sharing policies may provide access for free, or they may charge a fee for access to the data. Fees are sometimes set at the marginal cost of sharing the data to recoup the costs of distribution; the average cost of collecting and sharing the data, allowing for full cost recovery; or at a market rate based on the users’ willingness to pay. The cost may be fixed or negotiated on a case-by-case basis, and it may differ depending on, for example, the type of data being shared and the individual or group interested in accessing the data.

Restrictions on access may also include requirements for user registration and/or the submission of an official request. Requests may be brief and approval routine, or they may involve more complex proposals to be reviewed using a lengthy, formal process before being selected. Some policies restrict redistribution of the data and/or products derived from the data. Like the policy itself, this may include restrictions on the types of users with whom the data (or products) can be shared, or the types of uses for which it can or cannot be shared (research, commercial, and other uses). Downstream users may be required to return to the original source of the data to request permission to access and use it. Policies often require proper credit be provided to the data provider.

Data sharing policies may be informal and defined via common practice, or they may be carefully delineated in an official document. The choice of license placed on the data, if any, also affects the clarity of the policy and the usability of the data.6 An overview of the many types of potential restrictions is provided in Table 2.1.

Notes

1.  Russell L Ackoff, “From Data to Wisdom,” Journal of Applied Systems Analysis 16, no. 1 (1989).Charlotte Hess and Elinor Ostrom, “Introduction: An Overview of the Knowledge Commons,” eds. Charlotte Hess and Elinor Ostrom Understanding Knowledge as a Commons. From Theory to Practice (Cambridge, MA: The MIT Press, 2009).

2.  Harlan Yu and David G Robinson, “The New Ambiguity of ‘Open Government’ ” (2012).Antti Halonen, “Being Open About Data: Analysis of the UK Open Data Policies and Applicability of Open Data,” London: The Finnish Institute in London (2012).

3.  Bryn Nelson, “Data Sharing: Empty Archives,” Nature 461 (2009).Jane Kaye et al., “Data Sharing in Genomics—Re-Shaping Scientific Practice,” Nature Reviews Genetics 10, no. 5 (2009).

4.  Tung-Mou Yang and Terrence A Maxwell, “Information-Sharing in Public Organizations: A Literature Review of Interpersonal, Intra-Organizational and Inter-Organizational Success Factors,” Government Information Quarterly 28, no. 2 (2011).

5.  Jeni Tennison, “Being Open About Data: Analysis of the UK Open Data Policies and Applicability of Open Data,” https://theodi.org/blog/data-sharing-is-not-open-data (2014).

6.  Chris Martin, “Barriers to the Open Government Data Agenda: Taking a Multi-Level Perspective,” Policy & Internet 6, no. 3 (2014).Mireille Van Eechoud and Brenda Van Der Wal, “Creative Commons Licensing for Public Sector Information—Opportunities and Pitfalls,” available at SSRN 1096564 (2008).