2

General Sources

The sources covered in this chapter are general-purpose data sources, containing information across a broad range of subject areas. All contain a mix of demographic, economic, and social data; some also have data on additional topics. In some cases, these are the definitive sources for the fields they cover; in others, they are useful for “one-stop shopping,” but more detailed data is available from a more specific source covered in a later chapter.

Major Sources: United States

Census Bureau (U.S. Department of Commerce)

Most people are familiar with the Census Bureau and the Decennial Census as sources for basic demographic information about the United States.1 But the Census Bureau’s data offerings go far beyond basic demographic data: they are also one of the best sources for many types of economic information about the United States—which makes more sense when you remember that the Census Bureau is a division of the Department of Commerce. The Census Bureau runs many ongoing surveys of businesses, including the Economic Census, County Business Patterns, and the Annual Survey of Manufactures. In collaboration with the Bureau of Labor Statistics it manages the Current Population Survey, which provides the raw data for U.S. labor force statistics such as the unemployment rate. It is a major source of data on governments as employers and financial actors through products such as the Annual Survey of State and Local Government Finances; and of data on imports into and exports out of the United States. These economic surveys are covered in more depth in chapters 9 and 16, respectively.

Data from the Census Bureau is available in a variety of ways, but the most useful option for many purposes is American FactFinder (http://factfinder2.census.gov). American FactFinder provides several options for navigating many of the Census Bureau’s data sets, from the simple—browsing by geographic area or being walked step-by-step through a guided search—to power search options appropriate for advanced users. One drawback of American FactFinder is that it is focused on current data; no data from before 2000 is available. The most comprehensive and easy-to-use source for historical data from the U.S. Census Bureau is the National Historical Geographic Information System, a nongovernmental project discussed below.

For more than one hundred years before the launch of American FactFinder, the Statistical Abstract of the United States was the go-to source for statistics about the United States. In this print volume, the Census Bureau brought together hundreds of tables on just about every conceivable subject—from the number of people who participate in various sporting activities to the amount of hazardous waste generated in each state—produced using its own data, data from other government agencies, and, in later years, proprietary data licensed from private organizations. Then, facing budgetary constraints in 2011, the Census Bureau decided to eliminate the Statistical Compendia program, which published the Statistical Abstract and two similar volumes focused on subnational data, the State and Metropolitan Area Data Book and the County and City Data Book. The 2012 edition was the last version of the Statistical Abstract published by the Census Bureau.2 As of this writing, the volumes of the Statistical Abstract published from 1878 through 2012 remained available on the Census Bureau’s website (www.census.gov/compendia/statab/past_years.html).

Data.Gov (U.S. General Services Administration)

Data.gov (www.data.gov)—the official portal to data produced by the U.S. federal government—is more comprehensive than the Census Bureau’s offerings, since it includes data across a broader range of subject areas. Data from all fifteen cabinet-level agencies—the departments of Agriculture, Commerce, Defense, Education, Energy, Health and Human Services, Homeland Security, Housing and Urban Development, Interior, Justice, Labor, State, Transportation, Treasury, and Veterans Affairs—are included in its tens of thousands of available data sets, as is data from some of the many non-cabinet-level agencies, including the Federal Communications Commission, the Consumer Product Safety Commission, and the National Aeronautics and Space Administration. Data.gov also contains information about and links to data disseminated by nonfederal organizations, such as state and local government agencies. The sheer variety of types of data available is a great strength for users who know exactly what they are looking for and know how to recognize it when they see it, but it can contribute to making the site overwhelming for those less knowledgeable. Different data sets can be downloaded in different formats, from spreadsheets and PDF documents to RDF and JSON files. Additionally, a few hundred interactive datasets can be queried or manipulated online.

Major Sources: World

United Nations

The United Nations may be the single largest disseminator of freely available data from all countries in the world. Although a plethora of separate agencies within the UN system each publish data in their own domains, the United Nations brings a large portion (although nowhere near all) of these statistics together in a single, cross-searchable site, UNdata (http://data.un.org). In addition to the data collected by its own divisions, the United Nations also uses UNdata to distribute numeric information from other, independent international organizations, such as the World Bank and the International Monetary Fund.

Major agencies whose data are available, in whole or in part, through UNdata include the following:

Food and Agriculture Organization

International Labour Organization

International Monetary Fund

International Telecommunications Union

Joint United Nations Programme on HIV/AIDS

United Nations Children’s Fund

United Nations Development Programme

United Nations Educational, Social, and Cultural Organization

United Nations Framework Convention on Climate Change

United Nations High Commissioner for Refugees

United Nations Industrial Development Organization

United Nations Office on Drugs and Crime

United Nations Population Division

United Nations Statistics Division

World Bank

World Health Organization

World Meteorological Organization

World Tourism Organization

Note that, for many of these organizations, only some of their data is available through UNdata; the remainder is distributed only via their own websites. Agencies with significant data resources that are not available in UNdata are covered in the relevant sections later in this book.

World Bank

Although some of the World Bank’s data is available through UNdata, much of it is available only on the World Bank’s own data site (http://data.worldbank.org). The World Bank’s mission—to fight poverty and related problems in poor and middle-income countries—informs the data it makes available: thousands of indicators that are directly or indirectly related to the economic well-being of countries and their populations, including indicators of macroeconomic performance, economic inequality, the labor market, the health and education of the population, government spending, environmental quality, and gender discrimination in employment and education. More than eight thousand of these indicators are available in online interfaces that allow for visualizing the data on maps or in graphs, viewing data in tables, or downloading data sets.

Minor Sources

National Historical Geographic Information System (University of Minnesota)

The National Historical Geographic Information System (NHGIS, www.nhgis.org), a project of the Minnesota Population Center at the University of Minnesota, distributes an impressive quantity of historical data originally collected by the U.S. Census Bureau. This includes not only population data from the Decennial Census and American Community Survey but also data from County Business Patterns, the censuses of churches and other religious bodies conducted between 1906 and 1952, agricultural data from various sources covering 1840–1959, and various special censuses conducted in the 1920s and 1930s. The data is compatible with GIS software but can also be opened and manipulated as spreadsheets.

European Union

The Eurostat database (http://ec.europa.eu/eurostat) provides access to harmonized, nation-level data for countries that are currently part of or are candidates for joining the European Union. The database contains statistics related to a wide variety of topics, including agriculture, trade, labor, health, education, and other economic and socioeconomic topics. Free registration is required for access to some data sets and download formats. More advanced users can also access Eurostat data through the European Union Open Data Portal (http://open-data.europa.eu/en/), which provides access to the data in Linked Data and other specialized formats.

Organisation for Economic Co-operation and Development

Another source for general data on European countries, as well as for other developed and upper-middle-income countries, is OECD.StatExtracts (http://stats.oecd.org), produced by the Organisation for Economic Co-operation and Development (OECD). This database includes only so-called core data from the OECD; additional data can be accessed through OECD’s subscription product, iLibrary. The data that is freely available is broad and deep, covering topics ranging from productivity to pesticides, tax revenues to tobacco use, and immigration to technological innovation. The length of time series available varies, but some data is available as far back as the 1940s or 1950s. (To access additional years of historical data, where available, open the “Customize” menu and choose “Time & Frequency” under the “Selection” options.)

Regional Economic and Social Commissions and Regional Development Banks

Two types of regional sources, the United Nations Regional Economic and Social Commissions and the regional development banks, can be helpful for patrons who need data about multiple countries in the same geographic region. All of these organizations have an interest in some aspect of economic and social development in their region, and the data they disseminate broadly reflects this interest. (The possible exception is the United Nations Economic Commission for Europe, whose membership consists almost entirely of developed countries and which focuses instead on economic integration.) They all produce economic data, but because they are concerned with development broadly defined many of them also publish social, political, and in some cases even environmental data for their respective regions. Development banks and regional economic commissions that make significant data resources freely available on their websites include the following:

African Development Bank Group (AFDB, http://dataportal.afdb.org/Default.aspx). The AFDB’s Data Portal offers a variety of methods for accessing dozens of economic and social indicators for African countries: maps, “dashboards” that display related charts on a single screen, a “data analysis” option that allows users to manipulate data as they could in a desktop spreadsheet program, and traditional static spreadsheets.

Asian Development Bank (www.adb.org/statistics/). Of the approximately 750 indicators available in its Statistical Database System, over half are purely economic and financial, covering such topics as government finance, inflation, and trade, and the remaining indicators primarily cover social and socioeconomic issues. Statistics are also available in a variety of PDF publications.

Inter-American Development Bank (IDB, www.iadb.org). IDB makes available a variety of separate databases and downloadable data sets, from the purely economic (e.g., REVELA, which reports on expectations for inflation and economic growth) to the largely social (e.g., Sociómetro-BID, which contains copious data about the education, housing, and other indicators of socioeconomic status of countries’ populations, broken down by gender, race, and other demographic factors). IDB also has an entire database of indicators of the quality of a country’s governance, DataGov (see chapter 23).

United Nations Economic Commission for Africa (UNECA, www.uneca.org). UNECA’s statistical system, variously called the ECA Databank or StatBase (http://ecastats.uneca.org/statbase/), has data on trade, agriculture, and other economic variables; on health, education, and other social variables; and on the environment, among other topics. Several indicators are listed, but many of them are not available for every country for every year. Note that the site seems not to work in Firefox, although it works in most other browsers.

United Nations Economic Commission for Europe (UNECE, www.unece.org). The UNECE Statistical Database contains data in six areas: economics, forestry, gender, transportation, the Millennium Development Goals, and international migration.

United Nations Economic Commission for Latin America and the Caribbean (ECLAC in English, CEPAL in Spanish, www.eclac.cl). ECLAC’s statistical database, CEPALSTAT/Databases and Statistical Publications, has a user-friendly interface, with data presented in charts and graphs for basic users, in interactive interfaces with download options for intermediate users, and via an application programming interface for the most advanced users. It also provides extensive data on the environment, sustainability, and social cohesion, in addition to the standard economic and social indicators. Be aware, though, that it is possible to stumble into partially untranslated areas in CEPALSTAT, so some knowledge of Spanish can be helpful.

United Nations Economic and Social Commission for Asia and the Pacific (UN ESCAP, www.unescap.org). ESCAP’s statistical database includes many common economic, financial, and socioeconomic indicators, but it is notable for the breadth of data on environmental issues. Many indicators are available covering such topics as emissions of carbon dioxide and other pollutants, water usage, and protections for endangered species. Health and education data are also well covered by this database.

A fifth regional commission, the Economic and Social Commission for Western Asia (www.escwa.un.org), does not publish a statistical database, but some statistical information can be found in PDFs on its website.

National Statistical Agencies

One of the most useful single web pages for general statistical information seeking is “Statistics—National Agencies and Compendia” (www.library.vanderbilt.edu/govdb/natlstats.html), published by the Jean and Alexander Heard Library at Vanderbilt University. It contains links to the local equivalent of the Census Bureau and the Statistical Abstract, where available, for every country in the world, as well as notes about the languages in which information is available. If statistics are needed for a single country, and those statistics are not readily available in one of the other sources mentioned in this book, searching the data produced by the country’s national statistical agency is an excellent next step in one’s data search.

Open Knowledge Foundation

The Open Knowledge Foundation (http://okfn.org), a not-for-profit group based in Cambridge, England, is responsible for multiple wide-ranging catalogs of freely available data sets. These include datacatalogues.org, a “catalog of catalogs” with links to more than three hundred sites containing open data, and PublicData.eu, which contains almost 20,000 data sets for countries in Europe. Because of the international nature of the Open Knowledge Foundation sites, some of the data may be available only with non-English-language metadata.

Gapminder

No discussion of general data sources would be complete without mentioning the Gapminder site (www.gapminder.org). Gapminder is notable not primarily for its data—Gapminder merely disseminates data that is widely available from other sources, although the breadth of the data it has assembled is impressive—but for the interactive interface it puts over the data. This interface has been made famous by Hans Rosling’s 2006 TED talk. (If you have 20 minutes and you haven’t seen the video yet, it is well worth your time.)3 Gapminder’s interface allows users to visualize data in five dimensions at once (x axis, y axis, bubble size, color, and time). Time is fixed, but for all other dimensions users can choose which variables they want to be represented. The interface makes visualization so easy that users may not even realize how much data they are taking in at once.

Following the success of the Gapminder interface, similar moving “bubble charts” or “motion charts” have become available on other sites, including those of the United Nations Development Programme (UNDP, http://hdr.undp.org/en/data/explorer/), which can be used to explore the data that goes into the UNDP’s Human Development Reports (indicators covering social and economic issues, health, education, the environment, and technology); and the Google Public Data Explorer (www.google.com/publicdata/directory/), which allows users to visualize data from various U.S. and international agencies.

Notes

1. Until 2010, the Decennial Census (the every-ten-year undertaking commonly referred to in the United States as “the Census”) itself was a rich source of data about the employment, education, and income of American individuals, households, and families; one household in six received the so-called long form, with dozens of questions about everything from the amount of rent they paid on their residence and whether they had moved in the past five years to how well each person spoke English and how much schooling he or she had completed. After the 2000 Census, all questions beyond the most basic demographics (age, gender, race, and Hispanic ethnicity), plus one additional question (whether the residence was rented, owned free and clear, or owned but with outstanding loans), were dropped from the Decennial Census and moved to a new survey, the American Community Survey, which is covered in chapter 20.

2. ProQuest began publishing its own version of the Statistical Abstract in print and as an online subscription starting with the 2013 edition.

3. Hans Rosling, “The Best Stats You’ve Ever Seen,” Feb. 2006, www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html.