Chapter 8

Post-enumeration—
processing

Chapter Six

This chapter elaborates on some post-enumeration data operations, including the updating and verification of the changes that are made to the master GIS census database, EA-based or boundary aggregations, database archiving and maintenance, and open data systems. These operations contribute to the quality assurance of data prior to its dissemination, and even beyond dissemination, to facilitate such activities as post-census evaluation, sampling frame, surveys, and particularly future censuses.

Updating and verifying changes to the database

In chapter 6, we discussed updating and correcting EA maps during the pre-enumeration fieldwork process. You also learned that EA updating must continue to be carried out during enumeration, particularly when the fieldwork was begun a long time before enumeration began. Even when EA maps are created after an extensive fieldwork update and verification, errors may still be found due to simple human errors, omissions, buildings that have been destroyed, or new developments between the fieldwork and the actual enumeration. This requires enumerators to make and report updates and corrections of the EA maps either manually or electronically through “ground-truthing.”

When enumeration was done with paper maps, the census geographers at the office would gather a massive number of the EA maps after the enumeration operation, manually capture the correct data, and update the GIS census database for post-enumeration and intercensus activities. With the advent and use of handheld devices equipped with GPS, point-based data (dwellings or housing units, landmarks, addresses, etc.) is relatively easy to collect and validate, including its integration into the database. Even polygon editing is made easier with the use of GIS on mobile devices, particularly with the use of mobile map packages with imagery or basemap data and layers that can be edited interactively in the field. The edits and new points collected during enumeration, however, still need to be incorporated into the master database. There should be an established workflow of incorporating edits and verifying changes to the master geospatial census database.1

Usually, additional data-cleaning operations, coding activities, and other imputation operations are performed on the full set of data after the enumeration period, enabling the source database to produce all dissemination products. These procedures contribute to the quality assurance of data prior to its dissemination. Once data collection and processing operations are complete, it is essential to correct major errors to avoid any significant problems, and make sure that the final statistical data is quality assured, where possible, prior to its publication.2

Updating and verifying changes to the database after the field enumeration operation is advantageous to having the most accurate database of EAs post-enumeration. Curated data facilitates post-census evaluation and sampling frame and will be used in all future survey work. The automation of data collection has had a positive impact on the speed at which the work can proceed with updates and quality checks for database consistency. A mobile digital data collection also speeds the release of census results and helps ensure a faster delivery of results than a paper-based data collection.

For these purposes, ArcGIS Data Reviewer is a beneficial tool. Data Reviewer is designed to improve data quality by providing a complete system for automating and simplifying data quality management.

Figure 8.1. ArcGIS Data Reviewer checks.

Data Reviewer provides tools that support both the automated and visual analysis of data to improve data quality control across multiple platforms. Data Reviewer also provides simple-to-use visual review tools. Users can identify missing or misplaced features, improperly attributed features, and other types of issues. Data Reviewer can also be used to detect anomalies with features, attributes, and spatial relationships in a database. Its data checks contain analysis rules that can be scheduled to run interactively or automatically as a recurring event. Depending on the type of analysis to be performed, the anomaly can be corrected as part of database maintenance or investigated further.

More specifically, Data Reviewer allows users to: (1) manage the quality control and analysis of their GIS data (for example, detecting a building in the river or on a highway would most likely be an error identified as part of quality control); (2) undertake spatial checks by analyzing the spatial relationships of features (for example, analyzing whether features overlap, intersect, reside within a specified distance of other features, or touch); (3) undertake attribute checks by analyzing the attribute values of features and tables (which can be simple field validation similar to a geodatabase domain or more complex attribute dependencies); (4) undertake feature integrity checks by analyzing the properties of features—feature integrity checks ensure that the collection rules are followed for each feature class; (5) undertake metadata checks by analyzing the metadata information of the feature datasets and feature classes; (6) conduct a managed review of data through automated or visual checks to understand the integrity of the entire database; and (7) use interactive analysis tools that provide better communication about missing features and features with inaccurate shapes.3

Data Reviewer helps a user gain insight into sources of poor data quality and to identify error trends and monitor data health using reports, dashboards, and other automatically captured statistics.

Figure 8.2. ArcGIS Data Reviewer interface and widgets.

Another tool that should be considered in the processing of data is ArcGIS® Workflow Manager. While Workflow Manager is useful for managing field operations, it extends ArcGIS by providing a centralized enterprise job management and tracking system to streamline daily tasks. Significant time savings and improved efficiency of GIS implementations can be achieved by enforcing standardized and repeatable workflows across the organization. Workflow Manager handles complex geodatabase tasks such as data access, version creation and management, and archiving behind the scenes by integrating with ArcGIS geodatabase tools. This ensures that the right person is working on the right data at the right time.

Aggregation

The creation of a digital census geographic database at the EA level serves the production of digital EA maps and reports at administrative and statistical units. This reporting can be done through geographic data aggregation. Based on a nested administrative hierarchy structured in the database, GIS-enabled spatial aggregation capacity allows the EAs to be aggregated to various reporting units, required for the countless geographic products for census dissemination. The aggregation process is required to preserve confidentiality, making census data available for spatial aggregates and not for individuals. Thus, the production of aggregated geographic areas enables NSOs to meet different user needs from different sectors (such as health, education, transportation, or environment). The data aggregation process is critical to the successful use of the data, in the near term and the distant future, because this will be the foundation for comparative purposes. Aggregation must be done at each level of geography to be published. It is often also necessary to the workings and needs of central and local governments. For example, for its 2016 census, Statistics Canada created a new census dissemination geographic area, a subprovincial census dissemination geography called aggregate dissemination area (ADA). The intent of the ADA geography is to ensure the availability of census data where possible, across all regions of Canada.4

Figure 8.3. Example of an application of ArcGIS Workflow Manager.

Figure 8.4. Example of aggregation: administrative areas and grid system. Source: National Statistics Center of Japan (NSTAC), Esri UC 2018 presentation “Utilizing grid square statistics.”

The UN-EG-ISGI proposed this definition: “Aggregated statistical information is aggregated from geocoded unit record level data into the dissemination geography, as opposed to disaggregated statistical information that is created using a spatial distribution model and larger statistical geographies as source data.”5 More specifically, the geocoding of the EAs, their geometrical representation, and topological structure in the database provide the basis for GIS to enact its spatial analysis capabilities and create various aggregations.

Spatial aggregations combined with overlay, distances, spatial selection, intersection, and other analytical techniques provide insights and useful knowledge about many geographically related issues. Obviously, aggregations require the boundaries and attribute data of the reporting units because this is the statistical data related to these units. This aggregate data related to the reporting can then be made available for use and reuse in appropriate open formats, such as comma-separated values (CSV), Extensible Markup Language (XML), and so on (see details in chapter 10).

While census data has been traditionally aggregated by various types of administrative units (villages, towns, cities, provinces, etc.), the increasing demands for small areas require aggregations of some EAs, local areas of interest, or very small units such as blocks or mesh-blocks. For some applications, the appropriate geographic units may be an ad hoc aggregation or an aggregation or group of local administrative units. However, when data is captured at the point level (dwellings or housing units, landmarks, addresses, etc.), grid systems may be used to aggregate this existing point-based data. Using a spatial reference system with squared grid cells allows for overlaying capabilities, comparisons, and other spatial analysis. Grids are covered in more detail in chapter 4.

Database archiving and maintenance

A population and housing census generates massive amounts of data and information that constitute a valuable asset for the country, which every NSO needs to preserve and sustain. Preserving the asset requires the setup of a data repository system enabling data to be safely stored and archived, and sustaining the system requires the maintenance of the master database, as its value increases through continual updates and long-term use.

Preserving and archiving census data and documentation related to their collection and processing contributes effectively to the data dissemination of the current census and for planning and implementing future censuses. Census data and documentation can also be used in conducting time-series or comparative analyses. Drawing lessons from records of how the census was planned, organized, and conducted, and guidelines and documents on past processes (e.g., how a specific technology was used in a previous census) could contribute to the success of a future census. Like many other census activities, the preservation procedures need to be raised early in the planning of census activities. This has become even more crucial with a technology-driven census, as rapidly changing technology affects digital files products and storage, which may require ongoing assessment to ensure access in the future.

While archiving census data is of critical importance, a survey led by UNSD during the 2010 Round of Censuses found that only 73 percent of the countries or areas responding to the questionnaire indicated that they will use a system to archive their census data. For example, only four African countries declared having a system to archive census data, but this number still reflects an increasing awareness that is promising for the 2020 Round of Censuses.6

Countries recognize that archiving a vast amount of data represents a considerable challenge, particularly when they deal with census individual records and related security and confidentiality. This challenge requires the NSO to develop a set of procedures and an archival program to ensure that the contents from data collection operations are maintained in formats that can be used by current and future censuses and for other statistical activities. In Principles and Recommendations, the UN recommends that “the national statistical authority needs to develop an institutional strategy for archiving based on three components: organizational infrastructure, technological infrastructure and resources.” The organizational aspect generally refers to a centralized unit within the NSO that oversees archiving, maintenance, storage, and the possible release of census individual records. Technological infrastructure refers to the actual technology used for digital archiving, and resources are normally planned at an early stage of the census and needed for the archiving operation in the context of the organizational and technological infrastructures.7

A strong archival program includes not only the preservation of the current census data in a physical or logical space to protect it from loss, alteration, and deterioration,8 but also the maintenance of the database, metadata, and census data products for future use, particularly for future census and statistical activities. Database maintenance is critical for ensuring that census data and information remain continually updated and accessible for long-term use. Database metadata, which explains the content and structure of the data, needs to be continually verified and completed to document any changes implemented in the database and keep up with any evolution in the definitions and related technical standards. NSOs should develop and implement database maintenance procedures immediately following a census, allowing for continuous updates of boundaries and other features as new information becomes available.

Maintaining the database also expands its use in the post-census evaluation of the census coverage and in the intercensal period, providing geospatial services for other statistical applications, such as sample surveys or sectoral applications. It also prepares the geographic base for the next census enumeration. As already stated, the benefits gained from using a GIS-based census database for many applications beyond the core tasks of a census, at the national level, outweigh the costs involved, thus greatly increasing the return on investment in opting for a digital geographic census infrastructure.

Open platform and system interoperability

As geographic information is increasingly distributed on the web and routinely integrated into thousands of applications and services, GIS has become more open. We all recognize the increasing importance of open data, open standards, data interoperability (e.g., open formats), open application programming interfaces (APIs), and specifications, all needed for easy access and sharing of data by an open community. All of these are required for an open platform that provides interoperability, an essential feature of any system to be able to interact with other systems during its mission to exchange and use information.9 Indeed, a popular dissemination method today is open data using Internet protocols, which are usually configured to allow unrestricted access by people or other computers.

Esri has a longstanding commitment to standards and interoperability. As part of Esri’s dedication to building an open and interoperable platform, its goal is to support appropriate technology specifications as they become finalized. The company also participates in the development of GIS standards through organizations like OGC and the ISO. By serving in leadership roles in many OGC initiatives and ISO/TC 211 committees, Esri contributes to a knowledge of interoperability to promote standards compliance across the ArcGIS platform.

ArcGIS supports more than 100 established standards, including data formats, metadata, and services. The ArcGIS platform conforms to open standards and enterprise IT frameworks that allow users to incorporate GIS into any application on a variety of computers and mobile devices. ArcGIS uses data format standards to store geospatial data in a common format or transfer data from system to system via ETL tools for data validation, migration, and distribution. Further, services standards are used to transfer data via the web or provide remote access to data stored on a web server. These standards allow users to interact with data, usually through simple web clients, on a live and real-time basis. This includes viewing maps, accessing and querying data, running analyses, and downloading data.

However, the trend today for open data and systems is bigger; it isn’t just standards—it really is about integration. As we stated earlier, with the advent of open data, statistical organizations are facing new challenges: integrating the primary data (census and survey data) with secondary data sources (typically administrative datasets, geospatial data, and big data or any other nontraditional source of information for official statistics).10 Having found the various data, the issue is how to integrate it into a form so that exploration, analytics, and visualization of the combined datasets can be performed. This integration is needed particularly in the context of the 2030 Agenda for Sustainable Development, where the delivery of indicators requires the combination of various multisource data.

In this regard, to fulfill its vision of openness, Esri has built ArcGIS as an open platform with the view that open systems encourage innovation, support interoperability, promote transparency, improve reliability, and increase collaboration. Indeed, the ArcGIS platform reflects Esri’s relationship to all things open—standards, interoperability, data, APIs, code, and the community. For example, the Esri platform-independent approach ensures interoperability because it supports industry and community standards, libraries in every major programming language, integration with common analysis and data management tools, and a growing repository of open-source software available on GitHub®.11 For more information, see the 2017 Esri white paper Esri Support for Geospatial Standards at https://www.esri.com/~/media/files/pdfs/library/whitepapers/pdfs/esri-support-for-geospatial-standards.pdf.

Figure 8.5. ArcGIS is an open platform for innovation.

Case study: Ireland

Joining forces, the Central Statistics Office of Ireland (CSO) and the Ordnance Survey Ireland (OSi) collaborated to make the country’s 2016 census data more meaningful and accessible. The agencies launched two new data portals that are making information about Ireland’s people, environment, and prosperity available in ways that previously were not possible.

Challenges

Every five years, the CSO conducts a census survey of the country’s 4.8 million residents, at 1.5 million households, across an area of 70,000 km2. Enumerators had been using the “long form” method to collect data about everything from people’s employment status to their means of getting to work. The office traditionally presented this census data in statistical tables and published it in reports that contained a few maps and diagrams. Administrators realized, however, that they could add value to census data by presenting it in geographic context. Furthermore, if CSO provided GIS capabilities and interactive web applications, data users could make their own maps and do their own analysis.

CSO collaborated with OSi, the country’s national mapping agency, via a formal memorandum of understanding. Both organizations had been playing active roles in the government’s public-sector reform plan; both organizations worked with data and analytics; and both organizations used the ArcGIS platform. The two organizations agreed to work together to create new channels for disseminating geospatially referenced data.

Just a few months after CSO and OSi had signed the memorandum, the United Nations and Esri invited the agencies to participate in a research project to develop and deploy a new method of monitoring the UN’s Sustainable Development Goals (SDGs).

According to Esri, what makes data exploration like this feasible is having all the information in one place, which is what Esri and the UN Statistics Division (UNSD) are doing in their joint research exercise. For the project, participating member states use their existing data systems by deploying ArcGIS Hub in conjunction with ArcGIS Enterprise to help their national statistics offices integrate SDG-related data into their own work.

The exercise asks statistic offices to align their data and systems with other in-country SDG stakeholders, including National Mapping Agencies (NMAs), health ministries, natural resource and environmental agencies, and private-sector statistical data producers.

Ireland was one of seven countries selected for this groundbreaking initiative and the only country from Europe. The opportunity provided a clear focus for the partnership and provided the impetus for CSO and OSi to launch an ambitious, collaborative development project.

Solution

OSi had already developed a data-sharing platform called GeoHive®, based on the ArcGIS Hub solution, so CSO and OSi decided to use GeoHive as the technical platform for their collaborative projects. GeoHive acts as a “hub of hubs,” allowing the same data to be presented to different audiences, with different views, in subportals known as micro-hives.

While working on the UN SDG project, CSO and OSi decided to create a micro-hive to present Ireland’s Census 2016 Small Area Population Statistics (SAPS). For the first time, data would be geospatial and open. The resulting census portal (census2016.geohive.ie) allows data to be viewed, accessed, and downloaded in map form across 31 administrative counties, 95 municipal districts, 3,409 electoral divisions, and 18,641 small areas. Datasets include globally unique identifiers (GUIDs) to connect statistics and geography, which is a necessary step for using standard common IDs for spatial data in Ireland.

Using the Census 2016 portal, anyone can explore Ireland’s latest census data by theme, combine multiple data layers to create maps, embed maps in other applications, and download data or connect to it via a series of open-standard application programming interfaces (APIs).

Four months later, in November 2017, CSO and OSi launched another micro-hive, this time for sustainable development statistics. The Ireland SDG portal (http://irelandsdg.geohive.ie) data specifically aligns to the UN’s 17 development goals, 169 targets, and 230 indicators. The SDG portal incorporates Census 2016 variables from CSO as well as more than 100 spatial datasets ranging from biodiversity to traffic accidents. The portal provides over fifty indicators relating to Ireland’s progress toward SDGs. Users can see very specific information such as the total unemployed females in each electoral ward.

As an extension to the two portals, the joint team created a series of ArcGIS Online story maps to highlight key issues indicated by the CSO Census 2016 data and other open-data sources. Its first story map addresses climate change and unemployment issues and brings together data, interactive maps, images, and narratives to tell the story behind the statistics. People don’t need technical skills to use the story map. They simply access the map in their browsers and zoom to an area of interest to see how the issue affects that location.

Benefits

Improved ability to inform government policy decisions

By making it easier for policymakers, researchers, and government officials to visualize statistical information, the Census 2016 portal and Ireland SDG portal will play key roles in supporting government decision-making. Story maps will be particularly helpful in highlighting critical issues in society. For example, one recently completed story map, based on Census 2016 data, shows that 40 percent of children in Ireland live in rented accommodations and are therefore at risk of poverty and homelessness if rental prices increase. Story maps open issues for discussion and help to inform government policy.

Better information to encourage investment in Ireland’s economy

The Irish agency responsible for attracting foreign investment to Ireland—the Industrial Development Authority (IDA)—uses the Census 2016 portal to identify the best locations for local and foreign business investments. For instance, the agency can map potential areas meeting the criteria for graduates, skilled labor force, and transportation links and use the maps to attract investors.

Easy access to transparent, meaningful data for all citizens

For the first time, anyone can access Ireland’s 2016 census data in a geospatial format that is easy to understand and use. This improves public-sector transparency because any citizen can see the data on which government policies are determined. In addition, not-for-profit organizations can use the Census 2016 portal to see, for example, where there are high levels of unemployment. They can then better allocate their resources. They can also map areas with the greatest need and use them to lobby the Irish government to increase its support.

A powerful way to engage citizens in important issues

Story maps that link to the UN’s SDGs, CSO, and OSi help the Irish government raise awareness of important issues that affect the country, such as the need to protect biodiversity and preserve water quality.

A cost-effective mechanism for meeting UN reporting requirements

Significantly, Ireland’s new SDG portal will support the Irish government by making it easier for the government to meet the UN’s SDG reporting obligations. Prior to the launch of the portal, there was no single repository for all the data that the Irish government would need to find and analyze to produce the reports. Now, government working groups responsible for UN reporting can more easily find the pertinent data without having to duplicate effort or waste time manipulating data. As a result, the agencies produce reports quickly, which will reduce costs by saving time.

The Irish government cites the Ireland SDG portal as a best-practice example of how public-sector organizations can share and use data. This country-owned, country-led project is a new strategy for the future development of Ireland’s public service. It is opening the way for policies that envision shared data across government sectors that facilitate easier access to services, better service delivery, and better decision-making, and promises to drive government efficiency.

Notes

  1.See the US Census Bureau report New Technologies in Census Geographic Listing—Select Topics in International Censuses. Available at https://www.census.gov/content/dam/Census/library/working-papers/2015/demo/new-tech-census-geo.pdf.

  2.See the United Nations Economic Commission for Europe (UNECE) report Conference of European Statisticians—Recommendations for the 2020 Censuses of Population and Housing. Available at https://www.unece.org/fileadmin/DAM/stats/publications/2015/ECECES41_EN.pdf.

  3.See the ArcGIS Desktop Help article “What is Data Reviewer?” at http://desktop.arcgis.com/en/arcmap/latest/extensions/data-reviewer/what-is-data-reviewer.htm and the ArcGIS Data Reviewer documentation at http://www.esri.com/software/arcgis/extensions/arcgis-data-reviewer.

  4.See Statistics Canada note. Retrieved from https://www12.statcan.gc.ca/census-recensement/2016/geo/ADA/adainfo-eng.cfm.

  5.See the document Proposal for a Common Statistical-Geospatial Terminology Database published by the UN-GGIM EG-ISGI. Available at http://ggim.un.org/meetings/2015-2nd_Mtg_EG-ISGI-Portugal/documents/UN-GGIM%20EG Lisbon%20meeting%20session%204%20background paper%20terminology.pdf.

  6.See Jean-Michel Durr’s The 2010 Round of Population and Housing Censuses in the World. Available at http://jmstat.com/publications/SINAPE%202010.pdf.

  7.See additional details in the 2017 UN Principles and Recommendations publication.

  8.See the book Authentic Electronic Records: Strategies for Long-Term Access from Charles M. Dollar.

  9.See the book System Interoperability: The Reliability Information Analysis Center (RIAC) Guide from Chonchang Lee and Joseph Hazeltine.

10.See Guide to Data Integration for Official Statistics: Introduction. Available at https://statswiki.unece.org/display/DI/Introduction.

11.See ArcGIS: An Open Platform for Innovation. Available at http://www.esri.com/~/media/00C24660087A4EFB9F069148017EABD4.pdf.