Thus far, this book has been devoted to vector feature classes (points, lines, and polygons), except for displaying the raster basemaps available in ArcGIS Pro, plus an occasional raster map layer. Vector feature classes are useful for discrete features (such as streetlight locations, streets, and city boundaries). Raster map layers are best suited for continuous features (for example, satellite images of Earth, topography, and precipitation). Raster map layers also can be used when you want to display large numbers of vector features (for example, all blocks in a city, all counties in the United States). In some cases, you have so many vector features that the features of choropleth and other maps are too small to render clearly. In such cases, as you will see in this chapter, you can transform vector feature classes into raster datasets that you can easily visualize. This chapter first presents some background on raster maps before discussing exactly what you’ll be doing next.
Raster dataset is the generic name for a cell-based map layer stored on a disk in a raster data format. Esri supports more than 70 raster dataset formats, including familiar image formats such as TIFF and JPEG, as well as GIS-specific formats such as Esri Grid. You can import raster datasets into file geodatabases.
All raster datasets are arrays of cells (or pixels)—each with a value and location, and rendered with a color when they are mapped. Of course, just like any other digital image, the pixels are so small at intended viewing scales that they are not individually distinguishable. The coordinates for a raster dataset are the same kind as used for vector maps (chapter 5).
All raster datasets have at least one band of values. A band is comparable to an attribute for vector map layers but stores the values of a single attribute in an array. The values can be positive or negative integers or floating-point numbers. You can use integer values for categories (codes), which must have a layer file with descriptions and colors (for example, 1 = Agriculture, brown; 2 = Forest, green). Raster dataset values can also be floating-point numbers representing magnitudes (for example, temperature and slope steepness of terrain).
Color capture and representation in raster datasets is an important topic. Color in the visible range is captured by satellites in three bands (for example, red, blue, and green) that mix to produce any color. Color in many raster datasets, however, is often represented in one band using a color map, in which each color is given a code (integer value). Color depth is the term given to the number of bits for code length (on/off switches in data storage) used to represent colors. So-called “true color” uses 24 bits per pixel and can represent more than 16 million colors (the human eye can distinguish about 10 million colors).
The spatial resolution of a raster dataset is the length of one side of a square pixel. So if a pixel is 1 meter on a side, it has 1 meter spatial resolution (which is a high resolution leading to high-quality maps). The US Geological Survey provides imagery for urban areas in the United States at this resolution or higher with images capable of being zoomed far into small parts of neighborhoods (for example, with driveways, swimming pools, and tennis courts being clearly shown when displayed). The current Landsat 7 and 8 satellites that together image the entire Earth every eight days have a resolution of 30 meters for most of their eight bands, which is good for viewing areas as small as neighborhoods of a city.
File sizes for raster datasets can be very large, requiring large amounts of disk space for storage, and also taking potentially relatively long times to process and display on a computer screen. Raster GIS uses several mechanisms to reduce storage and processing time, including data compression and pyramids. Pyramids provide additional raster layers with larger spatial resolutions for zoomed-out viewing that take less time to display than the original layer. A mosaic dataset is a data catalog for storing, managing, viewing, and querying collections of raster datasets, often forming a continuous map when viewed. Although such a mosaic dataset is viewed as a single mosaicked image, you also have access to each dataset in the collection. A mosaic dataset also can store raster datasets of the same area for different times, with viewing of any time period’s map and comparing different time periods.
The ArcGIS Pro project that you will open has single-band raster datasets for land use and elevation, downloaded from the US Geological Survey’s website. You’ll extract raster datasets for Pittsburgh from each original dataset that has extents larger than Pittsburgh’s. Because raster datasets are rectangular, you’ll display layers using Pittsburgh’s boundary as a mask: pixels in Pittsburgh’s rectangular extent but outside the city’s boundary will still exist but will be given no color, while those within Pittsburgh will have assigned colors. Finally, you’ll use the elevation layer to produce a hillshade layer, which is a shaded relief rendering of topography created by using an artificial sun to add illumination and shadows.
Raster datasets have considerable metadata that you can read as properties.
Review the properties of LandUse_Pgh. Notable are that the raster format is TIFF, a common image format. Its resolution is 30 meters, with a single band and a color map (which is available in a separate layer file that you’ll use in an exercise that follows). It has a projection for the lower 48 states (Albers Equal Area) that distorts direction (which explains why it’s tilted).
Next, you’ll import LandUse_Pgh.tif into the Chapter10.gdb file geodatabase.
Environmental settings affect how geoprocessing is carried out by tools in the current project. You’ll set the cell size of raster datasets you create to 50 feet, and you’ll use Pittsburgh’s boundary as the default mask.
Next, you’ll use the Extract By Mask tool to extract LandUse_Pittsburgh from LandUse_Pgh. The resulting raster dataset will have the same extent as Pittsburgh and therefore be a much smaller file to store than the original.
Extract NED_Pittsburgh from NED using the Extract By Mask tool. In the Geoprocessing pane, for Extract by Mask, click Environments and then the Select coordinate system button on the right of Output Coordinate System. Expand Projected coordinate system > State Plane, NAD 1983 (US Feet), select the projection for Pennsylvania South, and click OK. Then on the Parameters tab, select NED as input raster, Pittsburgh as the mask, and NED_Pittsburgh as the output raster, and run the tool. After creating NED_Pittsburgh, remove NED from the map and delete it from Chapter10.gdb.
Hillshade provides a way to visualize elevation. The Hillshade tool simulates illumination of the earth’s elevation surface (the NED raster layer) using a hypothetical light source representing the sun. Two parameters of this function are the altitude (vertical angle) of the light source above the surface’s horizon in degrees and its azimuth (east–west angular direction) relative to true north. The effect of hillshade to elevation is striking because of light and shadow. You can enhance the display of another raster layer, such as land use, by making land use partially transparent and placing hillshade beneath it. You’ll use the default values of the Hillshade tool for azimuth and altitude. The sun for your map will be in the west (315°) at an elevation of 45° above the northern horizon.
The default symbolization of hillshade can be improved upon, as you’ll do next.
Next, you’ll make LandUse_Pittsburgh partially transparent and place Hillshade_Pittsburgh beneath it to give land use shaded relief.
Another way to visualize elevation data—elevation contours—is with lines of constant elevation, as commonly seen on topographic maps. For Pittsburgh, the minimum elevation is 215.2 feet and the maximum is 414.4 feet, with about a 200-foot difference. If you specify 20-foot contours, starting at 220 feet, there will be about 10 contours. Note that the output contours are vector line data (and not polygon data).
Kernel density smoothing (KDS) is a widely used method in statistics for smoothing data spatially. The input is a vector point layer, often centroids of polygons for population data or point locations of individual demands for goods or services. KDS distributes the attribute of interest of each point continuously and spatially, turning it into a density (or “heat”) map. For population, the density is, for example, persons per square mile.
KDS accomplishes smoothing by placing a kernel, a bell-shaped surface with surface area 1, over each point. If there is population, N, at a point, the kernel is multiplied by N so that its total area is N. Then all kernels are summed to produce a smoothed surface, a raster dataset.
The key parameter of KDS is its search radius, which corresponds to the radius of the kernel’s footprint. If the search radius is chosen to be small, you get highly peaked “mountains” for density. If you choose a large search radius, you get gentle, rolling hills. If the chosen search radius is too small (for example, smaller than the radius of a circle that fits inside most polygons that generate the points), you will get a small bump for each polygon, which does not amount to a smoothed surface.
Unfortunately, there are no really good guidelines on how to choose a search radius, but sometimes you can use a behavioral theory or craft your own guideline for a case at hand. For example, crime hot spots (areas of high crime concentrations) often run the length of the main street through a commercial corridor and extend one block on either side. In that case, we use a search radius of one city block’s length.
The map in this tutorial has the number of myocardial infarctions (heart attacks) outside of hospitals (OHCA) during a five-year period by city block centroid. One of the authors of this book studied this data to identify public locations for defibrillators, devices that deliver an electrical shock to revive heart attack victims. One location criteria was that the devices be in or near commercial areas. Therefore, the commercial area buffers are commercially zoned areas, plus about two blocks (600 feet) of surrounding areas.
KDS is an ideal method for estimating the demand surface for a service or good, because its data smoothing represents the uncertainty in locations for future demand, relative to historical demand. Also in this case, heart attacks, of course, do not occur at block centroids, so KDS distributes heart attack data across a wider area.
Blocks in Pittsburgh average about 300 feet per side in length. Suppose that health care analysts estimate that a defibrillator with public access can be identified for residents and retrieved for use as far away as 2½ blocks from the location of a heart attack victim. They thus recommend looking at areas that are five blocks by five blocks in size (total of 25 blocks or 0.08 square miles), 1,500 feet on a side, with defibrillators located in the center. With this estimate in mind, you’ll use a 1,500-foot search radius to include data within reach of a defibrillator, plus data beyond reach to strengthen estimates.
The objective is to determine whether Pittsburgh has areas that are roughly 25 blocks in area and, as specified by policy makers, have an average of about five or more heart attacks a year outside of hospitals.
Assuming that target areas will be around a tenth of a square mile in area, suppose policy makers decide on a threshold of 250 or more heart attacks per square mile in five years (or 50 per square mile per year). Next, you’ll create vector contours from the smoothed surface to represent this policy.
To estimate annual heart attacks, you can select OHCA centroids within each threshold area, sum the corresponding number of heart attacks, and divide by 5 since OHCA is a five-year sample of actual heart attacks. You will use the Threshold boundaries in a selection by location query, in which case the Threshold layer must be polygons. However, if you examine the properties of the Threshold layer, you’ll see that it has the line vector type and not polygon type, even though all seven areas look like polygons. The tool you ran to create Thresholds creates lines because some peak areas of an input raster could overlap with the border of the mask—Pittsburgh, in this case. The lines for such cases would not be closed but left open at the border. Nevertheless, ArcGIS Pro has a tool to create polygons from lines that you’ll use next.
Next, you will find the ThresholdAreas polygon that has no OHCA points and is not included in the previous table. The polygon (pink fill color) is about a city block in size and is predicted to be the peak density area on the basis of contributions from the kernels of nearby OHCA points, but it has no points itself! Given the polygon’s small size, its 28 nearby heart attacks in the five-year sample, and its distance within five blocks of the peak area, perhaps the polygon also warrants consideration as a defibrillator site.
In this tutorial, you learn more about creating and processing raster map layers. You also are introduced to ArcGIS Pro’s ModelBuilder to build models. A model, also known as a macro, is a computer script that you create without writing computer code (a script runs a series of tools). Instead of writing computer code, you drag tools to the model’s canvas (editor interface) and connect them in a workflow. Then ArcGIS Pro writes the script in the Python scripting language. Ultimately, you can run your model just as you would any other tool.
The model that you will build in this tutorial calculates an index for identifying poverty areas of a city by combining raster maps for these poverty indicators:
Low income alone is not sufficient to identify poverty areas, because some low-income persons have supplemental funds or services from government programs or relatives, and so rise above the poverty level. Female-headed households with children are among the poorest of the poor, so these populations must be represented when you consider poverty areas. Likewise, populations with low educational attainment and/or low employment levels can help identify poverty areas.
Dawes1 provides a simple method for combining such indicator measures into an overall index. If you have a reasonable theory that several variables are predictive of a dependent variable of interest (whether the dependent variable is observable or not), Dawes contends that you can proceed by removing scale from each input and then average the scaled inputs to create a predictive index. A good way to remove scale from a variable is to calculate z-scores, subtracting the mean and then dividing by the standard deviation of each variable. Each standardized variable has a mean of zero and standard deviation of one (and therefore no scale).
Table 10-1 for Pittsburgh block groups shows that if you simply averaged the four variables, the variable female-headed households would have a small weight, given its mean of only 36.1, while the means of the other three variables are all higher than 100. Z-scores level the playing field so that all variables have an equal role.
Table 10-1
The workflow to create the poverty index has three steps:
Experts/stakeholders in the policy area using the raster index can judgmentally give more weight to some variables than others if they choose. They must only use nonnegative weights that sum to one (1). So if judgmental weights for four input variables are 0.7, 0.1, 0.1, and 0.1, then the first variable is seven times more important than each of the other three variables. With different stakeholders possibly having different preferences, having a macro would allow you to repeatedly run the macro with different sets of weights for the multiple-step process for creating an index raster layer. For example, some policy makers (educators and grant-making foundations) may want to emphasize unemployment or education and give those inputs more weight than others, whereas other stakeholders (human services professionals) may want to heavily weight female-headed households.
Standardizing the input variables need only be done once, so you’ll do that step manually, but then you’ll complete parts 2 and 3 of the workflow using a ModelBuilder model for creating indexes.
All input variables must come from the same point layer (block group centroids, in this case) as input to build the index so that the data standardization and averaging process is valid. You’ll use a 3,000-foot buffer of the study region (Pittsburgh) for two purposes.
First, KDS uses the northernmost, easternmost, southernmost, and westernmost points of its input point layer to define its extent. If the inputs are polygon centroids in a study region, the corresponding KDS raster map will be cut off and not quite cover the study region. The block group centroids added by the buffer yield KDS rasters that extend a bit beyond Pittsburgh’s border, but the Pittsburgh mask will show only the portion within Pittsburgh.
Second, in applying KDS, the buffer eliminates the boundary problem for estimation caused by abruptly ending data at the city’s edge. KDS estimates benefit from the additional data provided by the buffer beyond the city’s edge.
PittsburghBlkGrps already has three out of four input attributes standardized and ready to use in the poverty index you’ll compute, but FHHChld has not yet been standardized. For practice purposes, you will standardize this attribute next.
You’ll set the cell size of rasters you create to 50 feet, and you’ll use Pittsburgh’s boundary as a mask.
A toolbox is just a container for models (macros). When your project was created, ArcGIS Pro built a toolbox, named Chapter10.tbx, which is where your model will be saved.
ModelBuilder has a drag-and-drop environment: you’ll search for tools, and when you find them, you’ll drag them to your model, open their input/output/parameter forms, and fill the forms out.
Configure three remaining Kernel Density processes, each with PittsburghBlkGrps as the input, a cell size of 50, and a search radius of 3,000, and with population fields ZNoHighSch, ZMaleUnem, and ZPoverty. Resize model elements to improve readability. The numbers of the input block groups do not have to match those in the figure. ModelBuilder just needs the names of all model elements to be unique. Also, your model may not have separate inputs for each KDS process, but instead may have the same PittsburghBlkGrps input to more than one KDS process. Resize model objects to make them readable and well aligned. Review all four KDS processes to make sure that they have correct z-score variable inputs and a 3,000-foot search radius. Save your model.
You can run the entire model by clicking its Run button, which you will do next. When you build a model, however, you can run each process one at a time by right-clicking processes and clicking Run, which gives you an ability to isolate errors and fix them. When you finish, you’ll run the model in the Geoprocessing pane.
Although vector layers should have a maximum of about seven to nine categories for symbolization of numeric attributes (to avoid a cluttered map and allow easy interpretation using a legend), symbolization of raster layers should have many more categories to represent continuous surfaces. You’ll use standard deviations to create categories and the maximum number of categories with that method. Finally, you’ll save symbolization as a layer file for automatic use whenever the model creates its output, PovertyIndex, with whatever weights the user chooses.
One objective for the PovertyIndex model is to allow users to change the poverty index’s weights. To accomplish this change, you must create variables that will store the weights that the user inputs, and then designate each variable as a parameter input by the user. ModelBuilder automatically creates a user interface for your model (just like the interface for any tool). The user can enter weights for your model in the interface as an alternative to the default equal weights.
Add three more variant variables, all with value 0.25, and named NoHighSchWeight, MaleUnemWeight, and PovertyWeight. Make each variable a parameter. Save your model.
In this exercise, you will transfer the weights stored in variables (values that will be input by the user) to parameters of the Raster Calculator tool. The mechanism is called “in-line substitution” because the variables’ values are substituted for the model’s parameter values.
You’ll use the layer file you created earlier for the poverty index for this task. To do so, you must make the PovertyIndex output of the Raster Calculator process a parameter in the model.
Making PovertyIndex a parameter also adds it to the Contents pane for display in your map when you run the model as a tool from the Geoprocessing pane.
Congratulations, your model is ready to use.
Remove PovertyIndex from the Contents pane, and try running your model with different weights (nonnegative and sum to one), such as 0.7, 0.1, 0.1, and 0.1. See how the output changes a lot. Note that for a class project or work for a client, you should symbolize PovertyIndex manually to get the best color scheme and categories for a final model, depending on the distribution of densities produced. To do this symbolization, you must go to the model properties and delete the layer file. Otherwise, ArcGIS Pro will not let you change symbolization. When you finish, save your project and close ArcGIS Pro.
This chapter has two assignments to complete that you can download from this book’s resource web page, esri.com/gist1arcgispro: