Chapter 4

4.1 Introduction

This chapter covers the fundamentals of the Python language, including the use of data types and data structures, variables, keywords, statements and expressions, methods, functions, and modules. These elements make up the basic syntax for writing code in Python. The second part of the chapter deals with controlling workflow, including the use of conditional statements, branching, and looping. These structures are an important element of geoprocessing scripts for ArcGIS Pro and are fundamental to batch operations. The third part of the chapter is dedicated to best practices in writing scripts, including naming conventions, code organization, and providing comments.

4.2 Locating Python documentation and resources

Before getting into Python syntax, it is useful to review where to look for Python documentation. If you installed Python as part of a typical ArcGIS Pro installation, you can find links to the Python manuals from within IDLE. Start IDLE, and from the upper menu, click Help > Python Docs. The Python manuals include the complete documentation for the version of Python that is installed on your computer.

The same Python documentation can also be found online at http://docs.python.org. You will find multiple versions of the documentation for the various Python versions, including 2.7, 3.6, 3.7, 3.8, and so on. The versions available will change over time as updates are released. You can also download the complete documentation in PDF, HTML, and text formats.

The Python documentation is extensive. The PDF version of the Python Library Reference, for example, is over 2,000 pages. This can be intimidating if you are just getting started in Python. The documentation, however, is not meant to be read cover to cover. Instead, the documentation is typically used to selectively look up the syntax for specific tasks. This chapter introduces you to the fundamentals of Python, which will give you the basic terminology needed to use the documentation more effectively.

The Python website at http://www.python.org also contains many additional resources for learning Python, including a “Beginner’s Guide to Python” at http://wiki.python.org/moin/BeginnersGuide and a set of introductory tutorials at http://wiki.python.org/moin/BeginnersGuide/NonProgrammers. You will quickly realize that there is a wealth of resources on Python, created by a large and active user community.

4.3 Working with data types and structures

Python uses several different data types, including strings, numbers, Booleans, lists, tuples, dictionaries, and more. The data type of an object determines what type of values it can have and what operations can be performed on the object. String values consist of characters, which can include letters, numbers, or other types of characters. There are two numeric data types: integers (whole numbers) and floats, or floating-point numbers (fractional numbers). Booleans can contain only the values True or False. Lists, tuples, and dictionaries are more complex data types that consist of a collection of data elements.

Some data types can contain more than a single value, and these are referred to as data structures. A data structure is a collection of elements that are structured in some meaningful way—for example, elements that are numbered sequentially. The most basic data structure in Python is the sequence, in which each element is assigned a number, or index. Strings, lists, and tuples are examples of sequences. Because sequences share the same inherent data structure, there are certain things you can do with any type of sequence; examples of working with sequences appear later in this chapter.

Strings, numbers, and tuples are immutable, which means you can’t modify them but only replace them with new values. Lists and dictionaries are mutable, which means the data elements can be modified. The following sections illustrate these concepts.

4.4 Working with numbers

Numbers can be integers or floats. Integers are whole numbers—that is, numbers that have no fractional part, meaning no decimals, for example, 1 and −34. Floats, or floating-point numbers, are numbers that have a decimal point, for example, 1.0 and −34.8307. Although both integers and floats are numeric data types, they act differently depending on Python syntax, so the distinction is important.

Consider a simple example:

>>> 3 - 8
-5

The result is the value of −5. Simple enough. In Python terminology, the numbers 3 and 8 are called operands, and the minus sign () is called an operator. There are many different types of operators—in this section, the focus is on arithmetic operators.

Another basic example of an arithmetic operator:

>>> 5 + 12
17

Next, consider the use of division in Python:

>>> 7 / 2
3.5

The code results in 3.5. Even though the numerator and denominator are integers, the result is reported as a float to capture the fractional part. This is referred to as true division, or floating-point division.

Another type of division that results in an integer is called floor division and uses the // symbol. You simply type the symbol for division (/) twice. Consider the example:

>>> 7 // 2
3

The result is 3. The result from the division is truncated (not rounded) to zero decimal places. Floor division can be a bit tricky when working with negative numbers. In this case, the round operator rounds “up” to the nearest integer, but mathematically speaking, it is rounding down because the result is negative. Consider the example:

>>> -7 // 2
-4

The result of −4 may appear somewhat counterintuitive at first, so be careful with floor division. Also note that floor division does not always return a result of type integer, even though the value is a whole integer. When either the numerator or denominator is a float, the actual return value will also be a float, but it will still round to the nearest integer.

>>> -7.0 // 2
-4.0

How about the next calculation?

>>> 17 % 4
1

The % operator here stands for modulus, or the remainder after the division. So, 17 % 4 evaluates the remainder of 17 divided by 4, which results in 1.

What if the calculation includes a float?

>>> 17.0 % 4
1.0

The result is a float, which is consistent with the earlier examples for division.

Next, consider multiplication in Python:

>>> 3 * 8
24

And what if a float is used?

>>> 3.0 * 8
24.0

No surprise here.

Finally, there is the use of the exponent operator that uses the ** symbol. You simply type the symbol for multiplication (*) twice. Consider the following example:

>>> 8 ** 3
512

And again, the use of a float will influence the result:

>>> 8.0 ** 3
512.0

The basic arithmetic operators for integers and floats are summarized in table 4.1.

Table 4.1. Common operators for integers and floats in Python
Operator Description Integer Floating point
Example Result Example Result
- Subtraction 9 – 2 7 9 – 2.0 7.0
+ Addition 9 + 2 11 9 + 2.0 11.0
/ Division 9 / 2 4.5 9 / 2.0 4.5
// Floor division 9 // 2 4 9 // 2.0 4.0
% Modulus 9 % 2 1 9 % 2.0 1.0
* Multiplication 9 * 2 18 9 * 2.0 18.0
** Exponent 9 ** 2 81 9 ** 2.0 81.0

The following sections explore the use of additional operators.

4.5 Working with variables and naming

Python scripts use variables to store information. A variable is basically a name that represents or refers to a value. You can also consider it a container that stores a value. For example, you may want to use a variable x to represent the number 17. Here is how you do it in Python:

>>> x = 17

In this assignment statement, you assign the value of 17 to variable x. Once you assign a value to a variable, you can use the variable in an expression. For example:

>>> x = 17
>>> x * 2
34

Note that you must assign a value to a variable before you can use it. The line of code x * 2 requires that it be preceded by the line of code x = 17, which assigns the value of 17 to x.

What if you use a variable before it is assigned a value? You will get an error. Consider the following example where the variable y is not used in earlier parts of your code:

>>> y * 2

This code will result in an error message, which will include the following:

NameError: name 'y' is not defined

Always make sure your variables are assigned a value before you use them in an expression with an operator.

A brief note about other programming languages is in order. If you are familiar with languages such as C++ or Java, you will have learned to first declare a variable and define its type (e.g., string, numeric, and others) before you can assign it a value. Python does not require you to declare it, and you can directly assign it a value. If you have not used other programming languages, the use of variables in Python is intuitive. It is also efficient and results in fewer lines of code.

How does Python know what type your variable is? The type is implicit in the value you assign it. For instance, x = 17 means that x is an integer, whereas x = 17.629 means that x is a float, and x = "GIS" means that x is a string. This type of assignment is known as dynamic assignment, sometimes referred to as “dynamic typing.”

You can change the type of a variable by assigning it a new value. For example:

>>> a = 0.123
>>> 2 * a
0.246
>>> a = 123
>>> 2 * a
246

Initially, the variable a is assigned the value of 0.123 and is therefore a float. Later in the code, the same variable is assigned the value of 123 and becomes an integer instead. This example illustrates the consequences of dynamic assignment in Python.

Multiple variables can be assigned on the same line, which keeps your scripts compact. For example:

is the same as:

>>> x = 1
>>> y = 2
>>> z = 3

Sometimes you must use a variable in your script, but you may not know in advance what the value is going to be. In such cases, you can create the variable without assigning it a value. There are two options to do so. The first option creates an empty string:

k = ""

Note that there is no space between the quotes.

The second option creates a variable without a type:

p = None

Technically, the type of this variable is NoneType if you checked the data type using the type() function, but it simply means the variable has no value.

When you write your code, you get to decide on your own variable names. You have quite a bit of flexibility, but there are some basic rules for naming variables:

In addition to the rules for variables that are technically correct, here are some guidelines for creating good variable names:

Naming Python scripts follows these same basic rules and guidelines.

Following are some examples of good variable names:

The following are examples of variable names that will produce an error:

Finally, the following are examples of variable names that will not produce an error but that are not encouraged:

The first of these examples, dataSize, employs a style know as camel case, also written as camelCase. In this style, variable names consist of the combination of two or more words without spaces or underscores, where the first word is written in all lowercase and the rest of the words start with an uppercase letter. This style is popular, but the Style Guide for Python Code discourages its use. If you want to strictly follow the official style, camel case should not be used. However, camel case is the most typical style used for variable names in other programming languages such as C++ and Java. Therefore, programmers who use multiple languages often also employ this style in Python because it does not cause any syntax issues. This group includes many Esri programmers who write Python.

Better alternatives that follow the style guide would be datasize, or data_size to improve legibility. This latter example is called snake case, also referred to as snake_case, in which words are separated with one underscore and no spaces.

Regardless of which style is used, the most important aspect for naming conventions is consistency. Therefore, if you employ a certain style, be consistent throughout your code. This applies not only to variables but to other code elements as well as script names themselves. Consistency makes it easier to recognize code elements.

4.6 Writing statements and expressions

Once you have variables to work with, you can start writing Python expressions and statements.

An expression is a value. For example, 2 * 17 is an expression, representing the value of 34. Simple expressions are built from literal values (such as 17) by using operators and functions. Section 4.10 covers functions. More complicated expressions can be created by combining simpler expressions. Expressions can also contain variables.

A statement, on the other hand, is an instruction that tells the computer to do something. This can consist of changing the value of variables through assignment, printing values to the screen, importing modules, and many other operations.

The difference between expressions and statements is a bit subtle but important. Consider the following example:

>>> 2 * 17
34

Here, 2 * 17 is an expression. It has a value (34), which is automatically printed to the screen of the interactive Python interpreter. Now consider the following:

>>> x = 2 * 17

Here, x = 2 * 17 is a statement. The variable x is assigned a value, but the statement itself is not a value. A defining property of statements is that they do something. A statement itself is not a value, and hence, in the preceding example, the interactive Python interpreter does not print a value—you need to use the print() function for that:

>>> x = 2 * 17
>>> print(x)
34

Statements that assign values to variables are among the most important statements in any programming language. At first glance, assignment statements may appear to serve only as temporary containers for values, but the real power lies in the fact that you do not need to know what values they hold to manipulate them. As a result, you can write Python scripts that manipulate variables without knowing the values they may eventually hold. For example, you can write code to process a list of numbers and find the largest number in the list, but you don’t need to know the exact list of numbers when you write the code.

4.7 Using strings

Among the most important data types are strings. A set of characters surrounded by quotation marks is called a string literal, or simply a string. You already saw an example of it when you wrote the code print("Hello World") in chapter 2. You can create a string variable by assigning a string literal to it. Consider the following example:

>>> name = "Paul"

The variable name has been assigned the value of Paul, and because the value is in quotation marks, the variable is a string. The quotation marks are required here. For example, the following code will produce an error:

>>> name = Paul

The error message will include the following:

NameError: name 'Paul' is not defined

Therefore, the quotation marks are required to create a string.

You can use single quotation marks (' ') or double quotation marks (" "). In Python, these two styles of quotation marks have the same purpose and are used interchangeably. Quotation marks are like bookends, telling the computer where the string begins and where it ends. Having two ways to indicate bookends provides flexibility. For example, if you use a pair of double quotation marks to surround your text, you can use as many single quotation marks inside the string as you want, and vice versa. For example:

>>> print("I said: 'Let's go!'")

Using single quotation marks around this string would be confusing. It would also generate a syntax error because the end of the string would not be read correctly.

>>> print('I said: "Let's go!"')
SyntaxError: invalid syntax

Even though the two types of quotation marks can be used interchangeably, they must be used in pairs. Either of the two quotation marks can be used as an opening quote, but only the same type of quotation mark can be used as a closing quote. For example:

>>> name = "Paul'
SyntaxError: EOL while scanning string literal

The result is a specific type of syntax error known as an End Of Line error, or EOL, because no closing quote could be found.

As a reminder, quotation marks in Python are always straight up, for both single (') and double (") quotes. When you type quotation marks in a Python editor, they are automatically formatted correctly. When you type quotation marks in a word processing application, they are often automatically formatted to curly or slanted quotation marks, for example, “new text.” These quotes are typographically correct when writing text, but copying these quotation marks to Python will produce errors:

x = “new text”
SyntaxError: invalid character in identifier

Strings are used frequently in geoprocessing scripts, especially when defining tool parameters. For example, paths or partial paths are used to define the inputs and outputs of a tool, and these paths are stored as strings. Therefore, string operators are important in working with datasets in different workspaces.

Consider a few things you can do with strings. For example, you can concatenate strings by simply “adding” them up:

>>> x = "G"
>>> y = "I"
>>> z = "S"
>>> print(x + y + z)
GIS

It may appear counterintuitive to use arithmetic operators on strings. However, when you consider that strings are simply sequences of characters, it makes sense to combine the existing sequences into a new one. And the easiest way to combine things is with the plus (+) operator.

When combining strings to form a new string, you may want to add spacing in between by using double quotation marks around a space (" "). For example:

>>> x = "Geographic"
>>> y = "Information"
>>> z = "Systems"
>>> print(x + " " + y + " " + z)
Geographic Information Systems

Strings can contain numeric characters. However, when trying to combine strings and numbers, numbers first must be converted to a string. Consider the following example:

>>> temp = 32
>>> print("The temperature is " + temp + " degrees")

The result is an error message because you cannot add a string and a number together:

TypeError: must be str, not int

You can use the str() function to convert the number to a string. The correct code is as follows:

>>> temp = 100
>>> print("The temperature is " + str(temp) + " degrees")
The temperature is 100 degrees

In this example, str() is an example of a function. Converting the value of the variable from one type to another is known as casting. The preceding example code is another example of string formatting using the + operator to concatenate strings. Section 4.13 covers alternative ways to format strings.

In Python, strings are, by default, Unicode strings. In general, you can think of strings as “plain text.” Text is stored as particular characters, and different languages use different sets of characters. This diversity can cause problems across different computer platforms, and characters may not display correctly. You may have encountered this situation when trying to read web pages or e-mails in different languages. To overcome these problems, the Unicode system is designed to represent every character from every language. This contrasts with other types of strings—for example, ASCII (American Standard Code Information Interchange) is limited to 256 different characters. When you think of all the languages of the world (e.g., Chinese, French, German, Russian, Thai, and so on), 256 different characters is clearly not enough.

ASCII and Unicode are both encoding systems—standard schemes for representing real-world characters into bytes used by computers. ASCII uses only a single byte, which is where the 256 different characters come from (i.e., 1 byte equals 8 bits, and 2 (for binary) raised to the power of 8 equals 256). The most widely used standard for Unicode is UTF-8 (8-bit Unicode Transformation Format). This standard uses from 1 to 8 bytes. It is backward compatible with ASCII, but it provides support for all the major languages in the world. UTF-8 can contain around 1.1 million different character codes, although presently only about 10 percent are assigned to real-world characters. Most of the world’s web pages in use today use UTF-8.

Because Python uses Unicode strings, you can work with characters from many different languages. You can write specific characters in Unicode by using the \u escape sequence. For example, Unicode 0394 is used for the Greek capital-letter delta:

>>> "\u0394"
'Δ'

As another example, Unicode 00E7 is used for the Latin small letter c with cedilla:

>>> "\u00E7"
'ç'

Finally, Unicode 0B90 is used for the Tamil letter U:

>>> "\u0B89"
'உ'

There is no need to remember any of these codes. The point is that characters from all major languages in the world are supported in Python. And because Python employs the widely used UTF-8 standard, there are numerous online utilities to look up the codes for any given character.

By default, strings in Python are Unicode strings. Other forms of encoding can be used in Python, but it must be specified in the comments at the start of the script.

4.8 Using lists

Another important data type in Python is lists. A list consists of a sequence of items surrounded by brackets ([ ]), sometimes referred to as “square brackets,” and the items are separated by commas. The items themselves can consist of numbers, strings, or several other data types, including lists. All the items in a single list are typically the same data type, although it is not required. Lists are used frequently in geoprocessing scripts. For example, you may want to create a list of all the feature classes in a workspace and perform an operation on all the feature classes in the list.

One way to create a list is to type all the items. The following example creates a list of numbers:

>>> mylist = [1, 2, 4, 8, 16, 32, 64]

Notice that the items are separated by commas, followed by a space. The space is not required, but it makes the list easier to read.

List items are not limited to numbers but can also consist of strings:

>>> mywords = ["jpg", "bmp", "tif", "img"]

You can print the contents of the list using the print() function as follows:

>>> print(mywords)
["jpg", "bmp", "tif", "img"]

Notice that the items retain their original order when printed. A list is an ordered sequence of items. Section 4.15 discusses how to manipulate lists, including the order of the items.

Generally, the elements of a list are all the same type, like the examples shown. However, the elements can be of varying types, as in the following example:

>>> newlist = [1, 2, 4, 8, "jpg", "bmp"]

Although this format is allowed, there are few scenarios where you would want to use such a list.

4.9 Working with Python objects

Now that you have seen several different data types in Python (i.e., numbers, strings, and lists), it is important to revisit the concept of Python being an object-oriented programming language and everything in Python being an object. Each object has the following: (1) a value; (2) an identity, which is a unique identifier; and (3) a type, which is the kind of values an object can hold.

Consider the following example:

>>> name = "Paul"

This statement creates an object (name), and this object has a value (Paul):

>>> name
'Paul'

The object also has a unique identifier, which varies depending on the specific computer being used. You can obtain this unique identifier using the id() function:

>>> id(name)
593835200

The unique identifier is provided by the computer to keep track of the object (and its value and properties), but it is not important to know the actual number being used.

Finally, the object has a type, which can be determined using the type() function:

>>> type(name)
<class 'str'>

Another important concept is that variables in Python are dynamic. Consider the following example:

>>> var1 = 100
>>> type(var1)
<class 'int'>
>>> var2 = 2.0
>>> type(var2)
<class 'float'>

The object type of the variables is determined by the nature of the value assigned to it. This feature is somewhat unique to Python, because in many other programming languages, variables must first be declared (and given a type), which remains fixed.

Object type conversion can be accomplished using casting. Consider the following example:

>>> var = 100
>>> type(var)
<class 'int'>
>>> 2 * var
200

The variable called var is an integer, and you can use it in calculations. Next, the variable is cast as a string:

>>> newvar = str(var)
>>> type(newvar)
<class 'str'>
>>> print(newvar)
100
>>> 2* newvar
'100100'

Once the variable is cast as a string, the value is still 100, but now it is a string, not an integer. When you print the value, it still looks like a number. Once you perform an operation on it—for example, using an arithmetic operator—it becomes clear that the value is no longer a number. In this example, the multiplication operator on the string results in the value being printed twice.

Trying to cast a string such as "Paul" as a number, however, is not logical. For example:

>>> name = "Paul"
>>> newname = int(name)

The result is an error message:

ValueError: invalid literal for int() with base 10: 'Paul'

Casting a number as a string is commonly employed to allow for such tasks as printing messages. The reverse, casting a string as a number, is also widely used. A typical scenario is when a script reads values from a text file or other source that contains a combination of text and numbers. Once you have identified the numeric parts, you can use casting to ensure that the values are read as numbers instead of as strings. Casting an integer as a float, and vice versa, is also common.

4.10 Using functions

Python expressions and statements use variables and functions among other elements. Variables were discussed earlier in this chapter. A function is like a little program that is used to carry out an action. You have already seen several functions, including int(), print(), str(), and type(). This section takes a closer look at what functions are and how to use them.

The general syntax of a function is

<function>(<parameters>)

In this syntax, <function> stands for the name of the function, followed by the parameters of the function in parentheses. Function parameters are also called arguments. And these terms are often used interchangeably. The general syntax can therefore also be written as follows:

<function>(<arguments>)

Most functions require one or more parameters or arguments, which serve as the input for the function. Using a function is referred to as calling the function. When you call a function, you supply it with parameters, and it returns a value. And because it returns a value, a function call is a type of expression.

Consider the print() function you are already familiar with.

>>> name = "Paul"
>>> print(name)
Paul

In this example, the parameter of the print() function is a variable, and this variable has a value. The print() function outputs the value to the interactive interpreter by printing it to the next line of code.

Using the print statement in Python 3 results in a syntax error:

>>> name = "Paul"
>>> print name
SyntaxError: Missing parentheses in call to 'print'

Python includes a set of core functions referred to as the built-in functions, which you can use directly in any statement or expression. The print() , str(), and type() functions are part of these built-in functions. Another example of a built-in function is the power function pow(), as follows:

>>> pow(2,3)
8

The pow() function returns 2 to the power of 3, or the value of 8. In this case, the parameters are 2 and 3, and the function returns a value of 8.

To see the available built-in functions, you can use the dir() function as follows:

>>> dir(__builtins__)

The dir() function returns all the properties and methods of the specified object. In this example, builtins is a module that contains all the built-in functions, and dir(__builtins__) returns a list of those functions. If you first import the module using import builtins, you can also use dir(builtins).

You can also consult the Python documentation to see the available built-in functions. For Python 3.6, the complete list of built-in functions can be viewed here: https://docs.python.org/3.6/library/functions.html.

Python includes several dozen built-in functions—too many to review here in detail, but several of them are used in upcoming examples. Once you identify a function you want to use, you may need to review the description of the function and the syntax. This description is stored in a documentation string, or docstring. You can view the docstring by using the __doc__ property. For example:

>>> print(pow.__doc__)

In this example, the print() function is not required in the interactive Python interpreter, but it provides slightly better formatting of the output.

A more intuitive alternative to printing the docstring is to use the built-in help() function, as follows:

>>> help(pow)

The help() function prints a description of the pow() function:

Help on built-in function pow in module builtins:
pow(x, y, z=None, /)
Equivalent to x**y (with two arguments) or x**y % z (with three arguments)
Some types, such as ints, are able to use a more efficient algorithm when invoked using the three argument form.

The online Python documentation typically provides more detailed information, as shown in the figure.

Explanation of the pow() function in the online Python documentation.

The pow() function has three arguments, which are separated by commas. Two of these arguments are required (x and y), and the third (z) is optional, which is indicated by brackets. The description refers to (**), which is the basic operator in Python for exponentiation.

Several other common built-in functions are listed in table 4.2. This list is not exhaustive, but it contains some of the more widely used functions.

Table 4.2. Common built-in functions in Python
Function Description Example Returns
abs(x) Returns the absolute value of a number abs(-8) 8
float(x) Converts a string or number to a float float("8.0") 10.0
int(x) Converts a string or number to an integer int("8") 8
pow(x,y[,z]) Returns x to the power of y pow(4, 3) 64
round(x[,n]) Rounds off the given float to n digits after the decimal round(2.3648, 2) 2.36
str(x) Returns a string representation of a given object str(10) "10"

Python is not limited to built-in functions, and additional functions can be accessed using modules, as discussed in section 4.20. You can also create your own functions.

4.11 Using methods

Methods are like functions. A method is a function that is coupled to an object—for example, a number, a string, or a list. A method is used as follows:

<object>.<method>(<arguments>)

This syntax looks like calling a function, but now the object is put before the method with a period separating them. The use of this period is called dot notation. Dot notation is widely used in object-oriented programming. It indicates that what comes after the dot belongs to what comes before the dot. In this case, the method belongs to the object. Later sections in this chapter illustrate other uses of dot notation.

Here is a simple example:

>>> topic = "Geographic Information Systems"
>>> topic.count("i")
2

In this example, the variable topic is assigned a string. Because the variable is defined as a string, Python automatically makes all methods associated with a string object available. In the second line of code, the count() method is called. The argument, in this case, is a substring (the letter i), and the result is a count of the number of times the substring occurs in the string object. The count() method is case sensitive, as is most of Python syntax.

Methods are not limited to strings—many of Python’s built-in data types have associated methods, and they are widely used when working with objects in Python scripts. The following sections include more examples of using methods.

4.12 Working with strings

Strings are a useful built-in data type and are frequently used in Python scripts. Many geoprocessing parameters are strings—for example, the name of a project in a workspace, the name of a feature class in a geodatabase, and the name of a field in a table. In many cases, these strings are somewhat complex. For example, the path of a feature class might look like C:\EsriPress\Python\Data\zipcodes.shp. It is therefore useful to review additional string methods to work with strings.

The lower() method returns a lowercase version of the string value:

>>> mytext = "GIS is cool"
>>> print(mytext.lower())
gis is cool

The upper() method returns an uppercase version of the string value:

>>> mytext = "GIS is cool"
>>> print(mytext.upper())
GIS IS COOL

The title() method returns a title-case version of the string value—that is, each word starts with an uppercase letter:

>>> mytext = "GIS is cool"
>>> print(mytext.title())
Gis Is Cool

Strings (and other Python sequences such as lists) have an index positioning system, which uses a range of values enclosed in brackets. Each character in a string is assigned an index number. Python is a zero-based language, which means the first character in a sequence starts with index zero [0]. Spaces are counted like any other character. Consider the following string:

>>> mystring = "Geographic Information Systems"

The code to obtain the first character is as follows:

>>> mystring[0]
'G'

This approach can be used to obtain any character:

>>> mystring[23]
'S'

You can use negative index numbers to start counting from the end. The last item in the list is index −1. This makes it possible to get the last item without knowing the exact count:

>>> mystring[-1]
's'

For any given string, you can therefore point to a specific character using two different index numbers: a positive index on the basis of forward indexing and a negative index on the basis of reverse indexing.

Forward indexing and reverse indexing of the term “Geography.”

To illustrate this pattern for the example string in the figure:

>>> text = "Geography"
>>> text[5]
'a'
>>> text[-4]
'a'

Obtaining an individual character from a string (or, more generally, a single element from a sequence) is simply called indexing.

Strings can be sliced into smaller strings. Slicing uses two indices separated by a colon. The first index is the number of the first character you want to include. The second index is the number of the first character you do not want to include. For example, the following code creates a new list containing the elements starting with index number 11, up to but not including the element with index number 22:

>>> mystring = "Geographic Information Systems"
>>> mystring[11:22]
'Information'

Initially, slicing can be confusing and a common source of errors. The key to understanding slicing is to recognize that the second index is the number of the first character not to include. Carefully consider the following example:

>>> text = "Geography"
>>> text[2:5]
"ogr"

The first index number 2 corresponds to the letter o—this is the first character to include. The second index number 5 corresponds to the letter a—this is the first character not to include. Therefore, the last character to include is the letter r at index number 4, and the letter a is not returned.

Leaving out one of the indices means you are not putting a limit on the range. For example, the following code returns a string consisting of the characters starting with index number 11, up to and including the highest index number or the last character:

>>> mystring = "Geographic Information Systems"
>>> mystring[11:]
'Information Systems'

On the other hand, the following code returns a string consisting of the characters starting with the lowest index number (which is zero), up to and including index number 9.

>>> mystring = "Geographic Information Systems"
>>> mystring[:10]
'Geographic'

This index positioning system is also used for other sequences, including lists; section 4.15 reviews the use of index numbers in more detail.

The find() method identifies a substring and returns the leftmost index number when the string is found. A value of −1 is returned when the string is not found.

>>> mystring = "Geographic Information Systems"
>>> mystring.find("Info")
11

The Python find() method is case sensitive, as is most of Python, as follows:

>>> mystring = "Geographic Information Systems"
>>> mystring.find("info")
-1

The returned value of −1 means that the substring was not found.

The in keyword is similar to the find() method but returns a Boolean value of True or False:

>>> mystring = "Geographic Information Systems"
>>> "Info" in mystring
True

The join() method is used to join elements of a list:

>>> list_gis = ["Geographic", "Information", "Systems"]
>>> string_gis = " "
>>> string_gis.join(list_gis)
'Geographic Information Systems'

Here, the elements in the list are joined into a single string. The string object (string_gis) has a method called join(). The value of the string object, in this case, is a space (" "), and the argument of the join() method is the list of elements to be joined into a single string.

The opposite of the join() method is the split() method. The argument of the split() method is the separator to split the input string into a list of elements. The following example also uses a space, but it can be any other character.

>>> pythonstring = "Geoprocessing using Python scripts"
>>> pythonlist = pythonstring.split(" ")
>>> pythonlist
['Geoprocessing', 'using', 'Python', 'scripts']

Another commonly used method to manipulate path and file names is the strip() method. The generic strip() method allows you to remove any combination of characters in any order from the ends of an existing string. For example:

>>> mytext = "Commenting scripts is good"
>>> mytext.strip("Cdo")
'mmenting scripts is g'

Notice that the strip() method is not specific: It removes any characters from the start or end of the string, irrespective of the order in which the characters are listed in the argument, and irrespective of whether all the characters are included on either end of the string.

The lstrip() and rstrip() methods provide a bit more control by limiting stripping to either the left side or right side of the string, respectively. For example:

>>> mytext = "Commenting scripts is good"
>>> mytext.rstrip("Cdo")
'Commenting scripts is g'

Notice that, in this case, the leading “Co” is not removed because the rstrip() method removes only trailing characters.

Calling one of the strip methods without any arguments removes leading or trailing spaces, also referred to as whitespace in coding. This method is useful for cleaning up strings that may have been formatted in another program. Consider the following example:

>>> white = " this is my text "
>>> white.strip()
'this is my text'

Instead of strip(), you can also use strip(" ").

The replace() method is much more specific and replaces all occurrences of a specific substring with another substring. This works like a find-and-replace operation in a text editor. For example:

>>> mygis = "Geographic Information Systems"
>>> mygis.replace("Systems", "Science")
'Geographic Information Science'

The replace() method can also be used to remove the file extensions from file names by replacing a substring with an empty string (" "). This approach is more specific than the strip() method, because it removes a specific substring wherever it occurs, and then replaces it. For example:

>>> myfile = "streams.shp"
>>> myfile.replace(".shp", "")
'streams'

The replace() method is more robust than stripping methods. Consider the same example code using the rstrip() method instead:

>>> myfile = "streams.shp"
>>> myfile.rstrip(".shp")
'stream'

The rstrip() method removes not only the file extension, but also another character from the file name. It removes any characters from the end of the string, regardless of the order in which the characters are listed in the argument. This removal is clearly not desired, and therefore stripping methods are not recommended to manipulate file extensions.

4.13 Printing and string formatting

Printing is widely used in scripts to check the values of variables and confirm that a step is completed. Printing is also necessary to solicit inputs from a user and provide results to a user in the form of messages. The print() function prints the values of variables as messages to the interactive interpreter, and often those values include strings. As shown in earlier examples, this sometimes requires the casting of variables, as follows:

>>> temp = 100
>>> print("The temperature is " + str(temp) + " degrees")
The temperature is 100 degrees

However, this type of string concatenation can become a bit cumbersome. A better alternative is to employ string formatting, which provides a more flexible and robust approach to the use of strings, including for printing purposes. There are several approaches to string formatting.

One of the most widely used techniques is the use of the format() method. Its most basic usage is to insert a value into a string using a placeholder or replacement field. Consider the earlier example rewritten to use the format() method:

>>> temp = 100
>>> print("The temperature is {0} degrees".format(temp))
The temperature is 100 degrees

In this example, {0} is a replacement field consisting of an index number surrounded by braces { }. Anything that is not contained in the braces is considered literal text. The value inside the replacement field is replaced by the argument of the format() method—in this case, the value of the variable temp.

The index number is not required, and the following code produces the same result:

>>> print("The temperature is {} degrees".format(temp))

This approach to string formatting is not limited to single replacement fields, as follows:

>>> username = "Paul"
>>> password = "surf$&9*"
>>> print("{0}'s password is {1}".format(username, password))
Paul's password is surf$&9*

Again, the index numbers are not required, and the following is also correct:

>>> print("{}'s password is {}".format(username, password))

By default, the arguments of the format() method are expected to be in the same order as the replacement fields. The index numbers are therefore optional. However, the index numbers make it possible to use the arguments out of order. Consider the following example where the arguments are used out of order by adjusting the index numbers:

>>> print("{1}'s password is {0}".format(password, username))
Paul's password is surf$&9*

The format() method is widely used to organize print messages in a consistent manner. An alternative technique is to use the % operator as a placeholder, sometimes referred to as “old-style” string formatting. Using this approach, the earlier example is as follows:

>>> username = "Paul"
>>> password = "surf$&9*"
>>> print("%s's password is %s" %(username, password))
Paul's password is surf$&9*

In this example, the use of %s indicates that the placeholder is intended for a string. Other options include %i for integer and %d for decimal.

Although the difference between the two approaches to string formatting seems somewhat subtle, the use of the format() method is preferred. For example, the use of indices makes it possible to rearrange the display order of the arguments.

In addition to the format() method and the % operator, there is a newer approach to format strings called f-strings. F-strings are also called “formatted string literals.” They are string literals prefixed with the letter f and that contain expressions in braces. These expressions are replaced with their values at runtime.

Consider the earlier example but now using f-strings, as follows:

>>> temp = 100
>>> print(f"The temperature is {temp} degrees")
The temperature is 100 degrees

This example shows how f-strings are similar to the use of str.format()but shorter.

F-strings also work for multiple values:

>>> username = "Paul"
>>> password = "surf$&9*"
>>> print(f"{username}'s password is {password}")

Although the other ways of formatting strings can still be used, f-strings are concise, readable, and convenient. They are also fast and less prone to error. As a result, f-strings are strongly recommended.

4.14 Using Boolean logic

Python also works with Boolean logic, which is built around the truth value of expressions and objects. Many expressions are evaluated in a Boolean context, which allows you to evaluate conditions (i.e., whether they are true or false) and decide how to proceed depending on the result of those conditions. Boolean logic is implemented in Python in various ways, including a Boolean data type, Boolean variables, Boolean expressions, and Boolean operators. This section reviews these concepts with examples.

Python provides a Boolean data type, which can have only one of two values, True or False. For example, you can check to see whether a variable has a certain value or not, and the result can be only True or False. The following code assigns a number to a variable, and then checks to see whether this number is smaller than another number:

>>> x = 7
>>> x < 10
True

Or you can check to see if the number is equal to another number:

>>> x = 3
>>> x == 8
False

A single equal-sign (=) is used for an assignment statement, whereas a double equal-sign (==) is used to check whether two objects have the same value.

These expressions are referred to as Boolean expressions, which are expressions that evaluate to True or False. These expressions can use logical or Boolean operators. Logical operators include equal to (==), not equal to (!=), less than (<), greater than (>), and several others. The result can be assigned to a variable:

>>> x = 7
>>> t = x < 10
>>> t
True

The data type of this variable is Boolean:

>>> type(t)
<class 'bool'>

The Boolean values True and False are case sensitive, and true or TRUE will produce an error. For example:

>>> a = True

creates a Boolean, whereas

>>> a = true

produces an error:

NameError: name 'true' is not defined

Boolean expressions can also use Boolean operators using the Python keywords and, or, and not. Consider a Boolean variable that is set to True:

>>> a = True

The not operator logically reverses this:

>>> not a
False

The and and or operators are used to compare two Booleans. The or operator results in True if either Boolean is True:

>>> a = True
>>> b = False
>>> a or b
True

The and operator results in True only if both Booleans are True:

>>> a and b
False

Boolean operators are case sensitive. For example, the use of And or AND will produce an error.

>>> a AND b
SyntaxError: invalid syntax

Data types such as strings or numbers may not appear to be true or false, but what if the string is empty or the number is zero? You can check the truthfulness of a value by using the bool() function, which determines whether the object is true or false. Consider the following example:

>>> text = "Something"
>>> bool(text)
True

When a string contains at least one character, the string evaluates to True. And when the string is empty?

>>> notext = ""
>>> bool(notext)
False

An empty string evaluates to False.

The results are similar for numeric values:

>>> number = 123
>>> bool(number)
True
>>> zero = 0
>>> bool(zero)
False

The number zero (0 or 0.0) evaluates to False, whereas any other number evaluates to True.

Two other Python keywords are worthy of mention here: is and None. First, the is keyword evaluates whether two variables refer to the same object.

>>> var1 = [1, 2, 3]
>>> var2 = var1
>>> var1 is var2
True

An expression using the is keyword returns a Boolean. The expression determines whether the two variables refer to the same object, not whether they have the same value. For example, the following code compares two variables with the same value, but they are different objects.

>>> var1 = [1, 2, 3]
>>> var2 = [1, 2, 3]
>>> var1 is var2
False

These examples illustrate the difference between the is keyword and the equal to (==) operator, which compares the values of the variables, not the objects themselves.

>>> var1 = [1, 2, 3]
>>> var2 = [1, 2, 3]
>>> var1 == var2
True

The None keyword in Python is used to define a null value or no value at all. None is a data type of its own—i.e., NoneType:

>>> x = None
>>> type(x)
<class 'NoneType'>

The data type None evaluates to False:

>>> bool(x)
False

The None data type is sometimes used to check whether user inputs have a valid value or not. This is different from the number zero or an empty string.

>>> empty = ""
>>> empty == None
False

Boolean logic, including the keywords in this section, are widely used in scripts to evaluate conditions and determine the appropriate course of action on the basis of those conditions.

4.15 Working with lists

Lists are a versatile Python data type and can be manipulated in many ways. In an earlier section, you saw how items in a list were joined to form a single string and how a single string was split into items in a list. In this section, you will see a few more ways to manipulate lists.

Consider the following list:

>>> cities = ["Austin", "Baltimore", "Cleveland", "Denver", "Eugene"]

The number of items in a list can be determined using Python’s built-in len() function, as follows:

>>> print(len(cities))
5

Lists can be sorted using the sort() method. The default sorting is alphanumerical, but it can be reversed by using the reverse() argument of the sort() method, as follows:

>>> cities.sort(reverse=True)
>>> print(cities)
['Eugene', 'Denver', 'Cleveland', 'Baltimore', 'Austin']
>>> cities.sort()
>>> print(cities)
['Austin', 'Baltimore', 'Cleveland', 'Denver', 'Eugene']

Notice how methods such as sort() modify the list but don’t return anything. That is why the print messages are added so you can see what happened to the list. When using an interactive interpreter, you could simply use cities instead of print(cities), and the result would be the same. When testing your code in a stand-alone script, however, print(cities) makes the message appear in the interactive window, whereas just using cities would not do anything. It is therefore recommended to use print messages for testing purposes.

Lists in Python are sequences, just like strings. Therefore, just like strings, Python lists are indexed, starting with the index zero [0]. These index numbers can obtain specific items in the list or slice the list into smaller lists. The code to obtain the second item from the preceding list is as follows:

>>> cities[1]
'Baltimore'

You can use negative index numbers to start counting from the end. The last item in the list is index −1. The use of a negative index number makes it possible to get the last item without knowing the exact count:

>>> cities[-1]
'Eugene'

The following code obtains the second-to-last item:

>>> cities[-2]
'Denver'

The use of indices does not modify the original list but simply obtains the item from the list as a value.

Lists can be sliced into smaller lists. Again, it works just like the slicing of strings. Slicing uses two indices separated by a colon. The first index is the number of the first element you want to include. The second index is the number of the first element you do not want to include. For example, the following code creates a new list containing the elements starting with index number 2, up to but not including the element with index number 4:

>>> cities[2:4]
['Cleveland', 'Denver']

Leaving out one of the indices means you are not putting a limit on the range. For example, the following code creates a new list consisting of the items starting with index number 2, up to and including the highest index number:

>>> cities[2:]
['Cleveland', 'Denver', 'Eugene']

The following code obtains the items up to but not including the index number 2:

>>> cities[:2]
['Austin', 'Baltimore']

Using a single index number—for example, cities[1]—returns the value of a single item—in this case, a string. By contrast, slicing returns a new list, even if it contains only a single item. Consider the differences. Here is the code to obtain the second item from the list as a string:

>>> cities[1]
'Baltimore'

And here is the code to obtain the second item from the list as a new list:

>>> cities[1:2]
['Baltimore']

This distinction is important because strings and lists are different types of objects, and you can do different things with each type. What if you have a list with only a single element, and you want to obtain the value of that element—i.e., not as a list? Use the index zero [0].

>>> mylist = ["Peter"]
>>> mylist[0]
'Peter'

Another important list operation determines membership using the in keyword. It checks whether something is true and returns a value of True or False. Consider the following example:

>>> cities = ["Austin", "Baltimore", "Cleveland", "Denver", "Eugene"]
>>> "Baltimore" in cities
True
>>> "Seattle" in cities
False

Elements can be deleted using the del statement. For example, the following code deletes a specific element from a list on the basis of the index number:

>>> cities = ["Austin", "Baltimore", "Cleveland", "Denver", "Eugene"]
>>> del cities[2]
>>> cities
['Austin', 'Baltimore', 'Denver', 'Eugene']

In addition to using index numbers, you can manipulate lists using methods. You have already seen the sort() method. Other commonly used list methods include append(), count(), extend(), index(), insert(), pop(), and remove(). A brief discussion of some of these list methods follows.

The append() method appends an element to the end of the list:

>>> cities = ["Austin", "Baltimore"]
>>> cities.append("Cleveland")
>>> cities
['Austin', 'Baltimore', 'Cleveland']

The count() method determines the number of times an element occurs in a list:

>>> yesno = ["True", "True", "False", "True", "False"]
>>> yesno.count("True")
3

The extend() method allows you to append several values at once:

>>> list1 = [1, 2, 3, 4]
>>> list2 = [11, 12, 13, 14]
>>> list1.extend(list2)
>>> list1
[1, 2, 3, 4, 11, 12, 13, 14]

The index() method finds the index of the first occurrence of a value:

>>> mylist = ["The", "quick", "fox", "jumps", "over", "the", "lazy", "dog"]
>>> mylist.index("the")
5

Recall that Python is case sensitive, which is why the result is index number 5, not zero (0).

The insert() method makes it possible to insert an element into a list at a specified location, as follows:

>>> cities = ["Austin", "Cleveland", "Denver", "Eugene"]
>>> cities.insert(1, "Baltimore")
>>> cities
['Austin', 'Baltimore', 'Cleveland', 'Denver', 'Eugene']

The pop() method removes an element from a list at a specified location and returns the value of this element, as follows:

>>> cities = ["Austin", "Baltimore", "Cleveland", "Denver", "Eugene"]
>>> cities.pop(3)
'Denver'
>>> cities
['Austin', 'Baltimore', 'Cleveland', 'Eugene']

The pop() method accomplished the same thing as the del statement. The only difference is that the pop() method returns the value of the element being removed, whereas the del statement does not.

The remove() method removes the first occurrence of a value, as follows:

>>> numbers = [1, 0, 1, 0, 1, 0, 1, 0, 1, 0]
>>> numbers.remove(0)
>>> numbers
[1, 1, 0, 1, 0, 1, 0, 1, 0]

There is no need to memorize all these methods. Code autocompletion prompts will help you by listing all the applicable methods. For example, consider the following code example in the Python window. After you create a list called cities, the interactive Python interpreter recognizes it as a list when you start typing the next line of code. Therefore, after typing cities followed by a dot (.), you are prompted by a drop-down list of list methods, as shown in the figure.

Python window with drop-down list of method for the list called cities.
Description

List includes append(), clear(), copy(), count(), extend(), index(), insert(), pop(), remove(), reverse(), and so on.

Once you have chosen a method, you are prompted with the syntax for the method, as shown in the figure.

Python window with a prompt of the syntax for the remove method.

The earlier examples have shown how you can create the elements of a list by typing the values directly into your code. You can also create an empty list first, and then add elements to it, as follows:

>>> letters = [ ]
>>> letters.append("A")

This can be helpful if you are obtaining the elements from a function or other source, and you need a list to add them to.

Another way to create a list it to convert another object that contains the elements of interest. This other object must be an iterable, or iterator object, which means you should be able to iterate over its contents. Sequences are an example of iterables, which includes strings, lists, and tuples. The built-in list() function creates a list from the elements in an iterable.

Consider the following example:

>>> letter_string = ("GIS")
>>> letter_list = list(letter_string)
>>> print(letter_list)
['G', 'I', 'S']

The list() function creates a list that consists of every character in the string. This is similar to casting between string and integer using str() and int(). However, the use of the list() function is a bit different. For example, it does not work in reverse, and you cannot convert the list back to the original string using str():

>>> str(letter_list)
"['G', 'I', 'S']"

Another limitation is that you cannot convert a number to a list. For example:

>>> numbers = 123
>>> list(numbers)

This produces an error:

TypeError: 'int' object is not iterable

Numbers are not iterable, and therefore they are not a valid argument of the list() function. To convert the numbers to a list, you must first convert the value to an iterable, such as a string, as follows:

>>> list(str(numbers))
['1', '2', '3']

To add the actual value as an element to a list (instead of the individual digits), you can start off with an empty list, and then add the number, as follows:

>>> number = 123
>>> number_list = []
>>> number_list.append(number)
>>> print(number_list)
[123]

The list() function also can be used on tuples and dictionaries; the next sections cover these in more detail. In the case of a dictionary, the list() function returns only the keys (unique objects) of a dictionary.

>>> ages = {"Adam": 23, "Beatrice": 25, "Calvin": 19}
>>> list(ages)
['Adam', 'Beatrice', 'Calvin']

In the case of a tuple, a sequence of immutable objects, each element of the tuple becomes an element in the list. For example:

>>> mytuple = (4, 5, 6)
>>> list(mytuple)
[4, 5, 6]

Another common use of the list() function is to create a list of values from an object created by a different function. For example, the built-in range() function creates a list of values. The following code creates an object that contains the values from 1 through 5:

>>> r = range(1, 6)

The range() function returns a range object, not a list. The list() function creates a list from this range object, as follows:

>>> list(r)
[1, 2, 3, 4, 5]

Lists are a common data type when writing geoprocessing scripts. Chapter 6 provides additional examples of working with lists.

4.16 Working with tuples

Lists are common in Python, and you will often use lists when writing geoprocessing scripts, including lists of projects, maps, layers, feature classes, fields, and more. Lists are versatile since you can modify them in many ways, as discussed in the previous section. Sometimes, however, you may want to use a list without allowing its elements to be modified. That is where tuples come in. A tuple (rhymes with “quintuple”) is a sequence of elements, just like a list, but a tuple is immutable, meaning that it cannot be changed. The syntax of a tuple is simple—separate a sequence of values with commas, and you have a tuple. For example, the following code returns a tuple with five elements:

>>> my tuple = 1, 2, 3, 4, 5
>>> my tuple
(1, 2, 3, 4, 5)

To create a tuple, it is not necessary to use parentheses, although they are often added for clarity. How to get a tuple with only one element? Add a comma, even though there is no element following the comma:

>>> 6,
(6,)

Working with tuples is like working with lists. Elements in the tuple have an index, which can be used to obtain specific elements of the tuple. For example:

>>> x = ("a", "b", "c")
>>> x[0]
'a'

However, the sequence of elements cannot be modified. Therefore, list operations such as deleting, appending, removing, and others are not supported by tuples. The only methods that work on tuples are count() and index() because these methods do not modify the sequence of elements. Other operations, such as slicing, can be applied but return a different tuple. Running the following code slices a tuple and returns a different tuple:

>>> x = ("a", "b", "c", "d", "e", "f", "g")
>>> x[2:5]
('c', 'd', 'e')

Notice that the slicing operation returns another tuple, and the original tuple remains the same as before. You cannot modify the elements of a tuple, unlike the elements of a list.

If you cannot modify a tuple, why are tuples important? First, some built-in Python functions and modules return tuples—in which case, you must deal with them. Second, tuples are often used in dictionaries, which are covered next. In addition, many geoprocessing tools in ArcGIS Pro use both lists and tuples as input parameters. Although lists and tuples cannot be used interchangeably, sometimes they are used for the same purpose.

The built-in tuple() function converts an iterable to a tuple. The following code shows how a list is converted to a tuple.

>>> mylist = [1, 2, 3]
>>> tuple(mylist)
(1, 2, 3)

The tuple() function also works on strings and dictionaries, but not on numbers, the same as the list() function.

4.17 Working with dictionaries

Lists and tuples are useful for grouping elements into a structure, and the elements can be referred to by their index number, starting with zero (0). Working with index numbers works fine, but it has its limitations. Consider the example of the following list of cities:

>>> cities = ["Austin", "Baltimore", "Cleveland", "Denver"]

Suppose you want to have a database that contains the state for each city. You can do this by creating a list as follows:

>>> states = ["Texas", "Maryland", "Ohio", "Colorado"]

Because the index numbers correspond, you can access elements from one list by using the index number from the other list. For example, to get the state for Cleveland, use the following code:

>>> states[cities.index("Cleveland")]
'Ohio'

This process is useful but cumbersome. For example, some lists have many elements, making it tedious to read through the contents of each list. In addition, in the example of states and cities, some states will have more than one city. Also, making only a minor edit to one of the lists would disrupt the entire sequence. You can use tuples to ensure no changes are made to the sequence, but that also has its limitations. What you really need is a lookup table that works as follows:

>>> statelookup["Cleveland"]
'Ohio'

A lookup table is commonly used to display information from one table on the basis of a corresponding value in another table. A table join operation in ArcGIS Pro is an example of using a lookup table. In Python, one way to implement a lookup table is to use a dictionary. Dictionaries consist of pairs of keys and their corresponding values. Pairs are also referred to as the items of the dictionary. A dictionary item consists of a key, followed by a colon, and then the corresponding value. Items are separated by a comma. The dictionary itself is surrounded by braces.

The dictionary for the preceding example is as follows:

>>> statelookup = {"Austin": "Texas", "Baltimore": "Maryland", "Cleveland": "Ohio", "Denver": "Colorado"}

You can now use this dictionary to look up the state for each city:

>>> statelookup["Cleveland"]
'Ohio'

The order in which the items are created in the dictionary does not matter. The dictionary can be modified, and if the pairs of keys and their corresponding values are intact, the dictionary will continue to function. Keep in mind when creating the dictionary that the keys must be unique, but the values do not have to be. In the earlier example, the dictionary can have multiple cities in the same state, but the dictionary should not contain duplicate city names.

Dictionaries can be created and populated at the same time, as in the preceding example. You can also create an empty dictionary first using only braces, and then add items to it. Here is the code to create a new empty dictionary:

>>> zoning = {}

Items can be added using brackets and an assignment statement as follows:

>>> zoning["RES"] = "Residential"

You can continue to add items to the dictionary. The code is as follows:

>>> zoning["IND"] = "Industry"
>>> zoning["WAT"] = "Water"
>>> print(zoning)
{'IND': 'Industry', 'RES': 'Residential', 'WAT': 'Water'}

Items can be modified using the same syntax. Setting the value using a key that is already in use overwrites the existing value. The code is as follows:

>>> zoning["IND"] = "Industrial"
>>> print(zoning)
{'IND': 'Industrial', 'RES': 'Residential', 'WAT': 'Water'}

Items can be deleted using brackets and the del statement as follows:

>>> del zoning["WAT"]
>>> print(zoning)
{'IND': 'Industry', 'RES': 'Residential'}

There are several dictionary methods. The keys() method returns a view object that displays a list of all the keys in the dictionary, as follows:

>>> zoning.keys()
dict_keys(['IND', 'RES'])

A dictionary view object is like a window on the keys and values. The list() function obtains a list of the keys inside the view object, as follows:

>>> list(zoning.keys())
['IND', 'RES']

The values() method returns a view object that displays a list of all the values in the dictionary. The list() function obtains a list of the values, as follows:

>>> list(zoning.values())
['Industrial', 'Residential']

The items() method returns a view object that displays a list of all the items in the dictionary—that is, all key-value pairs. The list() function obtains a list of the items, as follows:

>>> list(zoning.values())
['Industrial', 'Residential']

Dictionaries and lists have several characteristics in common. Both are mutable, which means elements can be added or removed. Both can be nested, which means that a list can contain a list, a list can contain a dictionary, a dictionary can contain a list, and so on. Lists and dictionaries are different, however, in how elements are accessed. List elements are accessed by their position in the list, whereas dictionary elements are accessed by their key. In addition, lists are considered sequences, and the elements are sorted on the basis of how the list was created. Dictionaries are not sequences, and the elements are not ordered.

Later chapters provide several examples of the use of dictionaries when writing geoprocessing scripts.

4.18 Using sets

In addition to lists, tuples, and dictionaries, Python includes another data type to organize elements called a set. A set is an unordered collection of elements without duplicates. This characteristic makes a set like a list, but there is no order to the elements, and the elements of a set are unique. Creating a set is like creating a list or a tuple, but the elements are enclosed with braces:

>>> number_set = {0, 1, 2, 4, 5}

When duplicates are entered, they are automatically removed.

>>> number_set = {1, 1, 1}
>>> number_set
{1}

Elements of a set must be immutable, which includes integers, floats, strings, and tuples.

A set can also be created by converting a list or a tuple using the set() function. For example:

>>> mylist = ["G", "I", "S"]
>>> myset = set(mylist)
>>> myset
{'I', 'S', 'G'}

This example illustrates that the order of the elements is not necessarily preserved because the elements of a set are not ordered.

You can also use a string as the argument of the set() function, but because a string is iterable, every unique character becomes an element of the set:

>>> mytext = "Mississippi"
>>> myset = set(mytext)
>>> myset
{'p', 's', 'i', 'M'}

It is possible to create an empty set, but empty braces ({ }) are interpreted as an empty dictionary. The only way to create an empty set is therefore to use the set() function:

>>> newset = set()
>>> type(newset)
<class 'set'>

Some of the operations you can perform on lists and tuples can also be applied to sets—for example, the len() function and the in operator:

>>> myset = set("tesselation")
>>> myset
{'t', 'l', 'i', 's', 'a', 'n', 'o', 'e'}
>>> len(myset)
8
>>> "s" in myset
True

One of the strengths of sets is that they can be easily manipulated using various operators. Consider the following two sets:

>>> set1 = {1, 2, 4, 7, 8}
>>> set2 = {1, 5, 8, 10}

The union of the two sets consists of all the elements in either set. You can use the union operator (|) or the union() method:

>>> set1 | set2

or

>>> set1.union(set2)

This results in

{1, 2, 4, 5, 7, 8, 10}

The intersection of the two sets consists of all the elements common to both sets:

>>> set1 & set2

or

>>> set1.intersection(set2)

This results in

{8, 1}

There are several other operators to compare two or more sets, including determining the difference between two sets (difference() method or - operator), determining whether two sets have any elements in common (isdisjoint() method), and determining whether one set is a subset of the other (issubset() method or <= operator).

A set is mutable, just like a list, and can therefore be modified using several methods, including add(), pop(), remove(), and update(). On the other hand, a set does not support indexing because the elements of a set are not ordered.

4.19 Working with paths

When files are organized in a computer, a folder structure is used to facilitate retrieving files using a path. A path consists of folder names separated by backslashes (\), optionally followed by a file name. An example of a path for a workspace is C:\EsriPress\Python\Data\. An example of a path for a shapefile is C:\EsriPress\Python\Data\rivers.shp.

Although the backslash is commonly used when writing paths, Python treats a backslash as an escape character. For example, \n represents a line feed, and \t represents a tab. So, in Python, you should avoid using backslashes in paths. An example to illustrate this behavior follows. Try the following path:

>>> mypath = "C:\Projects\Nanaimo\Data.gdb"

Looks simple enough. But when you press Enter to run this line of code, you will get an error, as shown in the figure.

Error message in the interactive interpreter in IDLE when using a backslash character in a string.

The syntax error is caused by using \N in the path. Because \n is used for a line feed, the underlying assumption is that you meant to type \n instead of \N. What happens if you use \n instead?

>>> mypath = "C:\Projects\nanaimo\Data.gdb"

No error appears. But all is not well:

>>> print(mypath)
C:\Projects
anaimo\Data.gdb

Even though the path is in quotation marks and therefore a string, the use of \n results in a new line with the characters \ and n being dropped. A similar issue occurs when using \t. This code results in a tab being added, and the characters \ and t are dropped:

>>> mypath = "C:\Projects\temp.gdb"
>>> print(mypath)
C:\Projects emp.gdb

As a result, you should not use regular backslashes (\) in your paths, even if you are not using \n or \t for any given path. The chance is too great that you will forget and inadvertently introduce an error in your scripts.

There are three correct ways to specify a path in Python:

  1. Use a forward slash (/)—for example,
    "C:/EsriPress/Python/Data"
  2. Use two backward slashes (\\)—for example,
    "C:\\EsriPress\\Python\\Data"
  3. Use a string literal by placing the letter r before a string—for example,
    r"C:\EsriPress\Python\Data" 

The letter r stands for “raw string,” which means that a backslash will not be read as an escape character.

The style you use for paths is a matter of preference. It is recommended to adopt a single style and stay with it. This book uses forward slashes, such as "C:/EsriPress/Python/Data", but it is good to be aware of the other types of notation.

One approach that some coders use to reduce typing and avoid typos is to copy their path directly from their file manager. Open your file manager (i.e., File Explorer), and navigate to the folder of interest. Once the folder is shown in the top navigation bar, right-click on the bar, and click Copy address as text.

 Copying a path in File Explorer as text.

You can then paste the exact path into a script. When you do this, however, the path uses the standard backslash (\). For example:

C:\Projects\Data\Study_Area\Itaquaquecetuba

To avoid replacing each backslash, it is faster to type the r character in front:

r"C:\Projects\Data\Study_Area\Itaquaquecetuba"

This ease of use is one reason why this is a popular style for writing paths.

In Python, paths are stored as strings. For example, the following code assigns the path for a shapefile to a string variable:

>>> inputfc = "C:/EsriPress/Python/Data/rivers.shp"

Once a path is assigned to a variable, the path can be used to reference data on disk.

4.20 Working with modules

There are many more functions available in Python than just the built-in functions. Using them requires the use of modules. A module is like an extension that can be imported into Python to extend its capabilities. Typically, a module consists of several specialized functions. Modules are imported using a special statement called import. The general syntax for the import statement is as follows:

>>> import <module>

A commonly used module is the math module. Importing it into Python works like this:

>>> import math

Once you import a module, all functions in that module are available to use in Python. To call a function from the imported module, you must still refer to the module itself by writing:

<module>.<function>

Notice that the syntax shows the same dot notation that was used for objects and methods. Dot notation is used here to indicate that the function belongs to the module.

For example, to use the cosine function cos() of the math module, use the following code:

>>> math.cos(1)
0.54030230586813977

It is important to recognize that this code works only after you import the module. For example, if you started your code with math.cos(1) as the first line of code without first importing the math module, you would receive an error message that includes the following:

NameError: name 'math' is not defined

Note that the math.cos() function assumes the input value is in radians. To get the description of a function, use the help() function:

>>> help(math.cos)

Note the result:

Help on built-in function cos in module math:
cos(...)
cos(x):
Return the cosine of x (measured in radians).

The help() function is a convenient way to access the documentation of modules, functions, classes, and methods in Python.

To get the list of all the functions in a module, use the dir() function:

>>> dir(math)

This function prints all the functions of the math module in alphabetical order as a Python list.

All the modules also have their own documentation. For example, the math module is explained in detail for Python 3.6 at https://docs.python.org/3.6/library/math.html.

One of the reasons for using the syntax <module>.<function> to call a function is that functions from different modules can have the same name. If you are sure you won’t be using more than one function with the same name in the same script, you can shorten your code by using a variant of the import statement:

>>> from math import cos
>>> cos(1)
0.54030230586813977

Once you use the from <module> import <function> statement, you can use the function without its module prefix.

Another module is the time module. You can review the functions of the time module using the following code:

>>> import time
>>> dir(time)

The code prints a list of all the functions in the time module. Try some of the simpler ones. The time.time() function determines the current time as the number of seconds since the “epoch,” or reference, date. It is platform dependent—for UNIX, it is 0 hours January 1, 1970, and that is what Python uses by default.

>>> time.time()
1539107680.2387223

The result varies from platform to platform, but you can use the function to reliably time something. For example, if you want to determine how long it takes to carry out a procedure, you record the time before and after the procedure to determine the difference.

The localtime() function converts the time in seconds to the components that make up the current local date and time:

>>> time.localtime()
time.struct_time(tm_year=2018, tm_mon=10, tm_mday=9, tm_hour=10, tm_min=55, tm_sec=51, tm_wday=1, tm_yday=282, tm_isdst=1)

The code returns a tuple containing the following elements: year, month, day of the month, hour, minute, second, day of the week (Monday is 0), day of the year, and daylight-saving time.

The asctime() function converts time to a string, as follows:

>>> time.asctime()
'Tue Oct 9 10:56:50 2018'

The keyword module can be used to see a list of Python keywords. Keywords should never be used as variable names, so it is useful to review the list of keywords, as follows:

>>> import keyword
>>> print(keyword.kwlist)
['False', 'None', 'True', 'and', 'as', 'assert', 'break', 'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'nonlocal', 'not', 'or', 'pass', 'raise', 'return', 'try', 'while', 'with', 'yield']

Examine the list, and notice that import is included, as are del and in. You will become familiar with many of these keywords—for example, the section on workflows later in this chapter covers the statements if, elif, and else.

Many more modules are available in Python, and upcoming chapters provide examples of them throughout. It is also important to recognize that you can import modules from other sources. The most relevant example is ArcPy. ArcPy is referred to as a package because it contains multiple modules, but from the perspective of writing a script, it looks like a module. When you write scripts to work with ArcGIS Pro, the first thing you typically do is import ArcPy to gain access to all the tools in ArcGIS Pro. ArcPy uses the same <module>.<function> syntax you have already seen in this chapter. Chapter 5 covers this syntax in greater detail, as well as a much more detailed look at the functionality of ArcPy.

4.21 Controlling workflow using conditional statements

The code you have looked at so far has a simple, sequential flow. Each statement or expression is run once, in the order of occurrence. More complex applications require that you selectively run certain portions of your code or repeat parts of it. Branching is one way to control the workflow in your script. It basically means deciding to take one path or another. Branching typically uses the if structure and its variants. Under the if structure, scripts can branch into a section of code or skip it, on the basis of the conditions that are set. Consider a simple example that employs the random module.

import random
x = random.randint(1, 6)
print(x)
if x == 6:
print("You win!")

The random.randint(1, 6) expression creates a random integer between 1 and 6, as if throwing a single dice, and the value is printed. If the value is 6, the string “You win!” is printed. What happens if the value is not equal to 6? Nothing—the next line of code is skipped.

All if structures have a condition that is either true or false. Conditions are most often created by using comparison operators. The basic operators are listed in table 4.3. Notice that the symbol for “equal to” is a double equal-sign (==), not a single equal-sign (=), as you might expect. The use of the single equal-sign is reserved for assigning a value to a variable. Therefore, x = 6 is an assignment, whereas x == 6 is a condition.

Table 4.3. Comparison operators in Python
Operator Description Example Conditions Result
== Equal to 4 == 9 False
!= Not equal to 4 != 9 True
> Greater than 4 > 9 False
< Less than 4 < 9 True
>= Greater than or equal to 3 >= 3 True
<= Less than or equal to 3 <= 2 False

Also notice a few things about the syntax of the branching structure. First, the if statement is followed by a colon—this syntax is required and indicates the beginning of indented code on the next line. Second, the if statement can be used on its own—that is, without being followed by an elif or else statement, as is often required in other programming languages. Third, the line following the if statement is indented—this indentation is a critical part of coding in Python.

To review, the basic structure of an if statement is as follows:

if x == 6:
print("You win!")

In the first line of code, the keyword if is followed by a condition and then a colon. Then there is a block of code—that is, one or more lines indented in the same manner. If the condition evaluates to True, the lines of code that make up the block are run. If the condition evaluates to False, the lines of code are skipped, and the program moves on to the next statement after the if structure. Notice that there is no endif statement as you might expect if you are familiar with other programming languages. So how does Python know when you have reached the end of the if structure? When you stop indenting your code—therefore, accurate indentation is the key to making this type of structure work.

There are several variations on the if structure using the elif and else statements. In the following example, the elif statement is executed only if the condition in the if statement is False. The elif statement can be repeated as many times as necessary, giving you the option to specify an action for every possible input. The else statement is executed only if all the previous conditions are False and can be used only once in a single if structure. The else statement (if used) comes after all the elif statements and does not include a condition—it is executed only when none of the previous conditions evaluate to True, and it does not test any additional condition. For example:

import random
x = random.randint(0,6)
print(x)
if x == 6:
print("You win!")
elif x == 5:
print("Try again!")
else:
print("You lose!")

The if structure and its variants are referred to as control structures, because the structure directs the order of execution for the statements in a series. Under the if structure, only certain parts of your code are executed while other parts are skipped.

4.22 Compound statements and indentation

The use of indentation in the previous section requires a bit more explanation. By indenting a line, the code becomes a block. A block of code is one or more consecutive lines indented by the same amount. Indentation sets lines off visually—but also logically.

Blocks are commonly used as part of a control structure such as the if statement. Blocks form the statement, or group of statements, that is run if the condition is True. Indenting in Python is not optional; it is the only way to define a block.

More generally speaking, the if statement is one example of a compound statement. Examples of other compound statements are for, try, while, and with. Later sections of this chapter as well as other chapters cover these statements. Compound statements in Python contain a group of statements or a block, and these blocks are created using indentation.

The beginning of a compound statement is defined by a colon at the end of the preceding line of code. All lines after this colon should be indented. Incorrect indentation will result in a syntax error. For example, consider the earlier example code using the if statement, but this time without any indentation, as shown in the figure.

Code example in IDLE without indentation following an if statement, resulting in a syntax error.

This error illustrates the importance of indentation.

You can create indentation using tabs or spaces. There is some debate in the Python community as to which one is better—and how many spaces to use if you use spaces—but this choice is largely a matter of personal preference. The key is consistency—that is, if you indent blocks using four spaces, always use four spaces. Mixing tabs and spaces may appear to be identical visually, but this inconsistency will cause problems. Common styles are to use one tab, two spaces, or four spaces. The choice is yours—just be consistent. Some IDEs have built-in tools to automatically convert between the use of tabs and spaces, making it easier to create a consistent indentation style in your code.

Python editors assist with writing proper indentation. First, after you write a statement that ends with a colon and press Enter, the next line is automatically indented, usually with a tab, as shown in the figure.

Automatic indentation following an if statement.

Second, once you start a block of code, every time you press Enter at the end of a line of code, the next line will be indented in the same manner as the previous line. Despite this assistance, however, indentation errors are common when you start writing Python code.

Indentation is somewhat unique to Python. Many other programming languages, including Java and C++, use braces to indicate a group of statements. A simplified example in Java is as follows:

if (<condition>) {
<statement1>
<statement2>
}

Note that this is not meaningful Python code—braces in Python are used for dictionaries and sets.

Instead of braces, Python uses indentation—this syntax makes code shorter and easier to read. However, programmers who have been writing their code in a language that uses braces may take a while to get used to indentation as a critical part of Python code.

4.23 Controlling workflow using looping structures

Another way to control your workflow is to use looping structures, also called loops. Loops allow you to repeat a certain part of your code until a specified condition is reached or until all possible inputs are used. There are two basic forms of looping structures in Python: while loops and for loops.

Here is an example of a simple while loop:

i = 0
while i <= 10:
print(i)
i += 1

The variable i is set to a beginning value of zero (0). The while statement checks the value of the variable. If this condition evaluates to True, the block of code is run. In this block, the value of the counter variable is printed, and on the next line, the value is increased by 1. After one iteration of the loop, the value of the variable is therefore 1. The block of code is repeated until the condition evaluates to False. The result of the code is a printout of the numbers 0 through 10. There are simpler ways to accomplish this result, but the example serves to illustrate the use of the while statement.

A while statement requires an exit condition. The variable used in the exit condition (i, in the case of the preceding code) is called a sentry variable or loop variable. The sentry variable is compared with some other value or values. It is important to make sure that your exit condition is robust—that is, after a certain number of repetitions, the exit condition must be reached; otherwise, the loop would keep going. For example, what if your script read as follows:

i = 0
while i <= 10:
print(i)

In this case, the sentry variable does not change in the while loop, and the exit condition is never reached. This code results in an infinite loop. You want to avoid infinite loops in your scripts because it often means that you must crash the application to exit. Therefore, be sure to confirm that the exit condition is, in fact, reached at some point.

There are several options to exit an infinite loop, however. Typically, pressing Ctrl+C stops the script from running. You will see a message KeyboardInterrupt in the interactive interpreter. Ctrl+C typically works in IDLE. Python editors such as Spyder and PyCharm provide additional options, including a Stop button (small red square in both Spyder and PyCharm).

On the other hand, you also want to make sure it is possible for the exit condition to evaluate to True at some point—otherwise, the block will never run. Take the following code, for example:

i = 12
while i <= 10:
print(i)
i +=1

In this script, the code block will never run, because the value assigned to the sentry variable (12) prevents the condition in the while loop from ever evaluating to True. This may appear like simple logic, but as your scripts become more complex, it becomes easier to overlook the simple things.

The while loop repeats part of your code on the basis of a condition. The for loop, on the other hand, also repeats part of your code, but not based on a condition. Instead, the for loop is typically based on a sequence, such as a string or a list. A for loop repeats a block of code for each element in the sequence. When it reaches the end of the sequence, the loop ends.

Here is an example of a simple for loop, as follows:

mylist = ["A", "B", "C", "D"]
for letter in mylist:
print(letter)

In this example, letter is the name of a variable, and for each iteration of the loop, this variable is assigned a different value from the list.

The result is a printout of every value in the sequence. A for loop iterates over every value in the sequence until the sequence is empty. The preceding example code uses a list, but a for loop can also be used to iterate over strings, tuples, and dictionaries.

It is important to note that when writing the for loop to iterate over a list, you typically don’t know how many elements the list contains. Unless you first check the length of the list, there is no way of knowing how many times the block of code inside the for loop will execute. What you do know is that the for loop will repeat until the code has iterated over all the elements of the list.

4.24 Working with long lines of code

Although most lines of code in Python tend to be short, sometimes you may end up writing long lines of code. Consider the following example where you want to print a long string:

message = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."
print(message)

You can use long lines of code in a Python editor, but it will be cumbersome to read. Long lines of code are not wrapped to fit the window, as they are in a typical word processing application. Instead, you must scroll across to see the entire line or make the window wider. The example in the figure shows how only part of the line of code is visible in a script window in IDLE.

Example script in IDLE where a long string on a single line of code is cut off.

The solution to make the code more readable is to manually break the line of code into multiple lines. You can place your cursor inside the line of code, press Enter, and place a backslash for line continuation, as follows:

message = "Lorem ipsum dolor sit amet, consectetur \
adipiscing elit, sed do eiusmod tempor incididunt \
ut labore et dolore magna aliqua."
print(message)

The backslash indicates that the line of code continues, and in a Python editor, the first three lines are read as a single line of code.

Example script in IDLE with backslashes used to break up a long line of code into three lines of readable code.

The Style Guide for Python Code recommends a maximum of 79 characters for a single line, but this is more a practical consideration to keep the script window size manageable.

In addition to backslashes, Python uses implied line continuation. This includes the use of parentheses ( ), brackets [ ], and braces { }. Consider the example of a long list of elements, such as all Canadian provinces and territories:

provinces = ["Alberta", "British Columbia", "Manitoba", "New Brunswick", "Newfoundland and Labrador", "Northwest Territories", "Nova Scotia", "Nunavut", "Ontario", "Prince Edward Island", "Quebec", "Saskatchewan", "Yukon"]

At 222 characters, the list is much too long to fit in a manageable window.

Example script in IDLE with a list of provinces cut off after New Brunswick.

Instead of using backslashes, the line of code can be broken up by pressing Enter, without using any backslashes, as follows:

provinces = ["Alberta", "British Columbia", "Manitoba",
"New Brunswick", "Newfoundland and Labrador",
"Northwest Territories", "Nova Scotia",
"Nunavut", "Ontario", "Prince Edward Island",
"Quebec", "Saskatchewan", "Yukon"]

A Python list requires a pair of brackets, but the first line does not end with a closing bracket. As a result, the Python interpreter continues to read the next line(s) until the closing bracket is encountered. Everything up to and including the closing bracket is part of the same line of code. Line continuation is implied through syntax—i.e., a pair of brackets in this case.

The example presents a bit of a legibility issue, because at quick glance it is easy to overlook the brackets, and all the lines of code start at the same position. As a result, it is common to use a special form of indentation that aligns the subsequent lines of code with the opening bracket, as follows:

provinces = ["Alberta", "British Columbia", "Manitoba",
"New Brunswick", "Newfoundland and Labrador",
"Northwest Territories", "Nova Scotia",
"Nunavut", "Ontario", "Prince Edward Island",
"Quebec", "Saskatchewan", "Yukon"]

In a typical Python editor, including IDLE, this indentation is applied automatically when you press Enter following an opening bracket (or parenthesis or brace) if you are not using a closing bracket (or parenthesis or brace). The indentation is created using spaces, but these are ignored when the code is run.

Example script in IDLE where the line of code for provinces is automatically indented following an opening bracket when breaking the code into multiple lines.

There is no specific name for this type of indentation, but it can be described as aligning the indentation with the opening bracket (or parenthesis or brace).

Implied line continuation works for anything that uses parentheses, brackets, or braces. Functions use parentheses, and line continuation is often used for functions with many arguments. Consider the Emerging Hot Spot Analysis tool in ArcGIS Pro. Following is one of the examples in the tool reference in the ArcGIS Pro help pages:

import arcpy
arcpy.env.workspace = r"C:\STPM"
arcpy.EmergingHotSpotAnalysis_stpm("Homicides.nc", "COUNT", "EHS_Homicides.shp", "5 Miles", 2, "#", "FIXED_DISTANCE", "3")

The third line is 133 characters long. The legibility of the script can be improved using implied line continuation, as shown in the figure.

Implied line continuation.
Description

Example script in IDLE with the parameters for the Emerging HotSpotAnalysis function broken into multiple lines using implied line continuation.

The indentation does not have to align with the opening parenthesis and can essentially be anywhere to improve legibility. This style is referred to as hanging indentation.

Example script in IDLE with the parameters to a function broken into multiple lines using hanging indentation.

Generally, implied line continuation is preferred over the use of backslashes. This preference presents an issue for the earlier example of a long string as there are no parentheses, brackets, or braces to work with. The solution is to add a pair of parentheses just for the purpose of line continuation. This syntax presents another issue because any line break inside the string results in an error: EOL while scanning string literal, where EOL stands for End Of Line.

Syntax error due to a line break inside a string.

The error is caused by a missing closing quote for the string in the first line. A missing quotation mark does not indicate implied line continuation. The solution is to use a pair of quotation marks for each line, as shown in the figure.

Use of quotation marks to allows for implied line continuation using strings.

Line continuation can be a bit cumbersome, especially establishing a meaningful and consistent alignment, but it greatly enhances the readability of your code.

4.25 Getting user input

Many Python scripts require inputs from outside the script itself. There are several ways to obtain these inputs, including the use of the input() function:

>>> x = input("")

Running this line of code in the interactive Python interpreter allows a user to enter a value. Exactly how this takes place varies with the IDE being used. For example, when the preceding line is run in IDLE, the cursor jumps to the next line of code but without a command prompt.

Interactive interpreter in IDLE showing the cursor without a command prompt.

After you type a value and press Enter, the command prompt appears again at the next line, and you can use the value in your code. The following example asks a user to enter an integer value:

>>> x = input("Type an integer and press Enter: ")

The message is printed to the next line, and the cursor is paused at the end of the message for a user to enter a value.

Type an integer and press Enter:

Once the user enters a value and presses Enter, this value is assigned to the variable. The input() function always returns a string. Therefore, if you want to use the numeric input as a number, the value must be cast as an integer. For example, the following code calculates the squared values of the user input and prints the result to a message:

>>> x = input("Type an integer and press Enter: ")
Type an integer and press Enter: 12
>>> y = int(x)**2
>>> print("The squared value of your integer is: {0}".format(y))
The squared value of your integer is: 144

Different Python editors vary slightly in terms of how user input is obtained using the input() function. For example, the same code in PyCharm is shown in the figure.

Python Console in PyCharm with a >? prompt for user input.

The actual Python code is identical, but the >? prompt makes it clear that a user is expected to type a value.

The Python window in ArcGIS Pro, on the other hand, is not set up to receive user input in this manner. When you run the same code in the Python window, you will receive an error.

Error message in the Python window when using the input() function.

The Python window works fine as an interactive interpreter for most tasks, but it does not have all the functionality of a regular Python IDE.

Although user input is sometimes important, for geoprocessing scripts these inputs don’t typically consist of values a user would type. Instead, user inputs consist of things such as input feature classes and tool parameters.

4.26 File management

Many tasks require working with folders and files. There are several built-in modules in Python to facilitate these tasks. The most important of these is the os module, which includes several functions related to the operating system.

A commonly used function is the os.mkdir() function, which creates a new folder, also referred to as a directory. The following example creates a new folder in the current working directory:

import os
os.mkdir("test")

As discussed in an earlier section, to use a module, it first must be imported. You can then call the function using the <module>.<function> syntax.

By default, the current working directory is the folder where the script is stored. You can also create a folder in a different location by specifying the complete path:

import os
os.mkdir("C:/Data/testing")

The implicit assumption here is that C:\Data already exists.

Instead of writing full paths, it is common to combine different elements into a path. Consider the example of a variable that contains a working directory that is different from your script, and you want to create one or more directories inside this working directory. You can create the full path using the os.path.join() function:

import os
folder = "C:/Projects/Demo"
subfolder = "test1"
newpath = os.path.join(folder, subfolder)
os.mkdir(newpath)

Note that, in this example, os.path is a submodule of the os module, and join() is a function of the os.path module. The os.path.join() function concatenates the two strings into a new string and ensures the correct backslash is included between the two strings. You could have created this path using regular string concatenation, as follows:

newpath = folder + "/" + subfolder

However, it is cumbersome to check which backslash and how many to include to create a valid path, so the use of os.path.join() is recommended when creating paths.

You can create a list of files in a folder using the os.listdir() function. This function returns a list of file names. The following example prints the names of all files in a specified folder:

import os
folder = "C:/Projects/Demo"
files = os.listdir(folder)
for f in files:
print()

A common scenario is to look for files with a specific file extension. You can use the string method endswith() to check the file extension of each file name. This method is case sensitive, which means you should first convert the file names to lowercase using the lower() method. The following example prints the names of all files in a specified folder with file extension .gif or .GIF:

import os
folder = "C:/Demo"
files = os.listdir(folder)
for f in files:
if f.lower().endswith(".gif"):
print(f)

In addition to illustrating the use of the os module, this example also illustrates the use of a for loop to iterate over a list and the use of an if statement to check a condition. Note that if there are no GIF files in the folder, the script does not print anything.

A few other observations are in order. First, os.listdir() returns a list of strings, so the variable files is a list. A for loop is used to iterate over the list. Every element of the list is a string, so the variable f is a string. The lower() method returns a lowercase string, so f.lower() is also a string. Finally, the endswith() method returns a Boolean depending on whether the string ends with the specific suffix. This result makes it possible to use it directly in a condition.

There are many other uses of the os module; the following chapters cover some of them. For example, os.path.exists() is used to determine whether a specific file folder already exists, and os.walk() is used to navigate through all the folders, subfolders, and file names of a specific folder.

There are several other modules to work with folders and files. They include the glob module (short for “global”) to search for patterns of file names and the shutil module (short for “shell utilities”) for file operations such as copying, renaming, or removing.

4.27 Commenting scripts

Well-developed scripts contain comments that help users understand them. Consider the script generatepointsfromlines.py that is used by the Generate Points Along Lines geoprocessing tool.The first section provides the name of the script, the author, and a brief description. In this case, the tool was authored by Esri staff. Comments are preceded by the number sign (#), also referred to as a “hash mark.” When the script is run, any line that starts with the # symbol is not executed. In IDLE, the default color for comments is red.

Points script.
Description

Contents of the generatepointsfromlines.py script in IDLE with comments at the top of the script.

Using the number sign is not the only way to indicate that a line is a comment. Sometimes double number signs (##) are used instead, although this symbol is often reserved for temporarily commenting out code to use it later. The effect is the same: lines of code that start with one or more number signs are not run as part of the script.

Comments can also be placed on the same line after pieces of code. For example, consider a sample from the same generatepointsfromlines.py script. In the sample shown, a line of code is followed by a comment on the same line. The comment is preceded by the number sign and is shown in red.

Screen capture of a script in IDLE where a line of code is followed by a comment on the same line.

Again, the text following the number sign is not run, but the code preceding it on the same line is executed.

Sometimes your comment will require multiple lines. Unlike other languages, Python does not have a way to write multiline comments. Consider the following example:

# This is a comment,
but this is not.

The first line will be ignored and read as a comment, but the second line will be read as code and result in an error. So you must start every line of comment with the # symbol, as follows:

# This is a comment,
# and so is this.

An alternative is to use a multiline string by wrapping comments inside a pair of triple quotation marks, also referred to as “triple quotes”:

"""
This is a multiline string,
which is technically not a comment,
but is widely used.
"""

Triple quotes mean that the value inside is a string. Because it is not assigned to a variable, it is not used anywhere in the script. It does not do anything at runtime and effectively acts as a comment. However, triple quotes are also used for docstrings, which provide documentation for things such as functions and classes directly in the script. Because triple quotes are used for docstrings, their use for comments is not recommended.

Comments are used to provide generic descriptions of who authored the script and how it works, as well as any pertinent details about specific code elements. Commenting helps others understand how the script works, and can also help the original coder remember how the code was created and why.

Commenting is not required for a script to work properly. However, using comments is good coding practice—both as a service to others who use the script as well as a reminder to yourself of how the code works. At a minimum, each script should contain a heading section that describes what the script does, who created it and when, and what the requirements are for the script to run.

In addition to comments, you can use blank lines to organize code. Technically, blank lines in a Python script are ignored on execution, but they make it easier to read the code. Usually, blank lines are used to keep lines of related code together and separate them from other sections. Like comments, blank lines are not required for a script to work properly, but they make code easier to read.

4.28 Following coding guidelines

Python enforces certain coding standards, and code that does not meet these standards will produce errors. In addition, coding guidelines can assist in making sure your code is not only error free, but also efficient, portable, and easy to follow. These guidelines are formalized in the Style Guide for Python Code, also known as PEP 8. This is part of a larger set of Python Enhancement Proposals (PEPs).

Following are several coding guidelines for the topics covered so far. These guidelines reflect PEP 8 as well as other considerations. As you learn more about Python and start writing your own scripts, it is a good idea to become familiar with the complete style guide, located at http://www.python.org/dev/peps/pep-0008. Some pointers follow.

Variable names

  • Start with a character, and avoid using special characters such as an asterisk (*).
  • Use all lowercase, such as mycount.
  • Underscores (_) can be used if they improve readability, such as count_final.
  • Use descriptive variable names, and avoid using slang terms or abbreviations.
  • Keep variable names short.

Script names

  • Script names should follow the preceding variable naming guidelines—that is, use all lowercase, and use underscores to improve readability.

Indentation

  • The use of a single tab is recommended to define each indentation level.
  • Never mix tabs and spaces for indentation.

Comments

  • Scripts should contain adequate commenting. Each logical section should have an explanation.
  • Each script should have a header that contains the script name, a description of how the script works, its requirements, who wrote it, and when it was written.

It is important to recognize that following the Python guidelines is not required—that is, if you break from the guidelines, it will not necessarily result in errors. However, following the guidelines will improve the consistency and readability of your code.