A short Python program can happily live in a single file—but as soon as you write a larger program, you need a way to organize its code. Above functions, Python has two more levels of code organization: functions and other code live in modules, and modules live in packages. Let’s start by looking at modules.
A module defines entities such as constants and functions that you can import and use in a program. Aside from some of the in-built modules of the Python interpreter, a module is a Python file.
For example, here is a module named my_module.py:
| THE_ANSWER = 42 |
| |
| |
| def ask(): |
| return THE_ANSWER |
This file defines a function and a constant. Now imagine that we have a Python program in the same directory. This program can import either (or both) definitions with the import keyword:
| from my_module import ask, THE_ANSWER |
When you import a module, two things happen:
For example, now my_program can call the ask function:
| ask() # => 42 |
Note that the code in the module is executed only the first time you import it. If you import a module more than once, Python marks it as “already imported” the first time around, and ignores subsequent imports.
Instead of cherry-picking the names you want to import, as we did before, you could instead import the entire module:
| import my_module |
This line will import all the names defined in my_module.py. When you import an entire module like that, however, those names might clash with other names in the main program, or in another module. To avoid those clashes, Python forces you to prefix the names with the name of the module, like this:
| my_module.ask() # => 42 |
| my_module.THE_ANSWER # => 42 |
To avoid prefixing the same long module name dozens of time, you can give a shorter name to the module when you import it, like this:
| import my_module as mm |
| mm.ask() # => 42 |
For example, the numpy library is almost always shortened to np, like this:
| import numpy as np |
After the renaming, you can make a reference to the NumPy functions with the np prefix, as in np.multiply(x, y).
When it comes to modules, we have one last topic to mention. It concerns a very common Python idiom.
We said that a file of Python code can be either a program, or a module, depending on how you use it. You execute a program directly, with a command like python3 my_code.py. By contrast, you use a module by importing it in another file.
However, it’s common for the same Python file to play both roles. A file with this binary nature can either be run as a stand-alone program, or imported as a module. Here is one such file:
| print("Executing the code in greetings.py") |
| |
| |
| def greet(name): |
| print("Hello,", name) |
| |
| |
| if __name__ == "__main__": |
| greet("human") |
Ignore the last two lines in the file for a minute. If we import greetings.py from another file, the usual things happen: first, the code in greetings.py is executed; and second, we can access the greet function:
| import greetings |
| greetings.greet("Bill") |
If you run python3 greetings_demo.py, you get:
| Executing the code in greetings.py |
| Hello, Bill |
However, you can also run greetings.py as a stand-alone program, by typing python3 greetings.py. In that case, you get:
| Executing the code in greetings.py |
| Hello, human |
The secret to running the file as a program is in the idiom if __name__ == "__main__". (That’s a double underscore both before and after name and main.) This idiom stands for: “only execute the following code if this file is run directly.” By contrast, if the file gets imported, then the Python interpreter skips the if block.
To see how this idiom is useful, imagine writing a program that defines a bunch of functions, and then uses those functions to interact with the user. When you load the file from another program, you want to skip the user interaction—but you still want to access the functions, to reuse or test them. You can fence the user interaction behind the if… "__main__" idiom, and it will be executed only when the file runs as the main program.
You’ll see the if… "__main__" idiom throughout this book, and in the source code of most Python libraries.
Above modules, packages are the next level of code organization. A package is essentially a bundle of modules, organized in a directory structure.
In this book, we don’t define our own packages—but we use them all the time, for one reason: when you install a Python library, that library comes in the form of a package.
There are multiple ways to install Python libraries. Most Python developers use the pip package manager. Others prefer an alternative tool named Conda, which I already mentioned as one way to install the Python language itself. Let’s look at both tools.
pip[45] is Python’s official package manager. Its name is a recursive acronym that stands for “pip Installs Packages.” (Yup, the Python community has a warped sense of humor. After all, the name of the language is a homage to the Monty Pythons.)
If you have Python installed, chances are you also have pip. You can use it to install one of the many packages from PyPI,[46] that stands for “Python Package Index”—Python’s official package repository. For example, this command installs version 2.2.4 of the Keras machine learning library:
| pip3 install keras==2.2.4 |
Once you have Keras installed, you can use its modules from a Python program. This line imports the serialize function from the keras.metrics module:
| from keras.metrics import serialize |
To be precise, keras is a module in the Keras library, and metrics is a submodule of keras.
pip has all the features you expect in a package manager: you can install a specific version of a package, list the packages installed, and so on. If you’re looking for a simple out-of-the-box system to install libraries, pip has you covered. If you want something more sophisticated… then keep reading.
Conda[47] is the package manager of choice in the ML community. It’s part of a hefty Python distribution called Anaconda[48] that’s especially tailored to data science.
Anaconda comes with a lot of bells and whistles, including an IDE and its own repository of packages, separated from the official Python repository. If you don’t need the extras, then you can install Miniconda,[49] which is a much slimmer install that only includes Conda and Python.
When it comes to installing a package, Conda works pretty much the same as pip:
| conda install keras=2.2.4 |
However, Conda has a couple of selling points over pip. For one, where pip focuses on Python libraries, Conda can handle data science packages written in different languages. Also, Conda allows you to create “environments” that you can activate and deactivate on the fly. Each environment can have a different set of libraries. By contrast, packages installed with pip are global: all the Python code on your machine sees the same version of the package.
Conda also integrates well with pip: if you want a package that’s only available in the PyPI repository, but not in Conda’s repository, you can run pip install in a Conda environment, and the package will only be visible in that environment.
To sum it up, the choice between pip and Conda usually boils down to this: if you’re okay with globally installed packages, then use pip; if you prefer to maintain separate environments that contain different packages (for example, a different set of packages for each project), then use Conda.
You’ll need to install a few packages to run the code in this book. Setting Up Your System contains instructions to install them with pip. If you opt for Conda, then take a look at the readme.txt in the book’s source code.