Chapter 3

Using Python to Work with Algorithms

IN THIS CHAPTER

check Using Python to discover how algorithms work

check Considering the various Python distributions

check Performing a Python installation on Linux

check Performing a Python installation on OS X

check Performing a Python installation on Windows

check Obtaining and installing the datasets used in this book

You have many good choices when it comes to using computer assistance to discover the wonders of algorithms. For example, apart from Python, many people rely on MATLAB and many others use R. In fact, some people use all three and then compare the sorts of outputs they get (see one such comparison at https://www.r-bloggers.com/evaluating-optimization-algorithms-in-matlab-python-and-r/). If you just had the three choices, you’d still need to think about them for a while and might choose to learn more than one language, but you actually have more than three choices, and this book can’t begin to cover them all. If you get deep into the world of algorithms, you discover that you can use all programming languages to write algorithms and that some are appreciated because they boil everything down to simple operations, such as the RAM simulation described in Chapter 2. For instance, Donald Knuth, winner of the Turing Award, wrote examples in Assembly language in his book The Art of Computer Programming (Addison-Wesley). Assembly language is a programming language that resembles machine code, the language used natively by computers (but not understandable by most humans).

This book uses Python for a number of good reasons, including the community support it enjoys and the fact that it’s full featured, yet easy to learn. Python is also a verbose language, resembling how a human creates instructions rather than how a computer interprets them. The first section of this chapter fills in the details of why this book uses Python for the examples, but also tells you why other options are useful and why you may need to consider them as your journey continues.

When you speak a human language, you add nuances of meaning by employing specific word combinations that others in your community understand. The use of nuanced meaning comes naturally and represents a dialect. In some cases, dialects also form because one group wants to demonstrate a difference with another group. For example, Noah Webster wrote and published A Grammatical Institute of the English Language, in part to remove the influence of the British aristocracy from the American public (see http://connecticuthistory.org/noah-webster-and-the-dream-of-a-common-language/ for details). Likewise, computer languages often come with flavors, and vendors purposely add extensions that make their product unique to provide a reason to buy their product over another offering.

The second section of the chapter introduces you to various Python distributions, each of which provides a Python dialect. This book uses Analytics Anaconda, which is the product you should use to obtain the best results from your learning experience. Using another product, essentially another dialect, can cause problems in making the examples work — the same sort of thing that happens sometimes when someone who speaks British English talks to someone who speaks American English. However, knowing about other distributions can be helpful when you need to obtain access to features that Anaconda may not provide.

The next three sections of this chapter help you install Anaconda on your platform. The examples in this book are tested on the Linux, Mac OS X, and Windows platforms. They may also work with other platforms, but the examples aren’t tested on these platforms, so you have no guarantee that they’ll work. By installing Anaconda using the procedures found in this chapter, you reduce the chance of getting an installation that won’t work with the example code. To use the examples in this book, you must install Anaconda 4.2.0 with support for Python 3.5. Other versions of Anaconda and Python may not work with the example code because, as with human language dialects, they could misunderstand the instructions that the code provides.

Algorithms work with data in specific ways. To see particular output from an algorithm, you need consistent data. Fortunately, the Python community is busy creating datasets that anyone can use for testing purposes. This allows the community to repeat results that others get without having to download custom datasets from an unknown source. The final section of this chapter helps you get and install the datasets needed for the examples.

Considering the Benefits of Python

To work with algorithms on a computer, you need some means of communicating with the computer. If this were Star Trek, you could probably just tell the computer what you want and it would dutifully perform the task for you. In fact, Scotty seems quite confused about the lack of a voice computer interface in Star Trek IV (see http://www.davidalison.com/2008/07/keyboard-vs-mouse.html for details). The point is that you still need to use the mouse and keyboard, along with a special language, to communicate your ideas to the computer because the computer isn’t going to make an effort to communicate with you. Python is one of a number of languages that is especially adept at making it easy for nondevelopers to communicate ideas to the computer, but it isn’t the only choice. The following paragraphs help you understand why this book uses Python and what your other choices are.

Understanding why this book uses Python

Every computer language available today translates algorithms into a form that the computer can process. In fact, languages like ALGOL (ALGOrithmic Language) and FORTRAN (FORmula TRANslation) make this focus clear. Remember the definition of an algorithm from Chapter 1 as being a sequence of steps used to solve a problem. The method used to perform this translation differs by language, and the techniques used by some languages are quite arcane, requiring specialized knowledge even to make an attempt.

remember Computers speak only one language, machine code (the 0s and 1s that a computer interprets to perform tasks), which is so incredibly hard for humans to speak that early developers created a huge array of alternatives. Computer languages exist to make human communication with computers easier. Consequently, if you find yourself struggling to make anything work, perhaps you have the wrong language. It’s always best to have more than one language at your fingertips so that you can perform computer communications with ease. Python happens to be one of the languages that works exceptionally well for people who work in disciplines outside application development.

Python is the vision of a single person, Guido van Rossum (see his home page at https://gvanrossum.github.io/). You might be surprised to learn that Python has been around for a long time — Guido started the language in December 1989 as a replacement for the ABC language. Not much information is available as to the precise goals for Python, but it does retain ABC’s capability to create applications using less code. However, it far exceeds the capability of ABC to create applications of all types, and in contrast to ABC, boasts four programming styles. In short, Guido took ABC as a starting point, found it limited, and created a new language without those limitations. It’s an example of creating a new language that really is better than its predecessor.

Python has gone through a number of iterations and currently has two development paths. The 2.x path is backward compatible with previous versions of Python; the 3.x path isn’t. The compatibility issue is one that figures into how you use Python to perform algorithm-related tasks because at least some of the packages won’t work with 3.x. In addition, some versions use different licensing because Guido was working at various companies during Python’s development. You can see a listing of the versions and their respective licenses at https://docs.python.org/3/license.html. The Python Software Foundation (PSF) owns all current versions of Python, so unless you use an older version, you really don’t need to worry about the licensing issue.

technicalstuff Guido actually started Python as a skunkworks project (a project developed by a small and loosely structured group of people). The core concept was to create Python as quickly as possible, yet create a language that is flexible, runs on any platform, and provides significant potential for extension. Python provides all these features and many more. Of course, there are always bumps in the road, such as figuring out just how much of the underlying system to expose. You can read more about the Python design philosophy at http://python-history.blogspot.com/2009/01/pythons-design-philosophy.html. The history of Python at http://python-history.blogspot.com/2009/01/introduction-and-overview.html also provides some useful information.

The original development (or design) goals for Python don’t quite match what has happened to the language since that time. Guido originally intended Python as a second language for developers who needed to create one-off code but who couldn’t quite achieve their goals using a scripting language. The original target audience for Python was the C developer. You can read about these original goals in the interview at http://www.artima.com/intv/pyscale.html.

You can find a number of applications written in Python today, so the idea of using it solely for scripting didn’t come to fruition. In fact, you can find listings of Python applications at https://www.python.org/about/apps/ and https://www.python.org/about/success/.

Naturally, with all these success stories to go on, people are enthusiastic about adding to Python. You can find lists of Python Enhancement Proposals (PEPs) at http://legacy.python.org/dev/peps/. These PEPs may or may not see the light of day, but they prove that Python is a living, growing language that will continue to provide features that developers truly need to create great applications of all types.

Working with MATLAB

Python has advantages over many other languages by offering multiple coding styles, fantastic flexibility, and great extensibility, but it’s still a programming language. If you honestly don’t want to use a programming language, you do have other options, such as MATLAB (https://www.mathworks.com/products/matlab/), which focuses more on algorithms. MATLAB is still a scripting language of a sort, and to perform any significant tasks with it, you still need to know a little about coding, but not as much as with Python.

One of the major issues with using MATLAB is the price you pay. Unlike Python, MATLAB requires a monetary investment on your part (see https://www.mathworks.com/pricing-licensing/ for licensing costs). The environment is indeed easier to use, but as with most things, there is no free lunch, and you must consider the cost differential as part of determining which product to use.

Many people are curious about MATLAB, that is, its strengths and weaknesses when compared to Python. This book doesn’t have room to provide a full comparison, but you can find a great overview at http://www.pyzo.org/python_vs_matlab.html. In addition, you can call Python packages from MATLAB using the techniques found at https://www.mathworks.com/help/matlab/call-python-libraries.html. In fact, MATLAB also works with the following:

Therefore, you don’t necessarily have to choose between MATLAB and Python (or other language), but the more Python features you use, the easier it becomes to simply work with Python and skip MATLAB. You can discover more about MATLAB in MATLAB For Dummies, by Jim Sizemore and John Paul Mueller (Wiley).

Considering other algorithm testing environments

A third major contender for algorithm-related work is R. The R programming language, like Python, is free of charge. It also supports a large number of packages and offers great flexibility. Some of the programming constructs are different, however, and some people find R harder to use than Python. Most people view R as the winner when it comes to performing statistics, but they see the general-purpose nature of Python as having major benefits (see the articles at https://www.datacamp.com/community/tutorials/r-or-python-for-data-analysis and http://www.kdnuggets.com/2015/05/r-vs-python-data-science.html). The stronger community support for Python is also a major advantage.

As previously mentioned, you can use any computer programming language to perform algorithm-related work, but most languages have a specific purpose in mind. For example, you can perform algorithm-related tasks using a language such as Structured Query Language (SQL), but this language focuses on data management, so some algorithm-related tasks might become convoluted and difficult to perform. A significant lack in SQL is the ability to plot data with ease and to perform some of the translations and transformations that algorithm-specific work requires. In short, you need to consider what you plan to do when choosing a language. This book uses Python because it truly is the best overall language to perform the tasks at hand, but it’s important to realize that you may need another language at some point.

Looking at the Python Distributions

You can quite possibly obtain a generic copy of Python and add all the packages required to work with algorithms to it. The process can be difficult because you need to ensure that you have all the required packages in the correct versions to guarantee success. In addition, you need to perform the configuration required to make sure that the packages are accessible when you need them. Fortunately, going through the required work is not necessary because numerous Python products that work well with algorithms are available for you to use. These products provide everything needed to get started with algorithm-related projects.

remember You can use any of the packages mentioned in the following sections to work with the examples in this book. However, the book’s source code and downloadable source code rely on Continuum Analytics Anaconda 4.2.0 because this particular package works on every platform this book supports: Linux, Mac OS X, and Windows. The book doesn’t mention a specific package in the chapters that follow, but any screenshots reflect how things look when using Anaconda on Windows. You may need to tweak the code to use another package, and the screens will look different if you use Anaconda on some other platform.

warning Windows 10 presents some serious installation issues when working with Python. You can read about these issues on my (John’s) blog at http://blog.johnmuellerbooks.com/2015/10/30/python-and-windows-10/. Given that so many readers of my other Python books have sent feedback saying that Windows 10 doesn’t provide a good environment, I can’t recommend Windows 10 as a Python platform for this book. If you’re working with Windows 10, simply be aware that your road to a Python installation will be a rocky one.

Obtaining Analytics Anaconda

The basic Anaconda package is a free download that you obtain at https://store.continuum.io/cshop/anaconda/. Simply click Download Anaconda to obtain access to the free product. You do need to provide an email address to get a copy of Anaconda. After you provide your email address, you go to another page, where you can choose your platform and the installer for that platform. Anaconda supports the following platforms:

  • Windows 32-bit and 64-bit (the installer may offer you only the 64-bit or 32-bit version, depending on which version of Windows it detects)
  • Linux 32-bit and 64-bit
  • Mac OS X 64-bit

Because package support for Python 3.5 has gotten better than previous 3.x versions, you see both Python 3.x and 2.x equally supported on the Analytics site. This book uses Python 3.5 because the package support is now substantial enough and stable enough to support all the programming examples, and because Python 3.x represents the future direction of Python.

tip You can obtain Anaconda with older versions of Python. If you want to use an older version of Python, click the installer archive link near the bottom of the page. You should use an older version of Python only when you have a pressing need to do so.

The Miniconda installer can potentially save time by limiting the number of features you install. However, trying to figure out precisely which packages you do need is an error-prone and time-consuming process. In general, you want to perform a full installation to ensure that you have everything needed for your projects. Even a full install doesn’t require much time or effort to download and install on most systems.

The free product is all you need for this book. However, when you look on the site, you see that many other add-on products are available. These products can help you create robust applications. For example, when you add Accelerate to the mix, you obtain the capability to perform multicore and GPU-enabled operations. The use of these add-on products is outside the scope of this book, but the Anaconda site provides details on using them.

Considering Enthought Canopy Express

Enthought Canopy Express is a free product for producing both technical and scientific applications using Python. You can obtain it at https://www.enthought.com/canopy-express/. Click Download Free on the main page to see a listing of the versions that you can download. Only Canopy Express is free; the full Canopy product comes at a cost. However, you can use Canopy Express to work with the examples in this book. Canopy Express supports the following platforms:

  • Windows 32-bit and 64-bit
  • Linux 32-bit and 64-bit
  • Mac OS X 32-bit and 64-bit

Choose the platform and version you want to download. When you click Download Canopy Express, you see an optional form for providing information about yourself. The download starts automatically, even if you don’t provide personal information to the company.

One of the advantages of Canopy Express is that Enthought is heavily involved in providing support for both students and teachers. People also can take classes, including online classes, that teach the use of Canopy Express in various ways (see https://training.enthought.com/courses).

Considering pythonxy

The pythonxy Integrated Development Environment (IDE) is a community project hosted on Google at http://python-xy.github.io/. It’s a Windows-only product, so you can’t easily use it for cross-platform needs. (In fact, it supports only Windows Vista, Windows 7, and Windows 8.) However, it does come with a full set of packages, and you can easily use it for this book if you want.

Because pythonxy uses the GNU General Public License (GPL) v3 (see http://www.gnu.org/licenses/gpl.html), you have no add-ons, training, or other paid features to worry about. No one will come calling at your door hoping to sell you something. In addition, you have access to all the source code for pythonxy, so you can make modifications if you want.

Considering WinPython

The name tells you that WinPython is a Windows-only product that you can find at http://winpython.sourceforge.net/. This product is actually a spin-off of pythonxy and isn’t meant to replace it. Quite the contrary: WinPython is simply a more flexible way to work with pythonxy. You can read about the motivation for creating WinPython at http://sourceforge.net/p/winpython/wiki/Roadmap/.

The bottom line for this product is that you gain flexibility at the cost of friendliness and a little platform integration. However, for developers who need to maintain multiple versions of an IDE, WinPython may make a significant difference. When using WinPython with this book, make sure to pay particular attention to configuration issues or you’ll find that even the downloadable code has little chance of working.

Installing Python on Linux

You use the command line to install Anaconda on Linux — you’re given no graphical installation option. Before you can perform the install, you must download a copy of the Linux software from the Continuum Analytics site. You can find the required download information in the “Obtaining Analytics Anaconda” section, earlier in this chapter. The following procedure should work fine on any Linux system, whether you use the 32-bit or 64-bit version of Anaconda:

  1. Open a copy of Terminal.

    The Terminal window appears.

  2. Change directories to the downloaded copy of Anaconda on your system.

    The name of this file varies, but normally it appears as Anaconda3-4.2.0-Linux-x86.sh for 32-bit systems and Anaconda3-4.2.0-Linux-x86_64.sh for 64-bit systems. The version number is embedded as part of the filename. In this case, the filename refers to version 4.2.0, which is the version used for this book. If you use some other version, you may experience problems with the source code and need to make adjustments when working with it.

  3. Type bash Anaconda3-4.2.0-Linux-x86.sh (for the 32-bit version) or bash Anaconda3-4.2.0-Linux-x86_64.sh (for the 64-bit version) and press Enter.

    An installation wizard starts that asks you to accept the licensing terms for using Anaconda.

  4. Read the licensing agreement and accept the terms using the method required for your version of Linux.

    The wizard asks you to provide an installation location for Anaconda. The book assumes that you use the default location of ~/anaconda. If you choose some other location, you may have to modify some procedures later in the book to work with your setup.

  5. Provide an installation location (if necessary) and press Enter (or click Next).

    The application extraction process begins. After the extraction is complete, you see a completion message.

  6. Add the installation path to your PATH statement using the method required for your version of Linux.

    You’re ready to begin using Anaconda.

Installing Python on MacOS

The Mac OS X installation comes in only one form: 64-bit. Before you can perform the install, you must download a copy of the Mac software from the Continuum Analytics site. You can find the required download information in the “Obtaining Analytics Anaconda” section, earlier in this chapter.

The installation files come in two forms. The first depends on a graphical installer; the second relies on the command line. The command-line version works much like the Linux version described in the “Installing Python on Linux” section of this chapter. The following steps help you install Anaconda 64-bit on a Mac system using the graphical installer:

  1. Locate the downloaded copy of Anaconda on your system.

    The name of this file varies, but normally it appears as Anaconda3-4.2.0-MacOSX-x86_64.pkg. The version number is embedded as part of the filename. In this case, the filename refers to version 4.2.0, which is the version used for this book. If you use some other version, you may experience problems with the source code and need to make adjustments when working with it.

  2. Double-click the installation file.

    An introduction dialog box appears.

  3. Click Continue.

    The wizard asks whether you want to review the Read Me materials. You can read these materials later. For now, you can safely skip the information.

  4. Click Continue.

    The wizard displays a licensing agreement. Be sure to read through the licensing agreement so that you know the terms of usage.

  5. Click I Agree if you agree to the licensing agreement.

    The wizard asks you to provide a destination for the installation. The destination controls whether the installation is for an individual user or a group.

    warning You may see an error message stating that you can’t install Anaconda on the system. The error message occurs because of a bug in the installer and has nothing to do with your system. To get rid of the error message, choose the Install Only for Me option. You can’t install Anaconda for a group of users on a Mac system.

  6. Click Continue.

    The installer displays a dialog box containing options for changing the installation type. Click Change Install Location if you want to modify where Anaconda is installed on your system. (The book assumes that you use the default path of ~/anaconda.) Click Customize if you want to modify how the installer works. For example, you can choose not to add Anaconda to your PATH statement. However, the book assumes that you have chosen the default install options, and no good reason exists to change them unless you have another copy of Python 3.5 installed somewhere else.

  7. Click Install.

    The installation begins. A progress bar tells you how the installation process is progressing. When the installation is complete, you see a completion dialog box.

  8. Click Continue.

    You’re ready to begin using Anaconda.

Installing Python on Windows

Anaconda comes with a graphical installation application for Windows, so getting a good install means using a wizard, as you would for any other installation. Of course, you need a copy of the installation file before you begin, and you can find the required download information in the “Obtaining Analytics Anaconda” section, earlier in this chapter. The following procedure should work fine on any Windows system, whether you use the 32-bit or the 64-bit version of Anaconda:

  1. Locate the downloaded copy of Anaconda on your system.

    The name of this file varies, but normally it appears as Anaconda3-4.2.0-Windows-x86.exe for 32-bit systems and Anaconda3-4.2.0-Windows-x86_64.exe for 64-bit systems. The version number is embedded as part of the filename. In this case, the filename refers to version 4.2.0, which is the version used for this book. If you use some other version, you may experience problems with the source code and need to make adjustments when working with it.

  2. Double-click the installation file.

    (You may see an Open File – Security Warning dialog box that asks whether you want to run this file. Click Run if you see this dialog box pop up.) You see an Anaconda 4.2.0 Setup dialog box similar to the one shown in Figure 3-1. The exact dialog box that you see depends on which version of the Anaconda installation program you download. If you have a 64-bit operating system, using the 64-bit version of Anaconda is always best so that you obtain the best possible performance. This first dialog box tells you when you have the 64-bit version of the product.

  3. Click Next.

    The wizard displays a licensing agreement. Be sure to read through the licensing agreement so that you know the terms of usage.

  4. Click I Agree if you agree to the licensing agreement.

    You’re asked what sort of installation type to perform, as shown in Figure 3-2. In most cases, you want to install the product just for yourself. The exception is if you have multiple people using your system and they all need access to Anaconda.

  5. Choose one of the installation types and then click Next.

    The wizard asks where to install Anaconda on disk, as shown in Figure 3-3. The book assumes that you use the default location. If you choose some other location, you may have to modify some procedures later in the book to work with your setup.

  6. Choose an installation location (if necessary) and then click Next.

    You see the Advanced Installation Options, shown in Figure 3-4. These options are selected by default, and no good reason exists to change them in most cases. You might need to change them if Anaconda won’t provide your default Python 3.5 (or Python 2.7) setup. However, the book assumes that you’ve set up Anaconda using the default options.

  7. Change the advanced installation options (if necessary) and then click Install.

    You see an Installing dialog box with a progress bar. The installation process can take a few minutes, so get yourself a cup of coffee and read the comics for a while. When the installation process is over, you see a Next button enabled.

  8. Click Next.

    The wizard tells you that the installation is complete.

  9. Click Finish.

    You’re ready to begin using Anaconda.

image

FIGURE 3-1: The setup process begins by telling you whether you have the 64-bit version.

image

FIGURE 3-2: Tell the wizard how to install Anaconda on your system.

image

FIGURE 3-3: Specify an installation location.

image

FIGURE 3-4: Configure the advanced installation options.

Downloading the Datasets and Example Code

This book is about using Python to perform machine learning tasks. Of course, you can spend all your time creating the example code from scratch, debugging it, and only then discovering how it relates to machine learning, or you can take the easy way and download the prewritten code from the Dummies site (see the Introduction of this book for details) so that you can get right to work. Likewise, creating datasets large enough for algorithm learning purposes would take quite a while. Fortunately, you can access standardized, precreated data sets quite easily by using features provided in some of the data science packages (which also work just fine for all sorts of purposes, including learning to work with algorithms). The following sections help you download and use the example code and datasets so that you can save time and get right to work with algorithm-specific tasks.

Using Jupyter Notebook

To make working with the relatively complex code in this book easier, you use Jupyter Notebook. This interface lets you easily create Python notebook files that can contain any number of examples, each of which can run individually. The program runs in your browser, so which platform you use for development doesn’t matter; as long as it has a browser, you should be okay.

Starting Jupyter Notebook

Most platforms provide an icon to access Jupyter Notebook. Just click this icon to access Jupyter Notebook. For example, on a Windows system, you choose Start ⇒   All Programs ⇒   Anaconda 3 ⇒   Jupyter Notebook. Figure 3-5 shows how the interface looks when viewed in a Firefox browser. The precise appearance on your system depends on the browser you use and the kind of platform you have installed.

image

FIGURE 3-5: Jupyter Notebook provides an easy method to create machine learning examples.

If you have a platform that doesn’t offer easy access through an icon, you can use these steps to access Jupyter Notebook:

  1. Open a Command Prompt or Terminal Window on your system.

    The window opens so that you can type commands.

  2. Change directories to the \Anaconda3\Scripts directory on your machine.

    Most systems let you use the CD command for this task.

  3. Type python jupyter-notebook-script.py and press Enter.

    The Jupyter Notebook page opens in your browser.

Stopping the Jupyter Notebook server

No matter how you start Jupyter Notebook (or just Notebook, as it appears in the remainder of the book), the system generally opens a command prompt or terminal window to host Jupyter Notebook. This window contains a server that makes the application work. After you close the browser window when a session is complete, select the server window and press Ctrl+C or Ctrl+Break to stop the server.

Defining the code repository

The code you create and use in this book will reside in a repository on your hard drive. Think of a repository as a kind of filing cabinet where you put your code. Notebook opens a drawer, takes out the folder, and shows the code to you. You can modify it, run individual examples within the folder, add new examples, and simply interact with your code in a natural manner. The following sections get you started with Notebook so that you can see how this whole repository concept works.

Defining the book’s folder

It pays to organize your files so that you can access them easier later. This book keeps its files in the A4D (Algorithms For Dummies) folder. Use these steps within Notebook to create a new folder.

  1. Choose New ⇒   Folder.

    Notebook creates a new folder named Untitled Folder, as shown in Figure 3-6. The file appears in alphanumeric order, so you may not initially see it. You must scroll down to the correct location.

  2. Select the box next to the Untitled Folder entry.
  3. Click Rename at the top of the page.

    You see a Rename Directory dialog box like the one shown in Figure 3-7.

  4. Type A4D and click OK.

    Notebook changes the name of the folder for you.

  5. Click the new A4D entry in the list.

    Notebook changes the location to the A4D folder in which you perform tasks related to the exercises in this book.

image

FIGURE 3-6: New folders appear with a name of Untitled Folder.

image

FIGURE 3-7: Rename the folder so that you remember the kinds of entries it contains.

Creating a new notebook

Every new notebook is like a file folder. You can place individual examples within the file folder, just as you would sheets of paper into a physical file folder. Each example appears in a cell. You can put other sorts of things in the file folder, too, but you see how these things work as the book progresses. Use these steps to create a new notebook:

  1. Click New ⇒   Python (default).

    A new tab opens in the browser with the new notebook, as shown in Figure 3-8. Notice that the notebook contains a cell and that Notebook has highlighted the cell so that you can begin typing code in it. The title of the notebook is Untitled right now. That’s not a particularly helpful title, so you need to change it.

  2. Click Untitled on the page.

    Notebook asks what you want to use as a new name, as shown in Figure 3-9.

  3. Type A4D; 03; Sample and press Enter.

    The new name tells you that this is a file for Algorithms For Dummies, Chapter 3, Sample.ipynb. Using this naming convention lets you easily differentiate these files from other files in your repository.

image

FIGURE 3-8: A notebook contains cells that you use to hold code.

image

FIGURE 3-9: Provide a new name for your notebook.

Of course, the Sample notebook doesn’t contain anything just yet. Place the cursor in the cell, type print('Python is really cool!'), and then click the Run button (the button with the right-pointing arrow on the toolbar). You see the output shown in Figure 3-10. The output is part of the same cell as the code. (The code resides in a square box and the output resides outside that square box, but both are within the cell.) However, Notebook visually separates the output from the code so that you can tell them apart. Notebook automatically creates a new cell for you.

image

FIGURE 3-10: Notebook uses cells to store your code.

When you finish working with a notebook, shutting it down is important. To close a notebook, choose File ⇒   Close and Halt. You return to the Home page, where you can see that the notebook you just created is added to the list, as shown in Figure 3-11.

image

FIGURE 3-11: Any notebooks you create appear in the repository list.

Exporting a notebook

Creating notebooks and keeping them all to yourself isn’t much fun. At some point, you want to share them with other people. To perform this task, you must export your notebook from the repository to a file. You can then send the file to someone else, who will import it into his or her repository.

The previous section shows how to create a notebook named A4D; 03; Sample. You can open this notebook by clicking its entry in the repository list. The file reopens so that you can see your code again. To export this code, choose File ⇒   Download As ⇒   Notebook (.ipynb). What you see next depends on your browser, but you generally see some sort of dialog box for saving the notebook as a file. Use the same method for saving the IPython Notebook file as you use for any other file you save using your browser.

Removing a notebook

Sometimes notebooks get outdated or you simply don’t need to work with them any longer. Rather than allow your repository to get clogged with files you don’t need, you can remove these unwanted notebooks from the list. Use these steps to remove the file:

  1. Select the box next to the A4D; 03; Sample.ipynb entry.
  2. Click the trash can icon (Delete) at the top of the page.

    You see a Delete notebook warning message like the one shown in Figure 3-12.

  3. Click Delete.

    The file gets removed from the list.

image

FIGURE 3-12: Notebook warns you before removing any files from the repository.

Importing a notebook

To use the source code from this book, you must import the downloaded files into your repository. The source code comes in an archive file that you extract to a location on your hard drive. The archive contains a list of .ipynb (IPython Notebook) files containing the source code for this book (see the Introduction for details on downloading the source code). The following steps tell how to import these files into your repository:

  1. Click Upload at the top of the page.

    What you see depends on your browser. In most cases, you see some type of File Upload dialog box that provides access to the files on your hard drive.

  2. Navigate to the directory containing the files that you want to import into Notebook.
  3. Highlight one or more files to import and click the Open (or other, similar) button to begin the upload process.

    You see the file added to an upload list, as shown in Figure 3-13. The file isn’t part of the repository yet — you’ve simply selected it for upload.

    tip When you export a file, Notebook converts any special characters to a form that your system will accept with greater ease. Figure 3-13 shows this conversion in action. The semicolons appear as %3B, and spaces appear as a + (plus sign). You must change these characters to their Notebook form to see the title as you expect it.

  4. Click Upload.

    Notebook places the file in the repository so that you can begin using it.

image

FIGURE 3-13: The files that you want to add to the repository appear as part of an upload list consisting of one or more filenames.

Understanding the datasets used in this book

This book uses a number of datasets, all of which appear in the scikit-learn package. These datasets demonstrate various ways in which you can interact with data, and you use them in the examples to perform a variety of tasks. The following list provides a quick overview of the function used to import each of the datasets into your Python code:

  • load_boston(): Regression analysis with the Boston house-prices dataset
  • load_iris(): Classification with the iris dataset
  • load_diabetes(): Regression with the diabetes dataset
  • load_digits([n_class]): Classification with the digits dataset
  • fetch_20newsgroups(subset='train'): Data from 20 newsgroups
  • fetch_olivetti_faces():Olivetti faces dataset from AT&T

The technique for loading each of these datasets is the same across examples. The following example shows how to load the Boston house-prices dataset. You can find the code in the A4D; 03; Dataset Load.ipynb notebook.

from sklearn.datasets import load_boston
Boston = load_boston()
print(Boston.data.shape)

(506, 13)

To see how the code works, click Run Cell. The output from the print() call is (506, 13). You can see the output shown in Figure 3-14.

image

FIGURE 3-14: The Boston object contains the loaded dataset.