O. G. YalçınApplied Neural Networks with TensorFlow 2https://doi.org/10.1007/978-1-4842-6513-0_4

4. Complementary Libraries to TensorFlow 2.x

Orhan Gazi Yalçın¹

(1)

Istanbul, Turkey

Now that we covered the basics of machine learning and deep learning, we can slowly move on to the applied side of deep learning. As you know, every machine learning application, including deep learning applications, has a pipeline consisting of several steps. TensorFlow offers us several modules for all these steps. Even though TensorFlow is very powerful for model building, training, evaluation, and making predictions, we still need other complementary libraries for certain tasks, especially for data preparation. Although the potential libraries you may use in a deep learning pipeline may vary to a great extent, the most popular complementary libraries are as follows:

• NumPy

• SciPy

• Pandas

• Matplotlib

• Scikit-learn

• Flask

Especially after TensorFlow 2.x, we started to see more and more data preparation, visualization, and other relevant capabilities added to TensorFlow. However, these capabilities cannot yet be compared to what these dedicated libraries have to offer. Table 4-1 lists these libraries with their core capabilities.

Table 4-1

The Libraries Complementary to TensorFlow and Their Main Use Cases

Library	Core Capability
NumPy	Array processing
SciPy	Scientific computing
Pandas	Array processing and data analysis including data visualization
Matplotlib	Data visualization
Scikit-learn	Machine learning
Flask	Web framework for deployment

Let’s take a look at how to install them all together using pip, our package installer for Python.

Installation with Pip

Pip is the de facto standard package-management system for Python, and it is already included in the Python installation package. You can easily install and manage Python libraries with pip.

The original environments to use pip are Terminal for macOS and Command Prompt for Windows OS. However, you can also use pip inside Jupyter Notebook and Google Colab with a small adjustment. The difference between these two options is only an exclamation mark (!).

Terminal and Command Prompt	Jupyter Notebook and Google Colab
pip install package-name	!pip install package-name

If you decide to follow this book with your local Jupyter Notebook installation, we have to make sure that you have pip installed on your system.

Use of Pip in Google Colab

If you are using Google Colab as recommended, you don’t have to worry about whether you have pip on your system. You can use pip inside your Google Colab Notebook with an exclamation mark.

Pip installation, or its confirmation , can be achieved in three steps:

1.
Open Terminal for macOS/Command Prompt for Windows OS:
1. a.
  You can open a Terminal window from Launchpad under Others folder.
2. b.
  You can open a Command Line window by (i) pressing Windows+X to open the Power Users menu and then (ii) clicking “Command Prompt” or “Command Prompt (Admin).”
2.
Check if pip installed and view the current version installed on your system with the following script:

pip --version

3.
If Terminal/Command Line does not return version info, install pip with the following command:

python -m pip install -U pip

If it returns version info, then you confirm that you have pip installed on your system.

4.
Close the Terminal/Command Line window.

Installation of the Libraries – Now that we confirmed you have pip on your system, you can install all the aforementioned libraries in this chapter with the following scripts in Table 4-2.

Table 4-2

The Complementary Libraries with Pip Installation Scripts

Library	Installation Script
NumPy	pip install numpy
SciPy	pip install scipy
Pandas	pip install pandas
Matplotlib	pip install matplotlib
Scikit-learn	pip install scikit-learn
Flask	pip install flask

Beware of Already Installed Packages

Both Google Colab Notebooks and Jupyter Notebooks already come with most of these libraries preinstalled. Just run the scripts mentioned earlier once to make sure you have them installed so that you won’t be bothered during case studies in case some of them are missing.

Now that we are sure that you have these libraries installed in your system (either Google Colab or Jupyter Notebook), we can dive into details of these libraries.

NumPy – Array Processing

../images/501289_1_En_4_Chapter/501289_1_En_4_Figa_HTML.jpg NumPy (Numerical Python) is a very popular open-source numerical Python library, created by Travis Oliphant. NumPy provides multidimensional arrays along with a significant number of useful functions for mathematical operations.

NumPy acts as a wrapper around the corresponding library implemented in C. Therefore, it offers the best of two worlds: (i) efficiency of C and (ii) ease of use of Python. NumPy arrays are easy-to-create and efficient objects for (i) storing data and (ii) fast matrix operations. With NumPy, you can quickly generate arrays with random numbers, which is perfect for an enhanced learning experience and proof of concept tasks. Also, the Pandas library, which we will cover later on, heavily relies on NumPy objects and almost works as a NumPy extension.

Thanks to NumPy arrays, we can process data in large volumes and do advanced mathematical operations with ease. Compared to built-in Python sequences, NumPy’s ndarray object executes much faster and more efficient with less code. There are a growing number of libraries that rely on NumPy arrays for processing data, which shows the power of NumPy. Since deep learning models are usually trained with millions of data points, size and speed superiority of NumPy arrays are essential for the machine learning experts.

Useful Information About NumPy

Website: www.numpy.org/
Documentation URL: https://numpy.org/doc/
Installation Command: pip install numpy
Preferred Alias for Importing: import numpy as np

SciPy – Scientific Computing

../images/501289_1_En_4_Chapter/501289_1_En_4_Figb_HTML.jpg SciPy is an open-source Python library that contains a collection of functions used for mathematical, scientific, and engineering studies. SciPy functions are built on the NumPy library. SciPy allows users to manipulate and visualize their data with an easy-to-use syntax. SciPy is a library that boosts developers’ data processing and system-prototyping capabilities and makes Python as effective as the rival systems such as MATLAB, IDL, Octave, R-Lab, and SciLab. Therefore, SciPy’s collection of data processing and prototyping functions strengthens Python’s already established superiority as a general-purpose programming language even further.

SciPy’s vast collection of functions is organized into domain-based sub-packages. SciPy sub-packages must be called separately from the mother SciPy library such as

from scipy import stats, special

In Table 4-3, you may find a list of SciPy sub-packages.

Table 4-3

SciPy Sub-packages

Sub-package	Description	Sub-package	Description
stats	Statistical functions and distributions	linalg	Linear algebra
special	Special functions	io	Input and output
spatial	Spatial data structures and algorithms	interpolate	Interpolation and smoothing splines
sparse	Sparse matrices and associated routines	integrate	Integration and equation solving
signal	Signal processing	fftpack	Fast Fourier transform routines
optimize	Optimization and root-finding routines	constants	Physical and mathematical constants
odr	Orthogonal distance regression	cluster	Clustering algorithms
ndimage	N-dimensional image processing

Useful Information About SciPy

Website: https://www.scipy.org/scipylib/
Documentation URL: https://docs.scipy.org/doc/
Installation Command: pip install scipy
Preferred Alias for Importing: from scipy import sub-package-name

Pandas – Array Processing and Data Analysis

../images/501289_1_En_4_Chapter/501289_1_En_4_Figc_HTML.jpg Pandas is a Python library that offers flexible and expressive data structures suitable for performing fast mathematical operations. Python is a comprehensive and easy-to-use data analysis library, and it aims to become the leading open-source language-neutral data analysis tool.

One-dimensional Series and two-dimensional DataFrames are the two main data structures in pandas. Since it extends the capabilities of NumPy and it is built on top of NumPy, Pandas almost operates as a NumPy extension. Pandas also offers several data visualization methods, which are very useful to derive insights from the datasets.

You can analyze your data and perform several calculation tasks with Pandas. Here is a non-exhaustive list of the things you can do with Pandas:

Handling missing data by filling and dropping
Data insertion and deletion thanks to allowed mutability
Automatic and explicit data alignment
Group-by and order-by functionality
Easily converting unorganized objects to DataFrames
Slice, index, and subset operations
Merge, concatenate, and join operations
Reshape and pivot operations
Hierarchical and multiple labeling
Specific operations for time-series and sequence data
Robust input and output operations with extensive file format support (including CSV, XLSX, HTML, HDF5)

Since Pandas is a de facto extension of NumPy, which improves its capabilities, we take advantage of Pandas more often than NumPy. But there are cases where we have to rely on NumPy due to limitations of other complementary libraries.

Useful Information About Pandas

Website: https://pandas.pydata.org/
Documentation URL: https://pandas.pydata.org/docs/
Installation Command: pip install pandas
Preferred Alias for Importing: import pandas as pd

Matplotlib and Seaborn – Data Visualization

../images/501289_1_En_4_Chapter/501289_1_En_4_Figd_HTML.jpg Matplotlib is a Python data visualization library for creating static, animated, and interactive graphs and plots. You can produce high-quality plots for academic publications, blogs, and books, and you can also derive insights from large datasets using Matplotlib.

In addition to deriving insights with your Google Colab Notebook, you can also use the object-oriented API of Matplotlib for embedding plots into applications. The three main functionalities of Matplotlib can be listed as follows:

Create: With Matplotlib, you can create high-quality plots with a minimal amount of code. The total number of graph types offered by Matplotlib exceeds hundreds – from histograms to heat plots, from bar charts to surface plots.
Customize: Matplotlib plots are flexible in the sense that you can customize line styles, font properties, colors, and axes information. You can export from your plot and embed data into your plot.
Extend: You can take advantage of numerous third-party libraries extending Matplotlib. Some of these libraries are also extremely useful, such as Seaborn.

The things you can do with Matplotlib may be listed as follows:

Use PyPlot module and create interactive plots.
Create hundreds of different graphs and plots using lines, bars, markers, and other objects.
Create unique plots such as surface and contours plots.
Add images and fields to your plots.
Create multiple subplots under a single figure.
Flexibly edit text, axes, colors, labels, and annotations in a plot.
Create one or more shapes with Matplotlib.
Create showcase figures.
Take advantage of the animation support .

Useful Information About Matplotlib

Website: https://matplotlib.org/
Documentation URL: https://matplotlib.org/3.2.1/contents.html (make sure you enter the latest version)
Installation Command: pip install matplotlib
Preferred Alias for Importing: import matplotlib.pyplot as plt

../images/501289_1_En_4_Chapter/501289_1_En_4_Fige_HTML.jpg Besides vanilla Matplotlib, third-party packages are widely used for increasing the capabilities of Matplotlib. One of the useful data visualization libraries built on top of Matplotlib is Seaborn. Seaborn is a data visualization library based on Matplotlib. It provides a high-level interface for extending the capabilities of Matplotlib. You can reduce the time required to generate insightful graphs with Seaborn.

Useful Information About Seaborn

Website: https://seaborn.pydata.org/
Gallery: https://seaborn.pydata.org/examples/
Installation Command: pip install seaborn
Preferred Alias for Importing: import seaborn as sns

Scikit-learn – Machine Learning

../images/501289_1_En_4_Chapter/501289_1_En_4_Figf_HTML.jpg

Scikit-learn is a powerful open-source machine learning library for Python, initially developed by David Cournapeau as a Google Summer of Code project. You can use scikit-learn as a stand-alone machine learning library and successfully build a wide range of traditional machine learning models. Besides being able to create machine learning models, scikit-learn, which is built on top of NumPy, SciPy, and Matplotlib, provides simple and efficient tools for predictive data analysis. There are six main functionalities of scikit-learn, which are listed as follows:

Classification: Scikit-learn offers several algorithms to identify which category an object belongs to, such as support vector machines, logistic regression, k-nearest neighbors, decision trees, and many more.
Regression: Several algorithms offered by scikit-learn can predict a continuous-valued response variable associated with an object such as linear regression, gradient boosting, random forest, decision trees, and many more.
Clustering: Scikit-learn also offers clustering algorithms, which are used for automated grouping of similar objects into clusters, such as k-means clustering, spectral clustering, mean shift, and many more.
Dimensionality Reduction: Scikit-learn provides several algorithms to reduce the number of explanatory variables to consider, such as PCA, feature selection , nonnegative matrix factorization, and many more.
Model Selection: Scikit-learn can help with model validation and comparison, and also it can help choose parameters and models. You can compare your TensorFlow models with scikit-learn’s traditional machine learning models. Grid search, cross-validation, and metrics are some of the tools used for model selection and validation functionality.
Preprocessing: With preprocessing, feature extraction, and feature scaling options, you can transform your data where TensorFlow falls short.

Scikit-learn is especially useful when we want to compare our deep learning models with other machine learning algorithms. In addition, with scikit-learn, we can preprocess our data before feeding it into our deep learning pipeline .

Useful Information About Scikit-learn

Website: https://scikit-learn.org/
User Guide: https://scikit-learn.org/stable/user_guide.html
Installation Command: pip install scikit-learn
Preferred Alias for Importing: from scikit-learn import *

Flask – Deployment

../images/501289_1_En_4_Chapter/501289_1_En_4_Figg_HTML.jpg As opposed to the libraries mentioned earlier, Flask is not a data science library, but it is a micro web framework for Python. It is considered as a microframework because it is not packaged with the components that the other web frameworks deem essential such as database abstraction layer and form validation. These components can be embedded in a Flask application with powerful third-party extensions. This characteristic makes Flask simple and lightweighted and reduces development time. Flask is a perfect option if you want to serve your trained deep learning models, and you don’t want to spend too much time on web programming.

Flask is easy to learn and to implement as opposed to Django. Django is a very well-documented and a popular web framework for Python. But due to its large size with a lot of built-in extension packages, Django would be a better choice for large projects. Currently, Flask has more stars on its GitHub repo than any other web framework for Python and voted the most popular web framework in the Python Developers Survey 2018 .

Useful Information About Flask

Website: https://palletsprojects.com/p/flask/
Documentation URL: https://flask.palletsprojects.com/
Installation Command: pip install flask
Preferred Alias for Importing: from flask import Flask, *

Final Evaluations

In this chapter, we make an introduction to the most commonly used libraries complementary to TensorFlow. We predominantly use TensorFlow thanks to its growing number of modules addressing the needs of developers at every step of the pipeline. However, there are still some operations we have to rely on these libraries.

While NumPy and Pandas are very powerful data processing libraries, Matplotlib and Seaborn are useful for data visualization. While SciPy helps us with complex mathematical operations, scikit-learn is particularly useful for advanced preprocessing operations and validation tasks. Finally, Flask is the web framework of our choice to serve our trained models quickly.

In the next chapter, we dive into TensorFlow modules with actual code examples.