© Orhan Gazi Yalçın 2021
O. G. YalçınApplied Neural Networks with TensorFlow 2https://doi.org/10.1007/978-1-4842-6513-0_4

4. Complementary Libraries to TensorFlow 2.x

Orhan Gazi Yalçın1  
(1)
Istanbul, Turkey
 
Now that we covered the basics of machine learning and deep learning, we can slowly move on to the applied side of deep learning. As you know, every machine learning application, including deep learning applications, has a pipeline consisting of several steps. TensorFlow offers us several modules for all these steps. Even though TensorFlow is very powerful for model building, training, evaluation, and making predictions, we still need other complementary libraries for certain tasks, especially for data preparation. Although the potential libraries you may use in a deep learning pipeline may vary to a great extent, the most popular complementary libraries are as follows:

    • NumPy

    • SciPy

    • Pandas

• Matplotlib

• Scikit-learn

• Flask

Especially after TensorFlow 2.x, we started to see more and more data preparation, visualization, and other relevant capabilities added to TensorFlow. However, these capabilities cannot yet be compared to what these dedicated libraries have to offer. Table 4-1 lists these libraries with their core capabilities.
Table 4-1

The Libraries Complementary to TensorFlow and Their Main Use Cases

Library

Core Capability

NumPy

Array processing

SciPy

Scientific computing

Pandas

Array processing and data analysis including data visualization

Matplotlib

Data visualization

Scikit-learn

Machine learning

Flask

Web framework for deployment

Let’s take a look at how to install them all together using pip, our package installer for Python.

Installation with Pip

Pip is the de facto standard package-management system for Python, and it is already included in the Python installation package. You can easily install and manage Python libraries with pip.

The original environments to use pip are Terminal for macOS and Command Prompt for Windows OS. However, you can also use pip inside Jupyter Notebook and Google Colab with a small adjustment. The difference between these two options is only an exclamation mark (!).

Terminal and Command Prompt

Jupyter Notebook and Google Colab

pip install package-name

!pip install package-name

If you decide to follow this book with your local Jupyter Notebook installation, we have to make sure that you have pip installed on your system.

Use of Pip in Google Colab

If you are using Google Colab as recommended, you don’t have to worry about whether you have pip on your system. You can use pip inside your Google Colab Notebook with an exclamation mark.

Pip installation, or its confirmation , can be achieved in three steps:
  1. 1.
    Open Terminal for macOS/Command Prompt for Windows OS:
    1. a.

      You can open a Terminal window from Launchpad under Others folder.

       
    2. b.

      You can open a Command Line window by (i) pressing Windows+X to open the Power Users menu and then (ii) clicking “Command Prompt” or “Command Prompt (Admin).”

       
     
  2. 2.

    Check if pip installed and view the current version installed on your system with the following script:

     
pip --version
  1. 3.

    If Terminal/Command Line does not return version info, install pip with the following command:

     

python -m pip install -U pip

If it returns version info, then you confirm that you have pip installed on your system.
  1. 4.

    Close the Terminal/Command Line window.

     
Installation of the Libraries  – Now that we confirmed you have pip on your system, you can install all the aforementioned libraries in this chapter with the following scripts in Table 4-2.
Table 4-2

The Complementary Libraries with Pip Installation Scripts

Library

Installation Script

NumPy

pip install numpy

SciPy

pip install scipy

Pandas

pip install pandas

Matplotlib

pip install matplotlib

Scikit-learn

pip install scikit-learn

Flask

pip install flask

Beware of Already Installed Packages

Both Google Colab Notebooks and Jupyter Notebooks already come with most of these libraries preinstalled. Just run the scripts mentioned earlier once to make sure you have them installed so that you won’t be bothered during case studies in case some of them are missing.

Now that we are sure that you have these libraries installed in your system (either Google Colab or Jupyter Notebook), we can dive into details of these libraries.

NumPy – Array Processing

../images/501289_1_En_4_Chapter/501289_1_En_4_Figa_HTML.jpg   NumPy (Numerical Python) is a very popular open-source numerical Python library, created by Travis Oliphant. NumPy provides multidimensional arrays along with a significant number of useful functions for mathematical operations.

NumPy acts as a wrapper around the corresponding library implemented in C. Therefore, it offers the best of two worlds: (i) efficiency of C and (ii) ease of use of Python. NumPy arrays are easy-to-create and efficient objects for (i) storing data and (ii) fast matrix operations. With NumPy, you can quickly generate arrays with random numbers, which is perfect for an enhanced learning experience and proof of concept tasks. Also, the Pandas library, which we will cover later on, heavily relies on NumPy objects and almost works as a NumPy extension.

Thanks to NumPy arrays, we can process data in large volumes and do advanced mathematical operations with ease. Compared to built-in Python sequences, NumPy’s ndarray object executes much faster and more efficient with less code. There are a growing number of libraries that rely on NumPy arrays for processing data, which shows the power of NumPy. Since deep learning models are usually trained with millions of data points, size and speed superiority of NumPy arrays are essential for the machine learning experts.

Useful Information About NumPy

SciPy – Scientific Computing

../images/501289_1_En_4_Chapter/501289_1_En_4_Figb_HTML.jpg   SciPy is an open-source Python library that contains a collection of functions used for mathematical, scientific, and engineering studies. SciPy functions are built on the NumPy library. SciPy allows users to manipulate and visualize their data with an easy-to-use syntax. SciPy is a library that boosts developers’ data processing and system-prototyping capabilities and makes Python as effective as the rival systems such as MATLAB, IDL, Octave, R-Lab, and SciLab. Therefore, SciPy’s collection of data processing and prototyping functions strengthens Python’s already established superiority as a general-purpose programming language even further.

SciPy’s vast collection of functions is organized into domain-based sub-packages. SciPy sub-packages must be called separately from the mother SciPy library such as
from scipy import stats, special
In Table 4-3, you may find a list of SciPy sub-packages.
Table 4-3

SciPy Sub-packages

Sub-package

Description

Sub-package

Description

stats

Statistical functions and distributions

linalg

Linear algebra

special

Special functions

io

Input and output

spatial

Spatial data structures and algorithms

interpolate

Interpolation and smoothing splines

sparse

Sparse matrices and associated routines

integrate

Integration and equation solving

signal

Signal processing

fftpack

Fast Fourier transform routines

optimize

Optimization and root-finding routines

constants

Physical and mathematical constants

odr

Orthogonal distance regression

cluster

Clustering algorithms

ndimage

N-dimensional image processing

  
Useful Information About SciPy

Pandas – Array Processing and Data Analysis

../images/501289_1_En_4_Chapter/501289_1_En_4_Figc_HTML.jpg   Pandas is a Python library that offers flexible and expressive data structures suitable for performing fast mathematical operations. Python is a comprehensive and easy-to-use data analysis library, and it aims to become the leading open-source language-neutral data analysis tool.

One-dimensional Series and two-dimensional DataFrames are the two main data structures in pandas. Since it extends the capabilities of NumPy and it is built on top of NumPy, Pandas almost operates as a NumPy extension. Pandas also offers several data visualization methods, which are very useful to derive insights from the datasets.

You can analyze your data and perform several calculation tasks with Pandas. Here is a non-exhaustive list of the things you can do with Pandas:
  • Handling missing data by filling and dropping

  • Data insertion and deletion thanks to allowed mutability

  • Automatic and explicit data alignment

  • Group-by and order-by functionality

  • Easily converting unorganized objects to DataFrames

  • Slice, index, and subset operations

  • Merge, concatenate, and join operations

  • Reshape and pivot operations

  • Hierarchical and multiple labeling

  • Specific operations for time-series and sequence data

  • Robust input and output operations with extensive file format support (including CSV, XLSX, HTML, HDF5)

Since Pandas is a de facto extension of NumPy, which improves its capabilities, we take advantage of Pandas more often than NumPy. But there are cases where we have to rely on NumPy due to limitations of other complementary libraries.

Useful Information About Pandas

Matplotlib and Seaborn – Data Visualization

../images/501289_1_En_4_Chapter/501289_1_En_4_Figd_HTML.jpg   Matplotlib is a Python data visualization library for creating static, animated, and interactive   graphs and plots. You can produce high-quality plots for academic publications, blogs, and books, and you can also derive insights from large datasets using Matplotlib.

In addition to deriving insights with your Google Colab Notebook, you can also use the object-oriented API of Matplotlib for embedding plots into applications. The three main functionalities of Matplotlib can be listed as follows:
  • Create: With Matplotlib, you can create high-quality plots with a minimal amount of code. The total number of graph types offered by Matplotlib exceeds hundreds – from histograms to heat plots, from bar charts to surface plots.

  • Customize: Matplotlib plots are flexible in the sense that you can customize line styles, font properties, colors, and axes information. You can export from your plot and embed data into your plot.

  • Extend: You can take advantage of numerous third-party libraries extending Matplotlib. Some of these libraries are also extremely useful, such as Seaborn.

The things you can do with Matplotlib may be listed as follows:
  • Use PyPlot module and create interactive plots.

  • Create hundreds of different graphs and plots using lines, bars, markers, and other objects.

  • Create unique plots such as surface and contours plots.

  • Add images and fields to your plots.

  • Create multiple subplots under a single figure.

  • Flexibly edit text, axes, colors, labels, and annotations in a plot.

  • Create one or more shapes with Matplotlib.

  • Create showcase figures.

  • Take advantage of the animation support .

Useful Information About Matplotlib

../images/501289_1_En_4_Chapter/501289_1_En_4_Fige_HTML.jpg   Besides vanilla Matplotlib, third-party packages are widely used for increasing the capabilities of Matplotlib. One of the useful data visualization libraries built on top of Matplotlib is Seaborn. Seaborn is a data visualization library based on Matplotlib. It provides a high-level interface for extending the capabilities of Matplotlib. You can reduce the time required to generate insightful graphs with Seaborn.

Useful Information About Seaborn

Scikit-learn – Machine Learning

../images/501289_1_En_4_Chapter/501289_1_En_4_Figf_HTML.jpg   Scikit-learn is a powerful open-source machine learning library for Python, initially developed by David Cournapeau as a Google Summer of Code project. You can use scikit-learn as a stand-alone machine learning library and successfully build a wide range of traditional machine learning models. Besides being able to create machine learning models, scikit-learn, which is built on top of NumPy, SciPy, and Matplotlib, provides simple and efficient tools for predictive data analysis. There are six main functionalities of scikit-learn, which are listed as follows:
  • Classification: Scikit-learn offers several algorithms to identify which category an object belongs to, such as support vector machines, logistic regression, k-nearest neighbors, decision trees, and many more.

  • Regression: Several algorithms offered by scikit-learn can predict a continuous-valued response variable associated with an object such as linear regression, gradient boosting, random forest, decision trees, and many more.

  • Clustering: Scikit-learn also offers clustering algorithms, which are used for automated grouping of similar objects into clusters, such as k-means clustering, spectral clustering, mean shift, and many more.

  • Dimensionality Reduction: Scikit-learn provides several algorithms to reduce the number of explanatory variables to consider, such as PCA, feature selection , nonnegative matrix factorization, and many more.

  • Model Selection: Scikit-learn can help with model validation and comparison, and also it can help choose parameters and models. You can compare your TensorFlow models with scikit-learn’s traditional machine learning models. Grid search, cross-validation, and metrics are some of the tools used for model selection and validation functionality.

  • Preprocessing: With preprocessing, feature extraction, and feature scaling options, you can transform your data where TensorFlow falls short.

Scikit-learn is especially useful when we want to compare our deep learning models with other machine learning algorithms. In addition, with scikit-learn, we can preprocess our data before feeding it into our deep learning pipeline .

Useful Information About Scikit-learn

Flask – Deployment

../images/501289_1_En_4_Chapter/501289_1_En_4_Figg_HTML.jpg   As opposed to the libraries mentioned earlier, Flask is not a data science library, but it is a micro web framework for Python. It is considered as a microframework because it is not packaged with the components that the other web frameworks deem essential such as database abstraction layer and form validation. These components can be embedded in a Flask application with powerful third-party extensions. This characteristic makes Flask simple and lightweighted and reduces development time. Flask is a perfect option if you want to serve your trained deep learning models, and you don’t want to spend too much time on web programming.

Flask is easy to learn and to implement as opposed to Django. Django is a very well-documented and a popular web framework for Python. But due to its large size with a lot of built-in extension packages, Django would be a better choice for large projects. Currently, Flask has more stars on its GitHub repo than any other web framework for Python and voted the most popular web framework in the Python Developers Survey 2018 .

Useful Information About Flask

Final Evaluations

In this chapter, we make an introduction to the most commonly used libraries complementary to TensorFlow. We predominantly use TensorFlow thanks to its growing number of modules addressing the needs of developers at every step of the pipeline. However, there are still some operations we have to rely on these libraries.

While NumPy and Pandas are very powerful data processing libraries, Matplotlib and Seaborn are useful for data visualization. While SciPy helps us with complex mathematical operations, scikit-learn is particularly useful for advanced preprocessing operations and validation tasks. Finally, Flask is the web framework of our choice to serve our trained models quickly.

In the next chapter, we dive into TensorFlow modules with actual code examples.