• NumPy • SciPy • Pandas | • Matplotlib • Scikit-learn • Flask |
The Libraries Complementary to TensorFlow and Their Main Use Cases
Library | Core Capability |
---|---|
NumPy | Array processing |
SciPy | Scientific computing |
Pandas | Array processing and data analysis including data visualization |
Matplotlib | Data visualization |
Scikit-learn | Machine learning |
Flask | Web framework for deployment |
Let’s take a look at how to install them all together using pip, our package installer for Python.
Installation with Pip
Pip is the de facto standard package-management system for Python, and it is already included in the Python installation package. You can easily install and manage Python libraries with pip.
Terminal and Command Prompt | Jupyter Notebook and Google Colab |
---|---|
pip install package-name | !pip install package-name |
If you decide to follow this book with your local Jupyter Notebook installation, we have to make sure that you have pip installed on your system.
If you are using Google Colab as recommended, you don’t have to worry about whether you have pip on your system. You can use pip inside your Google Colab Notebook with an exclamation mark.
- 1.Open Terminal for macOS/Command Prompt for Windows OS:
- a.
You can open a Terminal window from Launchpad under Others folder.
- b.
You can open a Command Line window by (i) pressing Windows+X to open the Power Users menu and then (ii) clicking “Command Prompt” or “Command Prompt (Admin).”
- a.
- 2.
Check if pip installed and view the current version installed on your system with the following script:
- 3.
If Terminal/Command Line does not return version info, install pip with the following command:
python -m pip install -U pip
- 4.
Close the Terminal/Command Line window.
The Complementary Libraries with Pip Installation Scripts
Library | Installation Script |
---|---|
NumPy | pip install numpy |
SciPy | pip install scipy |
Pandas | pip install pandas |
Matplotlib | pip install matplotlib |
Scikit-learn | pip install scikit-learn |
Flask | pip install flask |
Both Google Colab Notebooks and Jupyter Notebooks already come with most of these libraries preinstalled. Just run the scripts mentioned earlier once to make sure you have them installed so that you won’t be bothered during case studies in case some of them are missing.
Now that we are sure that you have these libraries installed in your system (either Google Colab or Jupyter Notebook), we can dive into details of these libraries.
NumPy – Array Processing
NumPy (Numerical Python) is a very popular open-source numerical Python library, created by Travis Oliphant. NumPy provides multidimensional arrays along with a significant number of useful functions for mathematical operations.
NumPy acts as a wrapper around the corresponding library implemented in C. Therefore, it offers the best of two worlds: (i) efficiency of C and (ii) ease of use of Python. NumPy arrays are easy-to-create and efficient objects for (i) storing data and (ii) fast matrix operations. With NumPy, you can quickly generate arrays with random numbers, which is perfect for an enhanced learning experience and proof of concept tasks. Also, the Pandas library, which we will cover later on, heavily relies on NumPy objects and almost works as a NumPy extension.
Thanks to NumPy arrays, we can process data in large volumes and do advanced mathematical operations with ease. Compared to built-in Python sequences, NumPy’s ndarray object executes much faster and more efficient with less code. There are a growing number of libraries that rely on NumPy arrays for processing data, which shows the power of NumPy. Since deep learning models are usually trained with millions of data points, size and speed superiority of NumPy arrays are essential for the machine learning experts.
Website: www.numpy.org/
Documentation URL: https://numpy.org/doc/
Installation Command: pip install numpy
Preferred Alias for Importing: import numpy as np
SciPy – Scientific Computing
SciPy is an open-source Python library that contains a collection of functions used for mathematical, scientific, and engineering studies. SciPy functions are built on the NumPy library. SciPy allows users to manipulate and visualize their data with an easy-to-use syntax. SciPy is a library that boosts developers’ data processing and system-prototyping capabilities and makes Python as effective as the rival systems such as MATLAB, IDL, Octave, R-Lab, and SciLab. Therefore, SciPy’s collection of data processing and prototyping functions strengthens Python’s already established superiority as a general-purpose programming language even further.
SciPy Sub-packages
Sub-package | Description | Sub-package | Description |
---|---|---|---|
stats | Statistical functions and distributions | linalg | Linear algebra |
special | Special functions | io | Input and output |
spatial | Spatial data structures and algorithms | interpolate | Interpolation and smoothing splines |
sparse | Sparse matrices and associated routines | integrate | Integration and equation solving |
signal | Signal processing | fftpack | Fast Fourier transform routines |
optimize | Optimization and root-finding routines | constants | Physical and mathematical constants |
odr | Orthogonal distance regression | cluster | Clustering algorithms |
ndimage | N-dimensional image processing |
Website: https://www.scipy.org/scipylib/
Documentation URL: https://docs.scipy.org/doc/
Installation Command: pip install scipy
Preferred Alias for Importing: from scipy import sub-package-name
Pandas – Array Processing and Data Analysis
Pandas is a Python library that offers flexible and expressive data structures suitable for performing fast mathematical operations. Python is a comprehensive and easy-to-use data analysis library, and it aims to become the leading open-source language-neutral data analysis tool.
One-dimensional Series and two-dimensional DataFrames are the two main data structures in pandas. Since it extends the capabilities of NumPy and it is built on top of NumPy, Pandas almost operates as a NumPy extension. Pandas also offers several data visualization methods, which are very useful to derive insights from the datasets.
Handling missing data by filling and dropping
Data insertion and deletion thanks to allowed mutability
Automatic and explicit data alignment
Group-by and order-by functionality
Easily converting unorganized objects to DataFrames
Slice, index, and subset operations
Merge, concatenate, and join operations
Reshape and pivot operations
Hierarchical and multiple labeling
Specific operations for time-series and sequence data
Robust input and output operations with extensive file format support (including CSV, XLSX, HTML, HDF5)
Since Pandas is a de facto extension of NumPy, which improves its capabilities, we take advantage of Pandas more often than NumPy. But there are cases where we have to rely on NumPy due to limitations of other complementary libraries.
Website: https://pandas.pydata.org/
Documentation URL: https://pandas.pydata.org/docs/
Installation Command: pip install pandas
Preferred Alias for Importing: import pandas as pd
Matplotlib and Seaborn – Data Visualization
Matplotlib is a Python data visualization library for creating static, animated, and interactive graphs and plots. You can produce high-quality plots for academic publications, blogs, and books, and you can also derive insights from large datasets using Matplotlib.
Create: With Matplotlib, you can create high-quality plots with a minimal amount of code. The total number of graph types offered by Matplotlib exceeds hundreds – from histograms to heat plots, from bar charts to surface plots.
Customize: Matplotlib plots are flexible in the sense that you can customize line styles, font properties, colors, and axes information. You can export from your plot and embed data into your plot.
Extend: You can take advantage of numerous third-party libraries extending Matplotlib. Some of these libraries are also extremely useful, such as Seaborn.
Use PyPlot module and create interactive plots.
Create hundreds of different graphs and plots using lines, bars, markers, and other objects.
Create unique plots such as surface and contours plots.
Add images and fields to your plots.
Create multiple subplots under a single figure.
Flexibly edit text, axes, colors, labels, and annotations in a plot.
Create one or more shapes with Matplotlib.
Create showcase figures.
Take advantage of the animation support .
Website: https://matplotlib.org/
Documentation URL: https://matplotlib.org/3.2.1/contents.html (make sure you enter the latest version)
Installation Command: pip install matplotlib
Preferred Alias for Importing: import matplotlib.pyplot as plt
Besides vanilla Matplotlib, third-party packages are widely used for increasing the capabilities of Matplotlib. One of the useful data visualization libraries built on top of Matplotlib is Seaborn. Seaborn is a data visualization library based on Matplotlib. It provides a high-level interface for extending the capabilities of Matplotlib. You can reduce the time required to generate insightful graphs with Seaborn.
Website: https://seaborn.pydata.org/
Gallery: https://seaborn.pydata.org/examples/
Installation Command: pip install seaborn
Preferred Alias for Importing: import seaborn as sns
Scikit-learn – Machine Learning
Classification: Scikit-learn offers several algorithms to identify which category an object belongs to, such as support vector machines, logistic regression, k-nearest neighbors, decision trees, and many more.
Regression: Several algorithms offered by scikit-learn can predict a continuous-valued response variable associated with an object such as linear regression, gradient boosting, random forest, decision trees, and many more.
Clustering: Scikit-learn also offers clustering algorithms, which are used for automated grouping of similar objects into clusters, such as k-means clustering, spectral clustering, mean shift, and many more.
Dimensionality Reduction: Scikit-learn provides several algorithms to reduce the number of explanatory variables to consider, such as PCA, feature selection , nonnegative matrix factorization, and many more.
Model Selection: Scikit-learn can help with model validation and comparison, and also it can help choose parameters and models. You can compare your TensorFlow models with scikit-learn’s traditional machine learning models. Grid search, cross-validation, and metrics are some of the tools used for model selection and validation functionality.
Preprocessing: With preprocessing, feature extraction, and feature scaling options, you can transform your data where TensorFlow falls short.
Scikit-learn is especially useful when we want to compare our deep learning models with other machine learning algorithms. In addition, with scikit-learn, we can preprocess our data before feeding it into our deep learning pipeline .
Website: https://scikit-learn.org/
User Guide: https://scikit-learn.org/stable/user_guide.html
Installation Command: pip install scikit-learn
Preferred Alias for Importing: from scikit-learn import *
Flask – Deployment
As opposed to the libraries mentioned earlier, Flask is not a data science library, but it is a micro web framework for Python. It is considered as a microframework because it is not packaged with the components that the other web frameworks deem essential such as database abstraction layer and form validation. These components can be embedded in a Flask application with powerful third-party extensions. This characteristic makes Flask simple and lightweighted and reduces development time. Flask is a perfect option if you want to serve your trained deep learning models, and you don’t want to spend too much time on web programming.
Flask is easy to learn and to implement as opposed to Django. Django is a very well-documented and a popular web framework for Python. But due to its large size with a lot of built-in extension packages, Django would be a better choice for large projects. Currently, Flask has more stars on its GitHub repo than any other web framework for Python and voted the most popular web framework in the Python Developers Survey 2018 .
Website: https://palletsprojects.com/p/flask/
Documentation URL: https://flask.palletsprojects.com/
Installation Command: pip install flask
Preferred Alias for Importing: from flask import Flask, *
Final Evaluations
In this chapter, we make an introduction to the most commonly used libraries complementary to TensorFlow. We predominantly use TensorFlow thanks to its growing number of modules addressing the needs of developers at every step of the pipeline. However, there are still some operations we have to rely on these libraries.
While NumPy and Pandas are very powerful data processing libraries, Matplotlib and Seaborn are useful for data visualization. While SciPy helps us with complex mathematical operations, scikit-learn is particularly useful for advanced preprocessing operations and validation tasks. Finally, Flask is the web framework of our choice to serve our trained models quickly.
In the next chapter, we dive into TensorFlow modules with actual code examples.