Preface

Data science is becoming increasingly popular as industries continue to value its importance. Recent advancements in open source sofware have made this discipline accessible to a wide range of people. In this book, we show how Jupyter Notebooks can be used with Python for various data science applications. Aside from being an ideal "virtual playground" for data exploration, Jupyter Notebooks are equally suitable for creating reproducible data processing pipelines, visualizations, and prediction models. By using Python with Jupyter Notebooks, many challenges presented by data science become simple to conceptualize and implement. This is achieved by leveraging Python libraries, which offer abstractions to the more complicated underlying algorithms. The result is that data science becomes very approachable for beginners. Furthermore, the Python ecosystem is very strong and is growing with each passing year. As such, students who wish to continue learning about the topics covered in this book will fnd excellent resources to do so.

By the end of this book, you will be equipped to analyse data using Python and use Jupyter notebooks effectively.

What This Book Covers

Lesson 1, Jupyter Fundamentals, covers the fundamentals of data analysis in Jupyter. We will start with usage instructions and features of Jupyter such as magic functions and tab completion. We will then transition to data science specific material. We will run an exploratory analysis in a live Jupyter Notebook. We will use visual assists such as scatter plots, histograms, and violin plots to deepen our understanding of the data. We will also perform simple predictive modeling.

Lesson 2, Data Cleaning and Advanced Machine Learning, shows how predictive models can be trained in Jupyter Notebooks. We will talk about how to plan a machine learning strategy. This lesson also explains the machine learning terminology such as supervised learning, unsupervised learning, classification, and regression. We will discuss methods for preprocessing data using scikit-learn and pandas.

Lesson 3, Web Scraping and Interactive Visualizations, explains how to scrap web page tables and then use interactive visualizations to study the data. We will start by looking at how HTTP requests work, focusing on GET requests and their response status codes. Then, we will go into the Jupyter Notebook and make HTTP requests with Python using the Requests library. We will see how Jupyter can be used to render HTML in the notebook, along with actual web pages that can be interacted with. After making requests, we will see how Beautiful Soup can be used to parse text from the HTML, and used this library to scrape tabular data.