Table of Contents

Beginning Data Science with Python and Jupyter

Why Subscribe?

PacktPub.com

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

What This Book Covers

What You Need for This Book

Installation and Setup

Installing Anaconda

Updating Jupyter and Installing Dependencies

Who This Book is for

Conventions

Reader Feedback

Customer Support

Downloading the Example Code

Errata

Piracy

Questions

1. Jupyter Fundamentals

Lesson Objectives

Basic Functionality and Features

Subtopic A: What is a Jupyter Notebook and Why is it Useful?

Subtopic B: Navigating the Platform

Introducing Jupyter Notebooks

Subtopic C: Jupyter Features

Explore some of Jupyter's most useful features

Converting a Jupyter Notebook to a Python Script

Subtopic D: Python Libraries

Import the external libraries and set up the plotting environment

Our First Analysis - The Boston Housing Dataset

Subtopic A: Loading the Data into Jupyter Using a Pandas DataFrame

Load the Boston housing dataset

Subtopic B: Data Exploration

Explore the Boston housing dataset

Subtopic C: Introduction to Predictive Analytics with Jupyter Notebooks

Linear models with Seaborn and scikit-learn

Activity B: Building a Third-Order Polynomial Model

Subtopic D: Using Categorical Features for Segmentation Analysis

Create categorical fields from continuous variables and make segmented visualizations

Summary

2. Data Cleaning and Advanced Machine Learning

Preparing to Train a Predictive Model

Subtopic A: Determining a Plan for Predictive Analytics

Subtopic B: Preprocessing Data for Machine Learning

Explore data preprocessing tools and methods

Activity A: Preparing to Train a Predictive Model for the Employee-Retention Problem

Training Classification Models

Subtopic A: Introduction to Classification Algorithms

Training two-feature classification models with scikit-learn

The plot_decision_regions Function

Training k-nearest neighbors for our model

Training a Random Forest

Subtopic B: Assessing Models with k-Fold Cross-Validation and Validation Curves

Using k-fold cross validation and validation curves in Python with scikit-learn

Subtopic C: Dimensionality Reduction Techniques

Training a predictive model for the employee retention problem

Summary

3. Web Scraping and Interactive Visualizations

Lesson Objectives

Scraping Web Page Data

Subtopic A: Introduction to HTTP Requests

Subtopic B: Making HTTP Requests in the Jupyter Notebook

Handling HTTP requests with Python in a Jupyter Notebook

Subtopic C: Parsing HTML in the Jupyter Notebook

Parsing HTML with Python in a Jupyter Notebook

Activity A: Web Scraping with Jupyter Notebooks

Interactive Visualizations

Subtopic A: Building a DataFrame to Store and Organize Data

Building and merging Pandas DataFrames

Subtopic B: Introduction to Bokeh

Introduction to interactive visualizations with Bokeh

Activity B: Exploring Data with Interactive Visualizations

Summary

Index