Python Machine Learning By Example, Third Edition by Yuxi (Hayden) Liu -- Read -- Imperial Library of Trantor

Index

Preface

Who this book is for What this book covers To get the most out of this book Get in touch

Getting Started with Machine Learning and Python

An introduction to machine learning

Understanding why we need machine learning Differentiating between machine learning and automation Machine learning applications

Knowing the prerequisites Getting started with three types of machine learning

A brief history of the development of machine learning algorithms

Digging into the core of machine learning

Generalizing with data Overfitting, underfitting, and the bias-variance trade-off

Overfitting Underfitting The bias-variance trade-off

Avoiding overfitting with cross-validation Avoiding overfitting with regularization Avoiding overfitting with feature selection and dimensionality reduction

Data preprocessing and feature engineering

Preprocessing and exploration Dealing with missing values Label encoding One-hot encoding Scaling Feature engineering Polynomial transformation Power transforms Binning

Combining models

Voting and averaging Bagging Boosting Stacking

Installing software and setting up

Setting up Python and environments Installing the main Python packages

NumPy SciPy Pandas Scikit-learn TensorFlow

Introducing TensorFlow 2

Summary Exercises

Building a Movie Recommendation Engine with Naïve Bayes

Getting started with classification

Binary classification Multiclass classification Multi-label classification

Exploring Naïve Bayes

Learning Bayes' theorem by example The mechanics of Naïve Bayes

Implementing Naïve Bayes

Implementing Naïve Bayes from scratch Implementing Naïve Bayes with scikit-learn

Building a movie recommender with Naïve Bayes Evaluating classification performance Tuning models with cross-validation Summary Exercise References

Recognizing Faces with Support Vector Machine

Finding the separating boundary with SVM

Scenario 1 – identifying a separating hyperplane Scenario 2 – determining the optimal hyperplane Scenario 3 – handling outliers Implementing SVM Scenario 4 – dealing with more than two classes Scenario 5 – solving linearly non-separable problems with kernels Choosing between linear and RBF kernels

Classifying face images with SVM

Exploring the face image dataset Building an SVM-based image classifier Boosting image classification performance with PCA

Fetal state classification on cardiotocography Summary Exercises

Predicting Online Ad Click-Through with Tree-Based Algorithms

A brief overview of ad click-through prediction Getting started with two types of data – numerical and categorical Exploring a decision tree from the root to the leaves

Constructing a decision tree The metrics for measuring a split

Gini Impurity Information Gain

Implementing a decision tree from scratch Implementing a decision tree with scikit-learn Predicting ad click-through with a decision tree Ensembling decision trees – random forest Ensembling decision trees – gradient boosted trees Summary Exercises

Predicting Online Ads Click-Through with Logistic Regression

Converting categorical features to numerical—one-hot encoding and ordinal encoding Classifying data with logistic regression

Getting started with the logistic function Jumping from the logistic function to logistic regression

Training a logistic regression model

Training a logistic regression model using gradient descent Predicting ad click-through with logistic regression using gradient descent Training a logistic regression model using stochastic gradient descent Training a logistic regression model with regularization Feature selection using L1 regularization

Training on large datasets with online learning Handling multiclass classification Implementing logistic regression using TensorFlow Feature selection using random forest Summary Exercises

Scaling Up Prediction to Terabyte Click Logs

Learning the essentials of Apache Spark

Breaking down Spark Installing Spark Launching and deploying Spark programs

Programming in PySpark Learning on massive click logs with Spark

Loading click logs Splitting and caching the data One-hot encoding categorical features Training and testing a logistic regression model

Feature engineering on categorical variables with Spark

Hashing categorical features Combining multiple variables – feature interaction

Summary Exercises

Predicting Stock Prices with Regression Algorithms

A brief overview of the stock market and stock prices What is regression? Mining stock price data

Getting started with feature engineering Acquiring data and generating features

Estimating with linear regression

How does linear regression work? Implementing linear regression from scratch Implementing linear regression with scikit-learn Implementing linear regression with TensorFlow

Estimating with decision tree regression

Transitioning from classification trees to regression trees Implementing decision tree regression Implementing a regression forest

Estimating with support vector regression

Implementing SVR

Evaluating regression performance Predicting stock prices with the three regression algorithms Summary Exercises

Predicting Stock Prices with Artificial Neural Networks

Demystifying neural networks

Starting with a single-layer neural network

Layers in neural networks

Activation functions Backpropagation Adding more layers to a neural network: DL

Building neural networks

Implementing neural networks from scratch Implementing neural networks with scikit-learn Implementing neural networks with TensorFlow

Picking the right activation functions Preventing overfitting in neural networks

Dropout Early stopping

Predicting stock prices with neural networks

Training a simple neural network Fine-tuning the neural network

Summary Exercise

Mining the 20 Newsgroups Dataset with Text Analysis Techniques

How computers understand language – NLP

What is NLP? The history of NLP NLP applications

Touring popular NLP libraries and picking up NLP basics

Installing famous NLP libraries Corpora Tokenization PoS tagging NER Stemming and lemmatization Semantics and topic modeling

Getting the newsgroups data Exploring the newsgroups data Thinking about features for text data

Counting the occurrence of each word token Text preprocessing Dropping stop words Reducing inflectional and derivational forms of words

Visualizing the newsgroups data with t-SNE

What is dimensionality reduction? t-SNE for dimensionality reduction

Summary Exercises

Discovering Underlying Topics in the Newsgroups Dataset with Clustering and Topic Modeling

Learning without guidance – unsupervised learning Clustering newsgroups data using k-means

How does k-means clustering work? Implementing k-means from scratch Implementing k-means with scikit-learn Choosing the value of k Clustering newsgroups data using k-means

Discovering underlying topics in newsgroups

Topic modeling using NMF Topic modeling using LDA

Summary Exercises

Machine Learning Best Practices

Machine learning solution workflow Best practices in the data preparation stage

Best practice 1 – Completely understanding the project goal Best practice 2 – Collecting all fields that are relevant Best practice 3 – Maintaining the consistency of field values Best practice 4 – Dealing with missing data Best practice 5 – Storing large-scale data

Best practices in the training sets generation stage

Best practice 6 – Identifying categorical features with numerical values Best practice 7 – Deciding whether to encode categorical features Best practice 8 – Deciding whether to select features, and if so, how to do so Best practice 9 – Deciding whether to reduce dimensionality, and if so, how to do so Best practice 10 – Deciding whether to rescale features Best practice 11 – Performing feature engineering with domain expertise Best practice 12 – Performing feature engineering without domain expertise Binarization Discretization Interaction Polynomial transformation Best practice 13 – Documenting how each feature is generated Best practice 14 – Extracting features from text data Tf and tf-idf Word embedding Word embedding with pre-trained models

Best practices in the model training, evaluation, and selection stage

Best practice 15 – Choosing the right algorithm(s) to start with

Naïve Bayes Logistic regression SVM Random forest (or decision tree) Neural networks

Best practice 16 – Reducing overfitting Best practice 17 – Diagnosing overfitting and underfitting Best practice 18 – Modeling on large-scale datasets

Best practices in the deployment and monitoring stage

Best practice 19 – Saving, loading, and reusing models Saving and restoring models using pickle Saving and restoring models in TensorFlow Best practice 20 – Monitoring model performance Best practice 21 – Updating models regularly

Summary Exercises

Categorizing Images of Clothing with Convolutional Neural Networks

Getting started with CNN building blocks

The convolutional layer The nonlinear layer The pooling layer

Architecting a CNN for classification Exploring the clothing image dataset Classifying clothing images with CNNs

Architecting the CNN model Fitting the CNN model Visualizing the convolutional filters

Boosting the CNN classifier with data augmentation

Horizontal flipping for data augmentation Rotation for data augmentation Shifting for data augmentation

Improving the clothing image classifier with data augmentation Summary Exercises

Making Predictions with Sequences Using Recurrent Neural Networks

Introducing sequential learning Learning the RNN architecture by example

Recurrent mechanism Many-to-one RNNs One-to-many RNNs Many-to-many (synced) RNNs Many-to-many (unsynced) RNNs

Training an RNN model Overcoming long-term dependencies with Long Short-Term Memory Analyzing movie review sentiment with RNNs

Analyzing and preprocessing the data Building a simple LSTM network Stacking multiple LSTM layers

Writing your own War and Peace with RNNs

Acquiring and analyzing the training data Constructing the training set for the RNN text generator Building an RNN text generator Training the RNN text generator

Advancing language understanding with the Transformer model

Exploring the Transformer's architecture Understanding self-attention

Summary Exercises

Making Decisions in Complex Environments with Reinforcement Learning

Setting up the working environment

Installing PyTorch Installing OpenAI Gym

Introducing reinforcement learning with examples

Elements of reinforcement learning Cumulative rewards Approaches to reinforcement learning

Solving the FrozenLake environment with dynamic programming

Simulating the FrozenLake environment Solving FrozenLake with the value iteration algorithm Solving FrozenLake with the policy iteration algorithm

Performing Monte Carlo learning

Simulating the Blackjack environment Performing Monte Carlo policy evaluation Performing on-policy Monte Carlo control

Solving the Taxi problem with the Q-learning algorithm

Simulating the Taxi environment Developing the Q-learning algorithm

Summary Exercises

Other Books You May Enjoy Index

← Prev
Back
Next →

← Prev
Back
Next →