Building Machine Learning Systems with Python Second Edition by Coelho, Luis Pedro -- Read -- Imperial Library of Trantor

Index

Building Machine Learning Systems with Python Second Edition

Table of Contents Building Machine Learning Systems with Python Second Edition Credits About the Authors About the Reviewers www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe? Free access for Packt account holders

Preface

What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support

Downloading the example code Errata Piracy Questions

1. Getting Started with Python Machine Learning

Machine learning and Python – a dream team What the book will teach you (and what it will not) What to do when you are stuck Getting started

Introduction to NumPy, SciPy, and matplotlib Installing Python Chewing data efficiently with NumPy and intelligently with SciPy Learning NumPy

Indexing Handling nonexisting values Comparing the runtime

Learning SciPy

Our first (tiny) application of machine learning

Reading in the data Preprocessing and cleaning the data Choosing the right model and learning algorithm

Before building our first model… Starting with a simple straight line Towards some advanced stuff Stepping back to go forward – another look at our data Training and testing Answering our initial question

Summary

2. Classifying with Real-world Examples

The Iris dataset

Visualization is a good first step Building our first classification model

Evaluation – holding out data and cross-validation

Building more complex classifiers A more complex dataset and a more complex classifier

Learning about the Seeds dataset Features and feature engineering Nearest neighbor classification

Classifying with scikit-learn

Looking at the decision boundaries

Binary and multiclass classification Summary

3. Clustering – Finding Related Posts

Measuring the relatedness of posts

How not to do it How to do it

Preprocessing – similarity measured as a similar number of common words

Converting raw text into a bag of words

Counting words Normalizing word count vectors Removing less important words Stemming

Installing and using NLTK Extending the vectorizer with NLTK's stemmer

Stop words on steroids

Our achievements and goals

Clustering

K-means Getting test data to evaluate our ideas on Clustering posts

Solving our initial challenge

Another look at noise

Tweaking the parameters Summary

4. Topic Modeling

Latent Dirichlet allocation

Building a topic model

Comparing documents by topics

Modeling the whole of Wikipedia

Choosing the number of topics Summary

5. Classification – Detecting Poor Answers

Sketching our roadmap Learning to classify classy answers

Tuning the instance Tuning the classifier

Fetching the data

Slimming the data down to chewable chunks Preselection and processing of attributes Defining what is a good answer

Creating our first classifier

Starting with kNN Engineering the features Training the classifier Measuring the classifier's performance Designing more features

Deciding how to improve

Bias-variance and their tradeoff Fixing high bias Fixing high variance High bias or low bias

Using logistic regression

A bit of math with a small example Applying logistic regression to our post classification problem

Looking behind accuracy – precision and recall Slimming the classifier Ship it! Summary

6. Classification II – Sentiment Analysis

Sketching our roadmap Fetching the Twitter data Introducing the Naïve Bayes classifier

Getting to know the Bayes' theorem Being naïve Using Naïve Bayes to classify Accounting for unseen words and other oddities Accounting for arithmetic underflows

Creating our first classifier and tuning it

Solving an easy problem first Using all classes Tuning the classifier's parameters

Cleaning tweets Taking the word types into account

Determining the word types Successfully cheating using SentiWordNet Our first estimator Putting everything together

Summary

7. Regression

Predicting house prices with regression

Multidimensional regression Cross-validation for regression

Penalized or regularized regression

L1 and L2 penalties Using Lasso or ElasticNet in scikit-learn Visualizing the Lasso path P-greater-than-N scenarios An example based on text documents Setting hyperparameters in a principled way

Summary

8. Recommendations

Rating predictions and recommendations

Splitting into training and testing Normalizing the training data A neighborhood approach to recommendations A regression approach to recommendations Combining multiple methods

Basket analysis

Obtaining useful predictions Analyzing supermarket shopping baskets Association rule mining More advanced basket analysis

Summary

9. Classification – Music Genre Classification

Sketching our roadmap Fetching the music data

Converting into a WAV format

Looking at music

Decomposing music into sine wave components

Using FFT to build our first classifier

Increasing experimentation agility Training the classifier Using a confusion matrix to measure accuracy in multiclass problems An alternative way to measure classifier performance using receiver-operator characteristics

Improving classification performance with Mel Frequency Cepstral Coefficients Summary

10. Computer Vision

Introducing image processing

Loading and displaying images Thresholding Gaussian blurring Putting the center in focus Basic image classification Computing features from images Writing your own features Using features to find similar images Classifying a harder dataset

Local feature representations Summary

11. Dimensionality Reduction

Sketching our roadmap Selecting features

Detecting redundant features using filters

Correlation Mutual information

Asking the model about the features using wrappers Other feature selection methods

Feature extraction

About principal component analysis

Sketching PCA Applying PCA

Limitations of PCA and how LDA can help

Multidimensional scaling Summary

12. Bigger Data

Learning about big data

Using jug to break up your pipeline into tasks An introduction to tasks in jug Looking under the hood Using jug for data analysis Reusing partial results

Using Amazon Web Services

Creating your first virtual machines

Installing Python packages on Amazon Linux Running jug on our cloud machine

Automating the generation of clusters with StarCluster

Summary

A. Where to Learn More Machine Learning

Online courses Books

Question and answer sites Blogs Data sources Getting competitive

All that was left out Summary

Index

← Prev
Back
Next →

← Prev
Back
Next →