Building Machine Learning Systems With Python by Richert, Willi -- Read -- Imperial Library of Trantor

Index

Building Machine Learning Systems with Python

Table of Contents Building Machine Learning Systems with Python Credits About the Authors About the Reviewers www.PacktPub.com

Support files, eBooks, discount offers and more

Why Subscribe? Free Access for Packt account holders

Preface

What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support

Downloading the example code Errata Piracy Questions

1. Getting Started with Python Machine Learning

Machine learning and Python – the dream team What the book will teach you (and what it will not) What to do when you are stuck Getting started

Introduction to NumPy, SciPy, and Matplotlib Installing Python Chewing data efficiently with NumPy and intelligently with SciPy Learning NumPy

Indexing Handling non-existing values Comparing runtime behaviors

Learning SciPy

Our first (tiny) machine learning application

Reading in the data Preprocessing and cleaning the data Choosing the right model and learning algorithm

Before building our first model Starting with a simple straight line Towards some advanced stuff Stepping back to go forward – another look at our data Training and testing Answering our initial question

Summary

2. Learning How to Classify with Real-world Examples

The Iris dataset

The first step is visualization Building our first classification model

Evaluation – holding out data and cross-validation

Building more complex classifiers A more complex dataset and a more complex classifier

Learning about the Seeds dataset Features and feature engineering Nearest neighbor classification

Binary and multiclass classification Summary

3. Clustering – Finding Related Posts

Measuring the relatedness of posts

How not to do it How to do it

Preprocessing – similarity measured as similar number of common words

Converting raw text into a bag-of-words Counting words Normalizing the word count vectors Removing less important words Stemming

Installing and using NLTK Extending the vectorizer with NLTK's stemmer

Stop words on steroids Our achievements and goals

Clustering

KMeans Getting test data to evaluate our ideas on Clustering posts

Solving our initial challenge

Another look at noise

Tweaking the parameters Summary

4. Topic Modeling

Latent Dirichlet allocation (LDA)

Building a topic model

Comparing similarity in topic space

Modeling the whole of Wikipedia

Choosing the number of topics Summary

5. Classification – Detecting Poor Answers

Sketching our roadmap Learning to classify classy answers

Tuning the instance Tuning the classifier

Fetching the data

Slimming the data down to chewable chunks Preselection and processing of attributes Defining what is a good answer

Creating our first classifier

Starting with the k-nearest neighbor (kNN) algorithm Engineering the features Training the classifier Measuring the classifier's performance Designing more features

Deciding how to improve

Bias-variance and its trade-off Fixing high bias Fixing high variance High bias or low bias

Using logistic regression

A bit of math with a small example Applying logistic regression to our postclassification problem

Looking behind accuracy – precision and recall Slimming the classifier Ship it! Summary

6. Classification II – Sentiment Analysis

Sketching our roadmap Fetching the Twitter data Introducing the Naive Bayes classifier

Getting to know the Bayes theorem Being naive Using Naive Bayes to classify Accounting for unseen words and other oddities Accounting for arithmetic underflows

Creating our first classifier and tuning it

Solving an easy problem first Using all the classes Tuning the classifier's parameters

Cleaning tweets Taking the word types into account

Determining the word types Successfully cheating using SentiWordNet Our first estimator Putting everything together

Summary

7. Regression – Recommendations

Predicting house prices with regression

Multidimensional regression Cross-validation for regression

Penalized regression

L1 and L2 penalties Using Lasso or Elastic nets in scikit-learn

P greater than N scenarios

An example based on text Setting hyperparameters in a smart way Rating prediction and recommendations

Summary

8. Regression – Recommendations Improved

Improved recommendations

Using the binary matrix of recommendations Looking at the movie neighbors Combining multiple methods

Basket analysis

Obtaining useful predictions Analyzing supermarket shopping baskets Association rule mining More advanced basket analysis

Summary

9. Classification III – Music Genre Classification

Sketching our roadmap Fetching the music data

Converting into a wave format

Looking at music

Decomposing music into sine wave components

Using FFT to build our first classifier

Increasing experimentation agility Training the classifier Using the confusion matrix to measure accuracy in multiclass problems An alternate way to measure classifier performance using receiver operator characteristic (ROC)

Improving classification performance with Mel Frequency Cepstral Coefficients Summary

10. Computer Vision – Pattern Recognition

Introducing image processing Loading and displaying images

Basic image processing

Thresholding Gaussian blurring Filtering for different effects

Adding salt and pepper noise

Putting the center in focus

Pattern recognition Computing features from images Writing your own features

Classifying a harder dataset Local feature representations Summary

11. Dimensionality Reduction

Sketching our roadmap Selecting features

Detecting redundant features using filters

Correlation Mutual information

Asking the model about the features using wrappers

Other feature selection methods Feature extraction

About principal component analysis (PCA)

Sketching PCA Applying PCA

Limitations of PCA and how LDA can help

Multidimensional scaling (MDS) Summary

12. Big(ger) Data

Learning about big data Using jug to break up your pipeline into tasks

About tasks Reusing partial results Looking under the hood Using jug for data analysis

Using Amazon Web Services (AWS)

Creating your first machines

Installing Python packages on Amazon Linux Running jug on our cloud machine

Automating the generation of clusters with starcluster

Summary

A. Where to Learn More about Machine Learning

Online courses Books

Q&A sites Blogs Data sources Getting competitive

What was left out Summary

Index

← Prev
Back
Next →

← Prev
Back
Next →