Python · Real World Machine Learning by Joshi, Prateek -- Read -- Imperial Library of Trantor

Log In

Or create an account ->

Imperial Library

Home
About
News
Upload
Forum

Help

Login/SignUp

Index

Python: Real World Machine Learning Python: Real World Machine Learning Credits Preface What this learning path covers What you need for this learning path Who this learning path is for Reader feedback Customer support Downloading the example code Errata Piracy Questions I. Module 1 1. The Realm of Supervised Learning Introduction Preprocessing data using different techniques Getting ready How to do it… Mean removal Scaling Normalization Binarization One Hot Encoding Label encoding How to do it… Building a linear regressor Getting ready How to do it… Computing regression accuracy Getting ready How to do it… Achieving model persistence How to do it… Building a ridge regressor Getting ready How to do it… Building a polynomial regressor Getting ready How to do it… Estimating housing prices Getting ready How to do it… Computing the relative importance of features How to do it… Estimating bicycle demand distribution Getting ready How to do it… There's more… 2. Constructing a Classifier Introduction Building a simple classifier How to do it… There's more… Building a logistic regression classifier How to do it… Building a Naive Bayes classifier How to do it… Splitting the dataset for training and testing How to do it… Evaluating the accuracy using cross-validation Getting ready… How to do it… Visualizing the confusion matrix How to do it… Extracting the performance report How to do it… Evaluating cars based on their characteristics Getting ready How to do it… Extracting validation curves How to do it… Extracting learning curves How to do it… Estimating the income bracket How to do it… 3. Predictive Modeling Introduction Building a linear classifier using Support Vector Machine (SVMs) Getting ready How to do it… Building a nonlinear classifier using SVMs How to do it… Tackling class imbalance How to do it… Extracting confidence measurements How to do it… Finding optimal hyperparameters How to do it… Building an event predictor Getting ready How to do it… Estimating traffic Getting ready How to do it… 4. Clustering with Unsupervised Learning Introduction Clustering data using the k-means algorithm How to do it… Compressing an image using vector quantization How to do it… Building a Mean Shift clustering model How to do it… Grouping data using agglomerative clustering How to do it… Evaluating the performance of clustering algorithms How to do it… Automatically estimating the number of clusters using DBSCAN algorithm How to do it… Finding patterns in stock market data How to do it… Building a customer segmentation model How to do it… 5. Building Recommendation Engines Introduction Building function compositions for data processing How to do it… Building machine learning pipelines How to do it… How it works… Finding the nearest neighbors How to do it… Constructing a k-nearest neighbors classifier How to do it… How it works… Constructing a k-nearest neighbors regressor How to do it… How it works… Computing the Euclidean distance score How to do it… Computing the Pearson correlation score How to do it… Finding similar users in the dataset How to do it… Generating movie recommendations How to do it… 6. Analyzing Text Data Introduction Preprocessing data using tokenization How to do it… Stemming text data How to do it… How it works… Converting text to its base form using lemmatization How to do it… Dividing text using chunking How to do it… Building a bag-of-words model How to do it… How it works… Building a text classifier How to do it… How it works… Identifying the gender How to do it… Analyzing the sentiment of a sentence How to do it… How it works… Identifying patterns in text using topic modeling How to do it… How it works… 7. Speech Recognition Introduction Reading and plotting audio data How to do it… Transforming audio signals into the frequency domain How to do it… Generating audio signals with custom parameters How to do it… Synthesizing music How to do it… Extracting frequency domain features How to do it… Building Hidden Markov Models How to do it… Building a speech recognizer How to do it… 8. Dissecting Time Series and Sequential Data Introduction Transforming data into the time series format How to do it… Slicing time series data How to do it… Operating on time series data How to do it… Extracting statistics from time series data How to do it… Building Hidden Markov Models for sequential data Getting ready How to do it… Building Conditional Random Fields for sequential text data Getting ready How to do it… Analyzing stock market data using Hidden Markov Models How to do it… 9. Image Content Analysis Introduction Operating on images using OpenCV-Python How to do it… Detecting edges How to do it… Histogram equalization How to do it… Detecting corners How to do it… Detecting SIFT feature points How to do it… Building a Star feature detector How to do it… Creating features using visual codebook and vector quantization How to do it… Training an image classifier using Extremely Random Forests How to do it… Building an object recognizer How to do it… 10. Biometric Face Recognition Introduction Capturing and processing video from a webcam How to do it… Building a face detector using Haar cascades How to do it… Building eye and nose detectors How to do it… Performing Principal Components Analysis How to do it… Performing Kernel Principal Components Analysis How to do it… Performing blind source separation How to do it… Building a face recognizer using Local Binary Patterns Histogram How to do it… 11. Deep Neural Networks Introduction Building a perceptron How to do it… Building a single layer neural network How to do it… Building a deep neural network How to do it… Creating a vector quantizer How to do it… Building a recurrent neural network for sequential data analysis How to do it… Visualizing the characters in an optical character recognition database How to do it… Building an optical character recognizer using neural networks How to do it… 12. Visualizing Data Introduction Plotting 3D scatter plots How to do it… Plotting bubble plots How to do it… Animating bubble plots How to do it… Drawing pie charts How to do it… Plotting date-formatted time series data How to do it… Plotting histograms How to do it… Visualizing heat maps How to do it… Animating dynamic signals How to do it… II. Module 2 1. Unsupervised Machine Learning Principal component analysis PCA – a primer Employing PCA Introducing k-means clustering Clustering – a primer Kick-starting clustering analysis Tuning your clustering configurations Self-organizing maps SOM – a primer Employing SOM Further reading Summary 2. Deep Belief Networks Neural networks – a primer The composition of a neural network Network topologies Restricted Boltzmann Machine Introducing the RBM Topology Training Applications of the RBM Further applications of the RBM Deep belief networks Training a DBN Applying the DBN Validating the DBN Further reading Summary 3. Stacked Denoising Autoencoders Autoencoders Introducing the autoencoder Topology Training Denoising autoencoders Applying a dA Stacked Denoising Autoencoders Applying the SdA Assessing SdA performance Further reading Summary 4. Convolutional Neural Networks Introducing the CNN Understanding the convnet topology Understanding convolution layers Understanding pooling layers Training a convnet Putting it all together Applying a CNN Further Reading Summary 5. Semi-Supervised Learning Introduction Understanding semi-supervised learning Semi-supervised algorithms in action Self-training Implementing self-training Finessing your self-training implementation Improving the selection process Contrastive Pessimistic Likelihood Estimation Further reading Summary 6. Text Feature Engineering Introduction Text feature engineering Cleaning text data Text cleaning with BeautifulSoup Managing punctuation and tokenizing Tagging and categorising words Tagging with NLTK Sequential tagging Backoff tagging Creating features from text data Stemming Bagging and random forests Testing our prepared data Further reading Summary 7. Feature Engineering Part II Introduction Creating a feature set Engineering features for ML applications Using rescaling techniques to improve the learnability of features Creating effective derived variables Reinterpreting non-numeric features Using feature selection techniques Performing feature selection Correlation LASSO Recursive Feature Elimination Genetic models Feature engineering in practice Acquiring data via RESTful APIs Testing the performance of our model Twitter Translink Twitter Consumer comments The Bing Traffic API Deriving and selecting variables using feature engineering techniques The weather API Further reading Summary 8. Ensemble Methods Introducing ensembles Understanding averaging ensembles Using bagging algorithms Using random forests Applying boosting methods Using XGBoost Using stacking ensembles Applying ensembles in practice Using models in dynamic applications Understanding model robustness Identifying modeling risk factors Strategies to managing model robustness Further reading Summary 9. Additional Python Machine Learning Tools Alternative development tools Introduction to Lasagne Getting to know Lasagne Introduction to TensorFlow Getting to know TensorFlow Using TensorFlow to iteratively improve our models Knowing when to use these libraries Further reading Summary A. Chapter Code Requirements III. Module 3 1. First Steps to Scalability Explaining scalability in detail Making large scale examples Introducing Python Scale up with Python Scale out with Python Python for large scale machine learning Choosing between Python 2 and Python 3 Package upgrades Scientific distributions Introducing Jupyter/IPython Python packages NumPy SciPy Pandas Scikit-learn The matplotlib package Gensim H2O XGBoost Theano TensorFlow The sknn library Theanets Keras Other useful packages to install on your system Summary 2. Scalable Learning in Scikit-learn Out-of-core learning Subsampling as a viable option Optimizing one instance at a time Building an out-of-core learning system Streaming data from sources Datasets to try the real thing yourself The first example – streaming the bike-sharing dataset Using pandas I/O tools Working with databases Paying attention to the ordering of instances Stochastic learning Batch gradient descent Stochastic gradient descent The Scikit-learn SGD implementation Defining SGD learning parameters Feature management with data streams Describing the target The hashing trick Other basic transformations Testing and validation in a stream Trying SGD in action Summary 3. Fast SVM Implementations Datasets to experiment with on your own The bike-sharing dataset The covertype dataset Support Vector Machines Hinge loss and its variants Understanding the Scikit-learn SVM implementation Pursuing nonlinear SVMs by subsampling Achieving SVM at scale with SGD Feature selection by regularization Including non-linearity in SGD Trying explicit high-dimensional mappings Hyperparameter tuning Other alternatives for SVM fast learning Nonlinear and faster with Vowpal Wabbit Installing VW Understanding the VW data format Python integration A few examples using reductions for SVM and neural nets Faster bike-sharing The covertype dataset crunched by VW Summary 4. Neural Networks and Deep Learning The neural network architecture What and how neural networks learn Choosing the right architecture The input layer The hidden layer The output layer Neural networks in action Parallelization for sknn Neural networks and regularization Neural networks and hyperparameter optimization Neural networks and decision boundaries Deep learning at scale with H2O Large scale deep learning with H2O Gridsearch on H2O Deep learning and unsupervised pretraining Deep learning with theanets Autoencoders and unsupervised learning Autoencoders Summary 5. Deep Learning with TensorFlow TensorFlow installation TensorFlow operations GPU computing Linear regression with SGD A neural network from scratch in TensorFlow Machine learning on TensorFlow with SkFlow Deep learning with large files – incremental learning Keras and TensorFlow installation Convolutional Neural Networks in TensorFlow through Keras The convolution layer The pooling layer The fully connected layer CNN's with an incremental approach GPU Computing Summary 6. Classification and Regression Trees at Scale Bootstrap aggregation Random forest and extremely randomized forest Fast parameter optimization with randomized search Extremely randomized trees and large datasets CART and boosting Gradient Boosting Machines max_depth learning_rate Subsample Faster GBM with warm_start Speeding up GBM with warm_start Training and storing GBM models XGBoost XGBoost regression XGBoost and variable importance XGBoost streaming large datasets XGBoost model persistence Out-of-core CART with H2O Random forest and gridsearch on H2O Stochastic gradient boosting and gridsearch on H2O Summary 7. Unsupervised Learning at Scale Unsupervised methods Feature decomposition – PCA Randomized PCA Incremental PCA Sparse PCA PCA with H2O Clustering – K-means Initialization methods K-means assumptions Selection of the best K Scaling K-means – mini-batch K-means with H2O LDA Scaling LDA – memory, CPUs, and machines Summary 8. Distributed Environments – Hadoop and Spark From a standalone machine to a bunch of nodes Why do we need a distributed framework? Setting up the VM VirtualBox Vagrant Using the VM The Hadoop ecosystem Architecture HDFS MapReduce YARN Spark pySpark Summary 9. Practical Machine Learning with Spark Setting up the VM for this chapter Sharing variables across cluster nodes Broadcast read-only variables Accumulators write-only variables Broadcast and accumulators together – an example Data preprocessing in Spark JSON files and Spark DataFrames Dealing with missing data Grouping and creating tables in-memory Writing the preprocessed DataFrame or RDD to disk Working with Spark DataFrames Machine learning with Spark Spark on the KDD99 dataset Reading the dataset Feature engineering Training a learner Evaluating a learner's performance The power of the ML pipeline Manual tuning Cross-validation Final cleanup Summary A. Introduction to GPUs and Theano GPU computing Theano – parallel computing on the GPU Installing Theano A. Bibliography Index

← Prev
Back
Next →

← Prev
Back
Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion