Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Title Page Copyright and Credits
Feature Engineering Made Easy
Packt Upsell
Why subscribe? PacktPub.com
Contributors
About the authors About the reviewer Packt is searching for authors like you
Preface
Who this book is for What this book covers To get the most out of this book
Download the example code files Download the color images Conventions used
Get in touch
Reviews
Introduction to Feature Engineering
Motivating example – AI-powered communications Why feature engineering matters What is feature engineering?
Understanding the basics of data and machine learning
Supervised learning Unsupervised learning
Unsupervised learning example – marketing segments
Evaluation of machine learning algorithms and feature engineering procedures
Example of feature engineering procedures – can anyone really predict the weather? Steps to evaluate a feature engineering procedure Evaluating supervised learning algorithms Evaluating unsupervised learning algorithms
Feature understanding – what’s in my dataset? Feature improvement – cleaning datasets Feature selection – say no to bad attributes Feature construction – can we build it? Feature transformation – enter math-man Feature learning – using AI to better our AI Summary
Feature Understanding – What's in My Dataset?
The structure, or lack thereof, of data An example of unstructured data – server logs Quantitative versus qualitative data
Salary ranges by job classification
The four levels of data
The nominal level
Mathematical operations allowed
The ordinal level
Mathematical operations allowed
The interval level
Mathematical operations allowed Plotting two columns at the interval level
The ratio level
Mathematical operations allowed
Recap of the levels of data Summary
Feature Improvement - Cleaning Datasets
Identifying missing values in data
The Pima Indian Diabetes Prediction dataset The exploratory data analysis (EDA)
Dealing with missing values in a dataset
Removing harmful rows of data Imputing the missing values in data Imputing values in a machine learning pipeline
Pipelines in machine learning
Standardization and normalization
Z-score standardization The min-max scaling method The row normalization method Putting it all together
Summary
Feature Construction
Examining our dataset Imputing categorical features
Custom imputers Custom category imputer Custom quantitative imputer
Encoding categorical variables
Encoding at the nominal level Encoding at the ordinal level Bucketing continuous features into categories Creating our pipeline
Extending numerical features
Activity recognition from the Single Chest-Mounted Accelerometer dataset Polynomial features
Parameters Exploratory data analysis
Text-specific feature construction
Bag of words representation CountVectorizer
CountVectorizer parameters
The Tf-idf vectorizer Using text in machine learning pipelines
Summary
Feature Selection
Achieving better performance in feature engineering
A case study – a credit card defaulting dataset
Creating a baseline machine learning pipeline The types of feature selection
Statistical-based feature selection
Using Pearson correlation to select features Feature selection using hypothesis testing
Interpreting the p-value Ranking the p-value
Model-based feature selection
A brief refresher on natural language processing Using machine learning to select features
Tree-based model feature selection metrics
Linear models and regularization
A brief introduction to regularization Linear model coefficients as another feature importance metric
Choosing the right feature selection method Summary
Feature Transformations
Dimension reduction – feature transformations versus feature selection versus feature construction Principal Component Analysis
How PCA works PCA with the Iris dataset – manual example
Creating the covariance matrix of the dataset Calculating the eigenvalues of the covariance matrix Keeping the top k eigenvalues (sorted by the descending eigenvalues) Using the kept eigenvectors to transform new data-points
Scikit-learn's PCA How centering and scaling data affects PCA A deeper look into the principal components Linear Discriminant Analysis
How LDA works
Calculating the mean vectors of each class Calculating within-class and between-class scatter matrices  Calculating eigenvalues and eigenvectors for SW-1SB  Keeping the top k eigenvectors by ordering them by descending eigenvalues Using the top eigenvectors to project onto the new space
How to use LDA in scikit-learn
LDA versus PCA – iris dataset Summary
Feature Learning
Parametric assumptions of data
Non-parametric fallacy The algorithms of this chapter
Restricted Boltzmann Machines
Not necessarily dimension reduction The graph of a Restricted Boltzmann Machine The restriction of a Boltzmann Machine Reconstructing the data MNIST dataset
The BernoulliRBM
Extracting PCA components from MNIST
Extracting RBM components from MNIST Using RBMs in a machine learning pipeline
Using a linear model on raw pixel values Using a linear model on extracted PCA components Using a linear model on extracted RBM components
Learning text features – word vectorizations
Word embeddings Two approaches to word embeddings - Word2vec and GloVe Word2Vec - another shallow neural network The gensim package for creating Word2vec embeddings Application of word embeddings - information retrieval
Summary
Case Studies
Case study 1 - facial recognition
Applications of facial recognition The data Some data exploration Applied facial recognition
Case study 2 - predicting topics of hotel reviews data
Applications of text clustering Hotel review data Exploration of the data The clustering model SVD versus PCA components Latent semantic analysis 
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion