Preface Who this book is for What this book covers To get the most out of this book Download the example code files Download the color images Conventions used Get in touch Reviews Introduction to Feature Engineering Motivating example – AI-powered communications Why feature engineering matters What is feature engineering? Understanding the basics of data and machine learning Supervised learning Unsupervised learning Unsupervised learning example – marketing segments Evaluation of machine learning algorithms and feature engineering procedures Example of feature engineering procedures – can anyone really predict the weather? Steps to evaluate a feature engineering procedure Evaluating supervised learning algorithms Evaluating unsupervised learning algorithms Feature understanding – what’s in my dataset? Feature improvement – cleaning datasets Feature selection – say no to bad attributes Feature construction – can we build it? Feature transformation – enter math-man Feature learning – using AI to better our AI Summary Feature Understanding – What's in My Dataset? The structure, or lack thereof, of data An example of unstructured data – server logs Quantitative versus qualitative data Salary ranges by job classification The four levels of data The nominal level Mathematical operations allowed The ordinal level Mathematical operations allowed The interval level Mathematical operations allowed Plotting two columns at the interval level The ratio level Mathematical operations allowed Recap of the levels of data Summary Feature Improvement - Cleaning Datasets Identifying missing values in data The Pima Indian Diabetes Prediction dataset The exploratory data analysis (EDA) Dealing with missing values in a dataset Removing harmful rows of data Imputing the missing values in data Imputing values in a machine learning pipeline Pipelines in machine learning Standardization and normalization Z-score standardization The min-max scaling method The row normalization method Putting it all together Summary Feature Construction Examining our dataset Imputing categorical features Custom imputers Custom category imputer Custom quantitative imputer Encoding categorical variables Encoding at the nominal level Encoding at the ordinal level Bucketing continuous features into categories Creating our pipeline Extending numerical features Activity recognition from the Single Chest-Mounted Accelerometer dataset Polynomial features Parameters Exploratory data analysis Text-specific feature construction Bag of words representation CountVectorizer CountVectorizer parameters The Tf-idf vectorizer Using text in machine learning pipelines Summary Feature Selection Achieving better performance in feature engineering A case study – a credit card defaulting dataset Creating a baseline machine learning pipeline The types of feature selection Statistical-based feature selection Using Pearson correlation to select features Feature selection using hypothesis testing Interpreting the p-value Ranking the p-value Model-based feature selection A brief refresher on natural language processing Using machine learning to select features Tree-based model feature selection metrics Linear models and regularization A brief introduction to regularization Linear model coefficients as another feature importance metric Choosing the right feature selection method Summary Feature Transformations Dimension reduction – feature transformations versus feature selection versus feature construction Principal Component Analysis How PCA works PCA with the Iris dataset – manual example Creating the covariance matrix of the dataset Calculating the eigenvalues of the covariance matrix Keeping the top k eigenvalues (sorted by the descending eigenvalues) Using the kept eigenvectors to transform new data-points Scikit-learn's PCA How centering and scaling data affects PCA A deeper look into the principal components Linear Discriminant Analysis How LDA works Calculating the mean vectors of each class Calculating within-class and between-class scatter matrices Calculating eigenvalues and eigenvectors for SW-1SB Keeping the top k eigenvectors by ordering them by descending eigenvalues Using the top eigenvectors to project onto the new space How to use LDA in scikit-learn LDA versus PCA – iris dataset Summary Feature Learning Parametric assumptions of data Non-parametric fallacy The algorithms of this chapter Restricted Boltzmann Machines Not necessarily dimension reduction The graph of a Restricted Boltzmann Machine The restriction of a Boltzmann Machine Reconstructing the data MNIST dataset The BernoulliRBM Extracting PCA components from MNIST Extracting RBM components from MNIST Using RBMs in a machine learning pipeline Using a linear model on raw pixel values Using a linear model on extracted PCA components Using a linear model on extracted RBM components Learning text features – word vectorizations Word embeddings Two approaches to word embeddings - Word2vec and GloVe Word2Vec - another shallow neural network The gensim package for creating Word2vec embeddings Application of word embeddings - information retrieval Summary Case Studies Case study 1 - facial recognition Applications of facial recognition The data Some data exploration Applied facial recognition Case study 2 - predicting topics of hotel reviews data Applications of text clustering Hotel review data Exploration of the data The clustering model SVD versus PCA components Latent semantic analysis Summary Other Books You May Enjoy Leave a review - let other readers know what you think