Machine Learning with Spark · 2nd Edition by Pentreath, Nick -- Read -- Imperial Library of Trantor

Log In

Or create an account ->

Imperial Library

Home
About
News
Upload
Forum

Help

Login/SignUp

Index

Machine Learning with Spark, Second Edition Title Page Copyright Credits About the Authors About the Reviewer www.PacktPub.com Customer Feedback Table of Contents Preface What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support Downloading the example code Downloading the color images of this book Errata Piracy Questions Getting Up and Running with Spark Installing and setting up Spark locally Spark clusters The Spark programming model SparkContext and SparkConf SparkSession The Spark shell Resilient Distributed Datasets Creating RDDs Spark operations Caching RDDs Broadcast variables and accumulators SchemaRDD Spark data frame The first step to a Spark program in Scala The first step to a Spark program in Java The first step to a Spark program in Python The first step to a Spark program in R SparkR DataFrames Getting Spark running on Amazon EC2 Launching an EC2 Spark cluster Configuring and running Spark on Amazon Elastic Map Reduce UI in Spark Supported machine learning algorithms by Spark Benefits of using Spark ML as compared to existing libraries Spark Cluster on Google Compute Engine - DataProc Hadoop and Spark Versions Creating a Cluster Submitting a Job Summary Math for Machine Learning Linear algebra Setting up the Scala environment in Intellij Setting up the Scala environment on the Command Line Fields Real numbers Complex numbers Vectors Vector spaces Vector types Vectors in Breeze Vectors in Spark Vector operations Hyperplanes Vectors in machine learning Matrix Types of matrices Matrix in Spark Distributed matrix in Spark Matrix operations Determinant Eigenvalues and eigenvectors Singular value decomposition Matrices in machine learning Functions Function types Functional composition Hypothesis Gradient descent Prior, likelihood, and posterior Calculus Differential calculus Integral calculus Lagranges multipliers Plotting Summary Designing a Machine Learning System What is Machine Learning? Introducing MovieStream Business use cases for a machine learning system Personalization Targeted marketing and customer segmentation Predictive modeling and analytics Types of machine learning models The components of a data-driven machine learning system Data ingestion and storage Data cleansing and transformation Model training and testing loop Model deployment and integration Model monitoring and feedback Batch versus real time Data Pipeline in Apache Spark An architecture for a machine learning system Spark MLlib Performance improvements in Spark ML over Spark MLlib Comparing algorithms supported by MLlib Classification Clustering Regression MLlib supported methods and developer APIs Spark Integration MLlib vision MLlib versions compared Spark 1.6 to 2.0 Summary Obtaining, Processing, and Preparing Data with Spark Accessing publicly available datasets The MovieLens 100k dataset Exploring and visualizing your data Exploring the user dataset Count by occupation Movie dataset Exploring the rating dataset Rating count bar chart Distribution of number ratings Processing and transforming your data Filling in bad or missing data Extracting useful features from your data Numerical features Categorical features Derived features Transforming timestamps into categorical features Extract time of day Text features Simple text feature extraction Sparse Vectors from Titles Normalizing features Using ML for feature normalization Using packages for feature extraction TFID IDF Word2Vector Skip-gram model Standard scalar Summary Building a Recommendation Engine with Spark Types of recommendation models Content-based filtering Collaborative filtering Matrix factorization Explicit matrix factorization Implicit Matrix Factorization Basic model for Matrix Factorization Alternating least squares Extracting the right features from your data Extracting features from the MovieLens 100k dataset Training the recommendation model Training a model on the MovieLens 100k dataset Training a model using Implicit feedback data Using the recommendation model ALS Model recommendations User recommendations Generating movie recommendations from the MovieLens 100k dataset Inspecting the recommendations Item recommendations Generating similar movies for the MovieLens 100k dataset Inspecting the similar items Evaluating the performance of recommendation models ALS Model Evaluation Mean Squared Error Mean Average Precision at K Using MLlib's built-in evaluation functions RMSE and MSE MAP FP-Growth algorithm FP-Growth Basic Sample FP-Growth Applied to Movie Lens Data Summary Building a Classification Model with Spark Types of classification models Linear models Logistic regression Multinomial logistic regression Visualizing the StumbleUpon dataset Extracting features from the Kaggle/StumbleUpon evergreen classification dataset StumbleUponExecutor Linear support vector machines The naive Bayes model Decision trees Ensembles of trees Random Forests Gradient-Boosted Trees Multilayer perceptron classifier Extracting the right features from your data Training classification models Training a classification model on the Kaggle/StumbleUpon evergreen classification dataset Using classification models Generating predictions for the Kaggle/StumbleUpon evergreen classification dataset Evaluating the performance of classification models Accuracy and prediction error Precision and recall ROC curve and AUC Improving model performance and tuning parameters Feature standardization Additional features Using the correct form of data Tuning model parameters Linear models Iterations Step size Regularization Decision trees Tuning tree depth and impurity The naive Bayes model Cross-validation Summary Building a Regression Model with Spark Types of regression models Least squares regression Decision trees for regression Evaluating the performance of regression models Mean Squared Error and Root Mean Squared Error Mean Absolute Error Root Mean Squared Log Error The R-squared coefficient Extracting the right features from your data Extracting features from the bike sharing dataset Training and using regression models BikeSharingExecutor Training a regression model on the bike sharing dataset Generalized linear regression Decision tree regression Ensembles of trees Random forest regression Gradient boosted tree regression Improving model performance and tuning parameters Transforming the target variable Impact of training on log-transformed targets Tuning model parameters Creating training and testing sets to evaluate parameters Splitting data for Decision tree The impact of parameter settings for linear models Iterations Step size L2 regularization L1 regularization Intercept The impact of parameter settings for the decision tree Tree depth Maximum bins The impact of parameter settings for the Gradient Boosted Trees Iterations MaxBins Summary Building a Clustering Model with Spark Types of clustering models k-means clustering Initialization methods Mixture models Hierarchical clustering Extracting the right features from your data Extracting features from the MovieLens dataset K-means - training a clustering model Training a clustering model on the MovieLens dataset K-means - interpreting cluster predictions on the MovieLens dataset Interpreting the movie clusters Interpreting the movie clusters K-means - evaluating the performance of clustering models Internal evaluation metrics External evaluation metrics Computing performance metrics on the MovieLens dataset Effect of iterations on WSSSE Bisecting KMeans Bisecting K-means - training a clustering model WSSSE and iterations Gaussian Mixture Model Clustering using GMM Plotting the user and item data with GMM clustering GMM - effect of iterations on cluster boundaries Summary Dimensionality Reduction with Spark Types of dimensionality reduction Principal components analysis Singular value decomposition Relationship with matrix factorization Clustering as dimensionality reduction Extracting the right features from your data Extracting features from the LFW dataset Exploring the face data Visualizing the face data Extracting facial images as vectors Loading images Converting to grayscale and resizing the images Extracting feature vectors Normalization Training a dimensionality reduction model Running PCA on the LFW dataset Visualizing the Eigenfaces Interpreting the Eigenfaces Using a dimensionality reduction model Projecting data using PCA on the LFW dataset The relationship between PCA and SVD Evaluating dimensionality reduction models Evaluating k for SVD on the LFW dataset Singular values Summary Advanced Text Processing with Spark What's so special about text data? Extracting the right features from your data Term weighting schemes Feature hashing Extracting the tf-idf features from the 20 Newsgroups dataset Exploring the 20 Newsgroups data Applying basic tokenization Improving our tokenization Removing stop words Excluding terms based on frequency A note about stemming Feature Hashing Building a tf-idf model Analyzing the tf-idf weightings Using a tf-idf model Document similarity with the 20 Newsgroups dataset and tf-idf features Training a text classifier on the 20 Newsgroups dataset using tf-idf Evaluating the impact of text processing Comparing raw features with processed tf-idf features on the 20 Newsgroups dataset Text classification with Spark 2.0 Word2Vec models Word2Vec with Spark MLlib on the 20 Newsgroups dataset Word2Vec with Spark ML on the 20 Newsgroups dataset Summary Real-Time Machine Learning with Spark Streaming Online learning Stream processing An introduction to Spark Streaming Input sources Transformations Keeping track of state General transformations Actions Window operators Caching and fault tolerance with Spark Streaming Creating a basic streaming application The producer application Creating a basic streaming application Streaming analytics Stateful streaming Online learning with Spark Streaming Streaming regression A simple streaming regression program Creating a streaming data producer Creating a streaming regression model Streaming K-means Online model evaluation Comparing model performance with Spark Streaming Structured Streaming Summary Pipeline APIs for Spark ML Introduction to pipelines DataFrames Pipeline components Transformers Estimators How pipelines work Machine learning pipeline with an example StumbleUponExecutor Summary

← Prev
Back
Next →

← Prev
Back
Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion