Mastering Java for Data Science by Grigorev, Alexey -- Read -- Imperial Library of Trantor

Index

Preface

What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support

Downloading the example code Downloading the color images of this book Errata Piracy Questions

Data Science Using Java

Data science

Machine learning

Supervised learning Unsupervised learning

Clustering Dimensionality reduction

Natural Language Processing

Data science process models

CRISP-DM A running example

Data science in Java

Data science libraries

Data processing libraries Math and stats libraries Machine learning and data mining libraries Text processing

Summary

Data Processing Toolbox

Standard Java library

Collections Input/Output

Reading input data Writing ouput data

Streaming API

Extensions to the standard library

Apache Commons

Commons Lang Commons IO Commons Collections Other commons modules

Google Guava AOL Cyclops React

Accessing data

Text data and CSV Web and HTML JSON Databases DataFrames

Search engine - preparing data Summary

Exploratory Data Analysis

Exploratory data analysis in Java

Search engine datasets Apache Commons Math Joinery

Interactive Exploratory Data Analysis in Java

JVM languages

Interactive Java

Joinery shell

Summary

Supervised Learning - Classification and Regression

Classification

Binary classification models

Smile JSAT LIBSVM and LIBLINEAR Encog

Evaluation

Accuracy Precision, recall, and F1 ROC and AU ROC (AUC) Result validation K-fold cross-validation Training, validation, and testing

Case study - page prediction Regression

Machine learning libraries for regression

Smile JSAT Other libraries

Evaluation

MSE MAE

Case study - hardware performance Summary

Unsupervised Learning - Clustering and Dimensionality Reduction

Dimensionality reduction

Unsupervised dimensionality reduction Principal Component Analysis Truncated SVD Truncated SVD for categorical and sparse data

Random projection

Cluster analysis

Hierarchical methods K-means

Choosing K in K-Means DBSCAN

Clustering for supervised learning

Clusters as features Clustering as dimensionality reduction Supervised learning via clustering

Evaluation

Manual evaluation Supervised evaluation Unsupervised Evaluation

Summary

Working with Text - Natural Language Processing and Information Retrieval

Natural Language Processing and information retrieval

Vector Space Model - Bag of Words and TF-IDF

Vector space model implementation

Indexing and Apache Lucene Natural Language Processing tools

Stanford CoreNLP

Customizing Apache Lucene

Machine learning for texts

Unsupervised learning for texts

Latent Semantic Analysis Text clustering Word embeddings

Supervised learning for texts Text classification Learning to rank for information retrieval

Reranking with Lucene

Summary

Extreme Gradient Boosting

Gradient Boosting Machines and XGBoost

Installing XGBoost

XGBoost in practice

XGBoost for classification

Parameter tuning Text features Feature importance

XGBoost for regression XGBoost for learning to rank

Summary

Deep Learning with DeepLearning4J

Neural Networks and DeepLearning4J

ND4J - N-dimensional arrays for Java Neural networks in DeepLearning4J Convolutional Neural Networks

Deep learning for cats versus dogs

Reading the data Creating the model Monitoring the performance Data augmentation Running DeepLearning4J on GPU

Summary

Scaling Data Science

Apache Hadoop

Hadoop MapReduce Common Crawl

Apache Spark Link prediction

Reading the DBLP graph Extracting features from the graph Node features Negative sampling Edge features Link Prediction with MLlib and XGBoost Link suggestion

Summary

Deploying Data Science Models

Microservices

Spring Boot Search engine service

Online evaluation

A/B testing Multi-armed bandits

Summary

← Prev
Back
Next →

← Prev
Back
Next →