Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Mastering Java Machine Learning
Table of Contents
Mastering Java Machine Learning
Credits
Foreword
About the Authors
About the Reviewers
www.PacktPub.com
eBooks, discount offers, and more
Why subscribe?
Customer Feedback
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Errata
Piracy
Questions
1. Machine Learning Review
Machine learning – history and definition
What is not machine learning?
Machine learning – concepts and terminology
Machine learning – types and subtypes
Datasets used in machine learning
Machine learning applications
Practical issues in machine learning
Machine learning – roles and process
Roles
Process
Machine learning – tools and datasets
Datasets
Summary
2. Practical Approach to Real-World Supervised Learning
Formal description and notation
Data quality analysis
Descriptive data analysis
Basic label analysis
Basic feature analysis
Visualization analysis
Univariate feature analysis
Categorical features
Continuous features
Multivariate feature analysis
Data transformation and preprocessing
Feature construction
Handling missing values
Outliers
Discretization
Data sampling
Is sampling needed?
Undersampling and oversampling
Stratified sampling
Training, validation, and test set
Feature relevance analysis and dimensionality reduction
Feature search techniques
Feature evaluation techniques
Filter approach
Univariate feature selection
Information theoretic approach
Statistical approach
Multivariate feature selection
Minimal redundancy maximal relevance (mRMR)
Correlation-based feature selection (CFS)
Wrapper approach
Embedded approach
Model building
Linear models
Linear Regression
Algorithm input and output
How does it work?
Advantages and limitations
Naïve Bayes
Algorithm input and output
How does it work?
Advantages and limitations
Logistic Regression
Algorithm input and output
How does it work?
Advantages and limitations
Non-linear models
Decision Trees
Algorithm inputs and outputs
How does it work?
Advantages and limitations
K-Nearest Neighbors (KNN)
Algorithm inputs and outputs
How does it work?
Advantages and limitations
Support vector machines (SVM)
Algorithm inputs and outputs
How does it work?
Advantages and limitations
Ensemble learning and meta learners
Bootstrap aggregating or bagging
Algorithm inputs and outputs
How does it work?
Random Forest
Advantages and limitations
Boosting
Algorithm inputs and outputs
How does it work?
Advantages and limitations
Model assessment, evaluation, and comparisons
Model assessment
Model evaluation metrics
Confusion matrix and related metrics
ROC and PRC curves
Gain charts and lift curves
Model comparisons
Comparing two algorithms
McNemar's Test
Paired-t test
Wilcoxon signed-rank test
Comparing multiple algorithms
ANOVA test
Friedman's test
Case Study – Horse Colic Classification
Business problem
Machine learning mapping
Data analysis
Label analysis
Features analysis
Supervised learning experiments
Weka experiments
Sample end-to-end process in Java
Weka experimenter and model selection
RapidMiner experiments
Visualization analysis
Feature selection
Model process flow
Model evaluation metrics
Evaluation on Confusion Metrics
ROC Curves, Lift Curves, and Gain Charts
Results, observations, and analysis
Summary
References
3. Unsupervised Machine Learning Techniques
Issues in common with supervised learning
Issues specific to unsupervised learning
Feature analysis and dimensionality reduction
Notation
Linear methods
Principal component analysis (PCA)
Inputs and outputs
How does it work?
Advantages and limitations
Random projections (RP)
Inputs and outputs
How does it work?
Advantages and limitations
Multidimensional Scaling (MDS)
Inputs and outputs
How does it work?
Advantages and limitations
Nonlinear methods
Kernel Principal Component Analysis (KPCA)
Inputs and outputs
How does it work?
Advantages and limitations
Manifold learning
Inputs and outputs
How does it work?
Advantages and limitations
Clustering
Clustering algorithms
k-Means
Inputs and outputs
How does it work?
Advantages and limitations
DBSCAN
Inputs and outputs
How does it work?
Advantages and limitations
Mean shift
Inputs and outputs
How does it work?
Advantages and limitations
Expectation maximization (EM) or Gaussian mixture modeling (GMM)
Input and output
How does it work?
Advantages and limitations
Hierarchical clustering
Input and output
How does it work?
Advantages and limitations
Self-organizing maps (SOM)
Inputs and outputs
How does it work?
Advantages and limitations
Spectral clustering
Inputs and outputs
How does it work?
Advantages and limitations
Affinity propagation
Inputs and outputs
How does it work?
Advantages and limitations
Clustering validation and evaluation
Internal evaluation measures
Notation
R-Squared
Dunn's Indices
Davies-Bouldin index
Silhouette's index
External evaluation measures
Rand index
F-Measure
Normalized mutual information index
Outlier or anomaly detection
Outlier algorithms
Statistical-based
Inputs and outputs
How does it work?
Advantages and limitations
Distance-based methods
Inputs and outputs
How does it work?
Advantages and limitations
Density-based methods
Inputs and outputs
How does it work?
Advantages and limitations
Clustering-based methods
Inputs and outputs
How does it work?
Advantages and limitations
High-dimensional-based methods
Inputs and outputs
How does it work?
Advantages and limitations
One-class SVM
Inputs and outputs
How does it work?
Advantages and limitations
Outlier evaluation techniques
Supervised evaluation
Unsupervised evaluation
Real-world case study
Tools and software
Business problem
Machine learning mapping
Data collection
Data quality analysis
Data sampling and transformation
Feature analysis and dimensionality reduction
PCA
Random projections
ISOMAP
Observations on feature analysis and dimensionality reduction
Clustering models, results, and evaluation
Observations and clustering analysis
Outlier models, results, and evaluation
Observations and analysis
Summary
References
4. Semi-Supervised and Active Learning
Semi-supervised learning
Representation, notation, and assumptions
Semi-supervised learning techniques
Self-training SSL
Inputs and outputs
How does it work?
Advantages and limitations
Co-training SSL or multi-view SSL
Inputs and outputs
How does it work?
Advantages and limitations
Cluster and label SSL
Inputs and outputs
How does it work?
Advantages and limitations
Transductive graph label propagation
Inputs and outputs
How does it work?
Advantages and limitations
Transductive SVM (TSVM)
Inputs and outputs
How does it work?
Advantages and limitations
Case study in semi-supervised learning
Tools and software
Business problem
Machine learning mapping
Data collection
Data quality analysis
Data sampling and transformation
Datasets and analysis
Feature analysis results
Experiments and results
Analysis of semi-supervised learning
Active learning
Representation and notation
Active learning scenarios
Active learning approaches
Uncertainty sampling
How does it work?
Least confident sampling
Smallest margin sampling
Label entropy sampling
Advantages and limitations
Version space sampling
Query by disagreement (QBD)
How does it work?
Query by Committee (QBC)
How does it work?
Advantages and limitations
Data distribution sampling
How does it work?
Expected model change
Expected error reduction
Variance reduction
Density weighted methods
Advantages and limitations
Case study in active learning
Tools and software
Business problem
Machine learning mapping
Data Collection
Data sampling and transformation
Feature analysis and dimensionality reduction
Models, results, and evaluation
Pool-based scenarios
Stream-based scenarios
Analysis of active learning results
Summary
References
5. Real-Time Stream Machine Learning
Assumptions and mathematical notations
Basic stream processing and computational techniques
Stream computations
Sliding windows
Sampling
Concept drift and drift detection
Data management
Partial memory
Full memory
Detection methods
Monitoring model evolution
Widmer and Kubat
Drift Detection Method or DDM
Early Drift Detection Method or EDDM
Monitoring distribution changes
Welch's t test
Kolmogorov-Smirnov's test
CUSUM and Page-Hinckley test
Adaptation methods
Explicit adaptation
Implicit adaptation
Incremental supervised learning
Modeling techniques
Linear algorithms
Online linear models with loss functions
Inputs and outputs
How does it work?
Advantages and limitations
Online Naïve Bayes
Inputs and outputs
How does it work?
Advantages and limitations
Non-linear algorithms
Hoeffding trees or very fast decision trees (VFDT)
Inputs and outputs
How does it work?
Advantages and limitations
Ensemble algorithms
Weighted majority algorithm
Inputs and outputs
How does it work?
Advantages and limitations
Online Bagging algorithm
Inputs and outputs
How does it work?
Advantages and limitations
Online Boosting algorithm
Inputs and outputs
How does it work?
Advantages and limitations
Validation, evaluation, and comparisons in online setting
Model validation techniques
Prequential evaluation
Holdout evaluation
Controlled permutations
Evaluation criteria
Comparing algorithms and metrics
Incremental unsupervised learning using clustering
Modeling techniques
Partition based
Online k-Means
Inputs and outputs
How does it work?
Advantages and limitations
Hierarchical based and micro clustering
Inputs and outputs
How does it work?
Advantages and limitations
Inputs and outputs
How does it work?
Advantages and limitations
Density based
Inputs and outputs
How does it work?
Advantages and limitations
Grid based
Inputs and outputs
How does it work?
Advantages and limitations
Validation and evaluation techniques
Key issues in stream cluster evaluation
Evaluation measures
Cluster Mapping Measures (CMM)
V-Measure
Other external measures
Unsupervised learning using outlier detection
Partition-based clustering for outlier detection
Inputs and outputs
How does it work?
Advantages and limitations
Distance-based clustering for outlier detection
Inputs and outputs
How does it work?
Exact Storm
Abstract-C
Direct Update of Events (DUE)
Micro Clustering based Algorithm (MCOD)
Approx Storm
Advantages and limitations
Validation and evaluation techniques
Case study in stream learning
Tools and software
Business problem
Machine learning mapping
Data collection
Data sampling and transformation
Feature analysis and dimensionality reduction
Models, results, and evaluation
Supervised learning experiments
Concept drift experiments
Clustering experiments
Outlier detection experiments
Analysis of stream learning results
Summary
References
6. Probabilistic Graph Modeling
Probability revisited
Concepts in probability
Conditional probability
Chain rule and Bayes' theorem
Random variables, joint, and marginal distributions
Marginal independence and conditional independence
Factors
Factor types
Distribution queries
Probabilistic queries
MAP queries and marginal MAP queries
Graph concepts
Graph structure and properties
Subgraphs and cliques
Path, trail, and cycles
Bayesian networks
Representation
Definition
Reasoning patterns
Causal or predictive reasoning
Evidential or diagnostic reasoning
Intercausal reasoning
Combined reasoning
Independencies, flow of influence, D-Separation, I-Map
Flow of influence
D-Separation
I-Map
Inference
Elimination-based inference
Variable elimination algorithm
Input and output
How does it work?
Advantages and limitations
Clique tree or junction tree algorithm
Input and output
How does it work?
Advantages and limitations
Propagation-based techniques
Belief propagation
Factor graph
Messaging in factor graph
Input and output
How does it work?
Advantages and limitations
Sampling-based techniques
Forward sampling with rejection
Input and output
How does it work?
Advantages and limitations
Learning
Learning parameters
Maximum likelihood estimation for Bayesian networks
Bayesian parameter estimation for Bayesian network
Prior and posterior using the Dirichlet distribution
Learning structures
Measures to evaluate structures
Methods for learning structures
Constraint-based techniques
Inputs and outputs
How does it work?
Advantages and limitations
Search and score-based techniques
Inputs and outputs
How does it work?
Advantages and limitations
Markov networks and conditional random fields
Representation
Parameterization
Gibbs parameterization
Factor graphs
Log-linear models
Independencies
Global
Pairwise Markov
Markov blanket
Inference
Learning
Conditional random fields
Specialized networks
Tree augmented network
Input and output
How does it work?
Advantages and limitations
Markov chains
Hidden Markov models
Most probable path in HMM
Posterior decoding in HMM
Tools and usage
OpenMarkov
Weka Bayesian Network GUI
Case study
Business problem
Machine learning mapping
Data sampling and transformation
Feature analysis
Models, results, and evaluation
Analysis of results
Summary
References
7. Deep Learning
Multi-layer feed-forward neural network
Inputs, neurons, activation function, and mathematical notation
Multi-layered neural network
Structure and mathematical notations
Activation functions in NN
Sigmoid function
Hyperbolic tangent ("tanh") function
Training neural network
Empirical risk minimization
Parameter initialization
Loss function
Gradients
Gradient at the output layer
Gradient at the Hidden Layer
Parameter gradient
Feed forward and backpropagation
How does it work?
Regularization
L2 regularization
L1 regularization
Limitations of neural networks
Vanishing gradients, local optimum, and slow training
Deep learning
Building blocks for deep learning
Rectified linear activation function
Restricted Boltzmann Machines
Definition and mathematical notation
Conditional distribution
Free energy in RBM
Training the RBM
Sampling in RBM
Contrastive divergence
Inputs and outputs
How does it work?
Persistent contrastive divergence
Autoencoders
Definition and mathematical notations
Loss function
Limitations of Autoencoders
Denoising Autoencoder
Unsupervised pre-training and supervised fine-tuning
Deep feed-forward NN
Input and outputs
How does it work?
Deep Autoencoders
Deep Belief Networks
Inputs and outputs
How does it work?
Deep learning with dropouts
Definition and mathematical notation
Inputs and outputs
How does it work?
Learning Training and testing with dropouts
Sparse coding
Convolutional Neural Network
Local connectivity
Parameter sharing
Discrete convolution
Pooling or subsampling
Normalization using ReLU
CNN Layers
Recurrent Neural Networks
Structure of Recurrent Neural Networks
Learning and associated problems in RNNs
Long Short Term Memory
Gated Recurrent Units
Case study
Tools and software
Business problem
Machine learning mapping
Data sampling and transfor
Feature analysis
Models, results, and evaluation
Basic data handling
Multi-layer perceptron
Parameters used for MLP
Code for MLP
Convolutional Network
Parameters used for ConvNet
Code for CNN
Variational Autoencoder
Parameters used for the Variational Autoencoder
Code for Variational Autoencoder
DBN
Parameter search using Arbiter
Results and analysis
Summary
References
8. Text Mining and Natural Language Processing
NLP, subfields, and tasks
Text categorization
Part-of-speech tagging (POS tagging)
Text clustering
Information extraction and named entity recognition
Sentiment analysis and opinion mining
Coreference resolution
Word sense disambiguation
Machine translation
Semantic reasoning and inferencing
Text summarization
Automating question and answers
Issues with mining unstructured data
Text processing components and transformations
Document collection and standardization
Inputs and outputs
How does it work?
Tokenization
Inputs and outputs
How does it work?
Stop words removal
Inputs and outputs
How does it work?
Stemming or lemmatization
Inputs and outputs
How does it work?
Local/global dictionary or vocabulary?
Feature extraction/generation
Lexical features
Character-based features
Word-based features
Part-of-speech tagging features
Taxonomy features
Syntactic features
Semantic features
Feature representation and similarity
Vector space model
Binary
Term frequency (TF)
Inverse document frequency (IDF)
Term frequency-inverse document frequency (TF-IDF)
Similarity measures
Euclidean distance
Cosine distance
Pairwise-adaptive similarity
Extended Jaccard coefficient
Dice coefficient
Feature selection and dimensionality reduction
Feature selection
Information theoretic techniques
Statistical-based techniques
Frequency-based techniques
Dimensionality reduction
Topics in text mining
Text categorization/classification
Topic modeling
Probabilistic latent semantic analysis (PLSA)
Input and output
How does it work?
Advantages and limitations
Text clustering
Feature transformation, selection, and reduction
Clustering techniques
Generative probabilistic models
Input and output
How does it work?
Advantages and limitations
Distance-based text clustering
Non-negative matrix factorization (NMF)
Input and output
How does it work?
Advantages and limitations
Evaluation of text clustering
Named entity recognition
Hidden Markov models for NER
Input and output
How does it work?
Advantages and limitations
Maximum entropy Markov models for NER
Input and output
How does it work?
Advantages and limitations
Deep learning and NLP
Tools and usage
Mallet
KNIME
Topic modeling with mallet
Business problem
Machine Learning mapping
Data collection
Data sampling and transformation
Feature analysis and dimensionality reduction
Models, results, and evaluation
Analysis of text processing results
Summary
References
9. Big Data Machine Learning – The Final Frontier
What are the characteristics of Big Data?
Big Data Machine Learning
General Big Data framework
Big Data cluster deployment frameworks
Hortonworks Data Platform
Cloudera CDH
Amazon Elastic MapReduce
Microsoft Azure HDInsight
Data acquisition
Publish-subscribe frameworks
Source-sink frameworks
SQL frameworks
Message queueing frameworks
Custom frameworks
Data storage
HDFS
NoSQL
Key-value databases
Document databases
Columnar databases
Graph databases
Data processing and preparation
Hive and HQL
Spark SQL
Amazon Redshift
Real-time stream processing
Machine Learning
Visualization and analysis
Batch Big Data Machine Learning
H2O as Big Data Machine Learning platform
H2O architecture
Machine learning in H2O
Tools and usage
Case study
Business problem
Machine Learning mapping
Data collection
Data sampling and transformation
Experiments, results, and analysis
Feature relevance and analysis
Evaluation on test data
Analysis of results
Spark MLlib as Big Data Machine Learning platform
Spark architecture
Machine Learning in MLlib
Tools and usage
Experiments, results, and analysis
k-Means
k-Means with PCA
Bisecting k-Means (with PCA)
Gaussian Mixture Model
Random Forest
Analysis of results
Real-time Big Data Machine Learning
SAMOA as a real-time Big Data Machine Learning framework
SAMOA architecture
Machine Learning algorithms
Tools and usage
Experiments, results, and analysis
Analysis of results
The future of Machine Learning
Summary
References
A. Linear Algebra
Vector
Scalar product of vectors
Matrix
Transpose of a matrix
Matrix addition
Scalar multiplication
Matrix multiplication
Properties of matrix product
Linear transformation
Matrix inverse
Eigendecomposition
Positive definite matrix
Singular value decomposition (SVD)
B. Probability
Axioms of probability
Bayes' theorem
Density estimation
Mean
Variance
Standard deviation
Gaussian standard deviation
Covariance
Correlation coefficient
Binomial distribution
Poisson distribution
Gaussian distribution
Central limit theorem
Error propagation
Index
← Prev
Back
Next →
← Prev
Back
Next →