Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Preface
What’s in This Book?
Who Is “The Practitioner”?
Who Should Read This Book?
The Enterprise Machine Learning Practitioner
The practicing data scientist
The Java engineer
The Enterprise Executive
The Academic
Conventions Used in This Book
Using Code Examples
Administrative Notes
O’Reilly Safari
How to Contact Us
Acknowledgments
Josh
Adam
1. A Review of Machine Learning
The Learning Machines
How Can Machines Learn?
Biological Inspiration
What Is Deep Learning?
Going Down the Rabbit Hole
Framing the Questions
The Math Behind Machine Learning: Linear Algebra
Scalars
Vectors
Matrices
Tensors
Hyperplanes
Relevant Mathematical Operations
Dot product
Element-wise product
Outer product
Converting Data Into Vectors
Solving Systems of Equations
Methods for solving systems of linear equations
Iterative methods
Iterative methods and linear algebra
The Math Behind Machine Learning: Statistics
Probability
Conditional Probabilities
Posterior Probability
Distributions
Samples Versus Population
Resampling Methods
Selection Bias
Likelihood
How Does Machine Learning Work?
Regression
Setting up the model
Visualizing linear regression
Relating the linear regression model
Classification
Clustering
Underfitting and Overfitting
Optimization
Convex Optimization
Gradient Descent
Stochastic Gradient Descent
Mini-batch training and SGD
Quasi-Newton Optimization Methods
Generative Versus Discriminative Models
Logistic Regression
The Logistic Function
Understanding Logistic Regression Output
Evaluating Models
The Confusion Matrix
Sensitivity versus specificity
Accuracy
Precision
Recall
F1
Context and interpreting scores
Building an Understanding of Machine Learning
2. Foundations of Neural Networks and Deep Learning
Neural Networks
The Biological Neuron
Synapses
Dendrites
Axons
Information flow across the biological neuron
From biological to artificial
The Perceptron
History of the perceptron
Definition of the perceptron
The perceptron learning algorithm
Limitations of the early perceptron
Multilayer Feed-Forward Networks
Evolution of the artificial neuron
Artificial neuron input
Connection weights
Biases
Activation functions
Comparing the biological neuron and the artificial neuron
Feed-forward neural network architecture
Input layer
Hidden layer
Output layer
Connections between layers
Training Neural Networks
Backpropagation Learning
Algorithm intuition
A closer look at backpropagation
Understanding backpropagation pseudocode
Updating the output layer weights
Further expressing the error term
The new propagation rule for the error value
Updating the hidden layers
Activation Functions
Linear
Sigmoid
Tanh
Hard Tanh
Softmax
Rectified Linear
Leaky ReLU
Softplus
Loss Functions
Loss Function Notation
Loss Functions for Regression
Mean squared error loss
Other loss functions for regression
Mean absolute error loss
Mean squared log error loss
Mean absolute percentage error loss
Regression loss function discussion
Loss Functions for Classification
Hinge loss
Logistic loss
Negative log likelihood
Loss Functions for Reconstruction
Hyperparameters
Learning Rate
Regularization
Momentum
Sparsity
3. Fundamentals of Deep Networks
Defining Deep Learning
What Is Deep Learning?
Defining deep networks
Evolutionary progress and resurgence
Advances in network architecture
Advances in layer types
Advances in neuron types
Hybrid architectures
From feature engineering to automated feature learning
Feature engineering
Feature learning
Generative modeling
Inceptionism
Modeling artistic style
GANs
Recurrent Neural Networks
The Tao of deep learning
Organization of This Chapter
Common Architectural Principles of Deep Networks
Parameters
Layers
Activation Functions
Activation functions for general architecture
Hidden layer activation functions
Output layer for regression
Output layer for binary classification
Output layer for multiclass classification
Loss Functions
Reconstruction cross-entropy
Optimization Algorithms
First-order methods
Second-order methods
L-BFGS
Conjugate gradient
Hessian-free
Hyperparameters
Layer size
Magnitude hyperparameters
Learning rate
Nesterov’s momentum
AdaGrad
RMSProp
AdaDelta
ADAM
Regularization
Dropout
DropConnect
L1
L2
Mini-batching
Summary
Building Blocks of Deep Networks
RBMs
Network layout
Visible and hidden layers
Connections and weights
Biases
Training
Reconstruction
Other uses of RBMs
Autoencoders
Similarities to multilayer perceptrons
Defining features of autoencoders
Unsupervised learning of unlabeled data
Learning to reproduce the input data
Training autoencoders
Common variants of autoencoders
Compression autoencoders
Denoising autoencoders
Applications of autoencoders
Variational Autoencoders
4. Major Architectures of Deep Networks
Unsupervised Pretrained Networks
Deep Belief Networks
Feature Extraction with RBM Layers
Learning higher-order features automatically
Initializing the feed-forward network
Fine-tuning a DBN with a feed-forward multilayer neural network
Gentle backpropagation
The output layer
Current state of DBNs
Generative Adversarial Networks
Training generative models, unsupervised learning, and GANs
The discriminator network
The generative network
Building generative models and Deep Convolutional Generative Adversarial Networks
Conditional GANs
Comparing GANs and variational autoencoders
Convolutional Neural Networks (CNNs)
Biological Inspiration
Intuition
CNN Architecture Overview
Neuron spatial arrangements
Evolution of the connections between layers
Input Layers
Convolutional Layers
Convolution
Filters
Activation maps
Parameter sharing
Learned filters and renders
ReLU activation functions as layers
Convolutional layer hyperparameters
Filter size
Output depth
Stride
Zero-padding
Batch normalization and layers
Pooling Layers
Fully Connected Layers
Other Applications of CNNs
CNNs of Note
Summary
Recurrent Neural Networks
Modeling the Time Dimension
Lost in time
Temporal feedback and loops in connections
Sequences and time-series data
Understanding model input and output
3D Volumetric Input
Uneven time-series and masking
Why Not Markov Models?
General Recurrent Neural Network Architecture
Recurrent Neural Networks architecture and time-steps
LSTM Networks
Properties of LSTM networks
LSTM network architecture
LSTM units
LSTM layers
Training
BPTT and truncated BPTT
Domain-Specific Applications and Blended Networks
Recursive Neural Networks
Network Architecture
Varieties of Recursive Neural Networks
Applications of Recursive Neural Networks
Summary and Discussion
Will Deep Learning Make Other Algorithms Obsolete?
Different Problems Have Different Best Methods
When Do I Need Deep Learning?
When to use deep learning
When to stick with traditional machine learning
5. Building Deep Networks
Matching Deep Networks to the Right Problem
Columnar Data and Multilayer Perceptrons
Images and Convolutional Neural Networks
Time-series Sequences and Recurrent Neural Networks
Using Hybrid Networks
The DL4J Suite of Tools
Vectorization and DataVec
Runtimes and ND4J
ND4J and the need for speed
JavaCPP
CPU backends
GPU backends
Benchmarking ND4J and DL4J
Basic Concepts of the DL4J API
Loading and Saving Models
Writing a trained model to disk
Writing to HDFS
Reading a saved model from disk
Reading from HDFS
Getting Input for the Model
Loading data during training
Setting Up Model Architecture
Building layer-oriented architectures
Hyperparameters
Training and Evaluation
Making a prediction
Training, validation, and test data
Modeling CSV Data with Multilayer Perceptron Networks
Setting Up Input Data
Determining Network Architecture
General hyperparameters
First hidden layer
Output layer for classification
Training the Model
Evaluating the Model
Modeling Handwritten Images Using CNNs
Java Code Listing for the LeNet CNN
Loading and Vectorizing the Input Images
Network Architecture for LeNet in DL4J
General hyperparameters
Convolution layers
Max-pooling layers
Output layer
Training the CNN
Modeling Sequence Data by Using Recurrent Neural Networks
Generating Shakespeare via LSTMs
High-level modeling workflow
Java code for modeling Shakespeare
Setting up input data and vectorization
LSTM network architecture
General comments on hyperparameters
Training the LSTM network
Generating Shakespeare samples
Classifying Sensor Time-series Sequences Using LSTMs
Java code listing for recurrent classification example
Setting up input data and vectorization
Network architecture and training
Using Autoencoders for Anomaly Detection
Java Code Listing for Autoencoder Example
Setting Up Input Data
Autoencoder Network Architecture and Training
Evaluating the Model
Using Variational Autoencoders to Reconstruct MNIST Digits
Code Listing to Reconstruct MNIST Digits
Examining the VAE Model
Understanding the scatterplot
Understanding the generated images
Applications of Deep Learning in Natural Language Processing
Learning Word Embedding Using Word2Vec
The Word2Vec model and algorithm
Modeling context
Learning similar meaning and semantic relationships
Vector arithmetic and word embedding
Java code listing for Word2Vec example
Understanding the Word2Vec example
Other practical uses of Word2Vec
Distributed Representations of Sentences with Paragraph Vectors
Building paragraph vectors
Understanding the paragraph vectors example
Using Paragraph Vectors for Document Classification
Understanding the paragraph vectors classification example
Further exploration of the Word2Vec approach
Extensions into specific domains: Gov2Vec
Graphs and Node2Vec
Recommendation engines and Item2Vec
Computer vision and FaceNet
6. Tuning Deep Networks
Basic Concepts in Tuning Deep Networks
An Intuition for Building Deep Networks
Building the Intuition as a Step-by-Step Process
Matching Input Data and Network Architectures
Summary
Relating Model Goal and Output Layers
Regression Model Output Layer
Classification Model Output Layer
Single-label classification models
Models with more than two labels
Multiclass classification models
Multilabel classification models
Working with Layer Count, Parameter Count, and Memory
Feed-Forward Multilayer Neural Networks
Determining hidden-layer count
Determining neuron count per layer
Controlling Layer and Parameter Counts
Getting the parameter count for a network
Estimating Network Memory Requirements
Weight Initialization Strategies
Using Activation Functions
Summary Table for Activation Functions
Applying Loss Functions
Understanding Learning Rates
Using the Ratio of Updates-to-Parameters
Specific Recommendations for Learning Rates
How Sparsity Affects Learning
Applying Methods of Optimization
SGD Best Practices
Using Parallelization and GPUs for Faster Training
Online Learning and Parallel Iterative Algorithms
Task parallelism
Data parallelism
Parallelizing SGD in DL4J
Parallel SGD execution
GPUs
Controlling Epochs and Mini-Batch Size
Understanding Mini-Batch Size Trade-Offs
How to Use Regularization
Priors as Regularizers
Max-Norm Regularization
Dropout
Issues with dropout
Other Regularization Topics
Working with Class Imbalance
Methods for Sampling Classes
Weighted Loss Functions
Dealing with Overfitting
Using Network Statistics from the Tuning UI
Detecting Poor Weight Initialization
Detecting Nonshuffled Data
Detecting Issues with Regularization
7. Tuning Specific Deep Network Architectures
Convolutional Neural Networks (CNNs)
Common Convolutional Architectural Patterns
Configuring Convolutional Layers
Setting the stride for filters
Using padding
Choosing the number of filters
Configuring filter size
Convolution mode and calculating spatial size of output volume
Configuring Pooling Layers
Transfer Learning
An alternative to training from scratch
When to consider trying transfer learning
Recurrent Neural Networks
Network Input Data and Input Layers
Output Layers and RnnOutputLayer
Training the Network
Initializing weights
Backpropagation through time
Regularization
Debugging Common Issues with LSTMs
Padding and Masking
Applying padding and masking to volumetric input
Evaluation and Scoring With Masking
Classification using the evaluation class
Scoring new data with MultiLayerNetwork
Variants of Recurrent Network Architectures
Restricted Boltzmann Machines
Hidden Units and Modeling Available Information
Using Different Units
Using Regularization with RBMs
DBNs
Using Momentum
Using Regularization
Dropout
Determining Hidden Unit Count
8. Vectorization
Introduction to Vectorization in Machine Learning
Why Do We Need to Vectorize Data?
Strategies for Dealing with Columnar Raw Data Attributes
Nominal
Ordinal
Interval
Ratio
Feature Engineering and Normalization Techniques
Feature copying
Normalization
Standardization and zero mean, unit variance
Min-max scaling
Whitening and principal component analysis
Applying normalization in Recurrent Neural Networks and CNNs
Normalization for regression models
Binarization
Using DataVec for ETL and Vectorization
Vectorizing Image Data
Image Data Representation in DL4J
Image Data and Vector Normalization with DataVec
Working with Sequential Data in Vectorization
Major Variations of Sequential Data Sources
Vectorizing Sequential Data with DataVec
Converting time-series to a single vector
Converting sequential data to a DataSet object in local mode
Building custom DataSets from sequential data
Working with Text in Vectorization
Bag of Words
TF-IDF
TF
IDF
Computing the full TF-IDF score
Comparing Word2Vec and VSM Comparison
Working with Graphs
9. Using Deep Learning and DL4J on Spark
Introduction to Using DL4J with Spark and Hadoop
Operating Spark from the Command Line
spark-submit
Working with Hadoop security and Kerberos
Uploading the Spark assembly
Initializing Kerberos
Configuring and Tuning Spark Execution
Running Spark on Mesos
Running Spark on YARN
Comparing Spark execution modes
General Spark Tuning Guide
Setting the number of executors
Spark executors and CPU cores
Spark executors and memory
Spark and YARN container resource allocation
Understanding executor memory requests in YARN
Understanding Spark, the JVM, and garbage collection
Dealing with slowing garbage collection efficiency or pauses
Selecting a garbage collector for the JVM and Spark
Tuning DL4J Jobs on Spark
Tuning the number of executors
Tuning the amount of memory for executors
Setting Up a Maven Project Object Model for Spark and DL4J
A pom.xml File Dependency Template
Setting Up a POM File for CDH 5.X
Setting Up a POM File for HDP 2.4
Troubleshooting Spark and Hadoop
Common Issues with ND4J
ND4J and Kyro serialization
jnind4j and java.library.path
DL4J Parallel Execution on Spark
A Minimal Spark Training Example
DL4J API Best Practices for Spark
Multilayer Perceptron Spark Example
Setting Up MLP Network Architecture for Spark
Distributed Training and Model Evaluation
Building and Executing a DL4J Spark Job
Generating Shakespeare Text with Spark and Long Short-Term Memory
Setting Up the LSTM Network Architecture
Training, Tracking Progress, and Understanding Results
Modeling MNIST with a Convolutional Neural Network on Spark
Configuring the Spark Job and Loading MNIST Data
Setting Up the LeNet CNN Architecture and Training
A. What Is Artificial Intelligence?
The Story So Far
Defining Deep Learning
Defining Artificial Intelligence
The study of intelligence
Cognitive dissonance and modern definitions
What AI is not
Moving the goal posts
Segmenting the definitions of AI
Critical commentary on segments
A fifth aspirational definition of AI
The AI winters
AI Winter I: (1974–1980)
AI Winter II: Late 1980s
The common patterns of AI Winters
What Is Driving Interest Today in AI Today?
Winter Is Coming
B. RL4J and Reinforcement Learning
Preliminaries
Markov Decision Process
Terminology
Different Settings
Model-Free
Observation Setting
Single-Player and Adversarial Games
Q-Learning
From Policy to Neural Networks the following
Policy Iteration
Exploration Versus Exploitation
Bellman Equation
Initial State Sampling
Q-Learning Implementation
Modeling Q(s,a)
Experience Replay
Compression
Convolutional Layers and Image Preprocessing
Image processing
History Processing
Double Q-Learning
Clipping
Scaling Rewards
Prioritized Replay
Graph, Visualization, and Mean-Q
RL4J
Conclusion
C. Numbers Everyone Should Know
D. Neural Networks and Backpropagation: A Mathematical Approach
Introduction
Backpropagation in a Multilayer Perceptron
E. Using the ND4J API
Design and Basic Usage
Understanding NDArrays
ND4J General Syntax
The Basics of Working with NDArrays
The ND4J class
Nd4j.zeros( int ... )
Nd4j.ones( int ... )
Initializing with other values
Initializing with random numbers
Controlling the shape of NDArrays
Creating basic arrays
Example: create a 2 x 2 NDArray
Example: add two 2 x 2 NDArrays together
Creating NDArrays from Java arrays
Getting and setting individual NDArray values
Working with NDArray rows
Get a single row
Get multiple rows
Setting a single row
Quick reference for determining the size/dimensions of NDArrays
Dataset
Relationship to NDArray
Common uses
Creating Input Vectors
Basics of Vector Creation
Sizing the vector
Setting feature values
Setting the label
Single-label output
Multiple-label output
Regression output
Using MLLibUtil
Converting from INDArray to MLLib Vector
Converting from MLLib Vector to INDArray
Making Model Predictions with DL4J
Using the DL4J and ND4J Together
Differences between output vector depending on output layer type
Logistic output layer for binary classification
Softmax output layer for multilabel classification
Linear output layer for regression output
Getting the redicted label from the returned INDArray
F. Using DataVec
Loading Data for Machine Learning
Loading CSV Data for Multilayer Perceptrons
Loading Image Data for Convolutional Neural Networks
Loading Sequence Data for Recurrent Neural Networks
Transforming Data: Data Wrangling with DataVec
DataVec Transforms: Key Concepts
DataVec Transform Functionality: An Example
G. Working with DL4J from Source
Verifying Git Is Installed
Cloning Key DL4J GitHub Projects
Downloading Source via Zip File
Using Maven to Build Source Code
H. Setting Up DL4J Projects
Creating a New DL4J Project
Java
Working with Maven
A minimal Project Object Model file
Project Object Model explanation
IDEs
Quickstart a DL4J project by using IntelliJ
Setting Up Other Maven POMs
ND4J and Maven
I. Setting Up GPUs for DL4J Projects
Switching Backends to GPU
Picking a GPU
Training on a Multiple GPU System
CUDA on Different Platforms
Monitoring GPU Performance
NVIDIA System Management Interface
J. Troubleshooting DL4J Installations
Previous Installation
Memory Errors When Installing From Source
Older Versions of Maven
Maven and PATH Variables
Bad JDK Versions
C++ and Other Development Tools
Windows and Include Paths
Monitoring GPUs
Using the JVisualVM
Working with Clojure
OS X and Float Support
Fork-Join Bug in Java 7
Precautions
Other Local Repositories
Check Maven Dependencies
Reinstall Dependencies
If All Else Fails
Different Platforms
OS X
Windows
Setting up Visual Studio
Working with Windows on 64-bit platforms
Linux
Ubuntu
Centos
Index
← Prev
Back
Next →
← Prev
Back
Next →