Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
The Reinforcement Learning Workshop
Preface
About the Book
Audience
About the Chapters
Conventions
Code Presentation
Setting up Your Environment
Installing Anaconda for Jupyter Notebook
Installing a Virtual Environment
Installing Gym
Installing TensorFlow 2
Installing PyTorch
Installing OpenAI Baselines
Installing Pillow
Installing Torch
Installing Other Libraries
Accessing the Code Files
1. Introduction to Reinforcement Learning
Introduction
Learning Paradigms
Introduction to Learning Paradigms
Supervised versus Unsupervised versus RL
Classifying Common Problems into Learning Scenarios
Predicting Whether an Image Contains a Dog or a Cat
Detecting and Classifying All Dogs and Cats in an Image
Playing Chess
Fundamentals of Reinforcement Learning
Elements of RL
Agent
Actions
Environment
Policy
An Example of an Autonomous Driving Environment
Exercise 1.01: Implementing a Toy Environment Using Python
The Agent-Environment Interface
What's the Agent? What's in the Environment?
Environment Types
Finite versus Continuous
Deterministic versus Stochastic
Fully Observable versus Partially Observable
POMDP versus MDP
Single Agents versus Multiple Agents
An Action and Its Types
Policy
Stochastic Policies
Policy Parameterizations
Exercise 1.02: Implementing a Linear Policy
Goals and Rewards
Why Discount?
Reinforcement Learning Frameworks
OpenAI Gym
Getting Started with Gym – CartPole
Gym Spaces
Exercise 1.03: Creating a Space for Image Observations
Rendering an Environment
Rendering CartPole
A Reinforcement Learning Loop with Gym
Exercise 1.04: Implementing the Reinforcement Learning Loop with Gym
Activity 1.01: Measuring the Performance of a Random Agent
OpenAI Baselines
Getting Started with Baselines – DQN on CartPole
Applications of Reinforcement Learning
Games
Go
Dota 2
StarCraft
Robot Control
Autonomous Driving
Summary
2. Markov Decision Processes and Bellman Equations
Introduction
Markov Processes
The Markov Property
Markov Chains
Markov Reward Processes
Value Functions and Bellman Equations for MRPs
Solving Linear Systems of an Equation Using SciPy
Exercise 2.01: Finding the Value Function in an MRP
Markov Decision Processes
The State-Value Function and the Action-Value Function
Bellman Optimality Equation
Solving the Bellman Optimality Equation
Solving MDPs
Algorithm Categorization
Value-Based Algorithms
Policy Search Algorithms
Linear Programming
Exercise 2.02: Determining the Best Policy for an MDP Using Linear Programming
Gridworld
Activity 2.01: Solving Gridworld
Summary
3. Deep Learning in Practice with TensorFlow 2
Introduction
An Introduction to TensorFlow and Keras
TensorFlow
Keras
Exercise 3.01: Building a Sequential Model with the Keras High-Level API
How to Implement a Neural Network Using TensorFlow
Model Creation
Model Training
Loss Function Definition
Optimizer Choice
Learning Rate Scheduling
Feature Normalization
Model Validation
Performance Metrics
Model Improvement
Overfitting
Regularization
Early Stopping
Dropout
Data Augmentation
Batch Normalization
Model Testing and Inference
Standard Fully Connected Neural Networks
Exercise 3.02: Building a Fully Connected Neural Network Model with the Keras High-Level API
Convolutional Neural Networks
Exercise 3.03: Building a Convolutional Neural Network Model with the Keras High-Level API
Recurrent Neural Networks
Exercise 3.04: Building a Recurrent Neural Network Model with the Keras High-Level API
Simple Regression Using TensorFlow
Exercise 3.05: Creating a Deep Neural Network to Predict the Fuel Efficiency of Cars
Simple Classification Using TensorFlow
Exercise 3.06: Creating a Deep Neural Network to Classify Events Generated by the ATLAS Experiment in the Quest for Higgs Boson
TensorBoard – How to Visualize Data Using TensorBoard
Exercise 3.07: Creating a Deep Neural Network to Classify Events Generated by the ATLAS Experiment in the Quest for the Higgs Boson Using TensorBoard for Visualization
Activity 3.01: Classifying Fashion Clothes Using a TensorFlow Dataset and TensorFlow 2
Summary
4. Getting Started with OpenAI and TensorFlow for Reinforcement Learning
Introduction
OpenAI Gym
How to Interact with a Gym Environment
Exercise 4.01: Interacting with the Gym Environment
Action and Observation Spaces
How to Implement a Custom Gym Environment
OpenAI Universe – Complex Environment
OpenAI Universe Infrastructure
Environments
Atari Games
Flash Games
Browser Tasks
Running an OpenAI Universe Environment
Validating the Universe Infrastructure
TensorFlow for Reinforcement Learning
Implementing a Policy Network Using TensorFlow
Exercise 4.02: Building a Policy Network with TensorFlow
Exercise 4.03: Feeding the Policy Network with Environment State Representation
How to Save a Policy Network
OpenAI Baselines
Proximal Policy Optimization
Command-Line Usage
Methods in OpenAI Baselines
Custom Policy Network Architecture
Training an RL Agent to Solve a Classic Control Problem
Exercise 4.04: Solving a CartPole Environment with the PPO Algorithm
Activity 4.01: Training a Reinforcement Learning Agent to Play a Classic Video Game
Summary
5. Dynamic Programming
Introduction
Solving Dynamic Programming Problems
Memoization
The Tabular Method
Exercise 5.01: Memoization in Practice
Exercise 5.02: The Tabular Method in Practice
Identifying Dynamic Programming Problems
Optimal Substructures
Overlapping Subproblems
The Coin-Change Problem
Exercise 5.03: Solving the Coin-Change Problem
Dynamic Programming in RL
Policy and Value Iteration
State-Value Functions
Action-Value Functions
OpenAI Gym: Taxi-v3 Environment
Policy Iteration
Value Iteration
The FrozenLake-v0 Environment
Activity 5.01: Implementing Policy and Value Iteration on the FrozenLake-v0 Environment
Summary
6. Monte Carlo Methods
Introduction
The Workings of Monte Carlo Methods
Understanding Monte Carlo with Blackjack
Exercise 6.01: Implementing Monte Carlo in Blackjack
Types of Monte Carlo Methods
First Visit Monte Carlo Prediction for Estimating the Value Function
Exercise 6.02: First Visit Monte Carlo Prediction for Estimating the Value Function in Blackjack
Every Visit Monte Carlo Prediction for Estimating the Value Function
Exercise 6.03: Every Visit Monte Carlo Prediction for Estimating the Value Function
Exploration versus Exploitation Trade-Off
Importance Sampling
The Pseudocode for Monte Carlo Off-Policy Evaluation
Exercise 6.04: Importance Sampling with Monte Carlo
Solving Frozen Lake Using Monte Carlo
Activity 6.01: Exploring the Frozen Lake Problem – the Reward Function
The Pseudocode for Every Visit Monte Carlo Control for Epsilon Soft
Activity 6.02 Solving Frozen Lake Using Monte Carlo Control Every Visit Epsilon Soft
Summary
7. Temporal Difference Learning
Introduction to TD Learning
TD(0) – SARSA and Q-Learning
SARSA – On-Policy Control
Exercise 7.01: Using TD(0) SARSA to Solve FrozenLake-v0 Deterministic Transitions
The Stochasticity Test
Exercise 7.02: Using TD(0) SARSA to Solve FrozenLake-v0 Stochastic Transitions
Q-Learning – Off-Policy Control
Exercise 7.03: Using TD(0) Q-Learning to Solve FrozenLake-v0 Deterministic Transitions
Expected SARSA
N-Step TD and TD(λ) Algorithms
N-Step TD
N-step SARSA
N-Step Off-Policy Learning
TD(λ)
SARSA(λ)
Exercise 7.04: Using TD(λ) SARSA to Solve FrozenLake-v0 Deterministic Transitions
Exercise 7.05: Using TD(λ) SARSA to Solve FrozenLake-v0 Stochastic Transitions
The Relationship between DP, Monte-Carlo, and TD Learning
Activity 7.01: Using TD(0) Q-Learning to Solve FrozenLake-v0 Stochastic Transitions
Summary
8. The Multi-Armed Bandit Problem
Introduction
Formulation of the MAB Problem
Applications of the MAB Problem
Background and Terminology
MAB Reward Distributions
The Python Interface
The Greedy Algorithm
Implementing the Greedy Algorithm
The Explore-then-Commit Algorithm
The ε-Greedy Algorithm
Exercise 8.01 Implementing the ε-Greedy Algorithm
The Softmax Algorithm
The UCB algorithm
Optimism in the Face of Uncertainty
Other Properties of UCB
Exercise 8.02 Implementing the UCB Algorithm
Thompson Sampling
Introduction to Bayesian Probability
The Thompson Sampling Algorithm
Exercise 8.03: Implementing the Thompson Sampling Algorithm
Contextual Bandits
Context That Defines a Bandit Problem
Queueing Bandits
Working with the Queueing API
Activity 8.01: Queueing Bandits
Summary
9. What Is Deep Q-Learning?
Introduction
Basics of Deep Learning
Basics of PyTorch
Exercise 9.01: Building a Simple Deep Learning Model in PyTorch
PyTorch Utilities
The view Function
The squeeze Function
The unsqueeze Function
The max Function
The gather Function
The State-Value Function and the Bellman Equation
Expected Value
The Value Function
The Value Function for a Deterministic Environment
The Value Function for a Stochastic Environment:
The Action-Value Function (Q Value Function)
Implementing Q Learning to Find Optimal Actions
Advantages of Q Learning
OpenAI Gym Review
Exercise 9.02: Implementing the Q Learning Tabular Method
Deep Q Learning
Exercise 9.03: Implementing a Working DQN Network with PyTorch in a CartPole-v0 Environment
Challenges in DQN
Correlation between Steps and the Convergence Issue
Experience Replay
The Challenge of a Non-Stationary Target
The Concept of a Target Network
Exercise 9.04: Implementing a Working DQN Network with Experience Replay and a Target Network in PyTorch
The Challenge of Overestimation in a DQN
Double Deep Q Network (DDQN)
Activity 9.01: Implementing a Double Deep Q Network in PyTorch for the CartPole Environment
Summary
10. Playing an Atari Game with Deep Recurrent Q-Networks
Introduction
Understanding the Breakout Environment
Exercise 10.01: Playing Breakout with a Random Agent
CNNs in TensorFlow
Exercise 10.02: Designing a CNN Model with TensorFlow
Combining a DQN with a CNN
Activity 10.01: Training a DQN with CNNs to Play Breakout
RNNs in TensorFlow
Exercise 10.03: Designing a Combination of CNN and RNN Models with TensorFlow
Building a DRQN
Activity 10.02: Training a DRQN to Play Breakout
Introduction to the Attention Mechanism and DARQN
Activity 10.03: Training a DARQN to Play Breakout
Summary
11. Policy-Based Methods for Reinforcement Learning
Introduction
Introduction to Value-Based and Model-Based RL
Introduction to Actor-Critic Model
Policy Gradients
Exercise 11.01: Landing a Spacecraft on the Lunar Surface Using Policy Gradients and the Actor-Critic Method
Deep Deterministic Policy Gradients
Ornstein-Uhlenbeck Noise
The ReplayBuffer Class
The Actor-Critic Model
Exercise 11.02: Creating a Learning Agent
Activity 11.01: Creating an Agent That Learns a Model Using DDPG
Improving Policy Gradients
Trust Region Policy Optimization
Proximal Policy Optimization
Exercise 11.03: Improving the Lunar Lander Example Using PPO
The Advantage Actor-Critic Method
Activity 11.02: Loading the Saved Policy to Run the Lunar Lander Simulation
Summary
12. Evolutionary Strategies for RL
Introduction
Problems with Gradient-Based Methods
Exercise 12.01: Optimization Using Stochastic Gradient Descent
Introduction to Genetic Algorithms
Exercise 12.02: Implementing Fixed-Value and Uniform Distribution Optimization Using GAs
Components: Population Creation
Exercise 12.03: Population Creation
Components: Parent Selection
Exercise 12.04: Implementing the Tournament and Roulette Wheel Techniques
Components: Crossover Application
Exercise 12.05: Crossover for a New Generation
Components: Population Mutation
Exercise 12.06: New Generation Development Using Mutation
Application to Hyperparameter Selection
Exercise 12.07: Implementing GA Hyperparameter Optimization for RNN Training
NEAT and Other Formulations
Exercise 12.08: XNOR Gate Functionality Using NEAT
Activity 12.01: Cart-Pole Activity
Summary
Appendix
1. Introduction to Reinforcement Learning
Activity 1.01: Measuring the Performance of a Random Agent
2. Markov Decision Processes and Bellman Equations
Activity 2.01: Solving Gridworld
3. Deep Learning in Practice with TensorFlow 2
Activity 3.01: Classifying Fashion Clothes Using a TensorFlow Dataset and TensorFlow 2
4. Getting started with OpenAI and TensorFlow for Reinforcement Learning
Activity 4.01: Training a Reinforcement Learning Agent to Play a Classic Video Game
5. Dynamic Programming
Activity 5.01: Implementing Policy and Value Iteration on the FrozenLake-v0 Environment
6. Monte Carlo Methods
Activity 6.01: Exploring the Frozen Lake Problem – the Reward Function
Activity 6.02 Solving Frozen Lake Using Monte Carlo Control Every Visit Epsilon Soft
7. Temporal Difference Learning
Activity 7.01: Using TD(0) Q-Learning to Solve FrozenLake-v0 Stochastic Transitions
8. The Multi-Armed Bandit Problem
Activity 8.01: Queueing Bandits
9. What Is Deep Q-Learning?
Activity 9.01: Implementing a Double Deep Q Network in PyTorch for the CartPole Environment
10. Playing an Atari Game with Deep Recurrent Q-Networks
Activity 10.01: Training a DQN with CNNs to Play Breakout
Activity 10.02: Training a DRQN to Play Breakout
Activity 10.03: Training a DARQN to Play Breakout
11. Policy-Based Methods for Reinforcement Learning
Activity 11.01: Creating an Agent That Learns a Model Using DDPG
Activity 11.02: Loading the Saved Policy to Run the Lunar Lander Simulation
12. Evolutionary Strategies for RL
Activity 12.01: Cart-Pole Activity
← Prev
Back
Next →
← Prev
Back
Next →