The Reinforcement Learning Workshop by Alessandro Palmas, Emanuele Ghelfi, Alexandra Galina Petre, Mayur Kulkarni, Anand N.S., Quan Nguyen, Aritra Sen, Anthony So -- Read -- Imperial Library of Trantor

Index

The Reinforcement Learning Workshop Preface

About the Book

Audience About the Chapters Conventions Code Presentation Setting up Your Environment Installing Anaconda for Jupyter Notebook Installing a Virtual Environment Installing Gym Installing TensorFlow 2 Installing PyTorch Installing OpenAI Baselines Installing Pillow Installing Torch Installing Other Libraries Accessing the Code Files

1. Introduction to Reinforcement Learning

Introduction Learning Paradigms

Introduction to Learning Paradigms Supervised versus Unsupervised versus RL Classifying Common Problems into Learning Scenarios

Predicting Whether an Image Contains a Dog or a Cat Detecting and Classifying All Dogs and Cats in an Image Playing Chess

Fundamentals of Reinforcement Learning

Elements of RL

Agent Actions Environment Policy An Example of an Autonomous Driving Environment

Exercise 1.01: Implementing a Toy Environment Using Python The Agent-Environment Interface

What's the Agent? What's in the Environment?

Environment Types

Finite versus Continuous Deterministic versus Stochastic Fully Observable versus Partially Observable POMDP versus MDP Single Agents versus Multiple Agents

An Action and Its Types Policy

Stochastic Policies Policy Parameterizations

Exercise 1.02: Implementing a Linear Policy Goals and Rewards

Why Discount?

Reinforcement Learning Frameworks

OpenAI Gym

Getting Started with Gym – CartPole Gym Spaces

Exercise 1.03: Creating a Space for Image Observations

Rendering an Environment Rendering CartPole A Reinforcement Learning Loop with Gym

Exercise 1.04: Implementing the Reinforcement Learning Loop with Gym Activity 1.01: Measuring the Performance of a Random Agent OpenAI Baselines

Getting Started with Baselines – DQN on CartPole

Applications of Reinforcement Learning

Games Go Dota 2

StarCraft

Robot Control Autonomous Driving

Summary

2. Markov Decision Processes and Bellman Equations

Introduction Markov Processes

The Markov Property Markov Chains Markov Reward Processes

Value Functions and Bellman Equations for MRPs Solving Linear Systems of an Equation Using SciPy

Exercise 2.01: Finding the Value Function in an MRP Markov Decision Processes

The State-Value Function and the Action-Value Function Bellman Optimality Equation Solving the Bellman Optimality Equation

Solving MDPs

Algorithm Categorization Value-Based Algorithms Policy Search Algorithms Linear Programming

Exercise 2.02: Determining the Best Policy for an MDP Using Linear Programming Gridworld Activity 2.01: Solving Gridworld Summary

3. Deep Learning in Practice with TensorFlow 2

Introduction An Introduction to TensorFlow and Keras

TensorFlow Keras Exercise 3.01: Building a Sequential Model with the Keras High-Level API

How to Implement a Neural Network Using TensorFlow

Model Creation Model Training Loss Function Definition Optimizer Choice Learning Rate Scheduling Feature Normalization Model Validation Performance Metrics Model Improvement

Overfitting Regularization Early Stopping Dropout Data Augmentation Batch Normalization Model Testing and Inference

Standard Fully Connected Neural Networks Exercise 3.02: Building a Fully Connected Neural Network Model with the Keras High-Level API Convolutional Neural Networks Exercise 3.03: Building a Convolutional Neural Network Model with the Keras High-Level API Recurrent Neural Networks Exercise 3.04: Building a Recurrent Neural Network Model with the Keras High-Level API

Simple Regression Using TensorFlow

Exercise 3.05: Creating a Deep Neural Network to Predict the Fuel Efficiency of Cars

Simple Classification Using TensorFlow

Exercise 3.06: Creating a Deep Neural Network to Classify Events Generated by the ATLAS Experiment in the Quest for Higgs Boson

TensorBoard – How to Visualize Data Using TensorBoard

Exercise 3.07: Creating a Deep Neural Network to Classify Events Generated by the ATLAS Experiment in the Quest for the Higgs Boson Using TensorBoard for Visualization Activity 3.01: Classifying Fashion Clothes Using a TensorFlow Dataset and TensorFlow 2

Summary

4. Getting Started with OpenAI and TensorFlow for Reinforcement Learning

Introduction OpenAI Gym

How to Interact with a Gym Environment Exercise 4.01: Interacting with the Gym Environment Action and Observation Spaces How to Implement a Custom Gym Environment

OpenAI Universe – Complex Environment

OpenAI Universe Infrastructure Environments

Atari Games Flash Games Browser Tasks

Running an OpenAI Universe Environment Validating the Universe Infrastructure

TensorFlow for Reinforcement Learning

Implementing a Policy Network Using TensorFlow Exercise 4.02: Building a Policy Network with TensorFlow Exercise 4.03: Feeding the Policy Network with Environment State Representation How to Save a Policy Network

OpenAI Baselines

Proximal Policy Optimization Command-Line Usage Methods in OpenAI Baselines Custom Policy Network Architecture

Training an RL Agent to Solve a Classic Control Problem

Exercise 4.04: Solving a CartPole Environment with the PPO Algorithm Activity 4.01: Training a Reinforcement Learning Agent to Play a Classic Video Game

Summary

5. Dynamic Programming

Introduction Solving Dynamic Programming Problems

Memoization The Tabular Method Exercise 5.01: Memoization in Practice Exercise 5.02: The Tabular Method in Practice

Identifying Dynamic Programming Problems

Optimal Substructures Overlapping Subproblems The Coin-Change Problem Exercise 5.03: Solving the Coin-Change Problem

Dynamic Programming in RL

Policy and Value Iteration State-Value Functions Action-Value Functions OpenAI Gym: Taxi-v3 Environment

Policy Iteration Value Iteration

The FrozenLake-v0 Environment Activity 5.01: Implementing Policy and Value Iteration on the FrozenLake-v0 Environment

Summary

6. Monte Carlo Methods

Introduction The Workings of Monte Carlo Methods Understanding Monte Carlo with Blackjack

Exercise 6.01: Implementing Monte Carlo in Blackjack

Types of Monte Carlo Methods

First Visit Monte Carlo Prediction for Estimating the Value Function Exercise 6.02: First Visit Monte Carlo Prediction for Estimating the Value Function in Blackjack Every Visit Monte Carlo Prediction for Estimating the Value Function Exercise 6.03: Every Visit Monte Carlo Prediction for Estimating the Value Function

Exploration versus Exploitation Trade-Off Importance Sampling

The Pseudocode for Monte Carlo Off-Policy Evaluation Exercise 6.04: Importance Sampling with Monte Carlo

Solving Frozen Lake Using Monte Carlo

Activity 6.01: Exploring the Frozen Lake Problem – the Reward Function The Pseudocode for Every Visit Monte Carlo Control for Epsilon Soft Activity 6.02 Solving Frozen Lake Using Monte Carlo Control Every Visit Epsilon Soft

Summary

7. Temporal Difference Learning

Introduction to TD Learning TD(0) – SARSA and Q-Learning

SARSA – On-Policy Control Exercise 7.01: Using TD(0) SARSA to Solve FrozenLake-v0 Deterministic Transitions The Stochasticity Test Exercise 7.02: Using TD(0) SARSA to Solve FrozenLake-v0 Stochastic Transitions Q-Learning – Off-Policy Control Exercise 7.03: Using TD(0) Q-Learning to Solve FrozenLake-v0 Deterministic Transitions Expected SARSA

N-Step TD and TD(λ) Algorithms

N-Step TD

N-step SARSA N-Step Off-Policy Learning

TD(λ)

SARSA(λ)

Exercise 7.04: Using TD(λ) SARSA to Solve FrozenLake-v0 Deterministic Transitions Exercise 7.05: Using TD(λ) SARSA to Solve FrozenLake-v0 Stochastic Transitions

The Relationship between DP, Monte-Carlo, and TD Learning

Activity 7.01: Using TD(0) Q-Learning to Solve FrozenLake-v0 Stochastic Transitions

Summary

8. The Multi-Armed Bandit Problem

Introduction Formulation of the MAB Problem

Applications of the MAB Problem Background and Terminology MAB Reward Distributions

The Python Interface The Greedy Algorithm

Implementing the Greedy Algorithm

The Explore-then-Commit Algorithm The ε-Greedy Algorithm

Exercise 8.01 Implementing the ε-Greedy Algorithm The Softmax Algorithm

The UCB algorithm

Optimism in the Face of Uncertainty Other Properties of UCB Exercise 8.02 Implementing the UCB Algorithm

Thompson Sampling

Introduction to Bayesian Probability The Thompson Sampling Algorithm Exercise 8.03: Implementing the Thompson Sampling Algorithm

Contextual Bandits

Context That Defines a Bandit Problem Queueing Bandits Working with the Queueing API Activity 8.01: Queueing Bandits

Summary

9. What Is Deep Q-Learning?

Introduction Basics of Deep Learning Basics of PyTorch

Exercise 9.01: Building a Simple Deep Learning Model in PyTorch PyTorch Utilities

The view Function The squeeze Function The unsqueeze Function The max Function The gather Function

The State-Value Function and the Bellman Equation

Expected Value The Value Function The Value Function for a Deterministic Environment The Value Function for a Stochastic Environment:

The Action-Value Function (Q Value Function)

Implementing Q Learning to Find Optimal Actions

Advantages of Q Learning

OpenAI Gym Review Exercise 9.02: Implementing the Q Learning Tabular Method

Deep Q Learning

Exercise 9.03: Implementing a Working DQN Network with PyTorch in a CartPole-v0 Environment

Challenges in DQN

Correlation between Steps and the Convergence Issue Experience Replay The Challenge of a Non-Stationary Target The Concept of a Target Network Exercise 9.04: Implementing a Working DQN Network with Experience Replay and a Target Network in PyTorch The Challenge of Overestimation in a DQN Double Deep Q Network (DDQN) Activity 9.01: Implementing a Double Deep Q Network in PyTorch for the CartPole Environment

Summary

10. Playing an Atari Game with Deep Recurrent Q-Networks

Introduction Understanding the Breakout Environment

Exercise 10.01: Playing Breakout with a Random Agent

CNNs in TensorFlow

Exercise 10.02: Designing a CNN Model with TensorFlow

Combining a DQN with a CNN

Activity 10.01: Training a DQN with CNNs to Play Breakout

RNNs in TensorFlow

Exercise 10.03: Designing a Combination of CNN and RNN Models with TensorFlow

Building a DRQN

Activity 10.02: Training a DRQN to Play Breakout

Introduction to the Attention Mechanism and DARQN

Activity 10.03: Training a DARQN to Play Breakout

Summary

11. Policy-Based Methods for Reinforcement Learning

Introduction

Introduction to Value-Based and Model-Based RL Introduction to Actor-Critic Model

Policy Gradients

Exercise 11.01: Landing a Spacecraft on the Lunar Surface Using Policy Gradients and the Actor-Critic Method

Deep Deterministic Policy Gradients

Ornstein-Uhlenbeck Noise The ReplayBuffer Class The Actor-Critic Model Exercise 11.02: Creating a Learning Agent Activity 11.01: Creating an Agent That Learns a Model Using DDPG

Improving Policy Gradients

Trust Region Policy Optimization Proximal Policy Optimization Exercise 11.03: Improving the Lunar Lander Example Using PPO The Advantage Actor-Critic Method Activity 11.02: Loading the Saved Policy to Run the Lunar Lander Simulation

Summary

12. Evolutionary Strategies for RL

Introduction Problems with Gradient-Based Methods

Exercise 12.01: Optimization Using Stochastic Gradient Descent

Introduction to Genetic Algorithms

Exercise 12.02: Implementing Fixed-Value and Uniform Distribution Optimization Using GAs Components: Population Creation Exercise 12.03: Population Creation Components: Parent Selection Exercise 12.04: Implementing the Tournament and Roulette Wheel Techniques Components: Crossover Application Exercise 12.05: Crossover for a New Generation Components: Population Mutation Exercise 12.06: New Generation Development Using Mutation Application to Hyperparameter Selection Exercise 12.07: Implementing GA Hyperparameter Optimization for RNN Training NEAT and Other Formulations Exercise 12.08: XNOR Gate Functionality Using NEAT Activity 12.01: Cart-Pole Activity

Summary

Appendix

1. Introduction to Reinforcement Learning

Activity 1.01: Measuring the Performance of a Random Agent

2. Markov Decision Processes and Bellman Equations

Activity 2.01: Solving Gridworld

3. Deep Learning in Practice with TensorFlow 2

Activity 3.01: Classifying Fashion Clothes Using a TensorFlow Dataset and TensorFlow 2

4. Getting started with OpenAI and TensorFlow for Reinforcement Learning

Activity 4.01: Training a Reinforcement Learning Agent to Play a Classic Video Game

5. Dynamic Programming

Activity 5.01: Implementing Policy and Value Iteration on the FrozenLake-v0 Environment

6. Monte Carlo Methods

Activity 6.01: Exploring the Frozen Lake Problem – the Reward Function Activity 6.02 Solving Frozen Lake Using Monte Carlo Control Every Visit Epsilon Soft

7. Temporal Difference Learning

Activity 7.01: Using TD(0) Q-Learning to Solve FrozenLake-v0 Stochastic Transitions

8. The Multi-Armed Bandit Problem

Activity 8.01: Queueing Bandits

9. What Is Deep Q-Learning?

Activity 9.01: Implementing a Double Deep Q Network in PyTorch for the CartPole Environment

10. Playing an Atari Game with Deep Recurrent Q-Networks

Activity 10.01: Training a DQN with CNNs to Play Breakout Activity 10.02: Training a DRQN to Play Breakout Activity 10.03: Training a DARQN to Play Breakout

11. Policy-Based Methods for Reinforcement Learning

Activity 11.01: Creating an Agent That Learns a Model Using DDPG Activity 11.02: Loading the Saved Policy to Run the Lunar Lander Simulation

12. Evolutionary Strategies for RL

Activity 12.01: Cart-Pole Activity

← Prev
Back
Next →

← Prev
Back
Next →