Hands-On Reinforcement Learning with Python · Master reinforcement and deep reinforcement learning using OpenAI Gym and TensorFlow by Ravichandiran, Sudharsan -- Read -- Imperial Library of Trantor

Index

Title Page Copyright and Credits

Hands-On Reinforcement Learning with Python

Dedication Packt Upsell

Why subscribe? PacktPub.com

Contributors

About the author About the reviewers Packt is searching for authors like you

Preface

Who this book is for What this book covers To get the most out of this book

Download the example code files Download the color images Conventions used

Get in touch

Reviews

Introduction to Reinforcement Learning

What is RL? RL algorithm How RL differs from other ML paradigms Elements of RL

Agent Policy function Value function Model

Agent environment interface Types of RL environment

Deterministic environment Stochastic environment Fully observable environment Partially observable environment Discrete environment Continuous environment Episodic and non-episodic environment Single and multi-agent environment

RL platforms

OpenAI Gym and Universe DeepMind Lab RL-Glue Project Malmo ViZDoom

Applications of RL

Education Medicine and healthcare Manufacturing Inventory management Finance Natural Language Processing and Computer Vision

Summary Questions Further reading

Getting Started with OpenAI and TensorFlow

Setting up your machine

Installing Anaconda Installing Docker Installing OpenAI Gym and Universe

Common error fixes

OpenAI Gym

Basic simulations Training a robot to walk

OpenAI Universe

Building a video game bot

TensorFlow

Variables, constants, and placeholders

Variables Constants Placeholders

Computation graph Sessions TensorBoard

Adding scope

Summary Questions Further reading

The Markov Decision Process and Dynamic Programming

The Markov chain and Markov process Markov Decision Process

Rewards and returns Episodic and continuous tasks Discount factor The policy function State value function State-action value function (Q function)

The Bellman equation and optimality

Deriving the Bellman equation for value and Q functions

Solving the Bellman equation

Dynamic programming

Value iteration Policy iteration

Solving the frozen lake problem

Value iteration Policy iteration

Summary Questions Further reading

Gaming with Monte Carlo Methods

Monte Carlo methods

Estimating the value of pi using Monte Carlo

Monte Carlo prediction

First visit Monte Carlo Every visit Monte Carlo Let's play Blackjack with Monte Carlo

Monte Carlo control

Monte Carlo exploration starts On-policy Monte Carlo control Off-policy Monte Carlo control

Summary Questions Further reading

Temporal Difference Learning

TD learning TD prediction TD control

Q learning

Solving the taxi problem using Q learning

SARSA

Solving the taxi problem using SARSA

The difference between Q learning and SARSA Summary Questions Further reading

Multi-Armed Bandit Problem

The MAB problem

The epsilon-greedy policy The softmax exploration algorithm The upper confidence bound algorithm The Thompson sampling algorithm

Applications of MAB Identifying the right advertisement banner using MAB Contextual bandits Summary Questions Further reading

Deep Learning Fundamentals

Artificial neurons ANNs

Input layer Hidden layer Output layer Activation functions

Deep diving into ANN

Gradient descent

Neural networks in TensorFlow RNN

Backpropagation through time

Long Short-Term Memory RNN

Generating song lyrics using LSTM RNN

Convolutional neural networks

Convolutional layer Pooling layer Fully connected layer CNN architecture

Classifying fashion products using CNN Summary Questions Further reading

Atari Games with Deep Q Network

What is a Deep Q Network? Architecture of DQN

Convolutional network Experience replay Target network Clipping rewards Understanding the algorithm

Building an agent to play Atari games Double DQN Prioritized experience replay Dueling network architecture Summary Questions Further reading

Playing Doom with a Deep Recurrent Q Network

DRQN

Architecture of DRQN

Training an agent to play Doom 

Basic Doom game Doom with DRQN

DARQN

Architecture of DARQN

Summary Questions Further reading

The Asynchronous Advantage Actor Critic Network

The Asynchronous Advantage Actor Critic

The three As The architecture of A3C How A3C works

Driving up a mountain with A3C

Visualization in TensorBoard

Summary Questions Further reading

Policy Gradients and Optimization

Policy gradient

Lunar Lander using policy gradients

Deep deterministic policy gradient

Swinging a pendulum

Trust Region Policy Optimization Proximal Policy Optimization Summary Questions Further reading

Capstone Project – Car Racing Using DQN

Environment wrapper functions Dueling network Replay memory Training the network Car racing Summary Questions Further reading

Recent Advancements and Next Steps

Imagination augmented agents  Learning from human preference Deep Q learning from demonstrations Hindsight experience replay Hierarchical reinforcement learning

MAXQ Value Function Decomposition

Inverse reinforcement learning Summary Questions Further reading

Assessments

Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 Chapter 12 Chapter 13

Other Books You May Enjoy

Leave a review - let other readers know what you think

← Prev
Back
Next →

← Prev
Back
Next →