Reinforcement Learning With TensorFlow · A Beginner's Guide to Designing Self-Learning Systems With TensorFlow and OpenAI Gym by Dutta, Sayon -- Read -- Imperial Library of Trantor

Index

Title Page Copyright and Credits

Reinforcement Learning with TensorFlow

Packt Upsell

Why subscribe? PacktPub.com

Contributors

About the author About the reviewer Packt is searching for authors like you

Preface

Who this book is for What this book covers To get the most out of this book

Download the example code files Download the color images Conventions used

Get in touch

Reviews

Deep Learning – Architectures and Frameworks

Deep learning

Activation functions for deep learning

The sigmoid function The tanh function The softmax function The rectified linear unit function How to choose the right activation function

Logistic regression as a neural network

Notation Objective The cost function The gradient descent algorithm The computational graph Steps to solve logistic regression using gradient descent

What is xavier initialization? Why do we use xavier initialization?

The neural network model

Recurrent neural networks Long Short Term Memory Networks Convolutional neural networks

The LeNet-5 convolutional neural network The AlexNet model The VGG-Net model The Inception model

Limitations of deep learning

The vanishing gradient problem The exploding gradient problem Overcoming the limitations of deep learning

Reinforcement learning

Basic terminologies and conventions Optimality criteria

The value function for optimality The policy model for optimality

The Q-learning approach to reinforcement learning Asynchronous advantage actor-critic

Introduction to TensorFlow and OpenAI Gym

Basic computations in TensorFlow An introduction to OpenAI Gym

The pioneers and breakthroughs in reinforcement learning

David Silver Pieter Abbeel Google DeepMind The AlphaGo program Libratus

Summary

Training Reinforcement Learning Agents Using OpenAI Gym

The OpenAI Gym

Understanding an OpenAI Gym environment

Programming an agent using an OpenAI Gym environment

Q-Learning

The Epsilon-Greedy approach

Using the Q-Network for real-world applications

Summary

Markov Decision Process

Markov decision processes

The Markov property The S state set Actions Transition model Rewards Policy The sequence of rewards - assumptions

The infinite horizons Utility of sequences

The Bellman equations

Solving the Bellman equation to find policies

An example of value iteration using the Bellman equation Policy iteration

Partially observable Markov decision processes

State estimation Value iteration in POMDPs

Training the FrozenLake-v0 environment using MDP Summary

Policy Gradients

The policy optimization method Why policy optimization methods?

Why stochastic policy?

Example 1 - rock, paper, scissors Example 2 - state aliased grid-world

Policy objective functions

Policy Gradient Theorem

Temporal difference rule

TD(1) rule TD(0) rule TD() rule

Policy gradients

The Monte Carlo policy gradient Actor-critic algorithms Using a baseline to reduce variance Vanilla policy gradient

Agent learning pong using policy gradients Summary

Q-Learning and Deep Q-Networks

Why reinforcement learning? Model based learning and model free learning

Monte Carlo learning Temporal difference learning On-policy and off-policy learning

Q-learning

The exploration exploitation dilemma Q-learning for the mountain car problem in OpenAI gym

Deep Q-networks

Using a convolution neural network instead of a single layer neural network Use of experience replay Separate target network to compute the target Q-values Advancements in deep Q-networks and beyond

Double DQN Dueling DQN

Deep Q-network for mountain car problem in OpenAI gym Deep Q-network for Cartpole problem in OpenAI gym Deep Q-network for Atari Breakout in OpenAI gym

The Monte Carlo tree search algorithm

Minimax and game trees The Monte Carlo Tree Search

The SARSA algorithm

SARSA algorithm for mountain car problem in OpenAI gym

Summary

Asynchronous Methods

Why asynchronous methods? Asynchronous one-step Q-learning Asynchronous one-step SARSA Asynchronous n-step Q-learning Asynchronous advantage actor critic A3C for Pong-v0 in OpenAI gym Summary

Robo Everything – Real Strategy Gaming

Real-time strategy games Reinforcement learning and other approaches

Online case-based planning

Drawbacks to real-time strategy games

Why reinforcement learning?

Reinforcement learning in RTS gaming

Deep autoencoder How is reinforcement learning better?

Summary

AlphaGo – Reinforcement Learning at Its Best

What is Go?

Go versus chess

How did DeepBlue defeat Gary Kasparov?

Why is the game tree approach no good for Go?

AlphaGo – mastering Go

Monte Carlo Tree Search Architecture and properties of AlphaGo  Energy consumption analysis – Lee Sedol versus AlphaGo

AlphaGo Zero

Architecture and properties of AlphaGo Zero

Training process in AlphaGo Zero 

Summary

Reinforcement Learning in Autonomous Driving

Machine learning for autonomous driving Reinforcement learning for autonomous driving

Creating autonomous driving agents Why reinforcement learning ?

Proposed frameworks for autonomous driving

Spatial aggregation

Sensor fusion Spatial features

Recurrent temporal aggregation Planning

DeepTraffic – MIT simulator for autonomous driving  Summary

Financial Portfolio Management

Introduction Problem definition Data preparation Reinforcement learning Further improvements Summary

Reinforcement Learning in Robotics

Reinforcement learning in robotics

Evolution of reinforcement learning

Challenges in robot reinforcement learning

High dimensionality problem Real-world challenges Issues due to model uncertainty What's the final objective a robot wants to achieve?

Open questions and practical challenges

Open questions Practical challenges for robotic reinforcement learning

Key takeaways Summary

Deep Reinforcement Learning in Ad Tech

Computational advertising challenges and bidding strategies

Business models used in advertising Sponsored-search advertisements

Search-advertisement management Adwords

Bidding strategies of advertisers

Real-time bidding by reinforcement learning in display advertising Summary

Reinforcement Learning in Image Processing

Hierarchical object detection with deep reinforcement learning

Related works

Region-based convolution neural networks Spatial pyramid pooling networks Fast R-CNN Faster R-CNN You Look Only Once Single Shot Detector

Hierarchical object detection model

State Actions Reward Model and training

Training specifics

Summary

Deep Reinforcement Learning in NLP

Text summarization

Deep reinforced model for Abstractive Summarization

Neural intra-attention model

Intra-temporal attention on input sequence while decoding Intra-decoder attention Token generation and pointer

Hybrid learning objective

Supervised learning with teacher forcing Policy learning Mixed training objective function

Text question answering

Mixed objective and deep residual coattention for Question Answering

Deep residual coattention encoder Mixed objective using self-critical policy learning

Summary

Further topics in Reinforcement Learning

Continuous action space algorithms

Trust region policy optimization Deterministic policy gradients

Scoring mechanism in sequential models in NLP

BLEU

What is BLEU score and what does it do?

ROUGE

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

← Prev
Back
Next →

← Prev
Back
Next →