Practical Reinforcement Learning · Develop Self-Evolving, Intelligent Agents With OpenAI Gym, Python and Java by Akhtar, Engr. S.M. Farrukh -- Read -- Imperial Library of Trantor

Index

Preface

What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support

Downloading the example code Errata Piracy Questions

Reinforcement Learning

Overview of machine learning

What is machine learning?

Speech conversion from one language to another Suspicious activity detection from CCTVs Medical diagnostics for detecting diseases

Supervised learning Unsupervised learning Reinforcement learning

Introduction to reinforcement learning

Positive reinforcement learning Negative reinforcement learning Applications of reinforcement learning Self-driving cars Drone autonomous aerial taxi Aerobatics autonomous helicopter TD-Gammon – computer game AlphaGo

The agent environment setup

Exploration versus exploitation Neural network and reinforcement learning Reinforcement learning frameworks/toolkits

OpenAI Gym

Getting Started with OpenAI Gym Docker

Docker installation on Windows environment Docker installation on a Linux environment

Running an environment

Brown-UMBC Reinforcement Learning and Planning

Walkthrough with Hello GridWorld Hello GridWorld project

Summary

Markov Decision Process

Introduction to MDP

State Action Model Reward Policy MDP - more about rewards Optimal policy More about policy

Bellman equation A practical example of building an MDP domain

GridWorld Terminal states Java interfaces for MDP definitions

Single-agent domain State Action Action type SampleModel Environment EnvironmentOutcome TransitionProb

Defining a GridWorld state Defining a GridWorld model Creating the state visualizer Testing it out

Markov chain Building an object-oriented MDP domain Summary

Dynamic Programming

Learning and planning Evaluating a policy Value iteration

Value iteration implementation using BURLAP

Output of the value iteration

Policy iteration Bellman equations

The relationship between Bellman equations

Summary

Temporal Difference Learning

Introducing TD learning TD lambda Estimating from data Learning rate

Properties of learning rate

Overview of TD(1)

An example of TD(1) Why TD(1) is wrong

Overview of TD(0) TD lambda rule K-step estimator Relationship between k-step estimators and TD lambda Summary

Monte Carlo Methods

Monte Carlo methods

First visit Monte Carlo Example – Blackjack

Objective of the game Card scoring/values The deal Naturals The gameplay Applying the Monte Carlo approach Blackjack game implementation

Monte Carlo for control

Monte Carlo Exploring Starts Example - Blackjack

Summary

Learning and Planning

Q-learning

Q-learning example by hand Value iteration

Testing the value iteration code

Q-learning code

Testing Q-learning code Output of the Q-learning program

Summary

Deep Reinforcement Learning

What is a neural network?

A single neuron Feed-forward neural network Multi-Layer Perceptron

Deep learning Deep Q Network

Experience replay The DQN algorithm DQN example – PyTorch and Gym

Task

Packages Replay memory Q-network Input extraction Training Training loop

Example – Flappy Bird using Keras

Dependencies qlearn.py Game screen input Image preprocessing Convolution Neural Network DQN implementation Complete code

Output

Summary

Game Theory

Introduction to game theory

Example of game theory Minimax Fundamental results Game tree von Neumann theorem Mini Poker game Mixed strategies

OpenAI Gym examples

Agents Environments

Example 1 – simple random agent Example 2 – learning agent Example 3 - keyboard learning agent

Summary

Reinforcement Learning Showdown

Reinforcement learning frameworks

PyBrain

Setup Ready to code Environment Agent Task Experiment

RLPy

Setup Ready to code

Maja Machine Learning Framework

Setup

RL-Glue

Setup

RL-Glue components Sample project sample_sarsa_agent.py sample_mines_environment.py sample_experiment.py

Mindpark

Setup

Summary

Applications and Case Studies – Reinforcement Learning

Inverse Reinforcement Learning

IRL algorithm Implementing a car obstacle avoidance problem

Results and observations

Partially Observable Markov Decision Process

POMDP example State estimator Value iteration in POMDP Reinforcement learning for POMDP

Summary

Current Research – Reinforcement Learning

Hierarchical reinforcement learning

Advantages of hierarchical reinforcement learning The SMDP model Hierarchical RL model

Reinforcement learning with hierarchies of abstract machines

HAM framework Running a HAM algorithm HAM for mobile robot example HAM for a RoboCup keepaway example

MAXQ value function decomposition

Taxi world example Decomposition of the projected value function

Summary

Article

State Action Model Reward Policy MDP - more about rewards Optimal policy More about Policy Summary

← Prev
Back
Next →

← Prev
Back
Next →