Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
Reinforcement Learning
Overview of machine learning
What is machine learning?
Speech conversion from one language to another
Suspicious activity detection from CCTVs
Medical diagnostics for detecting diseases
Supervised learning
Unsupervised learning
Reinforcement learning
Introduction to reinforcement learning
Positive reinforcement learning
Negative reinforcement learning
Applications of reinforcement learning
Self-driving cars
Drone autonomous aerial taxi
Aerobatics autonomous helicopter
TD-Gammon – computer game
AlphaGo
The agent environment setup
Exploration versus exploitation
Neural network and reinforcement learning
Reinforcement learning frameworks/toolkits
OpenAI Gym
Getting Started with OpenAI Gym
Docker
Docker installation on Windows environment
Docker installation on a Linux environment
Running an environment
Brown-UMBC Reinforcement Learning and Planning
Walkthrough with Hello GridWorld
Hello GridWorld project
Summary
Markov Decision Process
Introduction to MDP
State
Action
Model
Reward
Policy
MDP - more about rewards
Optimal policy
More about policy
Bellman equation
A practical example of building an MDP domain
GridWorld
Terminal states
Java interfaces for MDP definitions
Single-agent domain
State
Action
Action type
SampleModel
Environment
EnvironmentOutcome
TransitionProb
Defining a GridWorld state
Defining a GridWorld model
Creating the state visualizer
Testing it out
Markov chain
Building an object-oriented MDP domain
Summary
Dynamic Programming
Learning and planning
Evaluating a policy
Value iteration
Value iteration implementation using BURLAP
Output of the value iteration
Policy iteration
Bellman equations
The relationship between Bellman equations
Summary
Temporal Difference Learning
Introducing TD learning
TD lambda
Estimating from data
Learning rate
Properties of learning rate
Overview of TD(1)
An example of TD(1)
Why TD(1) is wrong
Overview of TD(0)
TD lambda rule
K-step estimator
Relationship between k-step estimators and TD lambda
Summary
Monte Carlo Methods
Monte Carlo methods
First visit Monte Carlo
Example – Blackjack
Objective of the game
Card scoring/values
The deal
Naturals
The gameplay
Applying the Monte Carlo approach
Blackjack game implementation
Monte Carlo for control
Monte Carlo Exploring Starts
Example - Blackjack
Summary
Learning and Planning
Q-learning
Q-learning example by hand
Value iteration
Testing the value iteration code
Q-learning code
Testing Q-learning code
Output of the Q-learning program
Summary
Deep Reinforcement Learning
What is a neural network?
A single neuron
Feed-forward neural network
Multi-Layer Perceptron
Deep learning
Deep Q Network
Experience replay
The DQN algorithm
DQN example – PyTorch and Gym
Task
Packages
Replay memory
Q-network
Input extraction
Training
Training loop
Example – Flappy Bird using Keras
Dependencies
qlearn.py
Game screen input
Image preprocessing
Convolution Neural Network
DQN implementation
Complete code
Output
Summary
Game Theory
Introduction to game theory
Example of game theory
Minimax
Fundamental results
Game tree
von Neumann theorem
Mini Poker game
Mixed strategies
OpenAI Gym examples
Agents
Environments
Example 1 – simple random agent
Example 2 – learning agent
Example 3 - keyboard learning agent
Summary
Reinforcement Learning Showdown
Reinforcement learning frameworks
PyBrain
Setup
Ready to code
Environment
Agent
Task
Experiment
RLPy
Setup
Ready to code
Maja Machine Learning Framework
Setup
RL-Glue
Setup
RL-Glue components
Sample project
sample_sarsa_agent.py
sample_mines_environment.py
sample_experiment.py
Mindpark
Setup
Summary
Applications and Case Studies – Reinforcement Learning
Inverse Reinforcement Learning
IRL algorithm
Implementing a car obstacle avoidance problem
Results and observations
Partially Observable Markov Decision Process
POMDP example
State estimator
Value iteration in POMDP
Reinforcement learning for POMDP
Summary
Current Research – Reinforcement Learning
Hierarchical reinforcement learning
Advantages of hierarchical reinforcement learning
The SMDP model
Hierarchical RL model
Reinforcement learning with hierarchies of abstract machines
HAM framework
Running a HAM algorithm
HAM for mobile robot example
HAM for a RoboCup keepaway example
MAXQ value function decomposition
Taxi world example
Decomposition of the projected value function
Summary
Article
State
Action
Model
Reward
Policy
MDP - more about rewards
Optimal policy
More about Policy
Summary
← Prev
Back
Next →
← Prev
Back
Next →