Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Title Page
Copyright and Credits
Hands-On Reinforcement Learning for Games
Dedication
About Packt
Why subscribe?
Contributors
About the author
About the reviewers
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Section 1: Exploring the Environment
Understanding Rewards-Based Learning
Technical requirements
Understanding rewards-based learning
The elements of RL
The history of RL
Why RL in games?
Introducing the Markov decision process
The Markov property and MDP
Building an MDP
Using value learning with multi-armed bandits
Coding a value learner
Implementing a greedy policy
Exploration versus exploitation
Exploring Q-learning with contextual bandits
Implementing a Q-learning agent
Removing discounted rewards
Summary
Questions
Dynamic Programming and the Bellman Equation
Introducing DP
Regular programming versus DP
Enter DP and memoization
Understanding the Bellman equation
Unraveling the finite MDP
The Bellman optimality equation
Building policy iteration
Installing OpenAI Gym
Testing Gym
Policy evaluation
Policy improvement
Building value iteration
Playing with policy versus value iteration
Exercises
Summary
Monte Carlo Methods
Understanding model-based and model-free learning
Introducing the Monte Carlo method
Solving for
Implementing Monte Carlo
Plotting the guesses
Adding RL
Monte Carlo control
Playing the FrozenLake game
Using prediction and control
Incremental means
Exercises
Summary
Temporal Difference Learning
Understanding the TCA problem
Introducing TDL
Bootstrapping and backup diagrams
Applying TD prediction
TD(0) or one-step TD
Tuning hyperparameters
Applying TDL to Q-learning
Exploring TD(0) in Q-learning
Exploration versus exploitation revisited
Teaching an agent to drive a taxi
Running off- versus on-policy
Exercises
Summary
Exploring SARSA
Exploring SARSA on-policy learning
Using continuous spaces with SARSA
Discretizing continuous state spaces
Expected SARSA
Extending continuous spaces
Working with TD (λ) and eligibility traces
Backward views and eligibility traces
Understanding SARSA (λ)
SARSA lambda and the Lunar Lander
Exercises
Summary
Section 2: Exploiting the Knowledge
Going Deep with DQN
DL for RL
DL frameworks for DRL
Using PyTorch for DL
Computational graphs with tensors
Training a neural network – computational graph
Building neural networks with Torch
Understanding DQN in PyTorch
Refreshing the environment
Partially observable Markov decision process
Constructing DQN
The replay buffer
The DQN class
Calculating loss and training
Exercising DQN
Revisiting the LunarLander and beyond
Exercises
Summary
Going Deeper with DDQN
Understanding visual state
Encoding visual state
Introducing CNNs
Working with a DQN on Atari
Adding CNN layers
Introducing DDQN
Double DQN or the fixed Q targets
Dueling DQN or the real DDQN
Extending replay with prioritized experience replay
Exercises
Summary
Policy Gradient Methods
Understanding policy gradient methods
Policy gradient ascent
Introducing REINFORCE
Using advantage actor-critic
Actor-critic
Training advantage AC
Building a deep deterministic policy gradient
Training DDPG
Exploring trust region policy optimization
Conjugate gradients
Trust region methods
The TRPO step
Exercises
Summary
Optimizing for Continuous Control
Understanding continuous control with Mujoco
Introducing proximal policy optimization
The hows of policy optimization
PPO and clipped objectives
Using PPO with recurrent networks
Deciding on synchronous and asynchronous actors
Using A2C
Using A3C
Building actor-critic with experience replay
Exercises
Summary
All about Rainbow DQN
Rainbow – combining improvements in deep reinforcement learning
Using TensorBoard
Introducing distributional RL
Back to TensorBoard
Understanding noisy networks
Noisy networks for exploration and importance sampling
Unveiling Rainbow DQN
When does training fail?
Exercises
Summary
Exploiting ML-Agents
Installing ML-Agents
Building a Unity environment
Building for Gym wrappers
Training a Unity environment with Rainbow
Creating a new environment
Coding an agent/environment
Advancing RL with ML-Agents
Curriculum learning
Behavioral cloning
Curiosity learning
Training generalized reinforcement learning agents
Exercises
Summary
DRL Frameworks
Choosing a framework
Introducing Google Dopamine
Playing with Keras-RL
Exploring RL Lib
Using TF-Agents
Exercises
Summary
Section 3: Reward Yourself
3D Worlds
Reasoning on 3D worlds
Training a visual agent
Generalizing 3D vision
ResNet for visual observation encoding
Challenging the Unity Obstacle Tower Challenge
Pre-training the agent
Prierarchy – implicit hierarchies
Exploring Habitat – embodied agents by FAIR
Installing Habitat
Training in Habitat
Exercises
Summary
From DRL to AGI
Learning meta learning
Learning 2 learn
Model-agnostic meta learning
Training a meta learner
Introducing meta reinforcement learning
MAML-RL
Using hindsight experience replay
Imagination and reasoning in RL
Generating imagination
Understanding imagination-augmented agents
Exercises
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
← Prev
Back
Next →
← Prev
Back
Next →