Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Title Page Copyright and Credits
Hands-On Reinforcement Learning with Python
Dedication Packt Upsell
Why subscribe? PacktPub.com
Contributors
About the author About the reviewers Packt is searching for authors like you
Preface
Who this book is for What this book covers To get the most out of this book
Download the example code files Download the color images Conventions used
Get in touch
Reviews
Introduction to Reinforcement Learning
What is RL? RL algorithm How RL differs from other ML paradigms Elements of RL
Agent Policy function Value function Model
Agent environment interface Types of RL environment
Deterministic environment Stochastic environment Fully observable environment Partially observable environment Discrete environment Continuous environment Episodic and non-episodic environment Single and multi-agent environment
RL platforms
OpenAI Gym and Universe DeepMind Lab RL-Glue Project Malmo ViZDoom
Applications of RL
Education Medicine and healthcare Manufacturing Inventory management Finance Natural Language Processing and Computer Vision
Summary Questions Further reading
Getting Started with OpenAI and TensorFlow
Setting up your machine
Installing Anaconda Installing Docker Installing OpenAI Gym and Universe
Common error fixes
OpenAI Gym
Basic simulations Training a robot to walk
OpenAI Universe
Building a video game bot
TensorFlow
Variables, constants, and placeholders
Variables Constants Placeholders
Computation graph Sessions TensorBoard
Adding scope
Summary Questions Further reading
The Markov Decision Process and Dynamic Programming
The Markov chain and Markov process Markov Decision Process
Rewards and returns Episodic and continuous tasks Discount factor The policy function State value function State-action value function (Q function)
The Bellman equation and optimality
Deriving the Bellman equation for value and Q functions
Solving the Bellman equation
Dynamic programming
Value iteration Policy iteration
Solving the frozen lake problem
Value iteration Policy iteration
Summary Questions Further reading
Gaming with Monte Carlo Methods
Monte Carlo methods
Estimating the value of pi using Monte Carlo
Monte Carlo prediction
First visit Monte Carlo Every visit Monte Carlo Let's play Blackjack with Monte Carlo
Monte Carlo control
Monte Carlo exploration starts On-policy Monte Carlo control Off-policy Monte Carlo control
Summary Questions Further reading
Temporal Difference Learning
TD learning TD prediction TD control
Q learning
Solving the taxi problem using Q learning
SARSA
Solving the taxi problem using SARSA
The difference between Q learning and SARSA Summary Questions Further reading
Multi-Armed Bandit Problem
The MAB problem
The epsilon-greedy policy The softmax exploration algorithm The upper confidence bound algorithm The Thompson sampling algorithm
Applications of MAB Identifying the right advertisement banner using MAB Contextual bandits Summary Questions Further reading
Deep Learning Fundamentals
Artificial neurons ANNs
Input layer Hidden layer Output layer Activation functions
Deep diving into ANN
Gradient descent
Neural networks in TensorFlow RNN
Backpropagation through time
Long Short-Term Memory RNN
Generating song lyrics using LSTM RNN
Convolutional neural networks
Convolutional layer Pooling layer Fully connected layer CNN architecture
Classifying fashion products using CNN Summary Questions Further reading
Atari Games with Deep Q Network
What is a Deep Q Network? Architecture of DQN
Convolutional network Experience replay Target network Clipping rewards Understanding the algorithm
Building an agent to play Atari games Double DQN Prioritized experience replay Dueling network architecture Summary Questions Further reading
Playing Doom with a Deep Recurrent Q Network
DRQN
Architecture of DRQN
Training an agent to play Doom 
Basic Doom game Doom with DRQN
DARQN
Architecture of DARQN
Summary Questions Further reading
The Asynchronous Advantage Actor Critic Network
The Asynchronous Advantage Actor Critic
The three As The architecture of A3C How A3C works
Driving up a mountain with A3C
Visualization in TensorBoard
Summary Questions Further reading
Policy Gradients and Optimization
Policy gradient
Lunar Lander using policy gradients
Deep deterministic policy gradient
Swinging a pendulum
Trust Region Policy Optimization Proximal Policy Optimization Summary Questions Further reading
Capstone Project – Car Racing Using DQN
Environment wrapper functions Dueling network Replay memory Training the network Car racing Summary Questions Further reading
Recent Advancements and Next Steps
Imagination augmented agents  Learning from human preference Deep Q learning from demonstrations Hindsight experience replay Hierarchical reinforcement learning
MAXQ Value Function Decomposition
Inverse reinforcement learning Summary Questions Further reading
Assessments
Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 Chapter 12 Chapter 13
Other Books You May Enjoy
Leave a review - let other readers know what you think
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion