Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
The Reinforcement Learning Workshop Preface
About the Book
Audience About the Chapters Conventions Code Presentation Setting up Your Environment Installing Anaconda for Jupyter Notebook Installing a Virtual Environment Installing Gym Installing TensorFlow 2 Installing PyTorch Installing OpenAI Baselines Installing Pillow Installing Torch Installing Other Libraries Accessing the Code Files
1. Introduction to Reinforcement Learning
Introduction Learning Paradigms
Introduction to Learning Paradigms Supervised versus Unsupervised versus RL Classifying Common Problems into Learning Scenarios
Predicting Whether an Image Contains a Dog or a Cat Detecting and Classifying All Dogs and Cats in an Image Playing Chess
Fundamentals of Reinforcement Learning
Elements of RL
Agent Actions Environment Policy An Example of an Autonomous Driving Environment
Exercise 1.01: Implementing a Toy Environment Using Python The Agent-Environment Interface
What's the Agent? What's in the Environment?
Environment Types
Finite versus Continuous Deterministic versus Stochastic Fully Observable versus Partially Observable POMDP versus MDP Single Agents versus Multiple Agents
An Action and Its Types Policy
Stochastic Policies Policy Parameterizations
Exercise 1.02: Implementing a Linear Policy Goals and Rewards
Why Discount?
Reinforcement Learning Frameworks
OpenAI Gym
Getting Started with Gym – CartPole Gym Spaces
Exercise 1.03: Creating a Space for Image Observations
Rendering an Environment Rendering CartPole A Reinforcement Learning Loop with Gym
Exercise 1.04: Implementing the Reinforcement Learning Loop with Gym Activity 1.01: Measuring the Performance of a Random Agent OpenAI Baselines
Getting Started with Baselines – DQN on CartPole
Applications of Reinforcement Learning
Games Go Dota 2
StarCraft
Robot Control Autonomous Driving
Summary
2. Markov Decision Processes and Bellman Equations
Introduction Markov Processes
The Markov Property Markov Chains Markov Reward Processes
Value Functions and Bellman Equations for MRPs Solving Linear Systems of an Equation Using SciPy
Exercise 2.01: Finding the Value Function in an MRP Markov Decision Processes
The State-Value Function and the Action-Value Function Bellman Optimality Equation Solving the Bellman Optimality Equation
Solving MDPs
Algorithm Categorization Value-Based Algorithms Policy Search Algorithms Linear Programming
Exercise 2.02: Determining the Best Policy for an MDP Using Linear Programming Gridworld Activity 2.01: Solving Gridworld Summary
3. Deep Learning in Practice with TensorFlow 2
Introduction An Introduction to TensorFlow and Keras
TensorFlow Keras Exercise 3.01: Building a Sequential Model with the Keras High-Level API
How to Implement a Neural Network Using TensorFlow
Model Creation Model Training Loss Function Definition Optimizer Choice Learning Rate Scheduling Feature Normalization Model Validation Performance Metrics Model Improvement
Overfitting Regularization Early Stopping Dropout Data Augmentation Batch Normalization Model Testing and Inference
Standard Fully Connected Neural Networks Exercise 3.02: Building a Fully Connected Neural Network Model with the Keras High-Level API Convolutional Neural Networks Exercise 3.03: Building a Convolutional Neural Network Model with the Keras High-Level API Recurrent Neural Networks Exercise 3.04: Building a Recurrent Neural Network Model with the Keras High-Level API
Simple Regression Using TensorFlow
Exercise 3.05: Creating a Deep Neural Network to Predict the Fuel Efficiency of Cars
Simple Classification Using TensorFlow
Exercise 3.06: Creating a Deep Neural Network to Classify Events Generated by the ATLAS Experiment in the Quest for Higgs Boson
TensorBoard – How to Visualize Data Using TensorBoard
Exercise 3.07: Creating a Deep Neural Network to Classify Events Generated by the ATLAS Experiment in the Quest for the Higgs Boson Using TensorBoard for Visualization Activity 3.01: Classifying Fashion Clothes Using a TensorFlow Dataset and TensorFlow 2
Summary
4. Getting Started with OpenAI and TensorFlow for Reinforcement Learning
Introduction OpenAI Gym
How to Interact with a Gym Environment Exercise 4.01: Interacting with the Gym Environment Action and Observation Spaces How to Implement a Custom Gym Environment
OpenAI Universe – Complex Environment
OpenAI Universe Infrastructure Environments
Atari Games Flash Games Browser Tasks
Running an OpenAI Universe Environment Validating the Universe Infrastructure
TensorFlow for Reinforcement Learning
Implementing a Policy Network Using TensorFlow Exercise 4.02: Building a Policy Network with TensorFlow Exercise 4.03: Feeding the Policy Network with Environment State Representation How to Save a Policy Network
OpenAI Baselines
Proximal Policy Optimization Command-Line Usage Methods in OpenAI Baselines Custom Policy Network Architecture
Training an RL Agent to Solve a Classic Control Problem
Exercise 4.04: Solving a CartPole Environment with the PPO Algorithm Activity 4.01: Training a Reinforcement Learning Agent to Play a Classic Video Game
Summary
5. Dynamic Programming
Introduction Solving Dynamic Programming Problems
Memoization The Tabular Method Exercise 5.01: Memoization in Practice Exercise 5.02: The Tabular Method in Practice
Identifying Dynamic Programming Problems
Optimal Substructures Overlapping Subproblems The Coin-Change Problem Exercise 5.03: Solving the Coin-Change Problem
Dynamic Programming in RL
Policy and Value Iteration State-Value Functions Action-Value Functions OpenAI Gym: Taxi-v3 Environment
Policy Iteration Value Iteration
The FrozenLake-v0 Environment Activity 5.01: Implementing Policy and Value Iteration on the FrozenLake-v0 Environment
Summary
6. Monte Carlo Methods
Introduction The Workings of Monte Carlo Methods Understanding Monte Carlo with Blackjack
Exercise 6.01: Implementing Monte Carlo in Blackjack
Types of Monte Carlo Methods
First Visit Monte Carlo Prediction for Estimating the Value Function Exercise 6.02: First Visit Monte Carlo Prediction for Estimating the Value Function in Blackjack Every Visit Monte Carlo Prediction for Estimating the Value Function Exercise 6.03: Every Visit Monte Carlo Prediction for Estimating the Value Function
Exploration versus Exploitation Trade-Off Importance Sampling
The Pseudocode for Monte Carlo Off-Policy Evaluation Exercise 6.04: Importance Sampling with Monte Carlo
Solving Frozen Lake Using Monte Carlo
Activity 6.01: Exploring the Frozen Lake Problem – the Reward Function The Pseudocode for Every Visit Monte Carlo Control for Epsilon Soft Activity 6.02 Solving Frozen Lake Using Monte Carlo Control Every Visit Epsilon Soft
Summary
7. Temporal Difference Learning
Introduction to TD Learning TD(0) – SARSA and Q-Learning
SARSA – On-Policy Control Exercise 7.01: Using TD(0) SARSA to Solve FrozenLake-v0 Deterministic Transitions The Stochasticity Test Exercise 7.02: Using TD(0) SARSA to Solve FrozenLake-v0 Stochastic Transitions Q-Learning – Off-Policy Control Exercise 7.03: Using TD(0) Q-Learning to Solve FrozenLake-v0 Deterministic Transitions Expected SARSA
N-Step TD and TD(λ) Algorithms
N-Step TD
N-step SARSA N-Step Off-Policy Learning
TD(λ)
SARSA(λ)
Exercise 7.04: Using TD(λ) SARSA to Solve FrozenLake-v0 Deterministic Transitions Exercise 7.05: Using TD(λ) SARSA to Solve FrozenLake-v0 Stochastic Transitions
The Relationship between DP, Monte-Carlo, and TD Learning
Activity 7.01: Using TD(0) Q-Learning to Solve FrozenLake-v0 Stochastic Transitions
Summary
8. The Multi-Armed Bandit Problem
Introduction Formulation of the MAB Problem
Applications of the MAB Problem Background and Terminology MAB Reward Distributions
The Python Interface The Greedy Algorithm
Implementing the Greedy Algorithm
The Explore-then-Commit Algorithm The ε-Greedy Algorithm
Exercise 8.01 Implementing the ε-Greedy Algorithm The Softmax Algorithm
The UCB algorithm
Optimism in the Face of Uncertainty Other Properties of UCB Exercise 8.02 Implementing the UCB Algorithm
Thompson Sampling
Introduction to Bayesian Probability The Thompson Sampling Algorithm Exercise 8.03: Implementing the Thompson Sampling Algorithm
Contextual Bandits
Context That Defines a Bandit Problem Queueing Bandits Working with the Queueing API Activity 8.01: Queueing Bandits
Summary
9. What Is Deep Q-Learning?
Introduction Basics of Deep Learning Basics of PyTorch
Exercise 9.01: Building a Simple Deep Learning Model in PyTorch PyTorch Utilities
The view Function The squeeze Function The unsqueeze Function The max Function The gather Function
The State-Value Function and the Bellman Equation
Expected Value The Value Function The Value Function for a Deterministic Environment The Value Function for a Stochastic Environment:
The Action-Value Function (Q Value Function)
Implementing Q Learning to Find Optimal Actions
Advantages of Q Learning
OpenAI Gym Review Exercise 9.02: Implementing the Q Learning Tabular Method
Deep Q Learning
Exercise 9.03: Implementing a Working DQN Network with PyTorch in a CartPole-v0 Environment
Challenges in DQN
Correlation between Steps and the Convergence Issue Experience Replay The Challenge of a Non-Stationary Target The Concept of a Target Network Exercise 9.04: Implementing a Working DQN Network with Experience Replay and a Target Network in PyTorch The Challenge of Overestimation in a DQN Double Deep Q Network (DDQN) Activity 9.01: Implementing a Double Deep Q Network in PyTorch for the CartPole Environment
Summary
10. Playing an Atari Game with Deep Recurrent Q-Networks
Introduction Understanding the Breakout Environment
Exercise 10.01: Playing Breakout with a Random Agent
CNNs in TensorFlow
Exercise 10.02: Designing a CNN Model with TensorFlow
Combining a DQN with a CNN
Activity 10.01: Training a DQN with CNNs to Play Breakout
RNNs in TensorFlow
Exercise 10.03: Designing a Combination of CNN and RNN Models with TensorFlow
Building a DRQN
Activity 10.02: Training a DRQN to Play Breakout
Introduction to the Attention Mechanism and DARQN
Activity 10.03: Training a DARQN to Play Breakout
Summary
11. Policy-Based Methods for Reinforcement Learning
Introduction
Introduction to Value-Based and Model-Based RL Introduction to Actor-Critic Model
Policy Gradients
Exercise 11.01: Landing a Spacecraft on the Lunar Surface Using Policy Gradients and the Actor-Critic Method
Deep Deterministic Policy Gradients
Ornstein-Uhlenbeck Noise The ReplayBuffer Class The Actor-Critic Model Exercise 11.02: Creating a Learning Agent Activity 11.01: Creating an Agent That Learns a Model Using DDPG
Improving Policy Gradients
Trust Region Policy Optimization Proximal Policy Optimization Exercise 11.03: Improving the Lunar Lander Example Using PPO The Advantage Actor-Critic Method Activity 11.02: Loading the Saved Policy to Run the Lunar Lander Simulation
Summary
12. Evolutionary Strategies for RL
Introduction Problems with Gradient-Based Methods
Exercise 12.01: Optimization Using Stochastic Gradient Descent
Introduction to Genetic Algorithms
Exercise 12.02: Implementing Fixed-Value and Uniform Distribution Optimization Using GAs Components: Population Creation Exercise 12.03: Population Creation Components: Parent Selection Exercise 12.04: Implementing the Tournament and Roulette Wheel Techniques Components: Crossover Application Exercise 12.05: Crossover for a New Generation Components: Population Mutation Exercise 12.06: New Generation Development Using Mutation Application to Hyperparameter Selection Exercise 12.07: Implementing GA Hyperparameter Optimization for RNN Training NEAT and Other Formulations Exercise 12.08: XNOR Gate Functionality Using NEAT Activity 12.01: Cart-Pole Activity
Summary
Appendix
1. Introduction to Reinforcement Learning
Activity 1.01: Measuring the Performance of a Random Agent
2. Markov Decision Processes and Bellman Equations
Activity 2.01: Solving Gridworld
3. Deep Learning in Practice with TensorFlow 2
Activity 3.01: Classifying Fashion Clothes Using a TensorFlow Dataset and TensorFlow 2
4. Getting started with OpenAI and TensorFlow for Reinforcement Learning
Activity 4.01: Training a Reinforcement Learning Agent to Play a Classic Video Game
5. Dynamic Programming
Activity 5.01: Implementing Policy and Value Iteration on the FrozenLake-v0 Environment
6. Monte Carlo Methods
Activity 6.01: Exploring the Frozen Lake Problem – the Reward Function Activity 6.02 Solving Frozen Lake Using Monte Carlo Control Every Visit Epsilon Soft
7. Temporal Difference Learning
Activity 7.01: Using TD(0) Q-Learning to Solve FrozenLake-v0 Stochastic Transitions
8. The Multi-Armed Bandit Problem
Activity 8.01: Queueing Bandits
9. What Is Deep Q-Learning?
Activity 9.01: Implementing a Double Deep Q Network in PyTorch for the CartPole Environment
10. Playing an Atari Game with Deep Recurrent Q-Networks
Activity 10.01: Training a DQN with CNNs to Play Breakout Activity 10.02: Training a DRQN to Play Breakout Activity 10.03: Training a DARQN to Play Breakout
11. Policy-Based Methods for Reinforcement Learning
Activity 11.01: Creating an Agent That Learns a Model Using DDPG Activity 11.02: Loading the Saved Policy to Run the Lunar Lander Simulation
12. Evolutionary Strategies for RL
Activity 12.01: Cart-Pole Activity
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion